I have around 5 million records of data, which includes JSON data type in our Cassandra DB. I am trying to move the data from Cassandra to MemSQL.
I tried to export data to CSV and import the data, but running into issues due to
Null or Empty column values
JSON having comma separated values and CSV having comma separated values
Large CSV file
due to null column values, and JSON comma separated since the volume is large, it is not working as expected.
Is there any other best way to do this?
It’s hard to understand your requirement given what you’ve written so far. Can you post a subset of the file you are trying to load, say the first few records, so we can see the format?
If that is not possible, here are a few things to consider:
NULL or empty column values: this should not matter, e.g. ",," should load as empty string. Or you can use an IF expression to convert from empty string to NULL later.
Load all the data into a table with one JSON column, then do INSERT INTO … SELECT FROM to transform the data from JSON to a standard table format in a second step.
Spilt the file into N parts, then write an application program to read the file and use INSERT to add data to the target table. Run several copies of the app in parallel on different files to speed things up so you can get through the 5M rows quicker.