Hi,
I’m currently evaluating MemSql and am in the process of loading data into the developer instance of the server.
I am attempting to import NDJSON data from S3 into memsql. The data is stored as .json.gz files.
The documentation is not clear at all so I am assuming that .gz files are automatically decompressed on import.
I have the following:
Newline Delimited JSON files containing rows like this.
{"user":"fc2ebc75782e6d395d0807081bd3bfb2","country":"AT","region":"AT-9","location":"12","loc_key":"AT~AT-9~12","birth_year":"1992","stream_date":"2019-01-01","upc":"22122122122","isrc":"BAZZL1100254","track_artists":"Foo Bar Baz","track_title":"Bar Baz","track_version":"2","album_artist":"Foo Bar Baz","album_name":"BarBaz Bar","parent_identifier":"22122122122","content_type":"audio","territory":"AT","vendor_identifier":"1c6c7cd166bab4dfba949af06d8d4a0e","playlist_id":"","units":"1","max_seconds":240,"source":"other","device":"speaker","os":"speaker","gender":"male","age":"25to34","account":"premium","shuffles":"0","repeats":"0","completed":"0","skipped":"1"}
And my definiton for the pipeline is like this. Unfortunately the documentation is rather sparse and the few examples of parsing JSON that I can find so far all use nested JSON as the sample.
CREATE PIPELINE IF NOT EXISTS pipeline_memsql_test_streaming_dataset AS
LOAD DATA S3 "bucket-test/data/prefix/xxx_user_streams"
CREDENTIALS '{"aws_secret_access_key": "******", "aws_access_key_id": "*****"}'
WITH TRANSFORM ('memsql://json', '', '-r " [.user, .country, .region, .location, .loc_key, .birth_year, .stream_date|tostring, .upc|tostring, .isrc|tostring, .track_artists, .track_title, .track_version|tonumber, .album_artist, .album_name, .parent_identifier, .content_type, .territory, .vendor_identifier, .playlist_id, .units|tonumber, .max_seconds|tonumber, .source, .device, .os, .gender, .age, .account, .shuffles|tonumber, .repeats|tonumber, .completed|tonumber, .skipped|tonumber] | @tsv"')
INTO TABLE xxx_user_streams
LINES TERMINATED BY '\n';
Running TEST PIPELINE produces no output as I’d expect.
How can I go about debugging this and get it working.