Pipeline to ingest CSV data into a table with auto increment

sanjeev.mishra · January 3, 2020, 1:52pm

Hi,

I want to create a pipeline that ingests CSV or parquet data into a table that has a primary key defined as auto increment. MemSQL 7.0 gives following error

ERROR 1749 (HY000): Feature ‘Pipeline with auto increment column’ is not supported by MemSQL Distributed.

Are there any suggestions or alternatives?

franck.leveneur · January 3, 2020, 5:16pm

Hi Sanjeev,
I would suggest to ingest your CSV data into a simple table, then have a second process that grab that data and insert it into another table (with a PK, auto-increment). MemSQL PK auto-increment does not guarantees sequential ID (like in MySQL). It will guarantee unique values though.
I suggest to add a “status” field in your table for ingestion with a default 0. When you want to process the rows, set the status to 2, then set it to 1 when done.

Table_A is your table for ingestion, has status colunn default 0

update table_A set status = 2 where status =0

insert into table_with_pk_auto_increment Select fields needed from table_A where status=2

update table_A set status=1 where status =2

This can be easily done via a Stored Proc which you could call every x seconds or minutes.

Hope this helps.

mikeczabator · January 4, 2020, 1:54pm

Hey Sanjeev,

This is easy. You will use an aggregator pipeline, using the syntax CREATE OR REPLACE AGGREGATOR PIPELINE …. That way data will be loaded only through the aggregator rather than through the leaves.

See more details here : SingleStoreDB Cloud · SingleStore Documentation