Hi,
I want to create a pipeline that ingests CSV or parquet data into a table that has a primary key defined as auto increment. MemSQL 7.0 gives following error
ERROR 1749 (HY000): Feature ‘Pipeline with auto increment column’ is not supported by MemSQL Distributed.
Are there any suggestions or alternatives?
Hi Sanjeev,
I would suggest to ingest your CSV data into a simple table, then have a second process that grab that data and insert it into another table (with a PK, auto-increment). MemSQL PK auto-increment does not guarantees sequential ID (like in MySQL). It will guarantee unique values though.
I suggest to add a “status” field in your table for ingestion with a default 0. When you want to process the rows, set the status to 2, then set it to 1 when done.
Table_A is your table for ingestion, has status colunn default 0
- update table_A set status = 2 where status =0
- insert into table_with_pk_auto_increment Select fields needed from table_A where status=2
- update table_A set status=1 where status =2
This can be easily done via a Stored Proc which you could call every x seconds or minutes.
Hope this helps.
Hey Sanjeev,
This is easy. You will use an aggregator pipeline, using the syntax CREATE OR REPLACE AGGREGATOR PIPELINE …
. That way data will be loaded only through the aggregator rather than through the leaves.
See more details here : SingleStoreDB Cloud · SingleStore Documentation