Struggling with S3 ingest

jeff · October 22, 2019, 6:47pm

I have successfully created a pipeline that connects to S3, but am getting this bogus message:

ERROR 1262 ER_WARN_TOO_MANY_RECORDS: Leaf Error (node-71a961e2-6fe3-4556-baf6-977cea79e9dd-leaf-ag2-1.svc-71a961e2-6fe3-4556-baf6-977cea79e9dd:3306): Leaf Error (node-71a961e2-6fe3-4556-baf6-977cea79e9dd-leaf-ag1-0.svc-71a961e2-6fe3-4556-baf6-977cea79e9dd:3306): Row 1 was truncated; it contained more data than there were input columns

The reason I say “bogus” is because I have used these exact same buckets to ingest to Redshift, Aurora, and Athena. I have carefully counted the columns in the table, the columns in the text files, and the columns specified in the CREATE PIPELINE - they match. I have specified the correct delimiters (’\t’, and ‘\n’).

One thing that might help is disambiguating the error message. I can’t tell whether it thinks there are more columns in the source file or in the schema, or in the pipeline spec.

Also - a by-the-way: the JSON configuration for pipelines mentions an “extended_null” field, but I don’t see a way to include that in the CREATE PIPELINE command. I am going to need that.

jack · October 22, 2019, 9:57pm

The information_schema.PIPELINES_ERRORS has additional details about the error, including the input line (as parsed by MemSQL) that contained the wrong number of columns: SingleStoreDB Cloud · SingleStore Documentation.

If you can find and share that information, along with the CREATE PIPELINE statement and the input line (or an example that shows the same behavior), we can probably figure out what is the cause of the error in this case.

The error means that the input csv file contained more columns than the pipeline specified. You’re right that that isn’t very clear, I’ll file a note to clarify it.