Thank you for the quick response! I tried that yesterday and the issue was with what I am seeing in the “information_schema.pipeline_files.”
When I write my data to the S3 file, I am overwriting the files with file names similiar to:
0000_part_00
0001_part_00
0002_part_00
and so forth.
When I truncate the table, those files are still being shown as “loaded”
Is there a way to get the pipeline to grab the files and overwrite the table?
Do I need to delete the data in the information_schema.pipeline_files regarding this pipeline and/or Can I unload those files from this table to enable the pipeline to accept the same files?
I think it is important for me to state my goal: I want to overwrite the same data in S3 file with update data every week. When that S3 bucket is loaded with new data, I want to start a pipeline to grab this updated data and overwrite the data on the table.
JoYo should be responding to your specific question shortly.
In the meantime, wanted to know how exactly did you deploy MemSQL? Not sure if you knew, we have MemSQL in AWS Marketplace as well as AWS QuickStart. You can either spin up a BYOL listing or the PAID (on demand) listing.
I see. If you want to start a pipeline from the beginning, you can do
alter pipeline <name> set offsets earliest
if you want to tell a pipeline to reload a specific file, you can do
alter pipeline <name> drop file <file>
there is no way to do this automatically. if a file changes the pipeline wont detect it.
If you’re completely rewriting your bucket, its probably best to just set the offsets earliest.