I am creating a hdfs pipeline and my source directory already contains hundred + files, I want to ignore all these files and pipelines should read only those files which will come to the directory from today onwards, Is there any way to achieve this.
ALTER PIPELINE SET OFFSETS LATEST should work here. It will cause the pipeline to mark all currently known files as “already loaded”. Future batches will only load files that weren’t in HDFS at the time of the query.