As the above solution, I can see the performance improvement by deleting the file periodically.
However, when the loading is delayed due to certain factors (network delay, etc.), pipeline stops when the unloading file is deleted.
To solve it, it would be better to distinguish between loaded and unloaded of the pipeline and then delete the loaded file.
However, as you can see in the second question below, it takes a long time to read the file.
So, it doesn’t seem to be a good idea. Can you suggest a plan?
As sensor data is generated continuously during actual operation, the size of the “PIPELINES_FILES” table will continue to grow for the pipeline to process these files.
As you can see below, the sensor data was created for about 12 hours and it took a long time (53.97s) to select the files.
I’d like to know that as these files pile up, performance slows down to check for new files.
Then, is there a setting that effectively manages the “PIPELINES_FILES” table so that it doesn’t grow?
In addition, pipeline related ‘PIPELINES_CURSORS’ and ‘PIPELINES_OFFSETS’ tables continue to grow, how can they be effectively managed?
Whenever a new file is created, it appears that the new file is recognized by the filename as the pipeline loads the file.
Is it possible to set the criteria to time of file creation?
in addition,
What is the meaning of ‘batch time’ and ‘batch interval’?
would like to know how this relates to the time it was last loaded into the database.
If I infinitely increase leaf nodes for a lot of data loads at a given time, will the amount of network traffic increase infinitely because of reshuffle?
thanks in advance