I would like to create a data pipeline which runs inside Single Store using multiple single store pipelines, e.g.
Pipeline A ingests external sources (CSV files) and writes them to table A1.
Pipeline B “listens” on changes in table A1 (is triggered once new data is written to A1), and reads the input from A1 and writes its output to table A2.
Is defining something like this (pipeline B) possible?
Are you sure you need two pipelines? Pipeline A can invoke a stored procedure, and the procedure can insert the batch into A1 and then invoke the same logic copying from A1 to A2.
let me rephrase myself: I would like to create a single data pipeline which will perform several operations on the data, and possibly persist the data into single store after each phase (for troubleshooting \ replay purposes), e.g. something like:
filter invalid records -> persist to table A1 -> aggregate rows -> persist to table A2 -> calculate additional stuff \ enrich data -> persist to table A3 -> etc…
the logic in each step (filter, aggregate, calculate) should be configurable from the outside. I was wondering if\how there is a way to implement something like this using SingleStore.
You can add multiple steps or phases to a single SP, which will operate on each pipeline batch as the input, and as many tables and aggregations as you need.
You can also implement the steps/phases as separate stored procedures, and the main pipeline SP can call other ones depending on either configuration or aggregation result. In that case, it might make sense to put configuration into yet another table.
I hope this helps. Would be willing to show us the code for your use case? It sounds non-trivial and interesting.