The batch datasets into the procedures need to be fed ultimately by calling a pipeline. My Question is that we merely load the base tables from S3 at the outset. The documentation does not reveal how any of the batches corresponding to the example StoredProcedures is materialized in a pipeline.
In a nutshell, how do I recognize the batch dataset upstream, that needs to be fed to a Pipeline which will call the Stored Procedure?
Pipeline takes care of loading the data from the source in batches, You just need to define the stored procedure with the required format.
calling the stored procedure with relevant batches is done internally by pipeline. In case of S3 batch would be a set of files that pipelines picks to load at once
OK, but if I want to ‘stage’ a CSV or Parquet from S3 with a help of pipeline X as Table A and then want to do some data manipulation on Table A through a Stored Procedure and want to automate this 2nd part through pipeline Y such that Pipeline Y follows Pipeline X, would that be possible? In this case, Pipeline Y executing Stored Procedure S would completely feed off Table A already materialized in Singlestore.
Or is the answer that Pipeline source batch can’t be a resident relational table in Singlestore?
pipelines bring data from data sources → optionally apply transformations → store on destination table and the data source cant be a SingleStore table
Is it a requirement to maintain Table A and transformed Table B?
I could suggest writing the logic in the Sproc of pipeline to do insertions in table A and table B.
or load to table A first and transform periodically through external app or scheduled notebook jobs feature of SingleStore