Is there a way to have the the name of the file that the pipeline is executing?
As an example I have a lot of files with a pattern like: file_name__.gz
depending on the ID and date value I need to enhance the data loaded trough the pipeline.
I was thinking to create multiple pipelines but the variation is quite large.
Yes! In 6.8 and later, you can use pipeline_source_file() builtin in the set clause. For instance, if you want to set a column in your table called source_file to the source file, you can do
create table t(col1 bigint, col2 bigint, col3 bigint, source_file blob);
create pipeline p
as load data fs '/path/to/data'
into table t(col_1, col_2, col_3)
set source_file = pipeline_source_file();
Procedure should work but you still have to pass set source_file = pipeline_source_file(). The source file has to be known before the procedure is called.
How to do this, when I am using a pipeline with a procedure
CREATE PIPELINE load_sales_proc_test
AS LOAD DATA FS ‘/home/memsql/*.csv’
INTO PROCEDURE sales_transformations
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’ ignore 1 lines;