File name for pipeline

tudor · August 6, 2019, 10:23pm

Is there a way to have the the name of the file that the pipeline is executing?

As an example I have a lot of files with a pattern like: file_name__.gz
depending on the ID and date value I need to enhance the data loaded trough the pipeline.

I was thinking to create multiple pipelines but the variation is quite large.

JoYo · August 6, 2019, 10:36pm

Yes! In 6.8 and later, you can use pipeline_source_file() builtin in the set clause. For instance, if you want to set a column in your table called source_file to the source file, you can do

create table t(col1 bigint, col2 bigint, col3 bigint, source_file blob);

create pipeline p
as load data fs '/path/to/data'
into table t(col_1, col_2, col_3)
set source_file = pipeline_source_file();

tudor · June 16, 2020, 7:43pm

Procedure should work but you still have to pass set source_file = pipeline_source_file(). The source file has to be known before the procedure is called.

ddsprasad · July 20, 2020, 7:53pm

How to do this, when I am using a pipeline with a procedure

CREATE PIPELINE load_sales_proc_test
AS LOAD DATA FS ‘/home/memsql/*.csv’
INTO PROCEDURE sales_transformations
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’ ignore 1 lines;