Hi, can anyone help with a example of pipelines using R as transform step? Thanks a lot
Could you give us more information on using R as a transform step? Do you have any code written that we could help with?
Sure Jacky. This is basic example that I trying to implement.
R code:
#!/usr/bin/env Rscript
##------Input files from CSV, to changed to MemSQL input----------
##setwd("/home/memsql/TestFiles/Sample.csv")
Read sample data
sampledata <- read.csv(“Sample.csv”)
sampledata$Test_Calc<- sampledata$Sample * 3
##sampledata
##-----------------------------------------------------------------------
Test_CalcData <- sampledata$Test_Calc
##—Output to file, to be changed to MemSQL table-------
##write.csv(Test_CalcData, ‘TestCalc.csv.log’)
write.csv(sampledata, “Sample.csv”)
##-------------------------------------------------------
MemSQL Pipeline code
CREATE AGGREGATOR PIPELINE tp2 as
LOAD DATA FS ‘/home/memsql/TestFiles/Sample.csv’
WITH TRANSFORM (’** **file://localhost/home/memsql/TestCode_Pipe.R’,’’,’’)
INTO TABLE Test_Raw
FIELDS TERMINATED BY ‘,’;
CSV
Sample,Test_Calc,
26.48416667,
102.265,
321.6091667,
1,
158.5141667,
1699.884167,
-32.9225,
145.8925,
20.50833333,
4454.87,
533.4758333,
2.32,
26.2875,
1538.288333,
Thanks for the help
I’m not familiar enough with R to know exactly how to do this, but a memsql transform reads the input file on stdin and writes, say, a CSV file to stdout. You should research how to do this in R, I assume its possible. I don’t know how good R is as a scripting language though.
To test your program, you can do
cat your_input_file.csv | ./TestCode_Pipe.R
and, if it prints CSV to the terminal, you have won.