Create Pipeline to GCS

oerner · February 24, 2020, 5:39pm

Hi,
What is the exact syntax needed to create a pipeline to GCS?
I understand it should be supported (for what exact version?)

I guess it is based on the following:
CREATE PIPELINE mypipeline AS
LOAD DATA S3 ‘my-bucket-name’
CONFIG ‘{“region”: “us-west-1”}’
CREDENTIALS ‘{“aws_access_key_id”: “your_access_key_id”, “aws_secret_access_key”: “your_secret_access_key”, [“role_arn”:“replace_with_your_role_arn”]}’
INTO TABLE my_table

Thanks!

rnarayanan · February 26, 2020, 9:33pm

Hello oerner,

Could you please clarify the meaning of GCS?

Thanks,

oerner · February 27, 2020, 6:22am

Hi,
meaning is Google Cloud Storage.
Using a pipeline to upload files from bucket.
For backup memsql is using the S3 api.
Can it be done the same with pipelines?
if yes - in what version with what exact syntax?
if no - are you planing on providing it? about when?

Thanks much.

rnarayanan · March 19, 2020, 3:53am

Hi oerner,

I have reached out to one of our engineers. Expect an answer soon.

Thanks for your patience.

Ramesh Narayanan

yznovyak-ua · March 19, 2020, 11:08am

Hi @oerner!

This feature is going to be released in 7.1 (mid-Spring, I believe).

Is this still relevant to you? If yes, what version of MemSQL are you currently running and what flavor: cloud (Helios) or self-managed? If there still is an interest, I’ll ask around to see if we have spare cycles to backport it or maybe even do a custom test-build.

oerner · March 24, 2020, 5:07am

Hi,
Yes, still relevant to us.
We’re running v6.8 in Kubernetes.
Soon we’ll move to v7.

Thanks,

nikita · March 24, 2020, 5:22am

Fresh off the press. Ingest from GCP is now supported

zfoster · April 3, 2020, 2:56pm

Nikita, do you happen to know when the documentation for the GCS pipelines will be released? Thanks!

yznovyak-ua · April 3, 2020, 5:14pm

Not Nikita, but hopefully I can answer.

The docs are coming really soon – sorry for that. Meanwhile I’ll try to describe here.

Basically the syntax is similar to the S3 one:

CREATE PIPELINE library
AS LOAD DATA GCS 'my-bucket-name'
CREDENTIALS '{"access_id": "YOUR_ACCESS_KEY_ID", "secret_key": "YOUR_SECRET_ACCESS_KEY"}'
INTO TABLE `classic_books`
FIELDS TERMINATED BY ',';

The differences are GCS (stands for Google Cloud Storage) instead of S3, and that in CREDENTIALS you have to specify access_id and secret_key fields, while in S3 they were named aws_access_key_id and aws_secret_access_key respectively.

access_id and secret_key are Google’s HMAC Keys. You can follow this guide about creating them. But as a quick sanity check you can assume that access_id is usually a 24 or 60 character alphanumeric string, which is linked to the Google account, typically is all uppercase and starts with “GOOG” and secret_key is usually a 40 character Base-64 encoded string that is linked to a specific access_id.

Hope this helps!