I trying to create hdfs pipeline but, I’m getting Error.
CREATE PIPELINE pi_hdfs
AS LOAD DATA HDFS ‘hdfs://~~:8020/~~’
CONFIG ‘{“disable_gunzip”: true}’
INTO TABLE ‘~~’(data_line)
SET file_path = pipeline_source_file();
========================================
ERROR 1993 ER_EXTRACTOR_GET_LASTEST_OFFSETS:
Cannot get source metadata for pipeline. could not walk folder /CTTM/TEST/~~/:stat /CTTM/TEST/~~/:getFileInfo call failed with FATAL_UNAUTHORIZED (org.apache.hadoop.security.AuthorizationException)
Actually, Hadoop’s access account is ‘sysadmin’. So, I run this query, and I’m getting same error again.
CREATE PIPELINE pi_hdfs
AS LOAD DATA HDFS ‘hdfs://~~:8020/~~’
CONFIG ‘{“disable_gunzip”: true}’
CREDENTIALS '{"user":"sysadmin"}'
INTO TABLE ‘~~’(data_line)
SET file_path = pipeline_source_file();
Is there any other way to change access account when pipeline access hadoop?
First, There were several typos in my question. Hadoop access account is ‘gpadmin’, not ‘sysadmin’.
and It’s the same in query. Using Different double quotes in query is typing error, too.
I am sorry to make you confused.
Now, I will answer your request.
User that owns hdfs:// path is ‘gpadmin’.
Different double quotes is typo.
Second query is corrected.
CREATE PIPELINE pi_hdfs
AS LOAD DATA HDFS 'hdfs://~~:8020/~~'
CONFIG '{"disable_gunzip": true}'
CREDENTIALS '{"user":"gpadmin"}'
INTO TABLE '~~'(data_line)
SET file_path = pipeline_source_file();
Additionally, I trying to run query with this syntax, too. but same error occurred.
Thanks for your reply, mkobyakov
I checked all subfolders with hdfs ls command, and ‘gpadmin’ has permissions to access all this directory. Is there anything else I need to check?
hmm, those are most of the things i would suspect on memsql side. can you describe your Hadoop cluster in more detail? do you by any chance use an authentication module like Kerberos?
I’m not using authentication module in Hadoop.
All subfolders of the top path in hadoop are given r-x permissions in other users. so, ‘memsql’ account has access to file and directory.
Can pipeline problems occur depending on the version? I’m using MemSQL version 7.1.10.