I am using Aws Glue for ETL Job. I have Memsql as source for ETL.
I have created connections, Crawler and database.
Crawlers created table in the database.
I want to use this database and Table to get data from memsql.
I am able to get data from memsql with this code. val datasource0 = glueContext.getSourceWithFormat( connectionType = "marketplace.spark", options =JsonOptions(s"""{ "ddlEndpoint":"endpoint:3306", "database" : "poc", "dbtable": "tablename", "user":"user", "password":"pass", "dataSource" : "memsql", "className" : "com.memsql.spark" }"""), transformationContext = "datasource0").getDynamicFrame()
but i don’t want the overhead to have username and password here.
what i want to use is something like this. val options =JsonOptions(s"""{ "connectionType" : "marketplace.spark", "dataSource" : "memsql", "className" : "com.memsql.spark" }""") val datasource0 = glueContext.getCatalogSource(database = "memsql-test", tableName = "poc_sales", redshiftTmpDir = "", transformationContext = "datasource0", pushDownPredicate = " ", additionalOptions = options).getDynamicFrame()
here database and tableName are glue data catalog names fetched by crawler.
Hi @anilvivekabhi,
you should provide user and password options because the job can’t connect without credentials.
But if you don’t want to reveal credentials in the code, you can use the secret manager to provide credentials in a secure way.
For more details, you can read this AWS blog about using Secret Manager with AWS Glue
is there no way we can use database and tables created in glue with memsql connection and crawlers?
I have created Glue connection there i provided username and password to connect.
Glue crawlers created a Table inside glue having metadata and schema fetched from memsql tables.
can’t i use that this glue tables as we do with other databases and connections?
i can connect with RDS like this.
val source = glueContext.getCatalogSource(database = “poc”, tableName = “table”, redshiftTmpDir = “”, transformationContext = “datasourceCustomer”).getDynamicFrame()
here database and tableName are glue database and tables created by crawlers.
@anilvivekabhi if your Glue connection already contains user and password, then you can try to provide that connection name as connectionName option to your Glue ETL Job, like this val options =JsonOptions(s"""{ "connectionType" : "marketplace.spark", "dataSource" : "memsql", "className" : "com.memsql.spark", "connectionName" : "<your connection name>" }""")