Is there any way I can connect memSQL using PySpark? I have a data pipeline using Kafka and a spark. now I want to store data into memSQL. I could find the connector only for scala. Is there a connector for PySpark or Java?
Hi mbhanda2
Yes, there is a way. You can use MariaDB Java connector for both Java and PySpark.
Here is a simple example:
pyspark --driver-class-path "PATH_TO_JAR" --jars "PATH_TO_JAR"
host="172.17.0.2"
port="3306"
database="name"
jdbcUrl = "jdbc:mariadb://{0}:{1}/{2}".format(host, port, database)
properties = {
"user": "root",
"password": "password",
}
df = spark.read.jdbc(url=url, table="tb", mode="MODE_NAME", properties=properties)
df.show()
df.write.jdbc(url=jdbcUrl, table="tb", mode="MODE_NAME", properties=properties)
NOTE: sql_mode should be set to “ANSI_QUOTES”.
Thanks Ramzes.
Can you give me the example in java too?
Sure
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
public class Main {
static final String JDBC_DRIVER = "org.mariadb.jdbc.Driver";
static final String DB_URL = "jdbc:mariadb://172.17.0.2:3306/db";
public static void main(String[] args) {
Connection conn = null;
Statement stmt = null;
try {
Class.forName(JDBC_DRIVER);
conn = DriverManager.getConnection(DB_URL, "user", "password");
stmt = conn.createStatement();
String sql = "SELECT * FROM TABLE";
stmt.executeUpdate(sql);
} catch (Exception se) {
se.printStackTrace();
} finally {
try {
assert conn != null;
conn.close();
} catch (SQLException e) {
e.printStackTrace();
}
try {
conn.close();
} catch (SQLException se) {
se.printStackTrace();
}
}
}
}
NOTE: mariadb JDBC driver must be accessible from the project (add to classpath or as a project dependency).
Thanx ramzes… I couldn’t find any documentation related to this topic . you saved my project.