Hi There,
I am trying to connect to Singlestore from Apache spark in an offline environment but I am getting the following error:
java.lang.ClassNotFoundException: Failed to find data source: singlestore. Please find packages at http://spark.apache.org/third-party-projects.html
Since I am working in an offline environment I can’t use the spark packages option that points to Maven coordinates in an online repository, so instead I am listing the jars with the --jars parameter.
I have tried following the instructions here and here but I am still running into this error.
I have pasted the spark-shell code below which shows that the jars are registered with the spark session and that the mssql jar can be used successfully. Any ideas on what I am doing wrong? Thanks in advance for any help:
[root@hadoop-namenode spark-2.4.5-bin-hadoop2.7]# bin/spark-shell --jars /lib/jars/sqljdbc_6.0/enu/jre8/sqljdbc42.jar,/lib/jars/memsql-spark-connector_2.11-3.0.5-spark-2.4.4.jar,/lib/jars/mariadb-java-client-2.7.2.jar
21/03/26 05:16:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/03/26 05:16:16 WARN util.Utils: Service ‘SparkUI’ could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://hadoop-namenode:4041
Spark context available as ‘sc’ (master = local[*], app id = local-1616728576328).
Spark session available as ‘spark’.
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
// .__/_,// //_\ version 2.4.5
//Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_252)
Type in expressions to have them evaluated.
Type :help for more information.scala> spark.sparkContext.listJars.foreach(println)
spark://hadoop-namenode:52984/jars/mariadb-java-client-2.7.2.jar
spark://hadoop-namenode:52984/jars/sqljdbc42.jar
spark://hadoop-namenode:52984/jars/memsql-spark-connector_2.11-3.0.5-spark-2.4.4.jarscala> :paste
// Entering paste mode (ctrl-D to finish)val df = spark.read
.format(“jdbc”)
.option(“url”, “jdbc:sqlserver://xxx.xxx.x.xx:1434”)
.option(“databasename”, “xxxxx”)
.option(“dbtable”, “xxxxx”)
.option(“user”, “xxxxxxxxx”)
.option(“password”, “xxxxxxxxxxx”)
.option(“driver”,“com.microsoft.sqlserver.jdbc.SQLServerDriver”)
.loadprintln("Test Number of Rows: " + df.count)
// Exiting paste mode, now interpreting.
21/03/26 05:18:50 WARN util.Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting ‘spark.debug.maxToStringFields’ in SparkEnv.conf.
Test Number of Rows: 0
df: org.apache.spark.sql.DataFrame = [x,x,x,x: int … 75 more fields]scala> :paste
// Entering paste mode (ctrl-D to finish)spark.conf.set(“spark.datasource.singlestore.ddlEndpoint”, “xxx.xx.x.xx”)
spark.conf.set(“spark.datasource.singlestore.user”, “root”)
spark.conf.set(“spark.datasource.singlestore.password”, “xxxxxxxxx”)val df = spark.read
.format(“singlestore”)
.load(“test.cust”)// Exiting paste mode, now interpreting.
java.lang.ClassNotFoundException: Failed to find data source: singlestore. Please find packages at Third-Party Projects | Apache Spark
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
… 57 elided
Caused by: java.lang.ClassNotFoundException: singlestore.DefaultSource
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
… 59 more