Hello again - No problem, and thank you for your replies.
You should be able to use the JDBC call you provided to achieve, but the performance could be better if you use the connector, as mentioned above. However, it is difficult to say how much because that depends on the data and other hardware characteristics.
There are a couple of reasons why the connector version may not be working for you based on the info you provided:
- It appears you are using the older version of the memsql connector in your example (e.g.,
memsql-connector_2.10-1.3.3.jar
), but you are using the newer syntax. The newer syntax won’t work for the older connector. All the artifacts for the latest connector are available here (via Maven). You can also access the Github repo here
- Your hostname and user/password must map to the settings you set from our example (e.g., the example we provided shows that the Master (ddlEndpoint) hostname is
memsql-master.cluster.internal
, that the password is s3cur3-pa$$word
) Did you align your cluster details with this? Otherwise, it will not work
- What version of Spark are you using?
The MemSQL 3.0.* connector versions are compatible with Spark 2.3 and 2.4. The 3.1.-* (in beta) is compatible with Spark 3. If you are using Spark Version 3, you should test our Beta version of 3.0.1. If you are using Spark 2.3 and 2.4, you should test with our production releases of the 3.0.* connector (e.g., everything from 3.0.0 to 3.0.5 here)
To add the dependency to your cluster, you technically do not need the jar file. If you use the jar file and build it, then you will need to install all dependencies manually as well:
-To add the dependency to using Pyspark, you can use the following command:
$SPARK_HOME/bin/spark-shell --jars com.memsql:memsql-spark-connector_2.11:3.0.<insert-connector-version>spark-<insert-spark-version>
For example, if you wanted to use Spark Connector version 3.0.5 w/ Spark 2.4, this would be:
$SPARK_HOME/bin/spark-shell --packages com.memsql:memsql-spark-connector_2.11:3.0.5-spark-2.4.4
Alternatively, you can use Maven or SBT to add the dependency.
If you require adding the jar file manually, you will also have to install these dependencies:
"org.mariadb.jdbc" % "mariadb-java-client" % "2.+",
"io.spray" %% "spray-json" % "1.3.5",
If you do use the jar file, you can use the similar command to add packages provided above via:
$SPARK_HOME/bin/spark-shell --jars <path_to_spark-connector>,<path_to_additional_lib1>,<path_to_additional_lib2>
Best,
Roxanna