Hello Roxanna,
Thanks for the reply.
I tried your solution.
df.write
.format(“memsql”)
.option(“loadDataCompression”, “LZ4”)
.option(“truncate”, “false”)
.mode(SaveMode.Overwrite)
.save(“foo.bar”) ,
But I received two errors.
Failed to find data source: memsql. Please find packages at Third-Party Projects | Apache Spark
Exception in thread “main” java.util.NoSuchElementException: No value found for ‘LZ4’
I am not able to change maven dependency to 3.0.0-beta-spark-2.3.4 from 2.0.1 because its throwing exception that dependency not found.
So I used current 2.0.1 and modified code in this way:
df.write
.format(“com.memsql.spark.connector”)
.option(“truncate”, “false”)
.mode(SaveMode.Overwrite)
.save(“foo.bar”) ,
but the duplicates are still there.Its not overwritten.
My memsql dependency in maven is:
com.memsql
memsql-connector_2.11
2.0.1
I created a data frame
val df1 = Seq(empTable(1,“Tim”,“dev”),empTable(2,“Tom”,“dev”),empTable(3,“Fank”,“hr”)).toDF
and wrote to memsql:
df1.write
.format(“com.memsql.spark.connector”)
.option(“truncate”, “false”)
.mode(SaveMode.Overwrite)
.save(“foo.bar”)
Again created a data frame and wrote to the same memsql table.
val df2 = Seq(empTable(2,“Tom”,“dev”),empTable(3,“Fank”,“hr”),empTable(4,“kim”,“hr”)).toDF
df2.write
.format(“com.memsql.spark.connector”)
.option(“truncate”, “false”)
.mode(SaveMode.Overwrite)
.save(“foo.bar”)
Ideally now memsql table should contain 4 records,but it have 6, ie, duplicates are created.
But there is an additional column ’ memsql insert time’ automatically created ,which is different for each record.Is that the reason for it?
Regards
Smitha