Singlestore Helios now supports user-defined functions written in C, C++ and Rust with our new Code Engine — Powered by Wasm.
Application developers can now use libraries of existing code, or code from their applications, in database functions and call them from SQL. That can tremendously reduce the need to move data from the database to the application to analyze it, speeding things up and simplifying development. We illustrate the power of Code Engine for Wasm with a story about a developer who finds a new way to create an important sentiment analysis report — easier and faster.
You're an application developer for an eCommerce company, and you've been asked several times to produce reports that do sentiment analysis on the review comments for products your company has sold. It is getting mighty tedious because your sentiment analysis code lives in a Rust program, so every report requires you to export data to the app and score it there. Then, you have to write query processing-style logic in your app to create the report. It's honestly kind of fun, but you — and your boss — think your time would be better spent elsewhere.
That's when you learn from your DBA that your database, SingleStoreDB, has a new feature called Code Engine — Powered by Wasm that lets you extend the database engine with user-defined functions (UDFs) in C, C++ or Rust, and other languages soon. It dawns on you that you can take the sentiment analysis code from your application and move it into the database as a Wasm UDF. That means those reports people have been asking for can be created using a pretty straightforward SQL SELECT statement. Better yet, you can teach the analysts to write the queries themselves. Then, you don't even need to hear about it!
We'll hear more of this story later!
What Is Wasm?
Wasm is short for WebAssembly. It's a machine-independent instruction format and compilation target. Key points to remember about Wasm are:
- It supports many languages. Backends exist for a set of different source languages that can generate compiled .wasm files, which are like dynamic link libraries (DLLs) containing Wasm instructions. Wasm was created to enable developers to write code that can run in a web browser, using just about any language, not just the pervasive JavaScript language. This lets them build on libraries of existing code in those other languages.
- It's fast. Compiled .wasm files can run 30x or more faster than JavaScript. In some cases, they can be within 10% of the speed of C code compiled to native machine instructions.
- It's safe. To make this happen, the Wasm community has mastered the challenge of allowing code to run safely in the browser using robust sandboxing. In the browser environment, the compiled code (whether C, C++, Rust, Go or some other language) cannot escape the sandbox, period. No starting process. No opening files. No writing network messages. No system calls of any kind.
The Wasm world began in the browser. But it has spread to the server! Wasm runtimes like Wasmtime can now be embedded in many kinds of apps, not just browsers. SingleStoreDB embeds a Wasm runtime environment, enabling you to write UDFs in C, C++ Rust and (soon) more languages.
Benefits of Wasm Extensibility in SingleStoreDB
At the beginning of our story, our application developer suffered from having to export data to the application to get a sentiment score for it (slow), and the need to write application code to do query-processing-like things (labor intensive). SingleStoreDB's Code Engine — Powered by Wasm makes these sore points go away.
Benefits of Wasm extensibility include:
- Faster performance by moving the computation to the data, instead of moving data to the applications.
- It's not necessary to write your own query processing-like logic in the application; you can rely on the SingleStoreDB SQL processor to do it for you.
Sentiment Analysis in SQL With Wasm
Let's return to our developer tasked with creating sentiment analysis reports. The first report asks to find the five comments with the highest sentiment score. Let's suppose this is the schema:
create database demo;
use demo;
create table products(id int, name varchar(70), category varchar(70));
create table comments(id int, ts datetime, user varchar(30), pid int, comment_text text);
And this is some sample data:
iinsert into products values
(1,"running shoe","sporting goods"),
(2,"soccer ball","sporting goods"),
(3,"cotton balls","cosmetics");
insert into comments values
(1, "2022-06-25 22:11:25", "joe",1,"fantastic shoe"),
(2, "2022-06-25 22:58:01", "sue",2,"ball has poor bounce"),
(3, "2022-06-25 22:59:00", "amy",2,"amazingly durable ball, and it looks great"),
(4, "2022-06-25 23:05:10","mila",3,"cotton balls are nice and fluffy -- love them!"),
(5, "2022-06-25 23:06:37","joao",3,"cotton balls were not fluffy; I don't like this brand");
Our hero realizes that if there was a function "sentiment()", then they could write a simple SQL query to get the report data — instead of writing reams of code to pull the comments into the app and implement a non-trivial top-five calculation. That looks something like this:
select id, comment_text, sentiment(comment_text) as s
from comments
order by s desc
limit 5;
And gets these results:
+------+-------------------------------------------------------+---------------------+
| id | comment_text | s |
+------+-------------------------------------------------------+---------------------+
| 4 | cotton balls are nice and fluffy -- love them! | 0.8069730414548824 |
| 3 | amazingly durable ball, and it looks great | 0.6248933269389457 |
| 1 | fantastic shoe | 0.5573704017131537 |
| 5 | cotton balls were not fluffy; I don't like this brand | 0.20746990495811898 |
| 2 | ball has poor bounce | -0.4766576055745744 |
+------+-------------------------------------------------------+---------------------+
This sentiment() function combined with the power of SingleStore Code Engine for Wasm and SingleStoreDB's distributed and parallel query performance allows this calculation to happen many times faster than if you did it in the app, while also saving development time.
And of course, once you have ability to do sentiment scoring in SQL, you can vary your reports, using conventional SQL structures intermixed with sentiment analysis. For example, if you want to calculate the top three highest-sentiment comments for sporting goods that happened on 2022-06-25, you can do this:
select c.id, c.user, p.name,
c.comment_text, sentiment(c.comment_text) as s
from comments c, products p
where p.id = c.pid
and (c.ts :> date) = "2022-06-25"
and p.category = "sporting goods"
order by s desc
limit 3;
And get this result:
+------+------+--------------+--------------------------------------------+---------------------+
| id | user | name | comment_text | s |
+------+------+--------------+--------------------------------------------+---------------------+
| 3 | amy | soccer ball | amazingly durable ball, and it looks great | 0.6248933269389457 |
| 1 | joe | running shoe | fantastic shoe | 0.5573704017131537 |
| 2 | sue | soccer ball | ball has poor bounce | -0.4766576055745744 |
+------+------+--------------+--------------------------------------------+---------------------+
See the appendix for source code for the sentiment function, and how to create a Wasm UDF for it in the database.
Customer Validation
Our customers are excited about the possibilities of running application functions in SQL to get more value from their data more easily. Abel Mascarenhas, IT Unit Manager at Millennium BCP, the largest bank in Portugal, says:
“The Code Engine for Wasm in SingleStoreDB is a catalyst for extracting value from our data faster and cheaper by leveraging our enterprise code base in real-time SQL.”
Another one of our financial customers has a trading application that stores 4K-byte packets, which are messages sent by their application, in BINARY fields of records in SingleStoreDB. They'd like to implement a function to simply convert the binary packet to text so they can pattern match against it with LIKE filters, allowing people to read the contents of the field more easily in the output of queries.
Summary
The real beauty of SingleStoreDB Code Engine — Powered by Wasm is that it brings the power of existing code, whether from libraries or your applications, into SQL. That allows you to apply the logic of this code in a parallel and distributed fashion, close to the data and benefit from all the power of SQL. What logic will you move close to your data?
Experience Code Engine — Powered by Wasm for yourself. Try SingleStoreDB free.
Appendix: Code for Sentiment Function
Source Code
The source code for the sentiment analysis UDF is as follows:
/* file examples/sentimentudf/src/lib.rs */
wit_bindgen_rust::export!("sentiment.wit");
struct Sentiment;
impl sentiment::Sentiment for Sentiment {
fn sentiment(input: String) -> f64 {
lazy_static::lazy_static! {
static ref ANALYZER:
vader_sentiment::SentimentIntensityAnalyzer<'static> =
vader_sentiment::SentimentIntensityAnalyzer::new();
}
let scores = ANALYZER.polarity_scores(input.as_str());
scores["compound"]
}
}
WIT File
Create the Wasm Interface Types (WIT) file sentimentudf/sentiment.wit with the following contents:
sentiment: func(input: string) -> float64
Building the Code
To build it, use a Cargo.toml file with the following contents:
# Cargo.toml
[package]
name = "sentiment"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
wit-bindgen-rust = { git = "https://github.com/bytecodealliance/wit-bindgen.git" }
vader_sentiment = { git = "https://github.com/ckw017/vader-sentiment-rust" }
lazy_static = "1.4.0"
[lib]
crate-type = ["cdylib"]
And run:
cargo wasi build --lib --release
Uploading the Code to Cloud Storage
Then, upload the sentiment.wasm file from target/wasm32-wasi/release/ under your VS Code folder for the sentiment UDF project.
There are many different ways to make your sentiment function accessible to Singlestore Helios. One way is to upload it to an AWS S3 bucket. As an example you could:
- Start at https://s3.console.aws.amazon.com/s3
- Create a bucket mybucket
- Upload the sentiment.wasm file
- Then navigate to the properties page for the file
- And in the upper right corner, choose "Object Actions"
- Then "Share with a presigned URL"
- Enter a number of hours to share the file, say 12, and click the button to complete the action
- Then, on the upper right corner, choose "Copy presigned URL"
- Then paste the URL into a create function statement similar to the one in the script below, between the single quotes after wasm from http
- Repeat the above steps, but for the sentiment.wit file, except paste the URL between the single quotes after the wit from http clause
use demo;
create function sentiment as
wasm from http 'https://mybucket.s3.us-west-2.amazonaws.com/sentiment.wasm?response-content-disposition=inline&X-Amz-Security-Token=IQ<redacted>b7d8'
with wit from http 'https://hansonpublic.s3.us-west-2.amazonaws.com/sentiment.wit?response-content-disposition=inline&X-Amz-Security-Token=IQ<redacted>61ec'
;
Then, run the script. Now, you can run the DDL and SELECT statements from the section Sentiment Analysis in SQL with Wasm.
Sentiment Analysis with a Wasm TVF
For another variation of sentiment analysis that uses a Wasm table-valued function (TVF), a capability still in preview, clone this github project:
https://github.com/singlestore-labs/demo-sentiment-analysis
The TVF version outputs not just an overall score, but instead a single row with four values:
- compound
- positive
- negative
- neutral
After building and uploading the sentimentable function from this project, you can use it like this:
select c.comment_text,
format(s.compound,3) cpd, format(s.positive,3) pos,
format(s.negative,3) neg, format(s.neutral,3) ntrl
from comments c, sentimentable(c.comment_text) s;
There's an implicit lateral join (cross apply) between c and the sentimentable function. Here are the results:
+-------------------------------------------------------+--------+-------+-------+-------+
| comment_text | cpd | pos | neg | ntrl |
+-------------------------------------------------------+--------+-------+-------+-------+
| fantastic shoe | 0.557 | 0.783 | 0.000 | 0.217 |
| cotton balls were not fluffy; I don't like this brand | 0.207 | 0.185 | 0.000 | 0.815 |
| amazingly durable ball, and it looks great | 0.625 | 0.406 | 0.000 | 0.594 |
| ball has poor bounce | -0.477 | 0.000 | 0.508 | 0.492 |
| cotton balls are nice and fluffy -- love them! | 0.807 | 0.510 | 0.000 | 0.490 |
+-------------------------------------------------------+--------+-------+-------+-------+
So you can do sentiment analysis with a single simple score with a UDF if that suits your needs, or get all four components of sentiment analysis with a TVF if you need them.