New

Singlestore Now 2024 Raffle

Notebook


SingleStore Notebooks

Singlestore Now 2024 Raffle

Note

This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.

The data set used in this competition/demo contains some E-commerce data revolving around customers and products that they have purchased. In this notebook, we will run a few queries using SingleStore Kai which will allow us to migrate MongoDB data and run MongoDB queries directly through SingleStore. To create your entry for the raffle, please open and complete the following form: https://forms.gle/n8KjTpJgPL29wFHV9

If you have any issues while completing the form, please reach out to a SingleStore team member at the event.

Install libraries and import modules

First, we will need to import the necessary dependencies into our notebook environment. This includes some python libraries needed to run our queries.

In [1]:

!pip install pymongo pandas ipywidgets --quiet

To ensure that we have a database we can use, we will then make sure that a database exists. If it doesn't we will have the notebook create one for us.

In [2]:

shared_tier_check = %sql show variables like 'is_shared_tier'
if shared_tier_check and shared_tier_check[0][1] == 'ON':
current_database = %sql SELECT DATABASE() as CurrentDatabase
database_to_use = current_database[0][0]
else:
database_to_use = "new_transactions"
%sql CREATE DATABASE {{database_to_use}}

Next, let's run the code that will actually import the needed dependencies, including pymongo, that will be used to connect to SingleStore and our Mongo instance where the initial data is stored.

In [3]:

import os
import time
import numpy as np
import pandas as pd
import pymongo
from pymongo import MongoClient

Connect to Atlas and SingleStore Kai endpoints

Next, we will connect to the MongoDB Atlas instance using a Mongo client. We will need to connect to this instance to get our initial data, currently stored in Mongo.

In [4]:

# No need to edit anything
myclientmongodb = pymongo.MongoClient("mongodb+srv://mongo_sample_reader:SingleStoreRocks27017@cluster1.tfutgo0.mongodb.net/?retryWrites=true&w=majority")
mydbmongodb = myclientmongodb["new_transactions"]
mongoitems = mydbmongodb["items"]
mongocusts = mydbmongodb["custs"]
mongotxs = mydbmongodb["txs"]

Then, we will need to connect to the SingleStore Kai API which will allow us to import and access the Mongo data we will move over from Mongo Atlas.

In [5]:

db_to_use = database_to_use
s2clientmongodb = pymongo.MongoClient(connection_url_kai)
s2dbmongodb = s2clientmongodb[db_to_use]
s2mongoitems = s2dbmongodb["items"]
s2mongocusts = s2dbmongodb["custs"]
s2mongotxs = s2dbmongodb["txs"]

Copy Atlas collections into SingleStore Kai

As our next step, we need to get our MongoDB data hosted in Atlas over to SingleStore. For this, we will run the following code that will then replicate the selected Mongo collections into our SingleStore instance. This will make the MongoDB data available in SingleStore, allowing us to migrate away from MongoDB and to perform all of our data storage and queries in a single database instead of having multiple data silos.

In [6]:

mongocollections = [mongoitems, mongocusts, mongotxs]
for mongo_collection in mongocollections:
df = pd.DataFrame(list(mongo_collection.find())).reset_index(drop=True)
data_dict = df.to_dict(orient='records')
s2mongo_collection = s2dbmongodb[mongo_collection.name]
s2mongo_collection.insert_many(data_dict)

QUERY 1: Total quantity of products sold across all products

Our first query on the newly migrated data will be to retrieve the total quanitity of products across every product within our dataset. As you'll see, even though we are running in SingleStore, we can still use Mongo query syntax using SingleStore Kai.

In [7]:

num_iterations = 10
mongo_times = []
# Updated pipeline for total quantity of products sold across all products
pipeline = [
{"$group": {"_id": None, "totalQuantity": {"$sum": "$item.quantity"}}}
]
# Simulating same for s2mongoitems
s2_times = []
for i in range(num_iterations):
s2_start_time = time.time()
s2_result = s2mongoitems.aggregate(pipeline)
s2_stop_time = time.time()
s2_times.append(s2_stop_time - s2_start_time)
# Retrieving total quantity from the result
total_quantity = next(s2_result)["totalQuantity"] if s2_result else 0
# Returning the numeric values of total quantity sold
print("Total Product Quantity Sold is",total_quantity)

ACTION ITEM!

Take the output from this query and put it into the ANSWER NUMBER 1 field in the Google Form.

QUERY 2: Top selling Product

Our next query will be to find the top selling product within our data. Once again, we are issuing a Mongo query against our SingleStore instance. If we had an application integrated with MongoDB but wanted to migrate to SingleStore, we could do so without having to rewrite the queries within our application!

In [8]:

# Updated pipeline to return the #1 selling product based on total quantity sold
pipeline = [
{"$group": {
"_id": "$item.name", # Group by product name
"total_quantity_sold": {"$sum": "$item.quantity"} # Sum of quantities sold
}},
{"$sort": {"total_quantity_sold": -1}}, # Sort by total quantity sold in descending order
{"$limit": 1} # Limit to the top product
]
s2_result = s2mongoitems.aggregate(pipeline)
# Retrieve the name of the #1 selling product
top_product = next(s2_result, None)
if top_product:
product_name = top_product["_id"]
total_quantity_sold = top_product["total_quantity_sold"]
else:
product_name = "No Data"
total_quantity_sold = 0
# Return the #1 selling product and its total quantity sold
print("Top-Selling product : ",product_name,"With total quantity sold ",total_quantity_sold)

ACTION ITEM!

Take the output from this query and put it into the ANSWER NUMBER 2 field in the Google Form.

QUERY 3: Top selling Location

In [9]:

# Updated pipeline to exclude "Online" and get top-selling location
pipeline = [
{"$lookup":
{
"from": "custs",
"localField": "customer.email",
"foreignField": "email",
"as": "transaction_links",
}
},
{"$match": {"store_location": {"$ne": "Online"}}}, # Exclude Online location
{"$limit": 100},
{"$group":
{
"_id": {"location": "$store_location"},
"count": {"$sum": 1}
}
},
{"$sort": {"count": -1}},
{"$limit": 1}
]
s2_result = s2mongotxs.aggregate(pipeline)
# Retrieve the top-selling location excluding "Online"
top_location = next(s2_result, None)
if top_location:
location_name = top_location["_id"]["location"]
transaction_count = top_location["count"]
else:
location_name = "No Data"
transaction_count = 0
# Return the top-selling location and transaction count
print("Top-Selling Location : ",location_name,"With transaction of Count ",transaction_count)

ACTION ITEM!

Take the output from this query and put it into the ANSWER NUMBER 3 field in the Google Form.

Clean up and submit!

Make sure to click submit on your Google Form to make sure you've been entered into the SingleStore NOW 2024 raffle!

Additionally, if you'd like to clean up your instance, you can run the statement below. To learn more about SingleStore, please connect with one of our SingleStore reps here at the conference!

Action Required

If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.

In [10]:

shared_tier_check = %sql show variables like 'is_shared_tier'
if not shared_tier_check or shared_tier_check[0][1] == 'OFF':
%sql DROP DATABASE IF EXISTS new_transactions;

Details


About this Template

"Explore the power of SingleStore in this interactive notebook by creating an account, loading data, and running queries for a chance to win the SignleStore Now 2024 Raffle!"

Notebook Icon

This Notebook can be run in Shared Tier, Standard and Enterprise deployments.

Tags

mongoembeddingsvectorgenaikaistarter

License

This Notebook has been released under the Apache 2.0 open source license.