Working with Vector Data
Notebook
Note
This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.
Using vector embeddings has become popular recently, but getting vector data into your database can leave you with a lot of questions. This notebook shows various ways to load vectors into SingleStoreDB from Python using the Python client, SQLAlchemy, pandas, and the SQL magic commaands. It covers vectors in the form of numpy arrays or Python lists of numerics.
We'll use the following function to reset the vector data table between examples.
In [1]:
1def reset_table():2 """Reset the table for use in the examples below."""3 with s2.connect() as conn:4 with conn.cursor() as cur:5 cur.execute('DROP TABLE IF EXISTS vectors;')6 cur.execute(r'''7 CREATE TABLE vectors (8 vec_f32 BLOB9 );10 ''')
At any time, if you want to see the actual query being sent to the database, you can set the following environment variable before making the query to the server.
In [2]:
1import os2 3# os.environ['SINGLESTOREDB_DEBUG_QUERIES'] = '1'
Action Required
If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
Create a database for our examples.
In [3]:
1shared_tier_check = %sql show variables like 'is_shared_tier'2if not shared_tier_check or shared_tier_check[0][1] == 'OFF':3 %sql DROP DATABASE IF EXISTS vector_data;4 %sql CREATE DATABASE vector_data;
Action Required
Make sure to select the vector_data database from the drop-down menu at the top of this notebook. It updates the connection_url which is used by the %%sql magic command and SQLAlchemy to make connections to the selected database.
Generate numpy arrays containing vector data
The code belowe generates 1,000 rows of 10 random 32-bit float numbers in a numpy array. This data will be used in the following examples.
In [4]:
1import numpy as np
In [5]:
1vec_f32 = [np.random.rand(10).astype(np.float32) for _ in range(1000)]2vec_f32[:3]
Out [5]:
[array([0.94529617, 0.5391597 , 0.3666218 , 0.27040002, 0.33434793,
0.81942284, 0.7387169 , 0.0188459 , 0.07248586, 0.5413058 ],
dtype=float32),
array([0.39620587, 0.56646174, 0.09738464, 0.6073699 , 0.86925113,
0.40876037, 0.17535466, 0.5120548 , 0.03570552, 0.842908 ],
dtype=float32),
array([0.10918448, 0.43081337, 0.03388631, 0.55986017, 0.80183506,
0.6763027 , 0.25283858, 0.41930103, 0.7678156 , 0.13405219],
dtype=float32)]Create a Python list of float values from the numpy array
We will show how to work with both numpy arrays and Python lists in the following examples. This cell creates a list of Python lists of floats equivalent to the numpy arrays above.
In [6]:
1vec_f32_list = [list([float(y) for y in x]) for x in vec_f32]2vec_f32_list[:3]
Out [6]:
[[0.9452961683273315,
0.5391597151756287,
0.36662179231643677,
0.2704000174999237,
0.3343479335308075,
0.8194228410720825,
0.7387169003486633,
0.018845897167921066,
0.07248586416244507,
0.5413057804107666],
[0.396205872297287,
0.5664617419242859,
0.09738463908433914,
0.6073698997497559,
0.8692511320114136,
0.4087603688240051,
0.17535465955734253,
0.5120548009872437,
0.03570551797747612,
0.8429080247879028],
[0.10918448120355606,
0.43081337213516235,
0.03388631343841553,
0.5598601698875427,
0.8018350601196289,
0.6763026714324951,
0.2528385818004608,
0.41930103302001953,
0.7678155899047852,
0.13405218720436096]]Upload and downloading data to SingleStoreDB
In the following sections, we'll describe how to use the SingleStoreDB Python client, SQLAlchemy, the %%sql magic,
and pandas to upload and download vector data.
Using SingleStoreDB Python client
In [7]:
1import singlestoredb as s22 3conn = s2.connect()4cursor = conn.cursor()
Working with numpy arrays
The SingleStoreDB Python client supports numpy arrays natively. If a numpy array is passed as a parameter to a query,
it will be converted to a byte string containing the contents of the array. The data type of the numpy array is
preserved, so you need to ensure that it is the proper numpy dtype before uploading. You can change the data type
of a numpy array by using the astype method.
In [8]:
1reset_table()
Recall that vec_f32 contained numpy arrays of float32 values.
In [9]:
1vec_f32[:3]
Out [9]:
[array([0.94529617, 0.5391597 , 0.3666218 , 0.27040002, 0.33434793,
0.81942284, 0.7387169 , 0.0188459 , 0.07248586, 0.5413058 ],
dtype=float32),
array([0.39620587, 0.56646174, 0.09738464, 0.6073699 , 0.86925113,
0.40876037, 0.17535466, 0.5120548 , 0.03570552, 0.842908 ],
dtype=float32),
array([0.10918448, 0.43081337, 0.03388631, 0.55986017, 0.80183506,
0.6763027 , 0.25283858, 0.41930103, 0.7678156 , 0.13405219],
dtype=float32)]The executemany method will insert multiple rows of data in a single SQL query.
In [10]:
1cursor.executemany('INSERT INTO vectors(vec_f32) VALUES (%s)', vec_f32)
Out [10]:
1000To download the vector data from SingleStoreDB, you simple execute a SELECT statement. The data is held in
blob columns, so the result will simply contain byte strings.
In [11]:
1cursor.execute('SELECT vec_f32 FROM vectors LIMIT 5')
Out [11]:
5Since we want to use the data as numpy arrays, we can "reconstitute" the arrays as we read the data using the np.frombuffer function.
In [12]:
1out_f32 = [np.frombuffer(x[0], dtype=np.float32) for x in cursor]2out_f32
Out [12]:
[array([0.7970012 , 0.34688511, 0.14492278, 0.73332036, 0.37237272,
0.15225586, 0.26400378, 0.1706023 , 0.2066024 , 0.2986435 ],
dtype=float32),
array([0.15736586, 0.7116634 , 0.55218774, 0.8527479 , 0.8662606 ,
0.21808125, 0.19137949, 0.19946271, 0.31750116, 0.4048979 ],
dtype=float32),
array([0.329683 , 0.7214109 , 0.48456433, 0.6002015 , 0.4030805 ,
0.32164323, 0.02634622, 0.10913838, 0.13508031, 0.33974582],
dtype=float32),
array([0.35889304, 0.37261793, 0.22267127, 0.57628405, 0.10873934,
0.66360027, 0.67708856, 0.69097304, 0.8924684 , 0.07560002],
dtype=float32),
array([0.46105748, 0.24171682, 0.3059963 , 0.95824414, 0.33805165,
0.30686185, 0.89336896, 0.70329565, 0.45199844, 0.6623023 ],
dtype=float32)]Working with Python lists
It is also possible to upload Python lists without going through a numpy array using the struct package. In this method, we convert
the floats to a byte string and pass that byte string as the parameter to the INSERT statement. The possible format codes are as follows.
The little-endian indicator (<) should also be used.
f - float32
d - float64
b - int8
h - int16
l - int32
q - int64
In [13]:
1reset_table()
In [14]:
1import struct2 3# Construct the format for a vector of 10 32-bit floats, in this case it is '<10f'4fmt = '<{}f'.format(len(vec_f32_list[0]))5 6vec_f32_list_bytes = [struct.pack(fmt, *x) for x in vec_f32_list]7vec_f32_list_bytes[:3]
Out [14]:
[b'\xee\xfeq?_\x06\n?\xda\xb5\xbb>\xdfq\x8a>\xa7/\xab>\xb2\xc5Q?\x8d\x1c=?\xb6b\x9a<xs\x94=\x04\x93\n?',
b'\x7f\xdb\xca>\xa3\x03\x11?\x99q\xc7=\x98|\x1b?>\x87^?\nI\xd1>,\x903>\x06\x16\x03?\xf3?\x12=\xd2\xc8W?',
b'\x1d\x9c\xdf=\x92\x93\xdc>`\xcc\n=\xffR\x0f?\x10EM?,"-?\x0ft\x81>\xa0\xae\xd6>\x90\x8fD?\xfaD\t>']The INSERT and SELECT code is the same as for numy arrays
In [15]:
1cursor.executemany('INSERT INTO vectors(vec_f32) VALUES (%s)', vec_f32_list_bytes)
Out [15]:
1000In [16]:
1cursor.execute('SELECT vec_f32 FROM vectors LIMIT 5')
Out [16]:
5To unpack the rows as Python lists, we use the struct package again.
In [17]:
1out_f32_list = [list(struct.unpack(fmt, x[0])) for x in cursor]2out_f32_list[:3]
Out [17]:
[[0.3937252461910248,
0.5025281310081482,
0.17226243019104004,
0.7201003432273865,
0.987917423248291,
0.36919161677360535,
0.03498654067516327,
0.7478368878364563,
0.34253644943237305,
0.33940786123275757],
[0.761231005191803,
0.8932342529296875,
0.06776423007249832,
0.8769919276237488,
0.48779383301734924,
0.9544709920883179,
0.8270399570465088,
0.9150049686431885,
0.8350704908370972,
0.9739500880241394],
[0.9656015634536743,
0.4987963140010834,
0.6006644368171692,
0.000701306969858706,
0.5339081287384033,
0.22828376293182373,
0.3365790545940399,
0.2838159203529358,
0.3415278196334839,
0.7082713842391968]]Using SQLAlchemy
In order to use SingleStoreDB with SQLAlchemy, you need to install the sqlalchemy-singlestoredb dialect as follows.
pip install sqlalchemy-singlestoredb
In [18]:
1import sqlalchemy as sa2 3eng = sa.create_engine(connection_url)4conn = eng.connect()
The SQLAlchemy method works much like the SingleStoreDB method. However, SQLAlchemy (v2+) requires parameters to be
in a dictionary, and the substitution syntax is of the form :var_name where 'var_name' in the key in the dictionary.
Working with numpy arrays
In [19]:
1reset_table()
SQLAlchemy requires you to construct the query as a sa.text object. Parameters for inserting multple
rows are in a list of dictionaries.
In [20]:
1query = sa.text('INSERT INTO vectors(vec_f32) VALUES (:vec_f32)')2conn.execute(query, [dict(vec_f32=x) for x in vec_f32])
Out [20]:
<sqlalchemy.engine.cursor.CursorResult at 0x7ed3e54ea120>Selecting the data works much as before as well.
In [21]:
1result = conn.execute(sa.text('SELECT vec_f32 FROM vectors LIMIT 5'))
We can use the np.frombuffer function again to convert the byte strings to numpy arrays.
In [22]:
1out_f32 = [np.frombuffer(x[0], dtype=np.float32) for x in result]2out_f32
Out [22]:
[array([0.06707381, 0.3336899 , 0.23638362, 0.54146034, 0.21330866,
0.57814604, 0.7436944 , 0.21778256, 0.32921487, 0.18143076],
dtype=float32),
array([0.17625922, 0.6122456 , 0.65093136, 0.680956 , 0.99456173,
0.785619 , 0.8397423 , 0.34446132, 0.9549833 , 0.53008443],
dtype=float32),
array([0.12105445, 0.27007556, 0.33191404, 0.35239697, 0.104354 ,
0.560923 , 0.95614606, 0.6793355 , 0.12789273, 0.01870769],
dtype=float32),
array([0.05535996, 0.13312466, 0.9434161 , 0.52270526, 0.24034844,
0.73964477, 0.8723515 , 0.02157358, 0.14537902, 0.8052284 ],
dtype=float32),
array([0.17335513, 0.87024 , 0.11818643, 0.40915504, 0.65390265,
0.519701 , 0.1028851 , 0.8442223 , 0.64491796, 0.31468135],
dtype=float32)]Working with Python lists
To upload Python lists of values, you use the struct package to construct the byte strings as described in the
"Uploading Python Lists" in the previous section. The rest of the code here stays the same with the exception of
replacing vec_f32 with vec_f32_list_bytes as the query parameter for the INSERT query.
In [23]:
1reset_table()
Recall that we create a list of bytes (vector) objects in the previous example. This list of vectors
can be passed to the INSERT as well as numpy arrays.
In [24]:
1vec_f32_list_bytes[:3]
Out [24]:
[b'\xee\xfeq?_\x06\n?\xda\xb5\xbb>\xdfq\x8a>\xa7/\xab>\xb2\xc5Q?\x8d\x1c=?\xb6b\x9a<xs\x94=\x04\x93\n?',
b'\x7f\xdb\xca>\xa3\x03\x11?\x99q\xc7=\x98|\x1b?>\x87^?\nI\xd1>,\x903>\x06\x16\x03?\xf3?\x12=\xd2\xc8W?',
b'\x1d\x9c\xdf=\x92\x93\xdc>`\xcc\n=\xffR\x0f?\x10EM?,"-?\x0ft\x81>\xa0\xae\xd6>\x90\x8fD?\xfaD\t>']In [25]:
1query = sa.text('INSERT INTO vectors(vec_f32) VALUES (:vec_f32)')2conn.execute(query, [dict(vec_f32=x) for x in vec_f32_list_bytes])
Out [25]:
<sqlalchemy.engine.cursor.CursorResult at 0x7ed3e54e9e10>In [26]:
1result = conn.execute(sa.text('SELECT vec_f32 FROM vectors LIMIT 5'))
Unpacking the Python lists works as before as well.
In [27]:
1out_f32_list = [list(struct.unpack(fmt, x[0])) for x in result]2out_f32_list[:3]
Out [27]:
[[0.6067176461219788,
0.2106485515832901,
0.6345869302749634,
0.6352039575576782,
0.829525887966156,
0.2823314368724823,
0.017225714400410652,
0.22034095227718353,
0.24362443387508392,
0.7712428569793701],
[0.26649677753448486,
0.6021978259086609,
0.8979067206382751,
0.9429398775100708,
0.589701771736145,
0.24339258670806885,
0.3752290904521942,
0.34352484345436096,
0.647399365901947,
0.19694264233112335],
[0.6997039914131165,
0.08066725730895996,
0.19695895910263062,
0.08963707834482193,
0.3289657235145569,
0.8245747089385986,
0.782729983329773,
0.43013912439346313,
0.765410304069519,
0.8552709817886353]]Using pandas
The pandas package has utilities for working with databases. The two primary methods / functions are
DataFrame.to_sql which uploads DataFrame data to a table, and pd.read_sql which downloads
data from a table.
In [28]:
1reset_table()
In [29]:
1import pandas as pd
First, we'll create a pandas DataFrame with our numpy arrays.
Working with numpy arrays
In [30]:
1df = pd.DataFrame(dict(vec_f32=pd.Series(vec_f32)))2df.head()
Out [30]:
| vec_f32 | |
|---|---|
| 0 | [0.94529617, 0.5391597, 0.3666218, 0.27040002,... |
| 1 | [0.39620587, 0.56646174, 0.09738464, 0.6073699... |
| 2 | [0.10918448, 0.43081337, 0.033886313, 0.559860... |
| 3 | [0.027094776, 0.03226529, 0.49422556, 0.171387... |
| 4 | [0.65606296, 0.022113293, 0.57438064, 0.867151... |
We can use the to_sql method of the DataFrame to upload the data. Notice that we are using the SQLAlchemy
connection we created in the previous section as the con parameter.
In [31]:
1df.to_sql('vectors', con=conn, if_exists='append', index=False)
Out [31]:
1000To read the data, we use the read_sql function. As before, we are getting byte strings back that will need to be
converted into numpy arrays.
In [32]:
1out_df = pd.read_sql('vectors', con=conn)2out_df.head(3)
Out [32]:
| vec_f32 | |
|---|---|
| 0 | b"\xc4.\x81<\x8a\x0fW?\x11~\xcd>_\x82@?Vq\x05?... |
| 1 | b'rp\xdc>U\xd7\x89>\xe6BC?\xe7\xcd\xfb>P\xe4\x... |
| 2 | b"\xaf\x10,?\xc9\x8c\\?\xa3\xccQ>c\xd0'?\xe2y\... |
We apply the np.frombuffer function to each element in the vec_f32 column to reconstruct the numpy array.
In [33]:
1out_df['vec_f32'] = out_df['vec_f32'].apply(lambda x: np.frombuffer(x, dtype=np.float32))
In [34]:
1out_df.head(3)
Out [34]:
| vec_f32 | |
|---|---|
| 0 | [0.01576937, 0.84008086, 0.40135244, 0.7519893... |
| 1 | [0.4305454, 0.26922098, 0.76273954, 0.4918053,... |
| 2 | [0.6721296, 0.8615232, 0.20488219, 0.6555235, ... |
Working with Python lists
Because Python lists are not typed arrays like numpy arrays, we have to convert them to bytes before uploading them.
In [35]:
1reset_table()
Construct a DataFrame using Python lists as the data.
In [36]:
1df = pd.DataFrame(dict(vec_f32=vec_f32_list))2df.head(3)
Out [36]:
| vec_f32 | |
|---|---|
| 0 | [0.9452961683273315, 0.5391597151756287, 0.366... |
| 1 | [0.396205872297287, 0.5664617419242859, 0.0973... |
| 2 | [0.10918448120355606, 0.43081337213516235, 0.0... |
Note that we are using our fmt value from a previous section to convert the Python lists
to bytes using struct.pack.
In [37]:
1fmt
Out [37]:
'<10f'In [38]:
1df['vec_f32'] = df['vec_f32'].apply(lambda x: struct.pack(fmt, *x))
In [39]:
1df['vec_f32'].head(3)
Out [39]:
0 b'\xee\xfeq?_\x06\n?\xda\xb5\xbb>\xdfq\x8a>\xa...
1 b'\x7f\xdb\xca>\xa3\x03\x11?\x99q\xc7=\x98|\x1...
2 b'\x1d\x9c\xdf=\x92\x93\xdc>`\xcc\n=\xffR\x0f?...
Name: vec_f32, dtype: objectUse the to_sql method to upload the DataFrame.
In [40]:
1df.to_sql('vectors', con=conn, if_exists='append', index=False)
Out [40]:
1000In [41]:
1out_df = pd.read_sql('vectors', con=conn)2out_df.head(3)
Out [41]:
| vec_f32 | |
|---|---|
| 0 | b'>kS?;\xec\x8b>0\xe5\x19?\r\xddF?\xf2\x044?u\... |
| 1 | b'\xedJ[=\xbfq\xce=G#\xa4<\xcbV\xe3=\xeb:;?\xa... |
| 2 | b'\x0e\x08n>\xe8\xb2\x98>\x10\x133>\xd4\xf7\x1... |
We now have to convert the byte strings back to Python lists.
In [42]:
1out_df['vec_f32'] = out_df['vec_f32'].apply(lambda x: list(struct.unpack(fmt, x)))
In [43]:
1out_df.head(3)
Out [43]:
| vec_f32 | |
|---|---|
| 0 | [0.8258551359176636, 0.2732866704463959, 0.601... |
| 1 | [0.0535382516682148, 0.10080289095640182, 0.02... |
| 2 | [0.2324526011943817, 0.29823994636535645, 0.17... |
Using the %%sql / %sql magic commands
While the SQL magic commands are convenient for invoking basic SQL commands, they aren't quite as good for complex queries that insert data. The primary issue is that you must construct the query as a string and ensure that all of your data is properly escaped. We'll demonstrate some basics here, but the methods described in the previous sections are likely to work better.
In [44]:
1reset_table()
Working with numpy arrays or Python lists
The SQL magic commands do not do any of the automatic conversions of data to query parameters, so this must be done manually before creating the query. This is done the same way whether the source is numpy arrays or Python lists. In either case, you must convert the objects to byte strings as we have in the previous sections, then convert that byte string into a hex literal that can be used in the query.
In [45]:
1# Convert an element of the numpy array to a hex string2vec_f32[0].tobytes().hex()
Out [45]:
'eefe713f5f060a3fdab5bb3edf718a3ea72fab3eb2c5513f8d1c3d3fb6629a3c7873943d04930a3f'In [46]:
1# Convert an element of the Python list to a hex string2struct.pack(fmt, *vec_f32_list[0]).hex()
Out [46]:
'eefe713f5f060a3fdab5bb3edf718a3ea72fab3eb2c5513f8d1c3d3fb6629a3c7873943d04930a3f'To construct the query string for the %%sql command, we need to build the entire list of values to insert
in a separate step. We'll insert the X at the beginning of the string to indicate a hex literal to
SingleStoreDB. We'll also add the parentheses around the value for inserting multiple rows of data using
the INSERT statement.
In [47]:
1params = ["(X'{}')".format(x.tobytes().hex()) for x in vec_f32]2params[:3]
Out [47]:
["(X'eefe713f5f060a3fdab5bb3edf718a3ea72fab3eb2c5513f8d1c3d3fb6629a3c7873943d04930a3f')",
"(X'7fdbca3ea303113f9971c73d987c1b3f3e875e3f0a49d13e2c90333e0616033ff33f123dd2c8573f')",
"(X'1d9cdf3d9293dc3e60cc0a3dff520f3f10454d3f2c222d3f0f74813ea0aed63e908f443ffa44093e')"]In [48]:
1%%sql2INSERT INTO vectors(vec_f32) VALUES {{ ','.join(params) }}
Out [48]:
We can now select the data.
In [49]:
1%%sql out <<2SELECT * FROM vectors LIMIT 5
In [50]:
1out
Out [50]:
| vec_f32 |
|---|
| b'\xfe\xd7@?\xecr\xec>\xbdW1?\xbb_\xcb>\x88\xb9\xf4>\x04G\xaa>#d\xf3=\x07\xb5\xcb>\xcd\xd7&?{Es?' |
| b'\x0e\x08n>\xe8\xb2\x98>\x10\x133>\xd4\xf7\x1b?Q$-?t\x11\xfa>,}S?\xa8\x14k;\x1a\xf8h>\xf8\xbc-?' |
| b'1~\x01>\xb7>/?\xd3\x10\x1f?Z\xcc\x05=>X\xa8>\x01\r\x1b>Q\xf1\xc3>/-\xab=\xea\x9c6?\xbc\xd6|?' |
| b'\xe3\x02\xba=zz)?<\xa33?\x15\x03\x14>\x99\x97\x19?(\x13#?!\xe9\xe9>\xba#i?\xdc\xe1y?\xe1\xc1)?' |
| b'\x8c\x86\x12?\xc3+h?\xba?=<\xa45\xda>\xb5\xf2\r?\xdb\xa8\x08?]\x84&?\xf2\xd1s?\xec\xce\xab>\x10\x19k?' |
At this point, there is nothing we can do with SQL magic commands to convert the data back into numpy arrays or Python lists. We need to drop to Python for that.
In [51]:
1out_df = pd.DataFrame(out)
In [52]:
1out_df['vec_f32'] = out_df['vec_f32'].apply(lambda x: np.frombuffer(x, dtype=np.float32))
In [53]:
1out_df.head(3)
Out [53]:
| vec_f32 | |
|---|---|
| 0 | [0.7532958, 0.46181428, 0.69274503, 0.39721474... |
| 1 | [0.2324526, 0.29823995, 0.1748774, 0.6092503, ... |
| 2 | [0.12645794, 0.6845507, 0.62135047, 0.03266558... |
Using JSON
It is also possible to use JSON to create vectors, however, this method require serializing and deserializing JSON on either
end which isn't quite a efficient as the techniques above. It also requires using the JSON_ARRAY_PACK and JSON_ARRAY_UNPACK
functions in your queries to go back and forth between the vector bytes and JSON. Here is an example of inserting the
Python list of floats.
In [54]:
1import json
In [55]:
1params = ['(JSON_ARRAY_PACK("{}"))'.format(json.dumps(x)) for x in vec_f32_list]2params[:3]
Out [55]:
['(JSON_ARRAY_PACK("[0.9452961683273315, 0.5391597151756287, 0.36662179231643677, 0.2704000174999237, 0.3343479335308075, 0.8194228410720825, 0.7387169003486633, 0.018845897167921066, 0.07248586416244507, 0.5413057804107666]"))',
'(JSON_ARRAY_PACK("[0.396205872297287, 0.5664617419242859, 0.09738463908433914, 0.6073698997497559, 0.8692511320114136, 0.4087603688240051, 0.17535465955734253, 0.5120548009872437, 0.03570551797747612, 0.8429080247879028]"))',
'(JSON_ARRAY_PACK("[0.10918448120355606, 0.43081337213516235, 0.03388631343841553, 0.5598601698875427, 0.8018350601196289, 0.6763026714324951, 0.2528385818004608, 0.41930103302001953, 0.7678155899047852, 0.13405218720436096]"))']In [56]:
1%%sql2INSERT INTO vectors(vec_f32) VALUES {{ ','.join(params) }}
Out [56]:
If you use the JSON_ARRAY_UNPACK function in your SELECT statement, you can download the data as JSON.
In [57]:
1%%sql out <<2SELECT JSON_ARRAY_UNPACK(vec_f32) AS 'vec_f32' FROM vectors LIMIT 5
In [58]:
1out = pd.DataFrame(out)2out
Out [58]:
| vec_f32 | |
|---|---|
| 0 | [0.177021876, 0.717939079, 0.825487614, 0.7764... |
| 1 | [0.749338746, 0.589595854, 0.704872251, 0.9270... |
| 2 | [0.971682549, 0.574555218, 0.174982488, 0.4692... |
| 3 | [0.814588428, 0.773147047, 0.970053494, 0.9038... |
| 4 | [0.247024894, 0.828292727, 0.599695325, 0.4499... |
In [59]:
1out['vec_f32'][0]
Out [59]:
[0.177021876,
0.717939079,
0.825487614,
0.77646929,
0.137723535,
0.358667195,
0.41495508,
0.027805429,
0.291372836,
0.413403481]Notice that since the data type of the column in the SELECT is JSON, it automatically gets converted to a Python list
in the client.
In [60]:
1type(out['vec_f32'][0])
Out [60]:
listConclusion
As you can see, there are various interfaces available for uploading and downloading vector data. Depending on which Python framework you are using and what format your data is in, you can pick and choose which methods work for your use-case.
Action Required
If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
In [61]:
1shared_tier_check = %sql show variables like 'is_shared_tier'2if not shared_tier_check or shared_tier_check[0][1] == 'OFF':3 %sql DROP DATABASE IF EXISTS vector_data;

Details
About this Template
This example demonstrates the various ways to upload and download vector data from SingleStoreDB using Python.
This Notebook can be run in Shared Tier, Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.
See Notebook in action
Launch this notebook in SingleStore and start executing queries instantly.