Working with Vector Data
Notebook
Note
This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to Start using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.
Using vector embeddings has become popular recently, but getting vector data into your database can leave you with a lot of questions. This notebook shows various ways to load vectors into SingleStoreDB from Python using the Python client, SQLAlchemy, pandas, and the SQL magic commaands. It covers vectors in the form of numpy arrays or Python lists of numerics.
We'll use the following function to reset the vector data table between examples.
In [1]:
def reset_table():"""Reset the table for use in the examples below."""with s2.connect() as conn:with conn.cursor() as cur:cur.execute('DROP TABLE IF EXISTS vectors;')cur.execute(r'''CREATE TABLE vectors (vec_f32 BLOB);''')
At any time, if you want to see the actual query being sent to the database, you can set the following environment variable before making the query to the server.
In [2]:
import os# os.environ['SINGLESTOREDB_DEBUG_QUERIES'] = '1'
Action Required
If you have a Free Starter Workspace deployed already, select the database from drop-down menu at the top of this notebook. It updates the connection_url to connect to that database.
Create a database for our examples.
In [3]:
shared_tier_check = %sql show variables like 'is_shared_tier'if not shared_tier_check or shared_tier_check[0][1] == 'OFF':%sql DROP DATABASE IF EXISTS vector_data;%sql CREATE DATABASE vector_data;
Action Required
Make sure to select the vector_data database from the drop-down menu at the top of this notebook. It updates the connection_url which is used by the %%sql magic command and SQLAlchemy to make connections to the selected database.
Generate numpy arrays containing vector data
The code belowe generates 1,000 rows of 10 random 32-bit float numbers in a numpy array. This data will be used in the following examples.
In [4]:
import numpy as np
In [5]:
vec_f32 = [np.random.rand(10).astype(np.float32) for _ in range(1000)]vec_f32[:3]
Out [5]:
[array([0.94529617, 0.5391597 , 0.3666218 , 0.27040002, 0.33434793,
0.81942284, 0.7387169 , 0.0188459 , 0.07248586, 0.5413058 ],
dtype=float32),
array([0.39620587, 0.56646174, 0.09738464, 0.6073699 , 0.86925113,
0.40876037, 0.17535466, 0.5120548 , 0.03570552, 0.842908 ],
dtype=float32),
array([0.10918448, 0.43081337, 0.03388631, 0.55986017, 0.80183506,
0.6763027 , 0.25283858, 0.41930103, 0.7678156 , 0.13405219],
dtype=float32)]
Create a Python list of float values from the numpy array
We will show how to work with both numpy arrays and Python lists in the following examples. This cell creates a list of Python lists of floats equivalent to the numpy arrays above.
In [6]:
vec_f32_list = [list([float(y) for y in x]) for x in vec_f32]vec_f32_list[:3]
Out [6]:
[[0.9452961683273315,
0.5391597151756287,
0.36662179231643677,
0.2704000174999237,
0.3343479335308075,
0.8194228410720825,
0.7387169003486633,
0.018845897167921066,
0.07248586416244507,
0.5413057804107666],
[0.396205872297287,
0.5664617419242859,
0.09738463908433914,
0.6073698997497559,
0.8692511320114136,
0.4087603688240051,
0.17535465955734253,
0.5120548009872437,
0.03570551797747612,
0.8429080247879028],
[0.10918448120355606,
0.43081337213516235,
0.03388631343841553,
0.5598601698875427,
0.8018350601196289,
0.6763026714324951,
0.2528385818004608,
0.41930103302001953,
0.7678155899047852,
0.13405218720436096]]
Upload and downloading data to SingleStoreDB
In the following sections, we'll describe how to use the SingleStoreDB Python client, SQLAlchemy, the %%sql
magic,
and pandas to upload and download vector data.
Using SingleStoreDB Python client
In [7]:
import singlestoredb as s2conn = s2.connect()cursor = conn.cursor()
Working with numpy arrays
The SingleStoreDB Python client supports numpy arrays natively. If a numpy array is passed as a parameter to a query,
it will be converted to a byte string containing the contents of the array. The data type of the numpy array is
preserved, so you need to ensure that it is the proper numpy dtype before uploading. You can change the data type
of a numpy array by using the astype
method.
In [8]:
reset_table()
Recall that vec_f32
contained numpy arrays of float32 values.
In [9]:
vec_f32[:3]
Out [9]:
[array([0.94529617, 0.5391597 , 0.3666218 , 0.27040002, 0.33434793,
0.81942284, 0.7387169 , 0.0188459 , 0.07248586, 0.5413058 ],
dtype=float32),
array([0.39620587, 0.56646174, 0.09738464, 0.6073699 , 0.86925113,
0.40876037, 0.17535466, 0.5120548 , 0.03570552, 0.842908 ],
dtype=float32),
array([0.10918448, 0.43081337, 0.03388631, 0.55986017, 0.80183506,
0.6763027 , 0.25283858, 0.41930103, 0.7678156 , 0.13405219],
dtype=float32)]
The executemany
method will insert multiple rows of data in a single SQL query.
In [10]:
cursor.executemany('INSERT INTO vectors(vec_f32) VALUES (%s)', vec_f32)
Out [10]:
1000
To download the vector data from SingleStoreDB, you simple execute a SELECT
statement. The data is held in
blob columns, so the result will simply contain byte strings.
In [11]:
cursor.execute('SELECT vec_f32 FROM vectors LIMIT 5')
Out [11]:
5
Since we want to use the data as numpy arrays, we can "reconstitute" the arrays as we read the data using the np.frombuffer
function.
In [12]:
out_f32 = [np.frombuffer(x[0], dtype=np.float32) for x in cursor]out_f32
Out [12]:
[array([0.7970012 , 0.34688511, 0.14492278, 0.73332036, 0.37237272,
0.15225586, 0.26400378, 0.1706023 , 0.2066024 , 0.2986435 ],
dtype=float32),
array([0.15736586, 0.7116634 , 0.55218774, 0.8527479 , 0.8662606 ,
0.21808125, 0.19137949, 0.19946271, 0.31750116, 0.4048979 ],
dtype=float32),
array([0.329683 , 0.7214109 , 0.48456433, 0.6002015 , 0.4030805 ,
0.32164323, 0.02634622, 0.10913838, 0.13508031, 0.33974582],
dtype=float32),
array([0.35889304, 0.37261793, 0.22267127, 0.57628405, 0.10873934,
0.66360027, 0.67708856, 0.69097304, 0.8924684 , 0.07560002],
dtype=float32),
array([0.46105748, 0.24171682, 0.3059963 , 0.95824414, 0.33805165,
0.30686185, 0.89336896, 0.70329565, 0.45199844, 0.6623023 ],
dtype=float32)]
Working with Python lists
It is also possible to upload Python lists without going through a numpy array using the struct package. In this method, we convert
the floats to a byte string and pass that byte string as the parameter to the INSERT
statement. The possible format codes are as follows.
The little-endian indicator (<
) should also be used.
f - float32
d - float64
b - int8
h - int16
l - int32
q - int64
In [13]:
reset_table()
In [14]:
import struct# Construct the format for a vector of 10 32-bit floats, in this case it is '<10f'fmt = '<{}f'.format(len(vec_f32_list[0]))vec_f32_list_bytes = [struct.pack(fmt, *x) for x in vec_f32_list]vec_f32_list_bytes[:3]
Out [14]:
[b'\xee\xfeq?_\x06\n?\xda\xb5\xbb>\xdfq\x8a>\xa7/\xab>\xb2\xc5Q?\x8d\x1c=?\xb6b\x9a<xs\x94=\x04\x93\n?',
b'\x7f\xdb\xca>\xa3\x03\x11?\x99q\xc7=\x98|\x1b?>\x87^?\nI\xd1>,\x903>\x06\x16\x03?\xf3?\x12=\xd2\xc8W?',
b'\x1d\x9c\xdf=\x92\x93\xdc>`\xcc\n=\xffR\x0f?\x10EM?,"-?\x0ft\x81>\xa0\xae\xd6>\x90\x8fD?\xfaD\t>']
The INSERT
and SELECT
code is the same as for numy arrays
In [15]:
cursor.executemany('INSERT INTO vectors(vec_f32) VALUES (%s)', vec_f32_list_bytes)
Out [15]:
1000
In [16]:
cursor.execute('SELECT vec_f32 FROM vectors LIMIT 5')
Out [16]:
5
To unpack the rows as Python lists, we use the struct
package again.
In [17]:
out_f32_list = [list(struct.unpack(fmt, x[0])) for x in cursor]out_f32_list[:3]
Out [17]:
[[0.3937252461910248,
0.5025281310081482,
0.17226243019104004,
0.7201003432273865,
0.987917423248291,
0.36919161677360535,
0.03498654067516327,
0.7478368878364563,
0.34253644943237305,
0.33940786123275757],
[0.761231005191803,
0.8932342529296875,
0.06776423007249832,
0.8769919276237488,
0.48779383301734924,
0.9544709920883179,
0.8270399570465088,
0.9150049686431885,
0.8350704908370972,
0.9739500880241394],
[0.9656015634536743,
0.4987963140010834,
0.6006644368171692,
0.000701306969858706,
0.5339081287384033,
0.22828376293182373,
0.3365790545940399,
0.2838159203529358,
0.3415278196334839,
0.7082713842391968]]
Using SQLAlchemy
In order to use SingleStoreDB with SQLAlchemy, you need to install the sqlalchemy-singlestoredb
dialect as follows.
pip install sqlalchemy-singlestoredb
In [18]:
import sqlalchemy as saeng = sa.create_engine(connection_url)conn = eng.connect()
The SQLAlchemy method works much like the SingleStoreDB method. However, SQLAlchemy (v2+) requires parameters to be
in a dictionary, and the substitution syntax is of the form :var_name
where 'var_name' in the key in the dictionary.
Working with numpy arrays
In [19]:
reset_table()
SQLAlchemy requires you to construct the query as a sa.text
object. Parameters for inserting multple
rows are in a list of dictionaries.
In [20]:
query = sa.text('INSERT INTO vectors(vec_f32) VALUES (:vec_f32)')conn.execute(query, [dict(vec_f32=x) for x in vec_f32])
Out [20]:
<sqlalchemy.engine.cursor.CursorResult at 0x7ed3e54ea120>
Selecting the data works much as before as well.
In [21]:
result = conn.execute(sa.text('SELECT vec_f32 FROM vectors LIMIT 5'))
We can use the np.frombuffer
function again to convert the byte strings to numpy arrays.
In [22]:
out_f32 = [np.frombuffer(x[0], dtype=np.float32) for x in result]out_f32
Out [22]:
[array([0.06707381, 0.3336899 , 0.23638362, 0.54146034, 0.21330866,
0.57814604, 0.7436944 , 0.21778256, 0.32921487, 0.18143076],
dtype=float32),
array([0.17625922, 0.6122456 , 0.65093136, 0.680956 , 0.99456173,
0.785619 , 0.8397423 , 0.34446132, 0.9549833 , 0.53008443],
dtype=float32),
array([0.12105445, 0.27007556, 0.33191404, 0.35239697, 0.104354 ,
0.560923 , 0.95614606, 0.6793355 , 0.12789273, 0.01870769],
dtype=float32),
array([0.05535996, 0.13312466, 0.9434161 , 0.52270526, 0.24034844,
0.73964477, 0.8723515 , 0.02157358, 0.14537902, 0.8052284 ],
dtype=float32),
array([0.17335513, 0.87024 , 0.11818643, 0.40915504, 0.65390265,
0.519701 , 0.1028851 , 0.8442223 , 0.64491796, 0.31468135],
dtype=float32)]
Working with Python lists
To upload Python lists of values, you use the struct
package to construct the byte strings as described in the
"Uploading Python Lists" in the previous section. The rest of the code here stays the same with the exception of
replacing vec_f32
with vec_f32_list_bytes
as the query parameter for the INSERT
query.
In [23]:
reset_table()
Recall that we create a list of bytes (vector) objects in the previous example. This list of vectors
can be passed to the INSERT
as well as numpy arrays.
In [24]:
vec_f32_list_bytes[:3]
Out [24]:
[b'\xee\xfeq?_\x06\n?\xda\xb5\xbb>\xdfq\x8a>\xa7/\xab>\xb2\xc5Q?\x8d\x1c=?\xb6b\x9a<xs\x94=\x04\x93\n?',
b'\x7f\xdb\xca>\xa3\x03\x11?\x99q\xc7=\x98|\x1b?>\x87^?\nI\xd1>,\x903>\x06\x16\x03?\xf3?\x12=\xd2\xc8W?',
b'\x1d\x9c\xdf=\x92\x93\xdc>`\xcc\n=\xffR\x0f?\x10EM?,"-?\x0ft\x81>\xa0\xae\xd6>\x90\x8fD?\xfaD\t>']
In [25]:
query = sa.text('INSERT INTO vectors(vec_f32) VALUES (:vec_f32)')conn.execute(query, [dict(vec_f32=x) for x in vec_f32_list_bytes])
Out [25]:
<sqlalchemy.engine.cursor.CursorResult at 0x7ed3e54e9e10>
In [26]:
result = conn.execute(sa.text('SELECT vec_f32 FROM vectors LIMIT 5'))
Unpacking the Python lists works as before as well.
In [27]:
out_f32_list = [list(struct.unpack(fmt, x[0])) for x in result]out_f32_list[:3]
Out [27]:
[[0.6067176461219788,
0.2106485515832901,
0.6345869302749634,
0.6352039575576782,
0.829525887966156,
0.2823314368724823,
0.017225714400410652,
0.22034095227718353,
0.24362443387508392,
0.7712428569793701],
[0.26649677753448486,
0.6021978259086609,
0.8979067206382751,
0.9429398775100708,
0.589701771736145,
0.24339258670806885,
0.3752290904521942,
0.34352484345436096,
0.647399365901947,
0.19694264233112335],
[0.6997039914131165,
0.08066725730895996,
0.19695895910263062,
0.08963707834482193,
0.3289657235145569,
0.8245747089385986,
0.782729983329773,
0.43013912439346313,
0.765410304069519,
0.8552709817886353]]
Using pandas
The pandas package has utilities for working with databases. The two primary methods / functions are
DataFrame.to_sql
which uploads DataFrame
data to a table, and pd.read_sql
which downloads
data from a table.
In [28]:
reset_table()
In [29]:
import pandas as pd
First, we'll create a pandas DataFrame
with our numpy arrays.
Working with numpy arrays
In [30]:
df = pd.DataFrame(dict(vec_f32=pd.Series(vec_f32)))df.head()
Out [30]:
vec_f32 | |
---|---|
0 | [0.94529617, 0.5391597, 0.3666218, 0.27040002,... |
1 | [0.39620587, 0.56646174, 0.09738464, 0.6073699... |
2 | [0.10918448, 0.43081337, 0.033886313, 0.559860... |
3 | [0.027094776, 0.03226529, 0.49422556, 0.171387... |
4 | [0.65606296, 0.022113293, 0.57438064, 0.867151... |
We can use the to_sql
method of the DataFrame
to upload the data. Notice that we are using the SQLAlchemy
connection we created in the previous section as the con
parameter.
In [31]:
df.to_sql('vectors', con=conn, if_exists='append', index=False)
Out [31]:
1000
To read the data, we use the read_sql
function. As before, we are getting byte strings back that will need to be
converted into numpy arrays.
In [32]:
out_df = pd.read_sql('vectors', con=conn)out_df.head(3)
Out [32]:
vec_f32 | |
---|---|
0 | b"\xc4.\x81<\x8a\x0fW?\x11~\xcd>_\x82@?Vq\x05?... |
1 | b'rp\xdc>U\xd7\x89>\xe6BC?\xe7\xcd\xfb>P\xe4\x... |
2 | b"\xaf\x10,?\xc9\x8c\\?\xa3\xccQ>c\xd0'?\xe2y\... |
We apply the np.frombuffer
function to each element in the vec_f32
column to reconstruct the numpy array.
In [33]:
out_df['vec_f32'] = out_df['vec_f32'].apply(lambda x: np.frombuffer(x, dtype=np.float32))
In [34]:
out_df.head(3)
Out [34]:
vec_f32 | |
---|---|
0 | [0.01576937, 0.84008086, 0.40135244, 0.7519893... |
1 | [0.4305454, 0.26922098, 0.76273954, 0.4918053,... |
2 | [0.6721296, 0.8615232, 0.20488219, 0.6555235, ... |
Working with Python lists
Because Python lists are not typed arrays like numpy arrays, we have to convert them to bytes before uploading them.
In [35]:
reset_table()
Construct a DataFrame
using Python lists as the data.
In [36]:
df = pd.DataFrame(dict(vec_f32=vec_f32_list))df.head(3)
Out [36]:
vec_f32 | |
---|---|
0 | [0.9452961683273315, 0.5391597151756287, 0.366... |
1 | [0.396205872297287, 0.5664617419242859, 0.0973... |
2 | [0.10918448120355606, 0.43081337213516235, 0.0... |
Note that we are using our fmt
value from a previous section to convert the Python lists
to bytes using struct.pack
.
In [37]:
fmt
Out [37]:
'<10f'
In [38]:
df['vec_f32'] = df['vec_f32'].apply(lambda x: struct.pack(fmt, *x))
In [39]:
df['vec_f32'].head(3)
Out [39]:
0 b'\xee\xfeq?_\x06\n?\xda\xb5\xbb>\xdfq\x8a>\xa...
1 b'\x7f\xdb\xca>\xa3\x03\x11?\x99q\xc7=\x98|\x1...
2 b'\x1d\x9c\xdf=\x92\x93\xdc>`\xcc\n=\xffR\x0f?...
Name: vec_f32, dtype: object
Use the to_sql
method to upload the DataFrame
.
In [40]:
df.to_sql('vectors', con=conn, if_exists='append', index=False)
Out [40]:
1000
In [41]:
out_df = pd.read_sql('vectors', con=conn)out_df.head(3)
Out [41]:
vec_f32 | |
---|---|
0 | b'>kS?;\xec\x8b>0\xe5\x19?\r\xddF?\xf2\x044?u\... |
1 | b'\xedJ[=\xbfq\xce=G#\xa4<\xcbV\xe3=\xeb:;?\xa... |
2 | b'\x0e\x08n>\xe8\xb2\x98>\x10\x133>\xd4\xf7\x1... |
We now have to convert the byte strings back to Python lists.
In [42]:
out_df['vec_f32'] = out_df['vec_f32'].apply(lambda x: list(struct.unpack(fmt, x)))
In [43]:
out_df.head(3)
Out [43]:
vec_f32 | |
---|---|
0 | [0.8258551359176636, 0.2732866704463959, 0.601... |
1 | [0.0535382516682148, 0.10080289095640182, 0.02... |
2 | [0.2324526011943817, 0.29823994636535645, 0.17... |
Using the %%sql
/ %sql
magic commands
While the SQL magic commands are convenient for invoking basic SQL commands, they aren't quite as good for complex queries that insert data. The primary issue is that you must construct the query as a string and ensure that all of your data is properly escaped. We'll demonstrate some basics here, but the methods described in the previous sections are likely to work better.
In [44]:
reset_table()
Working with numpy arrays or Python lists
The SQL magic commands do not do any of the automatic conversions of data to query parameters, so this must be done manually before creating the query. This is done the same way whether the source is numpy arrays or Python lists. In either case, you must convert the objects to byte strings as we have in the previous sections, then convert that byte string into a hex literal that can be used in the query.
In [45]:
# Convert an element of the numpy array to a hex stringvec_f32[0].tobytes().hex()
Out [45]:
'eefe713f5f060a3fdab5bb3edf718a3ea72fab3eb2c5513f8d1c3d3fb6629a3c7873943d04930a3f'
In [46]:
# Convert an element of the Python list to a hex stringstruct.pack(fmt, *vec_f32_list[0]).hex()
Out [46]:
'eefe713f5f060a3fdab5bb3edf718a3ea72fab3eb2c5513f8d1c3d3fb6629a3c7873943d04930a3f'
To construct the query string for the %%sql
command, we need to build the entire list of values to insert
in a separate step. We'll insert the X
at the beginning of the string to indicate a hex literal to
SingleStoreDB. We'll also add the parentheses around the value for inserting multiple rows of data using
the INSERT
statement.
In [47]:
params = ["(X'{}')".format(x.tobytes().hex()) for x in vec_f32]params[:3]
Out [47]:
["(X'eefe713f5f060a3fdab5bb3edf718a3ea72fab3eb2c5513f8d1c3d3fb6629a3c7873943d04930a3f')",
"(X'7fdbca3ea303113f9971c73d987c1b3f3e875e3f0a49d13e2c90333e0616033ff33f123dd2c8573f')",
"(X'1d9cdf3d9293dc3e60cc0a3dff520f3f10454d3f2c222d3f0f74813ea0aed63e908f443ffa44093e')"]
In [48]:
%%sqlINSERT INTO vectors(vec_f32) VALUES {{ ','.join(params) }}
Out [48]:
We can now select the data.
In [49]:
%%sql out <<SELECT * FROM vectors LIMIT 5
In [50]:
out
Out [50]:
vec_f32 |
---|
b'\xfe\xd7@?\xecr\xec>\xbdW1?\xbb_\xcb>\x88\xb9\xf4>\x04G\xaa>#d\xf3=\x07\xb5\xcb>\xcd\xd7&?{Es?' |
b'\x0e\x08n>\xe8\xb2\x98>\x10\x133>\xd4\xf7\x1b?Q$-?t\x11\xfa>,}S?\xa8\x14k;\x1a\xf8h>\xf8\xbc-?' |
b'1~\x01>\xb7>/?\xd3\x10\x1f?Z\xcc\x05=>X\xa8>\x01\r\x1b>Q\xf1\xc3>/-\xab=\xea\x9c6?\xbc\xd6|?' |
b'\xe3\x02\xba=zz)?<\xa33?\x15\x03\x14>\x99\x97\x19?(\x13#?!\xe9\xe9>\xba#i?\xdc\xe1y?\xe1\xc1)?' |
b'\x8c\x86\x12?\xc3+h?\xba?=<\xa45\xda>\xb5\xf2\r?\xdb\xa8\x08?]\x84&?\xf2\xd1s?\xec\xce\xab>\x10\x19k?' |
At this point, there is nothing we can do with SQL magic commands to convert the data back into numpy arrays or Python lists. We need to drop to Python for that.
In [51]:
out_df = pd.DataFrame(out)
In [52]:
out_df['vec_f32'] = out_df['vec_f32'].apply(lambda x: np.frombuffer(x, dtype=np.float32))
In [53]:
out_df.head(3)
Out [53]:
vec_f32 | |
---|---|
0 | [0.7532958, 0.46181428, 0.69274503, 0.39721474... |
1 | [0.2324526, 0.29823995, 0.1748774, 0.6092503, ... |
2 | [0.12645794, 0.6845507, 0.62135047, 0.03266558... |
Using JSON
It is also possible to use JSON to create vectors, however, this method require serializing and deserializing JSON on either
end which isn't quite a efficient as the techniques above. It also requires using the JSON_ARRAY_PACK
and JSON_ARRAY_UNPACK
functions in your queries to go back and forth between the vector bytes and JSON. Here is an example of inserting the
Python list of floats.
In [54]:
import json
In [55]:
params = ['(JSON_ARRAY_PACK("{}"))'.format(json.dumps(x)) for x in vec_f32_list]params[:3]
Out [55]:
['(JSON_ARRAY_PACK("[0.9452961683273315, 0.5391597151756287, 0.36662179231643677, 0.2704000174999237, 0.3343479335308075, 0.8194228410720825, 0.7387169003486633, 0.018845897167921066, 0.07248586416244507, 0.5413057804107666]"))',
'(JSON_ARRAY_PACK("[0.396205872297287, 0.5664617419242859, 0.09738463908433914, 0.6073698997497559, 0.8692511320114136, 0.4087603688240051, 0.17535465955734253, 0.5120548009872437, 0.03570551797747612, 0.8429080247879028]"))',
'(JSON_ARRAY_PACK("[0.10918448120355606, 0.43081337213516235, 0.03388631343841553, 0.5598601698875427, 0.8018350601196289, 0.6763026714324951, 0.2528385818004608, 0.41930103302001953, 0.7678155899047852, 0.13405218720436096]"))']
In [56]:
%%sqlINSERT INTO vectors(vec_f32) VALUES {{ ','.join(params) }}
Out [56]:
If you use the JSON_ARRAY_UNPACK
function in your SELECT
statement, you can download the data as JSON.
In [57]:
%%sql out <<SELECT JSON_ARRAY_UNPACK(vec_f32) AS 'vec_f32' FROM vectors LIMIT 5
In [58]:
out = pd.DataFrame(out)out
Out [58]:
vec_f32 | |
---|---|
0 | [0.177021876, 0.717939079, 0.825487614, 0.7764... |
1 | [0.749338746, 0.589595854, 0.704872251, 0.9270... |
2 | [0.971682549, 0.574555218, 0.174982488, 0.4692... |
3 | [0.814588428, 0.773147047, 0.970053494, 0.9038... |
4 | [0.247024894, 0.828292727, 0.599695325, 0.4499... |
In [59]:
out['vec_f32'][0]
Out [59]:
[0.177021876,
0.717939079,
0.825487614,
0.77646929,
0.137723535,
0.358667195,
0.41495508,
0.027805429,
0.291372836,
0.413403481]
Notice that since the data type of the column in the SELECT
is JSON, it automatically gets converted to a Python list
in the client.
In [60]:
type(out['vec_f32'][0])
Out [60]:
list
Conclusion
As you can see, there are various interfaces available for uploading and downloading vector data. Depending on which Python framework you are using and what format your data is in, you can pick and choose which methods work for your use-case.
Action Required
If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI.
In [61]:
shared_tier_check = %sql show variables like 'is_shared_tier'if not shared_tier_check or shared_tier_check[0][1] == 'OFF':%sql DROP DATABASE IF EXISTS vector_data;
Details
About this Template
This example demonstrates the various ways to upload and download vector data from SingleStoreDB using Python.
This Notebook can be run in Shared Tier, Standard and Enterprise deployments.
Tags
License
This Notebook has been released under the Apache 2.0 open source license.