Version 6
CREATE TABLE features (id bigint(11) NOT NULL AUTO_INCREMENT, features_extracted binary(1920) DEFAULT N
ULL, user_id varchar(256) DEFAULT NULL, bbox_id varchar(256) DEFAULT NULL , KEY id (id) USING CLUSTERED COLUMNS
TORE);
Error: ERROR 1074 (42000): Column length too big for column ‘features_extracted’ (max = 255); use BLOB or TEXT instead
I use EUCLIDEAN, which my food dataset is NOT familiar with hence I have to do experiments all over again to find a sweet spot (18.0 now). Is there a COSINE DISTANCE somehow available ?
For COSINE SIMILARITY (CS) use the DOT_PRODUCT() function, SingleStoreDB Cloud · SingleStore Documentation, and normalize all the input vector lengths to 1. I believe you can compute the cosine distance from that as 1 - CS.
Hi, but when we are using BLOB and save features in it, we are having a decrease in speed. It performs SELECT from 6 mln row table in approx. 40 seconds. What can we do to speed up the process? In your example here Image Recognition at the Speed of Memory Bandwidth it shows creation of BINARY(4096) field. But as we got error with creating it, is it not possible to do so?
Thanks for your fast reply. We inserted our 512-d feature vectors as you suggested to varbinary(4096).
We have only 13 million vectors in our table. But it takes 65-70 seconds to do SELECT.
Let me explain you our lifecycle:
We inserted all our 13 million vectors as you suggested with json_array_pack(vector);
We get new vector from incoming image (lst) and do:
“SELECT id, username, DOT_PRODUCT(feature_vector, JSON_ARRAY_PACK(concat(’%s’))) as score from FR.our_db order by score desc limit 1;” % (lst)
Maybe we are doing something wrong from our side, but we get good select response - it shows the row corresponding to our (lst) vector. But it takes 65-70 seconds.
Could you give us any other instructions on how to speed up the process?
Make sure you are sharding in such a way that you get an even distribution of rows across partitions. And make sure you have the same number of partitions per node as you have hardware threads.
Also, vectors are interpreted as single-precision, 4 bytes per element. So a 512-d vector needs 2048 bytes. So varbinary(2048) might work for you, although it shouldn’t changes things since varbinary(4096) can hold 2048 bytes.