i upgraded to 8.9 yesterday. It broke our existing vector indices (incredible slow, unable to drop and re-created), a few Kafka pipelines (ER_RESULT_TABLE_NOT_EXIST) and the FTS2 is also much slower than with 8.7.
Update: i was able to fix the pipelines by setting MAX_OFFSETS_PER_BATCH_PARTITION to 1000. Also i was able to drop the vector index and re-create after a few cluster restarts. Sadly it’s still much slower than with 8.7
I tested HNSW_PQ and IVF_PQFS both are extremely slow/unusable right now. Here are my observations:
According to the docs you have to use vector indices like this: set @vec = '[...]':>vector(512); select @vec <*> embedding as score from tbl order by score desc
If i do an explain of this query i can see that the index is used: ColumnStoreFilter [INTERNAL_VECTOR_SEARCH(0, @vec, ?, '', 0) index]
If i do an explain of select DOT_PRODUCT(@vec, embedding) as score from tbl order by score desc it doesn’t use the vector index.
select '[...]':>vector(512) <*> embedding as score from tbl order by score desc doesn’t use vector index, too.
It’s not possible to set a variable in the Data API. In 8.5 it was possible to work around this limitation by structuring the query like this: select DOT_PRODUCT(emb.embedding, tbl.embedding) as score from tbl inner join (select '[...]':>vector(512) as embedding) emb on true order by score desc
At some point with 8.7 it was possible to use the vector directly: select DOT_PRODUCT('[...]':>vector(512), embedding) as score from tbl oder by score desc
SHOW STATUS EXTENDED LIKE 'alloc_vector_index' doesn’t exist/returns an empty result
index hint doesn’t work, too
Now in 8.9 suddenly none of the approaches work anymore. Only way for me to get the index to work is by setting a variable and use this variable. But even then the query time is ~20x slower.
It’s nothing new that new features are always rushed and do have some weird caveats to workaround, but IMO this is pretty bad. Is there anything i am missing? Sorry if it comes over rude, but i think honest feedback is important to improve still think SingleStore is the best database out there
Regarding FTS V2 i didn’t have much time yet to dive into. We are on the default config, any specifics you need? First i have to fix the vector stuff so our app is usable again
That’s unfortunate. Thanks for sharing all the details. Do you have LIMIT clauses on your queries that you just omitted? Without a LIMIT it has to scan everything.
We’ve identified the vector search performance bug (whereby the ANN index does not get used) and a fix is being developed. We expect to ship this fix in an 8.9 patch release next week.
The issue can occur with use of both infix dot product and the dot_product() function.
The issue does not occur for infix euclidean distance or the euclidean_distance() function.
Alright, thank you for the update! Going to try euclidean distance as a workaround. Is this the same issue which results in a poor performance when using the data api / not using a variable?
For the data API, you should be able to include a vector literal in your query and cast it to VECTOR(512) once this fix is in, and it should use the index.
all leaf nodes went offline immediately after running select @vec <-> embedding as score from tbl order by score asc limit 12
details:
vector(512)
hnsw_pq
table contains rows where embedding col is null
i will not test further configurations as we only have our prod setup right now will try to spin sth up next week and test a little bit, but feel free to test, too
I will pass this along to the devs as well. Not good. It should not crash.
Just FYI, it is essential to use the same METRIC_TYPE for creating the index that you use when searching, for good recall. But regardless of that, it should not crash.
Do you think you could show us your CREATE TABLE statement and ALTER TABLE statement to add the index, that led to this crash?
Please provide the output of SHOW CREATE TABLE too, so we can see all the index definitions and types (in case the table already existed and you did not just create it). If you know what SingleStore version you created the index on, please tell us that.
We shipped a patch 8.9.2 today that fixes the performance regression with vector similarity search with dot product. Such queries will now use the vector ANN index as expected.