8.9 broke a few things for us

Hi,

i upgraded to 8.9 yesterday. It broke our existing vector indices (incredible slow, unable to drop and re-created), a few Kafka pipelines (ER_RESULT_TABLE_NOT_EXIST) and the FTS2 is also much slower than with 8.7.

Are these known issues?

Best,
Tom

Update: i was able to fix the pipelines by setting MAX_OFFSETS_PER_BATCH_PARTITION to 1000. Also i was able to drop the vector index and re-create after a few cluster restarts. Sadly it’s still much slower than with 8.7 :confused:

Sorry to hear that. I will pass this on to our dev team.

What kind of ANN indexes are you using? Any details on your configuration for FTS V2 or your app that you can share?

I tested HNSW_PQ and IVF_PQFS both are extremely slow/unusable right now. Here are my observations:

According to the docs you have to use vector indices like this:
set @vec = '[...]':>vector(512);
select @vec <*> embedding as score from tbl order by score desc

If i do an explain of this query i can see that the index is used: ColumnStoreFilter [INTERNAL_VECTOR_SEARCH(0, @vec, ?, '', 0) index]

If i do an explain of select DOT_PRODUCT(@vec, embedding) as score from tbl order by score desc it doesn’t use the vector index.

select '[...]':>vector(512) <*> embedding as score from tbl order by score desc doesn’t use vector index, too.

It’s not possible to set a variable in the Data API. In 8.5 it was possible to work around this limitation by structuring the query like this: select DOT_PRODUCT(emb.embedding, tbl.embedding) as score from tbl inner join (select '[...]':>vector(512) as embedding) emb on true order by score desc
At some point with 8.7 it was possible to use the vector directly: select DOT_PRODUCT('[...]':>vector(512), embedding) as score from tbl oder by score desc

SHOW STATUS EXTENDED LIKE 'alloc_vector_index' doesn’t exist/returns an empty result

index hint doesn’t work, too

Now in 8.9 suddenly none of the approaches work anymore. Only way for me to get the index to work is by setting a variable and use this variable. But even then the query time is ~20x slower.
It’s nothing new that new features are always rushed and do have some weird caveats to workaround, but IMO this is pretty bad. Is there anything i am missing? Sorry if it comes over rude, but i think honest feedback is important to improve :sweat_smile: still think SingleStore is the best database out there :eyes:

Regarding FTS V2 i didn’t have much time yet to dive into. We are on the default config, any specifics you need? First i have to fix the vector stuff so our app is usable again :sweat_smile:

That’s unfortunate. Thanks for sharing all the details. Do you have LIMIT clauses on your queries that you just omitted? Without a LIMIT it has to scan everything.

We’ll get back to you with more information.

yes, limit 12 offset 0 for every query. thank you! :slight_smile:

We’ve identified the vector search performance bug (whereby the ANN index does not get used) and a fix is being developed. We expect to ship this fix in an 8.9 patch release next week.

The issue can occur with use of both infix dot product and the dot_product() function.

The issue does not occur for infix euclidean distance or the euclidean_distance() function.

Alright, thank you for the update! Going to try euclidean distance as a workaround. Is this the same issue which results in a poor performance when using the data api / not using a variable?

For the data API, you should be able to include a vector literal in your query and cast it to VECTOR(512) once this fix is in, and it should use the index.

awesome, thank you! really appreciate the responsiveness :slight_smile:

euclidean distance crashed our whole cluster

all leaf nodes went offline immediately after running select @vec <-> embedding as score from tbl order by score asc limit 12
details:
vector(512)
hnsw_pq
table contains rows where embedding col is null

i will not test further configurations as we only have our prod setup right now :sweat_smile: will try to spin sth up next week and test a little bit, but feel free to test, too :smiley:

I will pass this along to the devs as well. Not good. It should not crash.

Just FYI, it is essential to use the same METRIC_TYPE for creating the index that you use when searching, for good recall. But regardless of that, it should not crash.

By the way, “SingleStore recommends IVF_PQFS and HNSW_FLAT”. If you want to save RAM, we recommend IVF_PQFS, not HNSW_PQ.

Do you think you could show us your CREATE TABLE statement and ALTER TABLE statement to add the index, that led to this crash?

Please provide the output of SHOW CREATE TABLE too, so we can see all the index definitions and types (in case the table already existed and you did not just create it). If you know what SingleStore version you created the index on, please tell us that.

If you can give us a repro of the crash, that will really help. We have not been able to repro it.

We shipped a patch 8.9.2 today that fixes the performance regression with vector similarity search with dot product. Such queries will now use the vector ANN index as expected.