I’m sure I’m missing something here, but I noticed some unexpected performance degradation in one of my queries. (Note that I’m replacing ?
with 0
in MemSQL Studio just to run the Explain/Profile.) Let me call it Query 0:
select e from datoms_long
where p = 11
and e in (select e from datoms_ref
where p = 11
and a = ?
and v = ?)
and a = ?;
Same performance issue with a join
approach (Query 1):
select datoms_long.e
from datoms_long
inner join
datoms_ref
on datoms_ref.p = 11
and datoms_long.p = 11
and datoms_ref.e = datoms_long.e
and datoms_long.a = ?
and datoms_ref.a = ?
and datoms_ref.v = ?;
Query 0 and Query 1 each take ~500ms, but if I break Query 0 into two parts (Queries 2 and 3), each part runs essentially instantaneously:
select e from datoms_ref
where p = 11
and a = ?
and v = ?;
select e from datoms_long
where p = 11
and e in (?, ?) -- output of `select e from datoms_ref ...`
and a = ?;
Each of the tables involved have the same indices:
index eav (e desc, a desc, v desc),
index av ( a desc, v desc),
index vv ( v desc),
constraint pk primary key (p desc, e desc, a desc, v desc),
shard key (p)
The Profile shows Queries 0 and 1 doing a full table scan (~500K rows) every time even when using the primary key. By contrast, Queries 2 and 3 scan only 300 rows. Happy to post the output of explain
etc.
Note that p
= “partition” (distinct ~15 of these), e
= “entity id” (distinct 100Ks of these), a
= “attribute id” (distinct ~50 of these), and v
= “value” (distinct 100Ks of these).
I’ve recently re-read up on the behavior of indexes in both MemSQL and MySQL, but I’ve got to be missing something fundamental here. Any ideas?
I really appreciate your help!
—————
EDIT: I figured this out. First I went to EverSQL (super useful site by the way!) and when I pasted in the query above, with 0
s instead of ?
s, it said, among other things, “Avoid Implicit Casts When Searching for Strings … It’s recommended not to compare string columns to numeric values, as it will result in a cast that will prevent index usage.” This appears to have been exactly what was happening:
- I was using
0
instead of e.g.convert(0, binary)
in MemSQL Studio, which made the datatype of the parameter ambiguous and caused the query optimizer not to use the appropriate indexes, just as EverSQL said. - I was doing
.setObject
in JDBC which also made the datatype of the parameter ambiguous. However, interestingly, I still needed to wrap the param inconvert(..., binary)
even after calling.setBytes
in JDBC.
As a simple test, wrapping the params / 0
s in convert(..., binary)
made the query execute virtually instantly instead of doing a full table scan. Hooray for obscure index knowledge!