Unexpected query performance degradation

alex · August 22, 2019, 5:32am

I’m sure I’m missing something here, but I noticed some unexpected performance degradation in one of my queries. (Note that I’m replacing ? with 0 in MemSQL Studio just to run the Explain/Profile.) Let me call it Query 0:

select e from datoms_long
  where     p = 11
        and e in (select e from datoms_ref
                  where     p = 11
                        and a = ?
                        and v = ?)
        and a = ?;

Same performance issue with a join approach (Query 1):

select datoms_long.e
  from datoms_long
  inner join
  datoms_ref
  on     datoms_ref.p = 11
     and datoms_long.p = 11
     and datoms_ref.e = datoms_long.e
     and datoms_long.a = ?
     and datoms_ref.a = ?
     and datoms_ref.v = ?;

Query 0 and Query 1 each take ~500ms, but if I break Query 0 into two parts (Queries 2 and 3), each part runs essentially instantaneously:

select e from datoms_ref
  where     p = 11
        and a = ?
        and v = ?;

select e from datoms_long
  where     p = 11
        and e in (?, ?) -- output of `select e from datoms_ref ...`
        and a = ?;

Each of the tables involved have the same indices:

  index eav (e desc, a desc, v desc),
  index av  (        a desc, v desc),
  index vv  (                v desc),

  constraint pk primary key (p desc, e desc, a desc, v desc),

  shard key (p)

The Profile shows Queries 0 and 1 doing a full table scan (~500K rows) every time even when using the primary key. By contrast, Queries 2 and 3 scan only 300 rows. Happy to post the output of explain etc.

Note that p = “partition” (distinct ~15 of these), e = “entity id” (distinct 100Ks of these), a = “attribute id” (distinct ~50 of these), and v = “value” (distinct 100Ks of these).

I’ve recently re-read up on the behavior of indexes in both MemSQL and MySQL, but I’ve got to be missing something fundamental here. Any ideas?

I really appreciate your help!

—————

EDIT: I figured this out. First I went to EverSQL (super useful site by the way!) and when I pasted in the query above, with 0s instead of ?s, it said, among other things, “Avoid Implicit Casts When Searching for Strings … It’s recommended not to compare string columns to numeric values, as it will result in a cast that will prevent index usage.” This appears to have been exactly what was happening:

I was using 0 instead of e.g. convert(0, binary) in MemSQL Studio, which made the datatype of the parameter ambiguous and caused the query optimizer not to use the appropriate indexes, just as EverSQL said.
I was doing .setObject in JDBC which also made the datatype of the parameter ambiguous. However, interestingly, I still needed to wrap the param in convert(..., binary) even after calling .setBytes in JDBC.

As a simple test, wrapping the params / 0s in convert(..., binary) made the query execute virtually instantly instead of doing a full table scan. Hooray for obscure index knowledge!

adam · August 22, 2019, 9:13pm

EXPLAIN in MemSQL should warn you about similar problems. If you paste your table schema’s we can check on why the warning wasn’t issued.

i.e.,

MemSQL [test]> create table t(c1 int primary key);

MemSQL [test]> explain select * from t where c1 = '1';
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING: Comparisons between mismatched datatypes which may involve unsafe datatype conversions and/or degrade performance. Consider changing the datatypes, or adding explicit typecasts. See https://docs.memsql.com/docs/mismatched-datatypes for more information. |
|                                                                                                                                                                                                                                                                        |
| WARNING: Comparison between mismatched datatypes: (`t`.`c1` = '1'). Types 'int(11) NOT NULL' vs 'longtext CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL'.                                                                                                        |
|                                                                                                                                                                                                                                                                        |
| Gather partitions:single alias:remote_0                                                                                                                                                                                                                                |
| Project [t.c1]                                                                                                                                                                                                                                                         |
| IndexSeek test.t, PRIMARY KEY (c1) scan:[c1 = '1']                                                                                                                                                                                                                     |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
7 rows in set (0.02 sec)

alex · August 22, 2019, 9:16pm

Good point @adam — when I run explain <query> it does say:

WARNING: Comparisons between mismatched datatypes which may involve unsafe datatype conversions and/or degrade performance. Consider changing the datatypes, or adding explicit typecasts. See https://docs.memsql.com/docs/mismatched-datatypes for more information.

WARNING: Comparison between mismatched datatypes: (`datoms_long`.`a` = 0). Types 'binary(16) NOT NULL' vs 'bigint(20) NOT NULL'.

WARNING: Comparison between mismatched datatypes: (`datoms_ref`.`a` = 0). Types 'binary(16) NOT NULL' vs 'bigint(20) NOT NULL'.

WARNING: Comparison between mismatched datatypes: (`datoms_ref`.`v` = 0). Types 'binary(16) NOT NULL' vs 'bigint(20) NOT NULL'.

However, the issue was that I was using visual explain, which didn’t display any warnings.

adam · August 22, 2019, 9:27pm

Thanks Alex. Let me check on why Studio doesn’t display those warnings. They are pretty important.

alex · August 22, 2019, 9:29pm

Sounds great — thanks so much!

Jacky · August 22, 2019, 10:01pm

Hey Alex

It’s Jacky, product manager at MemSQL

I don’t much have much to input regarding your question, but I see that you use EverSQL. What are your thoughts / experience on using it for MemSQL? Does it help optimize performance much?

alex · October 9, 2019, 6:56pm

Hey @Jacky! Sorry, missed this till now. EverSQL has been extremely useful actually — for instance, it advised me in this case to use convert(..., binary), which made MemSQL leverage indexes and dramatically sped up the query. There was another, much more complex query we were doing in which we were using several joins and or clauses; it recommended we instead use in and exists statements with and clauses. Using this advice we were able to speed up the query from ~6 seconds to ~20ms. MemSQL has amazing query explanation/visualization (best I’ve ever seen — much better than a MySQL explain!), but EverSQL fills in the gaps really nicely by providing suggestions for performance improvement (also the best I’ve ever seen).

EverSQL currently supports a number of DB implementations and versions thereof, but it doesn’t support MemSQL (probably since it’s proprietary and its performance characteristics are so different). We analyzed MemSQL queries in EverSQL under the MySQL setting and that did get us pretty far, but we’d get a lot of value out of EverSQL supporting MemSQL performance recommendations (or maybe a feature in MemSQL Studio that does EverSQL-type recommendations).

Jacky · October 9, 2019, 8:35pm

Hey Alex

Thanks for the feedback!

I have shared this with the team.

alex · October 9, 2019, 9:27pm

You’re very welcome! Thanks for your initial question and for passing my thoughts along.