Hello, community,
I am looking for the wisdom of the crowd. I have six tables, one main table A and five other tables that describe some additional dimensions of table A.; For instance, the A table describes a page and table AB describes devices used to view this page. So it is one too many relationships. It is a “star schema” with A being in the center.
So the data distributions look like the following
A: 0.5M records
AB: 1M records
AC: 10M records
AD: 5M records
AE: 20M records
The query is doing left joins to get a denormalized view for each records filtering by several parameters on A, such as a date, location, etc.
The question is what the reasonable performance I can get is? When each record in A have about 100 of related records in other tables? The primary keys join the tables.
In some circumstances you can certainly expect a multi-way join query to run in under 10 ms. But from what you’ve shown here, it’s not possible to tell if yours will. You say that your schema is like a star schema with A in the middle. But normally the middle table is the “fact table” and is far larger than the other tables which are the “dimension tables.” That is not the case here. A is the smallest table.
Nevertheless, on big enough hardware or with small enough data, a star join query with group-by and aggregation with several dimension tables joining on integer surrogate keys, with a columnstore fact table, can run in under 10 ms.
For a more transactional workload, with row stores, and doing nested loop joins via indexes on the join columns, it is also possible to do a multi-way join under 10 ms even with large databases.
This all assumes that you are compiling the query once and running it multiple times. The first run may take a substantial fraction of a second to compile.