Trying to perform Full join on 2 tables.
Table 1 has 1.1 Billion Records
Table 2 has 180 Million Records
Table Structure is,
CREATE TABLE TABLE1(
COLUMN1 VARCHAR(20),
COLUMN2 VARCHAR(20),
COLUMN3 VARCHAR(20),
COLUMN4 VARCHAR(20),
COLUMN5 VARCHAR(20),
COLUMN6 VARCHAR(20),
CREATE_DATE DATE,
KEY (COLUMN1,COLUMN2,COLUMN3) USING CLUSTERED COLUMNSTORE,
SHARD KEY (COLUMN1,COLUMN2));
CREATE TABLE TABLE2(
COLUMN1 VARCHAR(20),
COLUMN2 VARCHAR(20),
COLUMN3 VARCHAR(20),
STAT1 INT,
STAT2 INT,
STAT3 INT,
STAT4 INT,
CREATE_DATE DATE,
KEY (COLUMN1,COLUMN2,COLUMN3) USING CLUSTERED COLUMNSTORE,
SHARD KEY (COLUMN1,COLUMN2));
When this query is run, filter is not done before doing the join,
SELECT
(CASE WHEN t1.COLUMN3 is null then t2.COLUMN3 else t1.COLUMN3) AS PIVOT_1,
SUM(STAT1),
SUM(STAT2)
FROM
TABLE1 t1 FULL JOIN TABLE t2 ON
(t1.COLUMN1 = t2.COLUMN1 and t1.COLUMN2 = t2.COLUMN2)
WHERE
(t1.COLUMN1 = “STRING” or t2.COLUMN1 is null) or (t2.COLUMN1 = “STRING” and t1.COLUMN1 is null) or (t1.COLUMN1 = “STRING” and t2.COLUMN1 = “STRING”)
GROUP BY PIVOT1;
Sample profile screenshot:
Whereas if i manually filter the tables and then do a join, it is very fast. Should this filtering not be done implicitly by the singlestore.
SELECT
(CASE WHEN t1.COLUMN3 is null then t2.COLUMN3 else t1.COLUMN3) AS PIVOT_1,
SUM(STAT1),
SUM(STAT2)
FROM
(SELECT * FROM TABLE1 where t1.COLUMN1 = “STRING”) t1
FULL JOIN (SELECT * FROM TABLE1 where t1.COLUMN1 = “STRING”) t2
ON (t1.COLUMN1 = t2.COLUMN1 and t1.COLUMN2 = t2.COLUMN2)
GROUP BY PIVOT_1;
Sample profile screenshot: