I analyzed a slow query created from the B/E app and found a costly ShuffleGroupBy operation in its query’s profile.
Then I try reduce the ShuffleGroupBy time by adding the WITH(LEAF_PUSHDOWN=TRUE) hint, however it does not help much. When checking the operation again in the profile (with the leaf_pushdown hint), it shows the following 2 messages:
encoded_group_by_unsupported: “at least one join does not meet requirements for push down for columnstore scan”
encoded_join_unsupported: “join key is not single integer column at join depth 1”
I have some questions about the query performance as below:
- What are the requirements for push down for columnsotre scan?
- Does the table “x_tmp_hz_cv_r_sales_inv_store_by_channel_all” has the proper shard key for this query?
- What is the root cause for the slowness and any suggestions to improve?
Thank you in advance for your time and reply!
Query:
profile select WITH(LEAF_PUSHDOWN=TRUE)
CATALOG_4.parent_id ascatalog_4
,
period_id,
sum(sales_inv_hist.purchase_spent) aspurchase_spent
,
sum(sales_inv_hist.purchase_discount) aspurchase_discount
,
sum(sales_inv_hist.purchase_units) aspurchase_units
,
sum(sales_inv_hist.cost_amt) ascost_amt
,
sum(sales_inv_hist.gross_margin) asmargin
,
sum(sales_inv_hist.purchase_spent) assales_amount
,
– sum(case
– when sales_inv_hist.cal_dt = to_date then sales_inv_hist.inventory_on_hand_qty
– end) aseop_inventory_on_hand_qty
,
– sum(case
– when sales_inv_hist.cal_dt = to_date then sales_inv_hist.inventory_on_order_qty
– end) asinventory_on_order_eop_qty
,
– sum(sales_inv_hist.oos_flg) asoos_flag
,
– sum(case
– when sales_inv_hist.cal_dt = to_date then sales_inv_hist.oos_flg
– end) aseop_oos_flag
,
– sum(sales_inv_hist.over_stock_flg) asover_stock_flag
,
– sum(sales_inv_hist.current_oos_impact) asoos_impact
,
– sum(sales_inv_hist.current_low_stock_impact) aslow_stock_impact
,
– sum(case
– when sales_inv_hist.cal_dt = to_date then sales_inv_hist.inventory_on_hand_cost_amt
– end) asoh_eop_cost
,
– avg(sales_inv_hist.inventory_on_hand_cost_amt) asavg_oh_cost
,
– sum(case
– when sales_inv_hist.cal_dt = to_date then sales_inv_hist.inventory_on_hand_amt
– end) astotal_oh_eop
,
– sum(price_amt) asprice_amt
,
– avg(sales_inv_hist.inventory_on_hand_amt) asavg_oh_retail
,
– sum(sales_inv_hist.inventory_on_hand_qty) asinventory_on_hand_qty
,
– sum(sales_inv_hist.avg_sales_qty) asavg_sales_qty
,
– sum(inventory_on_order_qty) asinventory_on_order_qty
,
– avg(sales_inv_hist.inventory_on_hand_qty) asavg_inventory_oh_qty
,
count(distinct case
when sales_inv_hist.days_on_hand > 0 then sales_inv_hist.catalog_internal_id
end) asnum_items
,
sum(case
when sales_inv_hist.cal_dt = to_date then sales_inv_hist.inventory_on_order_cost_amt
end) aseop_inventory_on_order_cost
,
sum(sales_inv_hist.damage_amt) asdamage_amount
,
sum(sales_inv_hist.waste_amt) aswaste_amount
from x_tmp_hz_cv_r_sales_inv_store_by_channel_all assales_inv_hist
join x_tmp_hz_dt asperiods
oncal_dt
= periods.cal_dt1
straight_join (
select child_id
from (
select distinct child_id
from (
select child_id
from cv_r_all_parents_full_view
where (
hier = 1
and parent_id in (
2, 7)
and parent_level = 4
and child_level = 0
)
) asall_parents1
) asunion1
) asrelevant_products
on sales_inv_hist.catalog_internal_id = relevant_products.child_id
straight_join cv_r_all_parents_full_view asCATALOG_4
on (
CATALOG_4.child_id = sales_inv_hist.catalog_internal_id
and CATALOG_4.child_level = 0
and CATALOG_4.hier = 1
and CATALOG_4.parent_level = 4
)
– where (
– sales_inv_hist.cal_dt between date(‘2022-06-27’) and date(‘2022-11-13’)
– or sales_inv_hist.cal_dt between date(‘2023-06-26’) and date(‘2023-11-13’)
– )
group by
catalog_4,
period_id
;
Schema:
– x_tmp_hz_cv_r_sales_inv_store_by_channel_all definition
CREATE TABLE
x_tmp_hz_cv_r_sales_inv_store_by_channel_all
(
cal_dt
date DEFAULT NULL,
cv_pos_id
bigint(20) DEFAULT NULL,
catalog_internal_id
bigint(20) DEFAULT NULL,
dc_internal_id
bigint(20) DEFAULT NULL,
purchase_spent
decimal(19,3) DEFAULT NULL,
purchase_units
decimal(19,3) DEFAULT NULL,
purchase_discount
decimal(19,3) DEFAULT NULL,
price_amt
decimal(19,3) DEFAULT NULL,
cost_amt
decimal(19,3) DEFAULT NULL,
gross_margin
decimal(19,3) DEFAULT NULL,
net_margin
decimal(19,3) DEFAULT NULL,
inventory_on_hand_qty
decimal(19,3) DEFAULT NULL,
inventory_on_hand_amt
decimal(19,3) DEFAULT NULL,
inventory_on_hand_cost_amt
decimal(19,3) DEFAULT NULL,
inventory_on_order_qty
decimal(19,3) DEFAULT NULL,
inventory_on_order_amt
decimal(19,3) DEFAULT NULL,
inventory_on_order_cost_amt
decimal(19,3) DEFAULT NULL,
…
rec_insert_tmsp
varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
SORT KEYcal_cv_r_sales_inv_store_by_channel_all_sort_key
(cal_dt
DESC),
SHARD KEYcal_cv_r_sales_inv_store__by_channel_all_shard_key
(cal_dt
,catalog_internal_id
),
UNIQUE KEYcal_dt
(cal_dt
,catalog_internal_id
,cv_pos_id
,dc_internal_id
) USING HASH,
KEYhz__catalog_idx
(catalog_internal_id
) USING HASH,
KEYhz__pos_idx
(cv_pos_id
) USING HASH
) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE=‘STRICT_ALL_TABLES’;
– x_tmp_hz_dt definition
CREATE TABLE
x_tmp_hz_dt
(
cal_dt1
date DEFAULT NULL,
period_id
bigint(20) DEFAULT NULL,
from_date
date DEFAULT NULL,
to_date
date DEFAULT NULL,
SORT KEY__UNORDERED
(),
KEYhz__cal_dt1
(cal_dt1
) USING HASH,
KEYhz__pcal_dt1
(period_id
,cal_dt1
) USING HASH
, SHARD KEY ()
) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE=‘STRICT_ALL_TABLES’;
Profile:
cannot paste the profile here because of the post size limit. please let me know if there is a way to share.
Here is part of profile with the ShuffleGroupBy operation: