Yes, that join is concerning (red text below). The conditions all need to be checked so they are independent.
The query (with consistent obfuscation) is below :
select distinct a.sale_id , a.test_date , a.product_id as original_product_id ,vw2.product_id , a.volume as volume ,b.pair_rank from not_sold_locations a inner join vw_product vw2 using (product_group_name,product_class_code,product_type_code,sale_end_date) inner join product_mapping b on a.product_group_name = b.left_product_group_name and a.product_node_name = b.left_product_node and a.product_type_code = b.left_product and vw2.product_node_name = b.right_product_node and vw2.product_group_name = b.right_product_group_name and vw2.product_type_code = b.right_product inner join mapping_ref i on vw2.product_group_name || '.' || vw2.product_node_name = i.product_node_name and vw2.product_class_code = i.product_class_code and vw2.product_type_code = i.product_type_code and vw2.sale_end_date between i.first_product_date and i.last_product_date;
not_sold_locations(a) has 836 rows vw_product (vw2) has 785k rows and is a view that joins 11 tables together to have a consolidated view of all products, sales locations, etc
On Thu, Sep 27, 2018 at 01:08:05PM -0400, Arjun Ranade wrote: > When I look at the EXPLAIN ANALYZE output, it seems that it's using a > drastically different query plan for the INSERT+SELECT than SELECT by > itself.
The fast, SELECT plan is using parallel query, which isn't available for INSERT+SELECT:
Using parallel query in this case happens to mitigate the effects of the bad plan.
I see Tom responded, and you got an improvement by changing join threshold.
But I think you could perhaps get an better plan if the rowcount estimates were fixed. That's more important than probably anything else - changing settings is only a workaround for bad estimates.
In the slow/INSERT plan, this join is returning 55000x more rows than expected (not 55k more: 55k TIMES more).