Re: TPC-H Q20 from 1 hour to 19 hours! - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: TPC-H Q20 from 1 hour to 19 hours! |
Date | |
Msg-id | 060a9d76-67ab-0152-ff2e-36bf9156fd78@2ndquadrant.com Whole thread Raw |
In response to | [HACKERS] TPC-H Q20 from 1 hour to 19 hours! (Rafia Sabih <rafia.sabih@enterprisedb.com>) |
Responses |
Re: TPC-H Q20 from 1 hour to 19 hours!
(Tomas Vondra <tomas.vondra@2ndquadrant.com>)
|
List | pgsql-hackers |
On 03/29/2017 09:00 PM, Robert Haas wrote: > On Mon, Mar 6, 2017 at 1:22 AM, Rafia Sabih > <rafia.sabih@enterprisedb.com> wrote: >> This is to bring to notice a peculiar instance I found recently while >> running TPC-H benchmark queries. Q20 of the benchmark took 19 hours to >> complete ... > > That's bad. > >> It is clear that selectivity estimations are really bad in this case >> particularly at node, >> -> Merge Join (cost=52959586.72..60024468.82 rows=85 width=16) >> (actual time=1525322.753..2419045.641 rows=1696742 loops=1) >> Merge Cond: ((lineitem.l_partkey = >> partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey)) >> Join Filter: >> ((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity)))) >> Rows Removed by Join Filter: 3771 > > So, the selectivity estimation here is bad both before and after Tom's > commit, but it's significantly worse after (actual value 1696742, old > estimate 3771, new estimate 85). > >> Still this puzzled me as during earlier runs of this benchmark I never >> encountered such prolonged running times. On further investigation I >> found that on reverting the commit >> 7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218 >> Author: Tom Lane <tgl@sss.pgh.pa.us> >> Date: Sat Dec 17 15:28:54 2016 -0500 >> Fix FK-based join selectivity estimation for semi/antijoins. > > I don't think the problem originates at the Merge Join, though, > because the commit says that at is fixing semi and anti-join estimates > - this is a plain inner join, so in theory it shouldn't change. > However, it's a bit hard for me to piece through these plans, the > formatting kind of got messed up - things are wrapped. Could you > possibly attach the plans as attachments? > I've been looking into this today, and it seems to me the simplest query triggering this issue (essentially a part of q20) is this: select ps_suppkey from partsupp, ( select l_partkey agg_partkey, l_suppkey agg_suppkey from lineitem group by l_partkey, l_suppkey ) agg_lineitem where agg_partkey = ps_partkey and agg_suppkey = ps_suppkey and ps_partkey in ( select p_partkey from part where p_name like 'hot%' ); which does actually include a semijoin. What seems to trigger the issue is the join to the aggregated lineitem table - when replacing it with a plain table, everything seems to be estimated perfectly fine. Attached is a simple SQL script, that runs three variants of the query: (1) with the join to the aggregated lineitem table (2) with a join to a plain lineitem table (3) with a join to a plain lineitem table and to 'part' (without the semijoin) First the queries are executed on tables without any foreign keys (between those three), then with a FK between lineitem and partsupp, and finally with additional FK between partsupp and part. Initially the estimates are bad, but once the first foreign key is added, the estimates get very accurate - except for the case (1). I've only ran the queries on 10GB data set, but that should be enough. The plans are from current master - I'll rerun the script on an older release later today. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: