Re: TPC-H Q20 from 1 hour to 19 hours! - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: TPC-H Q20 from 1 hour to 19 hours!
Date
Msg-id 060a9d76-67ab-0152-ff2e-36bf9156fd78@2ndquadrant.com
Whole thread Raw
In response to [HACKERS] TPC-H Q20 from 1 hour to 19 hours!  (Rafia Sabih <rafia.sabih@enterprisedb.com>)
Responses Re: TPC-H Q20 from 1 hour to 19 hours!  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers

On 03/29/2017 09:00 PM, Robert Haas wrote:
> On Mon, Mar 6, 2017 at 1:22 AM, Rafia Sabih
> <rafia.sabih@enterprisedb.com> wrote:
>> This is to bring to notice a peculiar instance I found recently while
>> running TPC-H benchmark queries. Q20 of the benchmark took 19 hours to
>> complete ...
> 
> That's bad.
> 
>> It is clear that selectivity estimations are really bad in this case
>> particularly at node,
>>    ->  Merge Join  (cost=52959586.72..60024468.82 rows=85 width=16)
>> (actual time=1525322.753..2419045.641 rows=1696742 loops=1)
>>                             Merge Cond: ((lineitem.l_partkey =
>> partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
>>                             Join Filter:
>> ((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
>>                             Rows Removed by Join Filter: 3771
> 
> So, the selectivity estimation here is bad both before and after Tom's
> commit, but it's significantly worse after (actual value 1696742, old
> estimate 3771, new estimate 85).
> 
>> Still this puzzled me as during earlier runs of this benchmark I never
>> encountered such prolonged running times. On further investigation I
>> found that on reverting the commit
>> 7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218
>> Author: Tom Lane <tgl@sss.pgh.pa.us>
>> Date:   Sat Dec 17 15:28:54 2016 -0500
>>      Fix FK-based join selectivity estimation for semi/antijoins.
> 
> I don't think the problem originates at the Merge Join, though,
> because the commit says that at is fixing semi and anti-join estimates
> - this is a plain inner join, so in theory it shouldn't change.
> However, it's a bit hard for me to piece through these plans, the
> formatting kind of got messed up - things are wrapped.  Could you
> possibly attach the plans as attachments?
>

I've been looking into this today, and it seems to me the simplest query 
triggering this issue (essentially a part of q20) is this:

select
         ps_suppkey
from
         partsupp,
         (
                 select
                         l_partkey agg_partkey,
                         l_suppkey agg_suppkey
                 from
                         lineitem
                 group by
                         l_partkey,
                         l_suppkey
         ) agg_lineitem
where
         agg_partkey = ps_partkey
         and agg_suppkey = ps_suppkey
         and ps_partkey in (
                 select
                         p_partkey
                 from
                         part
                 where
                         p_name like 'hot%'
         );

which does actually include a semijoin. What seems to trigger the issue 
is the join to the aggregated lineitem table - when replacing it with a 
plain table, everything seems to be estimated perfectly fine.

Attached is a simple SQL script, that runs three variants of the query:

(1) with the join to the aggregated lineitem table
(2) with a join to a plain lineitem table
(3) with a join to a plain lineitem table and to 'part' (without the 
semijoin)

First the queries are executed on tables without any foreign keys 
(between those three), then with a FK between lineitem and partsupp, and 
finally with additional FK between partsupp and part.

Initially the estimates are bad, but once the first foreign key is 
added, the estimates get very accurate - except for the case (1).

I've only ran the queries on 10GB data set, but that should be enough. 
The plans are from current master - I'll rerun the script on an older 
release later today.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Jesper Pedersen
Date:
Subject: Re: Page Scan Mode in Hash Index
Next
From: Petr Jelinek
Date:
Subject: Re: logical replication access control patches