Home > mailing lists

Re: TPC-H Q20 from 1 hour to 19 hours! - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: TPC-H Q20 from 1 hour to 19 hours!
Date	March 30, 2017 01:14:30
Msg-id	060a9d76-67ab-0152-ff2e-36bf9156fd78@2ndquadrant.com Whole thread Raw
In response to	[HACKERS] TPC-H Q20 from 1 hour to 19 hours! (Rafia Sabih <rafia.sabih@enterprisedb.com>)
Responses	Re: TPC-H Q20 from 1 hour to 19 hours!
List	pgsql-hackers

Tree view


On 03/29/2017 09:00 PM, Robert Haas wrote:
> On Mon, Mar 6, 2017 at 1:22 AM, Rafia Sabih
> <rafia.sabih@enterprisedb.com> wrote:
>> This is to bring to notice a peculiar instance I found recently while
>> running TPC-H benchmark queries. Q20 of the benchmark took 19 hours to
>> complete ...
> 
> That's bad.
> 
>> It is clear that selectivity estimations are really bad in this case
>> particularly at node,
>>    ->  Merge Join  (cost=52959586.72..60024468.82 rows=85 width=16)
>> (actual time=1525322.753..2419045.641 rows=1696742 loops=1)
>>                             Merge Cond: ((lineitem.l_partkey =
>> partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
>>                             Join Filter:
>> ((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
>>                             Rows Removed by Join Filter: 3771
> 
> So, the selectivity estimation here is bad both before and after Tom's
> commit, but it's significantly worse after (actual value 1696742, old
> estimate 3771, new estimate 85).
> 
>> Still this puzzled me as during earlier runs of this benchmark I never
>> encountered such prolonged running times. On further investigation I
>> found that on reverting the commit
>> 7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218
>> Author: Tom Lane <tgl@sss.pgh.pa.us>
>> Date:   Sat Dec 17 15:28:54 2016 -0500
>>      Fix FK-based join selectivity estimation for semi/antijoins.
> 
> I don't think the problem originates at the Merge Join, though,
> because the commit says that at is fixing semi and anti-join estimates
> - this is a plain inner join, so in theory it shouldn't change.
> However, it's a bit hard for me to piece through these plans, the
> formatting kind of got messed up - things are wrapped.  Could you
> possibly attach the plans as attachments?
>

I've been looking into this today, and it seems to me the simplest query 
triggering this issue (essentially a part of q20) is this:

select
         ps_suppkey
from
         partsupp,
         (
                 select
                         l_partkey agg_partkey,
                         l_suppkey agg_suppkey
                 from
                         lineitem
                 group by
                         l_partkey,
                         l_suppkey
         ) agg_lineitem
where
         agg_partkey = ps_partkey
         and agg_suppkey = ps_suppkey
         and ps_partkey in (
                 select
                         p_partkey
                 from
                         part
                 where
                         p_name like 'hot%'
         );

which does actually include a semijoin. What seems to trigger the issue 
is the join to the aggregated lineitem table - when replacing it with a 
plain table, everything seems to be estimated perfectly fine.

Attached is a simple SQL script, that runs three variants of the query:

(1) with the join to the aggregated lineitem table
(2) with a join to a plain lineitem table
(3) with a join to a plain lineitem table and to 'part' (without the 
semijoin)

First the queries are executed on tables without any foreign keys 
(between those three), then with a FK between lineitem and partsupp, and 
finally with additional FK between partsupp and part.

Initially the estimates are bad, but once the first foreign key is 
added, the estimates get very accurate - except for the case (1).

I've only ran the queries on 10GB data set, but that should be enough. 
The plans are from current master - I'll rerun the script on an older 
release later today.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

From: Jesper Pedersen
Date: 30 March 2017, 00:06:13
Subject: Re: Page Scan Mode in Hash Index

From: Petr Jelinek
Date: 30 March 2017, 02:01:38
Subject: Re: logical replication access control patches

Re: TPC-H Q20 from 1 hour to 19 hours! - Mailing list pgsql-hackers

Attachment

Previous

Next