Re: TPC-H Q20 from 1 hour to 19 hours! - Mailing list pgsql-hackers

From Robert Haas
Subject Re: TPC-H Q20 from 1 hour to 19 hours!
Date
Msg-id CA+TgmoZJCKAp70A8kFiq5ZfZAMiA99g0wN60sbqBVR+LdMz+4g@mail.gmail.com
Whole thread Raw
In response to [HACKERS] TPC-H Q20 from 1 hour to 19 hours!  (Rafia Sabih <rafia.sabih@enterprisedb.com>)
List pgsql-hackers
On Mon, Mar 6, 2017 at 1:22 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
> This is to bring to notice a peculiar instance I found recently while
> running TPC-H benchmark queries. Q20 of the benchmark took 19 hours to
> complete ...

That's bad.

> It is clear that selectivity estimations are really bad in this case
> particularly at node,
>   ->  Merge Join  (cost=52959586.72..60024468.82 rows=85 width=16)
> (actual time=1525322.753..2419045.641 rows=1696742 loops=1)
>                            Merge Cond: ((lineitem.l_partkey =
> partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
>                            Join Filter:
> ((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
>                            Rows Removed by Join Filter: 3771

So, the selectivity estimation here is bad both before and after Tom's
commit, but it's significantly worse after (actual value 1696742, old
estimate 3771, new estimate 85).

> Still this puzzled me as during earlier runs of this benchmark I never
> encountered such prolonged running times. On further investigation I
> found that on reverting the commit
> 7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Sat Dec 17 15:28:54 2016 -0500
>     Fix FK-based join selectivity estimation for semi/antijoins.

I don't think the problem originates at the Merge Join, though,
because the commit says that at is fixing semi and anti-join estimates
- this is a plain inner join, so in theory it shouldn't change.
However, it's a bit hard for me to piece through these plans, the
formatting kind of got messed up - things are wrapped.  Could you
possibly attach the plans as attachments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [PATCH] Reduce src/test/recovery verbosity
Next
From: Alvaro Herrera
Date:
Subject: Re: Schedule and Release Management Team for PG10