On Mon, Mar 6, 2017 at 1:22 AM, Rafia Sabih
<rafia.sabih@enterprisedb.com> wrote:
> This is to bring to notice a peculiar instance I found recently while
> running TPC-H benchmark queries. Q20 of the benchmark took 19 hours to
> complete ...
That's bad.
> It is clear that selectivity estimations are really bad in this case
> particularly at node,
> -> Merge Join (cost=52959586.72..60024468.82 rows=85 width=16)
> (actual time=1525322.753..2419045.641 rows=1696742 loops=1)
> Merge Cond: ((lineitem.l_partkey =
> partsupp.ps_partkey) AND (lineitem.l_suppkey = partsupp.ps_suppkey))
> Join Filter:
> ((partsupp.ps_availqty)::numeric > ((0.5 * sum(lineitem.l_quantity))))
> Rows Removed by Join Filter: 3771
So, the selectivity estimation here is bad both before and after Tom's
commit, but it's significantly worse after (actual value 1696742, old
estimate 3771, new estimate 85).
> Still this puzzled me as during earlier runs of this benchmark I never
> encountered such prolonged running times. On further investigation I
> found that on reverting the commit
> 7fa93eec4e0c9c3e801e3c51aa4bae3a38aaa218
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date: Sat Dec 17 15:28:54 2016 -0500
> Fix FK-based join selectivity estimation for semi/antijoins.
I don't think the problem originates at the Merge Join, though,
because the commit says that at is fixing semi and anti-join estimates
- this is a plain inner join, so in theory it shouldn't change.
However, it's a bit hard for me to piece through these plans, the
formatting kind of got messed up - things are wrapped. Could you
possibly attach the plans as attachments?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company