Re: WIP: multivariate statistics / proof of concept - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: WIP: multivariate statistics / proof of concept
Date
Msg-id 6a0b84cf-24e3-bf28-c4ac-b3ef33c3b55f@2ndquadrant.com
Whole thread Raw
In response to Re: WIP: multivariate statistics / proof of concept  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: WIP: multivariate statistics / proof of concept
List pgsql-hackers
On 11/21/2016 11:10 PM, Robert Haas wrote:
> [ reviving an old multivariate statistics thread ]
>
> On Thu, Nov 13, 2014 at 6:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 12 October 2014 23:00, Tomas Vondra <tv@fuzzy.cz> wrote:
>>
>>> It however seems to be working sufficiently well at this point, enough
>>> to get some useful feedback. So here we go.
>>
>> This looks interesting and useful.
>>
>> What I'd like to check before a detailed review is that this has
>> sufficient applicability to be useful.
>>
>> My understanding is that Q9 and Q18 of TPC-H have poor plans as a
>> result of multi-column stats errors.
>>
>> Could you look at those queries and confirm that this patch can
>> produce better plans for them?
>
> Tomas, did you ever do any testing in this area?  One of my
> colleagues, Rafia Sabih, recently did some testing of TPC-H queries @
> 20 GB.  Q18 actually doesn't complete at all right now because of an
> issue with the new simplehash implementation.  I reported it to Andres
> and he tracked it down, but hasn't posted the patch yet - see
> http://archives.postgresql.org/message-id/20161115192802.jfbec5s6ougxwicp@alap3.anarazel.de
>
> Of the remaining queries, the slowest are Q9 and Q20, and both of them
> have serious estimation errors.  On Q9, things go wrong here:
>
>                                  ->  Merge Join
> (cost=5225092.04..6595105.57 rows=154 width=47) (actual
> time=103592.821..149335.010 rows=6503988 loops=1)
>                                        Merge Cond:
> (partsupp.ps_partkey = lineitem.l_partkey)
>                                        Join Filter:
> (lineitem.l_suppkey = partsupp.ps_suppkey)
>                                        Rows Removed by Join Filter: 19511964
>                                        ->  Index Scan using> [snip]
>
> Rows Removed by Filter: 756627
>
> The estimate for the index scan on partsupp is essentially perfect,
> and the lineitem-part join is off by about 3x.  However, the merge
> join is off by about 4000x, which is real bad.
>

The patch only deals with statistics on base relations, no joins, at 
this point. It's meant to be extended in that direction, so the syntax 
supports it, but at this point that's all. No joins.

That being said, this estimate should be improved in 9.6, when you 
create a foreign key between the tables. In fact, that patch was exactly 
about Q9.

This is how the join estimate looks on scale 1 without the FK between 
the two tables:
                          QUERY PLAN
----------------------------------------------------------------------- Merge Join  (cost=19.19..700980.12 rows=2404
width=261)  Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND                (lineitem.l_suppkey =
partsupp.ps_suppkey))  ->  Index Scan using idx_lineitem_part_supp on lineitem                (cost=0.43..605856.84
rows=6001117width=117)   ->  Index Scan using partsupp_pkey on partsupp                (cost=0.42..61141.76 rows=800000
width=144)
(4 rows)


and with the foreign key:
                             QUERY PLAN
----------------------------------------------------------------------- Merge Join  (cost=19.19..700980.12 rows=6001117
width=261)            (actual rows=6001215 loops=1)   Merge Cond: ((lineitem.l_partkey = partsupp.ps_partkey) AND
        (lineitem.l_suppkey = partsupp.ps_suppkey))   ->  Index Scan using idx_lineitem_part_supp on lineitem
    (cost=0.43..605856.84 rows=6001117 width=117)                (actual rows=6001215 loops=1)   ->  Index Scan using
partsupp_pkeyon partsupp                (cost=0.42..61141.76 rows=800000 width=144)                (actual rows=6001672
loops=1)Planning time: 3.840 ms Execution time: 21987.913 ms
 
(6 rows)


> On Q20, things go wrong here:>
> [snip]
>
> The estimate for the GroupAggregate feeding one side of the merge join
> is quite accurate.  The estimate for the part-partsupp join on the
> other side is off by 8x.  Then things get much worse: the estimate for
> the merge join is off by 400x.
>

Well, most of the estimation error comes from the join, but sadly the 
aggregate makes using the foreign keys impossible - at least in the 
current version. I don't know if it can be improved, somehow.

> I'm not really sure whether the multivariate statistics stuff will fix
> this kind of case or not, but if it did it would be awesome.
>

Join statistics are something I'd like to add eventually, but I don't 
see how it could happen in the first version. Also, the patch received 
no reviews this CF, and making it even larger is unlikely to make it 
more attractive.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?
Next
From: Pavel Stehule
Date:
Subject: Re: Danger of automatic connection reset in psql