Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H - Mailing list pgsql-hackers

From Feng Tian
Subject Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
Date
Msg-id CAFWGqnsxryEevA5A_CqT3dExmTaT44mBpNTy8TWVsSVDS71QMg@mail.gmail.com
Whole thread Raw
In response to pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Wed, Jun 17, 2015 at 10:52 AM, Tomas
Vondra<span dir="ltr"><<a href="mailto:tomas.vondra@2ndquadrant.com"
target="_blank">tomas.vondra@2ndquadrant.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0
00 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br /><br /> I'm currently running some tests on a 3TB TPC-H
dataset, and I tripped over a pretty bad n_distinct underestimate, causing OOM in HashAgg (which somehow illustrates
theimportance of the memory-bounded hashagg patch Jeff Davis is working on).<br /><br /> The problem is Q18,
particularlythis simple subquery:<br /><br />     select l_orderkey<br />     from lineitem<br />     group by
l_orderkey<br/>     having sum(l_quantity) > 313;<br /><br /> which is planned like this:<br /><br />              
                QUERY PLAN<br /> ---------------------------------------------------------------------------------<br
/> HashAggregate  (cost=598510163.92..598515393.93 rows=418401 width=12)<br />    Group Key: l_orderkey<br />  
 Filter:(sum(l_quantity) > '313'::double precision)<br />    ->  Seq Scan on lineitem  (cost=0.00..508509923.28
rows=18000048128width=12)<br /> (4 rows)<br /><br /> but sadly, in reality the l_orderkey cardinality looks like
this:<br/><br />     tpch=# select count(distinct l_orderkey) from lineitem;<br />        count<br />    
------------<br/>      4500000000<br />     (1 row)<br /><br /> That's a helluva difference - not the usual one or two
ordersof magnitude, but 10000x underestimate.<br /><br /> The usual thing to do in this case is increasing statistics
target,and while this improves the estimate, the improvement is rather small:<br /><br />    statistics target   
 estimate       difference<br />    --------------------------------------------------<br />    100                   
 429491            10000<br />    1000                   4240418              1000<br />    10000               
 42913759              100<br /><br /> I find the pattern rather strange - every time the statistics target increases
10x,the difference decreases 10x - maybe that's natural, but the perfect proportionality is suspicious IMHO.<br /><br
/>Also, this is a quite large dataset - the table has ~18 billion rows, and even with target=10000 we're sampling only
3Mrows, which is ~0.02%. That's a tiny sample, so inaccuracy is naturally expected, but OTOH the TPC-H dataset is damn
uniform- there's pretty much no skew in the distributions AFAIK. So I'd expect a slightly better result.<br /><br />
Withtarget=10000 the plan switches to GroupAggregate, because the estimate gets sufficient to exceed work_mem (2GB).
Butit's still way off, and it's mostly just a lucky coincidence.<br /><br /> So I'm wondering if there's some bug
becauseof the dataset size (an integer overflow or something like), so I added a bunch of logging into the estimator,
loggingall the parameters computed:<br /><br /> target=100 (samplerows=30000)<br /> -----------------------------<br />
WARNING: attnum=1 attname=l_orderkey f1=27976 ndistinct=28977 nmultiple=1001 toowide_cnt=0 d=28977
numer=869310000.000000denom=2024.046627 stadistinct=429491.094029<br /> WARNING:  ndistinct estimate attnum=1
attname=l_orderkeycurrent=429491.09 adaptive=443730.00<br /><br /> target=1000 (samplerows=300000)<br />
-------------------------------<br/> WARNING:  attnum=1 attname=l_orderkey f1=279513 ndistinct=289644 nmultiple=10131
toowide_cnt=0d=289644 numer=86893200000.000000 denom=20491.658538 stadistinct=4240418.111618<br /> WARNING:  ndistinct
estimateattnum=1 attname=l_orderkey current=4240418.11 adaptive=4375171.00<br /><br /> target=10000
(samplerows=3000000)<br/> ---------------------------------<br /> WARNING:  attnum=1 attname=l_orderkey f1=2797888
ndistinct=2897799nmultiple=99911 toowide_cnt=0 d=2897799 numer=8693397000000.000000 denom=202578.313396
stadistinct=42913759.396282<br/> WARNING:  ndistinct estimate attnum=1 attname=l_orderkey current=42913759.40
adaptive=44449882.00<br/><br /> It's totalrows=18000049031 in all cases. The logs also show estimate produced by the
adaptiveestimate (discussed in a separate thread), but apparently that does not change the estimates much :-(<br /><br
/>Any ideas?<br /><br /> --<br /> Tomas Vondra                   <a href="http://www.2ndQuadrant.com/" rel="noreferrer"
target="_blank">http://www.2ndQuadrant.com/</a><br/> PostgreSQL Development, 24x7 Support, Remote DBA, Training &
Services<spanclass="HOEnZb"><font color="#888888"><br /><br /><br /> -- <br /> Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org"target="_blank">pgsql-hackers@postgresql.org</a>)<br /> To make changes to
yoursubscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers" rel="noreferrer"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></font></span></blockquote></div><br
/></div><divclass="gmail_extra"><br /></div><div class="gmail_extra">While better sample/stats is important for
choosinga good plan, in this query, hash agg is really the right plan.   If a sort agg is chosen, the performance will
bereally really bad.   The patch that Jeff is working on is critical for a decent TPCH number (unless you have
unlimitedamount of memory).</div><div class="gmail_extra"><br /></div><div class="gmail_extra">Thanks,</div><div
class="gmail_extra">   </div></div> 

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: anole: assorted stability problems
Next
From: Fabien COELHO
Date:
Subject: Re: checkpointer continuous flushing