Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
Date
Msg-id CAMkU=1xWa5GvE3tsrcUsWCQaZ_mSjLtKGEAEbee5VbLFa47SCA@mail.gmail.com
Whole thread Raw
In response to pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Wed, Jun 17, 2015 at 10:52 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

I'm currently running some tests on a 3TB TPC-H data set, and I tripped over a pretty bad n_distinct underestimate, causing OOM in HashAgg (which somehow illustrates the importance of the memory-bounded hashagg patch Jeff Davis is working on).

The problem is Q18, particularly this simple subquery:

    select l_orderkey
    from lineitem
    group by l_orderkey
    having sum(l_quantity) > 313;

which is planned like this:

                               QUERY PLAN
---------------------------------------------------------------------------------
 HashAggregate  (cost=598510163.92..598515393.93 rows=418401 width=12)
   Group Key: l_orderkey
   Filter: (sum(l_quantity) > '313'::double precision)
   ->  Seq Scan on lineitem  (cost=0.00..508509923.28 rows=18000048128 width=12)
(4 rows)

but sadly, in reality the l_orderkey cardinality looks like this:

    tpch=# select count(distinct l_orderkey) from lineitem;
       count
    ------------
     4500000000
    (1 row)

That's a helluva difference - not the usual one or two orders of magnitude, but 10000x underestimate.


Is the row order in the table correlated with the value l_orderkey?

Could you create copy of the table ordered at random, and see if it exhibits the same estimation issue?

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: pgbench - allow backslash-continuations in custom scripts
Next
From: Tom Lane
Date:
Subject: Re: pgbench - allow backslash-continuations in custom scripts