Home > mailing lists

Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
Date	June 20, 2015 14:17:30
Msg-id	CA+TgmoYtOZyfFp47KBUvL5+Q=RZJcHM+Lk7=rd6cvihfk36c5A@mail.gmail.com Whole thread Raw
In response to	pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H
List	pgsql-hackers

Tree view

On Wed, Jun 17, 2015 at 1:52 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> I'm currently running some tests on a 3TB TPC-H data set, and I tripped over
> a pretty bad n_distinct underestimate, causing OOM in HashAgg (which somehow
> illustrates the importance of the memory-bounded hashagg patch Jeff Davis is
> working on).

Stupid question, but why not just override it using ALTER TABLE ...
ALTER COLUMN ... SET (n_distinct = ...)?

I think it's been discussed quite often on previous threads that you
need to sample an awful lot of the table to get a good estimate for
n_distinct.  We could support that, but it would be expensive, and it
would have to be done again every time the table is auto-analyzed.
The above syntax supports nailing the estimate to either an exact
value or a percentage of the table, and I'm not sure why that isn't
good enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Joel Jacobson
Date: 20 June 2015, 14:12:17
Subject: Re: pg_stat_*_columns?

From: Paul Ramsey
Date: 20 June 2015, 14:20:25
Subject: Extension support for postgres_fdw

Re: pretty bad n_distinct estimate, causing HashAgg OOM on TPC-H - Mailing list pgsql-hackers

Previous

Next