I'm currently running some tests on a 3TB TPC-H data set, and I tripped over a pretty bad n_distinct underestimate, causing OOM in HashAgg (which somehow illustrates the importance of the memory-bounded hashagg patch Jeff Davis is working on).
The problem is Q18, particularly this simple subquery:
select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 313;
which is planned like this:
QUERY PLAN --------------------------------------------------------------------------------- HashAggregate (cost=598510163.92..598515393.93 rows=418401 width=12) Group Key: l_orderkey Filter: (sum(l_quantity) > '313'::double precision) -> Seq Scan on lineitem (cost=0.00..508509923.28 rows=18000048128 width=12) (4 rows)
but sadly, in reality the l_orderkey cardinality looks like this: