Home > mailing lists

Re: Spilling hashed SetOps and aggregates to disk - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Spilling hashed SetOps and aggregates to disk
Date	June 13, 2018 22:53:10
Msg-id	1528919590.8818.93.camel@j-davis.com Whole thread Raw
In response to	Re: Spilling hashed SetOps and aggregates to disk (Claudio Freire <klaussfreire@gmail.com>)
Responses	Re: Spilling hashed SetOps and aggregates to disk
List	pgsql-hackers

Tree view

On Wed, 2018-06-13 at 11:50 -0300, Claudio Freire wrote:
> In essence, the technique I've been using always uses a fixed-size
> hash table to do partial grouping. The table is never grown, when
> collisions do happen, the existing entry in the hash table is flushed
> to disk and the aggregate state in that bucket reset for the incoming
> key. To avoid excessive spilling due to frequent collisions, we use a
> kind of "lazy cuckoo" hash table. Lazy in the sense that it does no
> relocations, it just checks 2 hash values, and if it has to evict, it
> evicts the "oldest" value (with some cheap definition of "old").

...

> An adaptive hash agg node would start as a hash agg, and turn into a
> "partial hash agg + sort + final group agg" when OOM is detected.
> 
> The benefit over ordinary sort+group agg is that the sort is
> happening
> on a potentially much smaller data set. When the buffer is large
> enough to capture enough key repetitions, the output of the partial
> hash agg can be orders of magnitude smaller than the scan.
> 
> My proposal was to use this for all group aggs, not only when the
> planner chooses a hash agg, because it can speed up the sort and
> reduce temp storage considerably. I guess the trick lies in
> estimating
> that cardinality reduction to avoid applying this when there's no
> hope
> of getting a payoff. The overhead of such a lazy hash table isn't
> much, really. But yes, its cache locality is terrible, so it needs to
> be considered.

I think this is a promising approach because it means the planner has
one less decion to make. And the planner doesn't have great information
to make its decision on, anyway (ndistinct estimates are hard enough on
plain tables, and all bets are off a few levels up in the plan).

Unfortunately, it means that either all data types must support hashing
and sorting[1], or we need to have a fallback path that can get by with
hashing alone (which would not be tested nearly as well as more typical
paths).

I still like this idea, though.

Regards,
    Jeff Davis


[1] https://www.postgresql.org/message-id/9584.1528739975%40sss.pgh.pa.
us

pgsql-hackers by date:

From: Fabien COELHO
Date: 13 June 2018, 22:44:52
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

From: Alvaro Herrera
Date: 13 June 2018, 22:59:24
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

Re: Spilling hashed SetOps and aggregates to disk - Mailing list pgsql-hackers

Previous

Next