Re: Spilling hashed SetOps and aggregates to disk - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Spilling hashed SetOps and aggregates to disk
Date
Msg-id 5bc4336f-c243-183b-4dc2-9bbf2ff22ee2@2ndquadrant.com
Whole thread Raw
In response to Re: Spilling hashed SetOps and aggregates to disk  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On 06/11/2018 08:13 PM, Jeff Davis wrote:
> On Mon, 2018-06-11 at 19:33 +0200, Tomas Vondra wrote:
>> For example we hit the work_mem limit after processing 10% tuples,
>>  switching to sort would mean spill+sort of 900GB of data. Or we 
>> might say - hmm, we're 10% through, so we expect hitting the limit
>> 10x, so let's spill the hash table and then do sort on that,
>> writing and sorting only 10GB of data. (Or merging it in some
>> hash-based way, per Robert's earlier message.)
> 
> Your example depends on large groups and a high degree of group
> clustering. That's fine, but it's a special case,
> 

True, it's a special case and it won't work for other cases. It was
merely an example for Andres.

OTOH it's not entirely unrealistic, I think. Consider something like

  SELECT
    extract(year from ts) AS y,
    extract(month from ts) AS m,
    extract(day from ts) AS d,
    string_agg(x),
    array_agg(y)
  FROM fact_table
  GROUP BY y, m, d;

which is likely very correlated (assuming new data is appended to the
table), and the string_agg/array_agg are likely to produce fairly large
groups (about proportional to the number of tuples in the group).

Another example might be about HLL aggregate, although in that case the
transition state does not grow, so it may not be that bad (and the
default estimate of 1kB would work pretty nicely). But there certainly
are other aggregates with large transition state, where this might not
be the case, and we currently have no way to communicate that to the
planner - except for setting work_mem much lower :-/

However, I now realize I've ignored the fact that we typically don't
sort the whole table but only a very few columns, so the example was not
entirely fair - we would not sort the whole remaining 900GB but likely
much less.

> and complexity does have a cost, too.

Sure.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: PostgreSQL vs SQL Standard
Next
From: Julien Rouhaud
Date:
Subject: Re: Proposal: Partitioning Advisor for PostgreSQL