Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Memory-Bounded Hash Aggregation
Date
Msg-id 2688fb94f3d0e571768bf851b7fde0eae780e898.camel@j-davis.com
Whole thread Raw
In response to Re: Memory-Bounded Hash Aggregation  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Memory-Bounded Hash Aggregation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Memory-Bounded Hash Aggregation  (Pengzhou Tang <ptang@pivotal.io>)
Re: Memory-Bounded Hash Aggregation  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
Committed.

There's some future work that would be nice (some of these are just
ideas and may not be worth it):

* Refactor MemoryContextMemAllocated() to be a part of
MemoryContextStats(), but allow it to avoid walking through the blocks
and freelists.

* Improve the choice of the initial number of buckets in the hash
table. For this patch, I tried to preserve the existing behavior of
estimating the number of groups and trying to initialize with that many
buckets. But my performance tests seem to indicate this is not the best
approach. More work is needed to find what we should really do here.

* For workloads that are not in work_mem *or* system memory, and need
to actually go to storage, I see poor CPU utilization because it's not
effectively overlapping CPU and IO work. Perhaps buffering or readahead
changes can improve this, or async IO (even better).

* Project unnecessary attributes away before spilling tuples to disk.

* Improve logtape.c API so that the caller doesn't need to manage a
bunch of tape numbers.

* Improve estimate of the hash entry size. This patch doesn't change
the way the planner estimates it, but I observe that actual size as
seen at runtime is significantly different. This is connected to the
initial number of buckets for the hash table.

* In recursive steps, I don't have a good estimate for the number of
groups, so I just estimate it as the number of tuples in that spill
tape (which is pessimistic). That could be improved by doing a real
cardinality estimate as the tuples are spilling (perhaps with HLL?).

* Many aggregates with pass-by-ref transition states don't provide a
great aggtransspace. We should consider doing something smarter, like
having negative numbers represent a number that should be multiplied by
the size of the group (e.g. ARRAY_AGG would have a size dependent on
the group size, not a constant).

* If we want to handle ARRAY_AGG (and the like) well, we can consider
spilling the partial states in the hash table whem the memory is full.
That would add a fair amount of complexity because there would be two
types of spilled data (tuples and partial states), but it could be
useful in some cases.

Regards,
    Jeff Davis





pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pgsql: Disk-based Hash Aggregation.
Next
From: Tomas Vondra
Date:
Subject: Re: Memory-Bounded Hash Aggregation