Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Memory-Bounded Hash Aggregation |
Date | |
Msg-id | b2c6d4c1-9298-4dfd-2763-cd080b23a826@iki.fi Whole thread Raw |
In response to | Re: Memory-Bounded Hash Aggregation (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Memory-Bounded Hash Aggregation
(Jeff Davis <pgsql@j-davis.com>)
|
List | pgsql-hackers |
On 03/02/2020 20:29, Jeff Davis wrote: > 1. Use a minheap for the freelist. The original design used an array > that had to be sorted between a read (which frees a block) and a write > (which needs to sort the array to consume the lowest block number). The > comments said: > > * sorted. This is an efficient way to handle it because we expect > cycles > * of releasing many blocks followed by re-using many blocks, due to > * the larger read buffer. > > But I didn't find a case where that actually wins over a simple > minheap. With that in mind, a minheap seems closer to what one might > expect for that purpose, and more robust when the assumptions don't > hold up as well. If someone knows of a case where the re-sorting > behavior is important, please let me know. A minheap certainly seems more natural for that. I guess re-sorting the array would be faster in the extreme case that you free almost all of the blocks, and then consume almost all of the blocks, but I don't think the usage pattern is ever that extreme. Because if all the data fit in memory, we wouldn't be spilling in the first place. I wonder if a more advanced heap like the pairing heap or fibonacci heap would perform better? Probably doesn't matter in practice, so better keep it simple... > Changing to a minheap effectively solves the problem for HashAgg, > though in theory the memory consumption of the freelist itself could > become significant (though it's only 0.1% of the free space being > tracked). We could fairly easily spill parts of the freelist to disk, too, if necessary. But it's probably not worth the trouble. > 2. Lazily-allocate the read buffer. The write buffer was always lazily- > allocated, so this patch creates better symmetry. More importantly, it > means freshly-rewound tapes don't have any buffer allocated, so it > greatly expands the number of tapes that can be managed efficiently as > long as only a limited number are active at once. Makes sense. > 3. Allow expanding the number of tapes for an existing tape set. This > is useful for HashAgg, which doesn't know how many tapes will be needed > in advance. I'd love to change the LogicalTape API so that you could allocate and free tapes more freely. I wrote a patch to do that, as part of replacing tuplesort.c's polyphase algorithm with a simpler one (see [1]), but I never got around to committing it. Maybe the time is ripe to do that now? [1] https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83@iki.fi - Heikki
pgsql-hackers by date: