Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Use generation context to speed up tuplesorts |
Date | |
Msg-id | CAApHDvpnpoGrEcoJT0bQjCR55_zqZgMhd0mmgQ748cyXZoQQ8w@mail.gmail.com Whole thread Raw |
In response to | Re: Use generation context to speed up tuplesorts (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Use generation context to speed up tuplesorts
|
List | pgsql-hackers |
On Sun, 13 Feb 2022 at 09:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I'm not against pushing the generation context improvements, e.g. > something like the patches posted in [1], because those seem reasonable > in general. But I'm somewhat confused about the last patch (adjusting > allocChunkLimit) causing fairly significant regressions. My thoughts here are that we should pursue the patch that allows growing of the block size in the generation context. I do think the investigation of the malloc() behaviour and performance is worth some pursuit, I just don't think it should be done here or as part of this patch. I think it's a fairly large undertaking to ensure any changes to the malloc options are an overall win and not just a win for whatever benchmark we test them on. If they're really an overall win, then shouldn't glibc know about them and maybe adopt them as the standard options? To get the ball rolling again on the changes to the generation context, I took your patches, Tomas, and fixed a few things around the keeper blocks not working correctly. I've now made the keeper block behaviour match how aset.c works. There the keeper block is allocated in the same allocation as the context itself. That reduces the number of malloc() calls when setting up a new memory context. On testing this, I discovered a problem regarding the use of generation contexts for storing tuples in tuplesort.c. The problem is when the sort is bounded (e.g. SELECT * FROM ... ORDER BY .. LIMIT N), we'll store and directly throw away any tuples that sort greater than the existing set of N tuples. This causes a problem because once we move away from using the keeper block, initially, we'll at some point end up storing a tuple in a newly allocated generation block then proceed to pfree() that tuple due to it sorting greater than the existing N tuples. Because we currently always free() newly empty blocks, we end up thrashing free()/malloc(). This behaviour results in one of the regression test queries in tuplesort.sql going from taking 10 ms to 1 minute, on my machine. I had a few thoughts about how best to work around this problem. Firstly, I thought that we should just *not* use the Generation context when the sort is bounded. That turns out to be a bit icky as we only make the sort bounding when we call tuplesort_set_bound(), which is after we've allocated the memory context for storing tuples. I thought about maybe just adding another bool flag named "allowBounding" and use a Generation context if that flag is not set. I just don't really like that as a solution. We also cannot make the set bound part of the tuplesort_begin_heap() call as nodeIncrementalSort.c does reset the bound. Possibly we could ditch tuplesort_set_bound() and invent tuplesort_reset_bound() and change the API so the bound is set when making the Tuplesortstate. The other way I thought to fix it was by changing the logic for when generation blocks are freed. In the problem case mentioned above, the block being freed is the current block (which was just allocated). I made some changes to adjust this behaviour so that we no longer free the block when the final chunk is pfree()'d. Instead, that now lingers and can be reused by future allocations, providing they fit inside it. If they don't fit then we must free the block, otherwise it would just remain empty forever. The problem I see with this method is that there still could be some pathological case that causes us to end up storing just a single tuple per generation block. Hitting the worst case here does seem pretty unlikely, so I'm a bit drawn between if we should just play it safe and stick to the standard memory allocator for bounded sort. I've attached 2 patches. The 0001 patch adds the new logic to generation.c to allow the block to grow and also adds the logic to skip freeing the current block when it becomes empty. The 0002 patch simply makes tuplesort.c use the generation context for tuples unconditionally. I've also attached the benchmark results I got from this patch running the attached sortbenchall script. It's fairly obvious that there are performance improvements here even when the malloc() usage pattern is similar to aset.c's David
Attachment
pgsql-hackers by date: