Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers
| From | Ronan Dunklau | 
|---|---|
| Subject | Re: Use generation context to speed up tuplesorts | 
| Date | |
| Msg-id | 3470027.R56niFO833@aivenronan Whole thread Raw | 
| In response to | Re: Use generation context to speed up tuplesorts (Ronan Dunklau <ronan.dunklau@aiven.io>) | 
| Responses | Re: Use generation context to speed up tuplesorts | 
| List | pgsql-hackers | 
Le jeudi 20 janvier 2022, 08:36:46 CET Ronan Dunklau a écrit : > Le jeudi 20 janvier 2022, 02:31:15 CET David Rowley a écrit : > > On Tue, 18 Jan 2022 at 19:45, Julien Rouhaud <rjuju123@gmail.com> wrote: > > > I'm also a bit confused about which patch(es) should actually be > > > reviewed > > > in that thread. My understanding is that the original patch might not > > > be > > > relevant anymore but I might be wrong. The CF entry should probably be > > > updated to clarify that, with an annotation/ title change / author > > > update > > > or something. > > > > > > In the meantime I will switch the entry to Waiting on Author. > > > > I think what's going on here is a bit confusing. There's quite a few > > patches on here that aim to do very different things. > > > > The patch I originally proposed was just to make tuplesort use > > generation memory contexts. This helps speed up tuplesort because > > generation contexts allow palloc() to be faster than the standard > > allocator. Additionally, there's no power-of-2 wastage with generation > > contexts like there is with the standard allocator. This helps fit > > more tuples on fewer CPU cache lines. > > > > I believe Andres then mentioned that the fixed block size for > > generation contexts might become a problem. With the standard > > allocator the actual size of the underlying malloc() grows up until a > > maximum size. With generation contexts this is always the same, so we > > could end up with a very large number of blocks which means lots of > > small mallocs which could end up slow. Tomas then posted a series of > > patches to address this. > > > > I then posted another patch that has the planner make a choice on the > > size of the blocks to use for the generation context based on the > > tuple width and number of tuple estimates. My hope there was to > > improve performance further by making a better choice for how large to > > make the blocks in the first place. This reduces the need for Tomas' > > patch, but does not eliminate the need for it. > > > > As of now, I still believe we'll need Tomas' patches to allow the > > block size to grow up to a maximum size. I think those patches are > > likely needed before we think about making tuplesort use generation > > contexts. The reason I believe this is that we don't always have good > > estimates for the number of tuples we're going to sort. nodeSort.c is > > fairly easy as it just fetches tuples once and then sorts them. The > > use of tuplesort for nodeIncrementalSort.c is much more complex as > > we'll likely be performing a series of smaller sorts which are much > > harder to get size estimates for. This means there's likely no magic > > block size that we can use for the generation context that's used to > > store the tuples in tuplesort. This is also the case for order by > > aggregates and probably most other usages of tuplesort. > > You left out the discussion I started about glibc's malloc specific > behaviour. > > I tried to demonstrate that a big part of the performance gain you were > seeing were not in fact due to our allocator by itself, but by the way > different block sizes allocations interact with this malloc implementation > and how it handles releasing memory back to the system. I also tried to > show that we can give DBAs more control over how much memory to "waste" as > the price for faster allocations. > > I agree that the generation context memory manager is useful in the > tuplesort context, if only for the fact that we fall back to disk less > quickly due to the reduced wastage of memory, but I would be wary of > drawing conclusions too quickly based on a specific malloc implementation > which shows threshold effects, which are compounded by our growing block > sizes code in general. > > I have on my TODO list to run the same kind of benchmarks using jemalloc, to > better isolate the performance gains expected from the generation allocator > itself from the side effect of avoiding glibc's pattern of releasing memory > back to the kernel too quickly. > I've run the same 1-to-32-columns sort benchmark, using both glibc malloc and jemalloc, with the standard aset memory manager or with David's generation v2 patch. To use jemalloc, I just use a LD_PRELOAD env variable before starting postgres. I have not tried to measure jemalloc's memory footprint like I did for glibc's malloc in the previous benchmark, as I'm not trying to evaluate jemalloc as a glibc's malloc replacement. Please find the results attached. As I suspected, most of the gain observed with David's patch come from working around glibc's malloc idiosyncracies but the good news is that it also helps (to a lesser degree) with jemalloc. I'm not sure how that would behave with growing block sizes though: as Tomas patch and stress-testing benchmarks showed, we can hit some particular thresholds (and regressions) in glibc's self-tuning behaviour. -- Ronan Dunklau
Attachment
pgsql-hackers by date: