Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Use generation context to speed up tuplesorts |
Date | |
Msg-id | d24b7d97-7a8f-7416-b932-2c0a781c2409@enterprisedb.com Whole thread Raw |
In response to | Re: Use generation context to speed up tuplesorts (Ronan Dunklau <ronan.dunklau@aiven.io>) |
Responses |
Re: Use generation context to speed up tuplesorts
|
List | pgsql-hackers |
On 1/20/22 08:36, Ronan Dunklau wrote: > Le jeudi 20 janvier 2022, 02:31:15 CET David Rowley a écrit : >> On Tue, 18 Jan 2022 at 19:45, Julien Rouhaud <rjuju123@gmail.com> wrote: >>> I'm also a bit confused about which patch(es) should actually be reviewed >>> in that thread. My understanding is that the original patch might not be >>> relevant anymore but I might be wrong. The CF entry should probably be >>> updated to clarify that, with an annotation/ title change / author update >>> or something. >>> >>> In the meantime I will switch the entry to Waiting on Author. >> >> I think what's going on here is a bit confusing. There's quite a few >> patches on here that aim to do very different things. >> >> The patch I originally proposed was just to make tuplesort use >> generation memory contexts. This helps speed up tuplesort because >> generation contexts allow palloc() to be faster than the standard >> allocator. Additionally, there's no power-of-2 wastage with generation >> contexts like there is with the standard allocator. This helps fit >> more tuples on fewer CPU cache lines. >> >> I believe Andres then mentioned that the fixed block size for >> generation contexts might become a problem. With the standard >> allocator the actual size of the underlying malloc() grows up until a >> maximum size. With generation contexts this is always the same, so we >> could end up with a very large number of blocks which means lots of >> small mallocs which could end up slow. Tomas then posted a series of >> patches to address this. >> >> I then posted another patch that has the planner make a choice on the >> size of the blocks to use for the generation context based on the >> tuple width and number of tuple estimates. My hope there was to >> improve performance further by making a better choice for how large to >> make the blocks in the first place. This reduces the need for Tomas' >> patch, but does not eliminate the need for it. >> >> As of now, I still believe we'll need Tomas' patches to allow the >> block size to grow up to a maximum size. I think those patches are >> likely needed before we think about making tuplesort use generation >> contexts. The reason I believe this is that we don't always have good >> estimates for the number of tuples we're going to sort. nodeSort.c is >> fairly easy as it just fetches tuples once and then sorts them. The >> use of tuplesort for nodeIncrementalSort.c is much more complex as >> we'll likely be performing a series of smaller sorts which are much >> harder to get size estimates for. This means there's likely no magic >> block size that we can use for the generation context that's used to >> store the tuples in tuplesort. This is also the case for order by >> aggregates and probably most other usages of tuplesort. >> > > You left out the discussion I started about glibc's malloc specific behaviour. > > I tried to demonstrate that a big part of the performance gain you were seeing > were not in fact due to our allocator by itself, but by the way different block > sizes allocations interact with this malloc implementation and how it handles > releasing memory back to the system. I also tried to show that we can give > DBAs more control over how much memory to "waste" as the price for faster > allocations. > > I agree that the generation context memory manager is useful in the tuplesort > context, if only for the fact that we fall back to disk less quickly due to > the reduced wastage of memory, but I would be wary of drawing conclusions too > quickly based on a specific malloc implementation which shows threshold effects, > which are compounded by our growing block sizes code in general. > Yeah, I share this concern - how much of the improvement is real and how much is due to accidentally not hitting some glibc-specific self-tuning stuff? Would a realistic workloads get the same improvement or would it cause regression? What about non-glibc systems with other malloc? I'm not against pushing the generation context improvements, e.g. something like the patches posted in [1], because those seem reasonable in general. But I'm somewhat confused about the last patch (adjusting allocChunkLimit) causing fairly significant regressions. I'll try investigating this a bit more next week. regards [1] https://www.postgresql.org/message-id/flat/bcdd4e3e-c12d-cd2b-7ead-a91ad416100a%40enterprisedb.com#a5ecd4e5a9e7d5b07ff25b5684da894f -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: