Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Use generation context to speed up tuplesorts
Date
Msg-id d24b7d97-7a8f-7416-b932-2c0a781c2409@enterprisedb.com
Whole thread Raw
In response to Re: Use generation context to speed up tuplesorts  (Ronan Dunklau <ronan.dunklau@aiven.io>)
Responses Re: Use generation context to speed up tuplesorts  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On 1/20/22 08:36, Ronan Dunklau wrote:
> Le jeudi 20 janvier 2022, 02:31:15 CET David Rowley a écrit :
>> On Tue, 18 Jan 2022 at 19:45, Julien Rouhaud <rjuju123@gmail.com> wrote:
>>> I'm also a bit confused about which patch(es) should actually be reviewed
>>> in that thread.  My understanding is that the original patch might not be
>>> relevant anymore but I might be wrong.  The CF entry should probably be
>>> updated to clarify that, with an annotation/ title change / author update
>>> or something.
>>>
>>> In the meantime I will switch the entry to Waiting on Author.
>>
>> I think what's going on here is a bit confusing. There's quite a few
>> patches on here that aim to do very different things.
>>
>> The patch I originally proposed was just to make tuplesort use
>> generation memory contexts. This helps speed up tuplesort because
>> generation contexts allow palloc() to be faster than the standard
>> allocator. Additionally, there's no power-of-2 wastage with generation
>> contexts like there is with the standard allocator. This helps fit
>> more tuples on fewer CPU cache lines.
>>
>> I believe Andres then mentioned that the fixed block size for
>> generation contexts might become a problem. With the standard
>> allocator the actual size of the underlying malloc() grows up until a
>> maximum size.  With generation contexts this is always the same, so we
>> could end up with a very large number of blocks which means lots of
>> small mallocs which could end up slow.  Tomas then posted a series of
>> patches to address this.
>>
>> I then posted another patch that has the planner make a choice on the
>> size of the blocks to use for the generation context based on the
>> tuple width and number of tuple estimates. My hope there was to
>> improve performance further by making a better choice for how large to
>> make the blocks in the first place.  This reduces the need for Tomas'
>> patch, but does not eliminate the need for it.
>>
>> As of now, I still believe we'll need Tomas' patches to allow the
>> block size to grow up to a maximum size.  I think those patches are
>> likely needed before we think about making tuplesort use generation
>> contexts.  The reason I believe this is that we don't always have good
>> estimates for the number of tuples we're going to sort.  nodeSort.c is
>> fairly easy as it just fetches tuples once and then sorts them.  The
>> use of tuplesort for nodeIncrementalSort.c is much more complex as
>> we'll likely be performing a series of smaller sorts which are much
>> harder to get size estimates for. This means there's likely no magic
>> block size that we can use for the generation context that's used to
>> store the tuples in tuplesort.  This is also the case for order by
>> aggregates and probably most other usages of tuplesort.
>>
> 
> You left out the discussion I started about glibc's malloc specific behaviour.
> 
> I tried to demonstrate that a big part of the performance gain you were seeing 
> were not in fact due to our allocator by itself, but by the way different block 
> sizes allocations interact with this malloc implementation and how it handles 
> releasing memory back to the system. I also tried to show that we can give 
> DBAs more control over how much memory to "waste" as the price for faster 
> allocations.
> 
> I agree that the generation context memory manager is useful in the tuplesort 
> context, if only for the fact that we fall back to disk less quickly due to 
> the reduced wastage of memory, but I would be wary of drawing conclusions too 
> quickly based on a specific malloc implementation which shows threshold effects, 
> which are compounded by our growing block sizes code in general.
> 

Yeah, I share this concern - how much of the improvement is real and how
much is due to accidentally not hitting some glibc-specific self-tuning
stuff? Would a realistic workloads get the same improvement or would it
cause regression? What about non-glibc systems with other malloc?

I'm not against pushing the generation context improvements, e.g.
something like the patches posted in [1], because those seem reasonable
in general. But I'm somewhat confused about the last patch (adjusting
allocChunkLimit) causing fairly significant regressions.

I'll try investigating this a bit more next week.


regards


[1]

https://www.postgresql.org/message-id/flat/bcdd4e3e-c12d-cd2b-7ead-a91ad416100a%40enterprisedb.com#a5ecd4e5a9e7d5b07ff25b5684da894f

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Use generation context to speed up tuplesorts
Next
From: Justin Pryzby
Date:
Subject: buildfarm warnings