Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Use generation context to speed up tuplesorts
Date
Msg-id bc765deb-ca2b-838e-f980-16a77dd0922b@enterprisedb.com
Whole thread Raw
In response to Re: Use generation context to speed up tuplesorts  (Andres Freund <andres@anarazel.de>)
Responses Re: Use generation context to speed up tuplesorts  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On 7/30/21 10:38 PM, Andres Freund wrote:
> Hi,
> 
> On 2021-07-30 18:42:18 +1200, David Rowley wrote:
>> While reviewing and benchmarking 91e9e89dc (Make nodeSort.c use Datum
>> sorts for single column sorts), I noticed that we use a separate
>> memory context to store tuple data and we just reset when we're done
>> if the sort fits in memory, or we dump the tuples to disk in the same
>> order they're added and reset the context when it does not.  There is
>> a little pfree() work going on via writetup_heap() which I think
>> possibly could just be removed to get some additional gains.
>>
>> Anyway, this context that stores tuples uses the standard aset.c
>> allocator which has the usual power of 2 wastage and additional
>> overheads of freelists etc.  I wondered how much faster it would go if
>> I set it to use a generation context instead of an aset.c one.
>>
>> After running make installcheck to make the tenk1 table, running the
>> attached tuplesortbench script, I get this:
> 
>> So it seems to save quite a bit of memory getting away from rounding
>> up allocations to the next power of 2.
>>
>> Performance-wise, there's some pretty good gains. (Results in TPS)
> 
> Very nice!
> 

Yes, very nice. I wouldn't have expected such significant difference, 
particularly in CPU usage. It's pretty interesting that it both reduces 
memory and CPU usage, I'd have guessed it's either one of the other.

> 
> I wonder if there's cases where generation.c would regress performance
> over aset.c due to not having an initial / "keeper" block?
> 

Not sure. I guess such workload would need to allocate and free a single 
block (so very little memory) very often. I guess that's possible, but 
I'm not aware of a place doing that very often. Although, maybe decoding 
could do that for simple (serial) workload.

I'm not opposed to adding a keeper block to Generation, similarly to 
what was discussed for Slab not too long ago.

> 
>> The patch is just a simple 1-liner at the moment.  I likely do need to
>> adjust what I'm passing as the blockSize to GenerationContextCreate().
>>    Maybe a better number would be something that's calculated from
>> work_mem, e.g Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64))
>> so that we just allocate at most a 16th of work_mem per chunk, but not
>> bigger than 8MB. I don't think changing this will affect the
>> performance of the above very much.
> 
> I think it's bad that both genereration and slab don't have internal
> handling of block sizes. Needing to err on the size of too big blocks to
> handle large amounts of memory well, just so the contexts don't need to
> deal with variably sized blocks isn't a sensible tradeoff.
> 

Well, back then it seemed like a sensible trade off to me, but I agree 
it may have negative consequences. I'm not opposed to revisiting this.

> I don't think it's acceptable to use ALLOCSET_DEFAULT_MAXSIZE or
> Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64) for
> tuplesort.c. There's plenty cases where we'll just sort a handful of
> tuples, and increasing the memory usage of those by a factor of 1024
> isn't good. The Min() won't do any good if a larger work_mem is used.
> 
> Nor will it be good to use thousands of small allocations for a large
> in-memory tuplesort just because we're concerned about the initial
> allocation size. Both because of the allocation overhead, but
> importantly also because that will make context resets more expensive.
> 

True.

> To me this says that we should transplant aset.c's block size growing
> into generation.c.
> 

Yeah, maybe.

> 
> There is at least one path using tuplecontext that reaches code that
> could end up freeing memory to a significant enough degree to care about
> generation.c effectively not using that memory:
> tuplesort_putdatum()->datumCopy()->EOH_flatten_into()
> On a quick look I didn't find any expanded record user that frees
> nontrivial amounts of memory, but I didn't look all that carefully.
> 

Not sure, I'm not familiar with EOH_flatten_into or expanded records. 
But I wonder if there's some sort of metric that we could track in 
Generation and use it to identify "interesting" places.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Corrected documentation of data type for the logical replication message formats.
Next
From: Gavin Flower
Date:
Subject: Re: Replace l337sp34k in comments.