Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Use generation context to speed up tuplesorts |
Date | |
Msg-id | bc765deb-ca2b-838e-f980-16a77dd0922b@enterprisedb.com Whole thread Raw |
In response to | Re: Use generation context to speed up tuplesorts (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Use generation context to speed up tuplesorts
|
List | pgsql-hackers |
On 7/30/21 10:38 PM, Andres Freund wrote: > Hi, > > On 2021-07-30 18:42:18 +1200, David Rowley wrote: >> While reviewing and benchmarking 91e9e89dc (Make nodeSort.c use Datum >> sorts for single column sorts), I noticed that we use a separate >> memory context to store tuple data and we just reset when we're done >> if the sort fits in memory, or we dump the tuples to disk in the same >> order they're added and reset the context when it does not. There is >> a little pfree() work going on via writetup_heap() which I think >> possibly could just be removed to get some additional gains. >> >> Anyway, this context that stores tuples uses the standard aset.c >> allocator which has the usual power of 2 wastage and additional >> overheads of freelists etc. I wondered how much faster it would go if >> I set it to use a generation context instead of an aset.c one. >> >> After running make installcheck to make the tenk1 table, running the >> attached tuplesortbench script, I get this: > >> So it seems to save quite a bit of memory getting away from rounding >> up allocations to the next power of 2. >> >> Performance-wise, there's some pretty good gains. (Results in TPS) > > Very nice! > Yes, very nice. I wouldn't have expected such significant difference, particularly in CPU usage. It's pretty interesting that it both reduces memory and CPU usage, I'd have guessed it's either one of the other. > > I wonder if there's cases where generation.c would regress performance > over aset.c due to not having an initial / "keeper" block? > Not sure. I guess such workload would need to allocate and free a single block (so very little memory) very often. I guess that's possible, but I'm not aware of a place doing that very often. Although, maybe decoding could do that for simple (serial) workload. I'm not opposed to adding a keeper block to Generation, similarly to what was discussed for Slab not too long ago. > >> The patch is just a simple 1-liner at the moment. I likely do need to >> adjust what I'm passing as the blockSize to GenerationContextCreate(). >> Maybe a better number would be something that's calculated from >> work_mem, e.g Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64)) >> so that we just allocate at most a 16th of work_mem per chunk, but not >> bigger than 8MB. I don't think changing this will affect the >> performance of the above very much. > > I think it's bad that both genereration and slab don't have internal > handling of block sizes. Needing to err on the size of too big blocks to > handle large amounts of memory well, just so the contexts don't need to > deal with variably sized blocks isn't a sensible tradeoff. > Well, back then it seemed like a sensible trade off to me, but I agree it may have negative consequences. I'm not opposed to revisiting this. > I don't think it's acceptable to use ALLOCSET_DEFAULT_MAXSIZE or > Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64) for > tuplesort.c. There's plenty cases where we'll just sort a handful of > tuples, and increasing the memory usage of those by a factor of 1024 > isn't good. The Min() won't do any good if a larger work_mem is used. > > Nor will it be good to use thousands of small allocations for a large > in-memory tuplesort just because we're concerned about the initial > allocation size. Both because of the allocation overhead, but > importantly also because that will make context resets more expensive. > True. > To me this says that we should transplant aset.c's block size growing > into generation.c. > Yeah, maybe. > > There is at least one path using tuplecontext that reaches code that > could end up freeing memory to a significant enough degree to care about > generation.c effectively not using that memory: > tuplesort_putdatum()->datumCopy()->EOH_flatten_into() > On a quick look I didn't find any expanded record user that frees > nontrivial amounts of memory, but I didn't look all that carefully. > Not sure, I'm not familiar with EOH_flatten_into or expanded records. But I wonder if there's some sort of metric that we could track in Generation and use it to identify "interesting" places. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: