Use generation context to speed up tuplesorts - Mailing list pgsql-hackers

From David Rowley
Subject Use generation context to speed up tuplesorts
Date
Msg-id CAApHDvoH4ASzsAOyHcxkuY01Qf++8JJ0paw+03dk+W25tQEcNQ@mail.gmail.com
Whole thread Raw
Responses Re: Use generation context to speed up tuplesorts
Re: Use generation context to speed up tuplesorts
List pgsql-hackers
Hackers,

While reviewing and benchmarking 91e9e89dc (Make nodeSort.c use Datum
sorts for single column sorts), I noticed that we use a separate
memory context to store tuple data and we just reset when we're done
if the sort fits in memory, or we dump the tuples to disk in the same
order they're added and reset the context when it does not.  There is
a little pfree() work going on via writetup_heap() which I think
possibly could just be removed to get some additional gains.

Anyway, this context that stores tuples uses the standard aset.c
allocator which has the usual power of 2 wastage and additional
overheads of freelists etc.  I wondered how much faster it would go if
I set it to use a generation context instead of an aset.c one.

After running make installcheck to make the tenk1 table, running the
attached tuplesortbench script, I get this:

Master:
work_mem = '4MB';
   Sort Method: external merge  Disk: 2496kB
work_mem = '4GB';
   Sort Method: quicksort  Memory: 5541kB

Patched:
work_mem = '4MB';
   Sort Method: quicksort  Memory: 3197kB

So it seems to save quite a bit of memory getting away from rounding
up allocations to the next power of 2.

Performance-wise, there's some pretty good gains. (Results in TPS)

work_mem = '4GB';
Test    master    gen sort    compare
Test1    317.2    665.6    210%
Test2    228.6    388.9    170%
Test3    207.4    330.7    159%
Test4    185.5    279.4    151%
Test5    292.2    563.9    193%

If I drop the work_mem down to standard the unpatched version does to
disk, but the patched version does not.  The gains get a little
bigger.

work_mem = '4MB';
Test    master    gen sort    compare
Test1    177.5    658.2    371%
Test2    149.7    385.2    257%
Test3    137.5    330.0    240%
Test4    129.0    275.1    213%
Test5    161.7    546.4    338%

The patch is just a simple 1-liner at the moment.  I likely do need to
adjust what I'm passing as the blockSize to GenerationContextCreate().
  Maybe a better number would be something that's calculated from
work_mem, e.g Min(ALLOCSET_DEFAULT_MAXSIZE, ((Size) work_mem) * 64))
so that we just allocate at most a 16th of work_mem per chunk, but not
bigger than 8MB. I don't think changing this will affect the
performance of the above very much.

David

Attachment

pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Added schema level support for publication.