Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers

From David Rowley
Subject Re: Use generation context to speed up tuplesorts
Date
Msg-id CAApHDvrTNYtHZUjUbYbDJ-hPWNbotnCXr2pN7+EJURirz7urPw@mail.gmail.com
Whole thread Raw
In response to Re: Use generation context to speed up tuplesorts  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Use generation context to speed up tuplesorts
List pgsql-hackers
On Fri, 10 Sept 2021 at 01:38, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I've spent quite a bit of time investigating the regressions clearly
> visible in the benchmark results for some allocation/free patterns.
> Attached is a subset of results from a recent run, but the old results
> show mostly the same thing.

I started looking at the original patch again and wondered if it might
be a good idea to allow the estimates from the planner about the tuple
size and the number of tuples to help determine the size of the chunks
in the generation context.   The number of tuples can maybe also help
size the array that we're sorting, which would save us allocating just
1024 elements and doubling it each time we run out of space in the
array.

The main problem with that is that there will be many cases where we
need to sort but don't have any estimates from the planner, so if we
do this, then likely, we'll still need the chunk growing code.

I've attached some benchmark results that I took recently.  The
spreadsheet contains results from 3 versions. master, master + 0001 -
0002, then master + 0001 - 0003.  The 0003 patch makes the code a bit
more conservative about the chunk sizes it allocates and also tries to
allocate the tuple array according to the number of tuples we expect
to be able to sort in a single batch for when the sort is not
estimated to fit inside work_mem.

For the 4MB work_mem test in the spreadsheet, this does end up using
2MB chunks rather than 4MB chunks. This is because the 10k tuple test
is just so close to requiring 4MB of memory to perform the sort.  This
does mean that for the 10k tuple test, the 0003 patch makes things
look quite a bit slower than just with the 0001 + 0002 patch. However,
I think not making the chunk size the same size as work_mem is a good
idea.

The patch is still WIP.   I still have a few magic numbers in there,
for example the 48 in:

state->est_tuple_memory = pg_nextpower2_64(state->est_tuples *
(state->est_tupwidth + 48));

I should really have a sizeof() in there.  I've not really looked for
other opportunities other than in nodeSort.c for where I can pass down
tuple estimates down into tuplesort.c

David

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: slowest tap tests - split or accelerate?
Next
From: Tomas Vondra
Date:
Subject: Collecting statistics about contents of JSONB columns