Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Use generation context to speed up tuplesorts
Date
Msg-id CAKU4AWq+KRmH9Cb8bsO-+usjxZkn8JsDRGU=fxy8YMFwdD1wOw@mail.gmail.com
Whole thread Raw
In response to Re: Use generation context to speed up tuplesorts  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Hi:

On Thu, Jan 20, 2022 at 9:31 AM David Rowley <dgrowleyml@gmail.com> wrote:

As of now, I still believe we'll need Tomas' patches to allow the
block size to grow up to a maximum size.  I think those patches are
likely needed before we think about making tuplesort use generation
contexts.  The reason I believe this is that we don't always have good
estimates for the number of tuples we're going to sort. 

+1
 
I spent some times to study the case here and my current thought is:
we can discuss/commit the minimum committable changes which 
should be beneficial for some cases and no harm for others. 
Tomas's patch 0002 would make there are no more blocks needed
if we switch to GenerationContext (compared with Standard Context). and
David's patch can obviously reduce total memory usage and improve
the performance.  so IMO Tomas's patch 0002 + David's patch is a committable
patchset at first round.  and Tomas's 0001 patch would be good to have
as well. 

I double checked Tomas's 0002 patch, it looks good to me.  and then applied
David's patch with ALLOCSET_DEFAULT_SIZES,  testing the same workload.
Here is the result (number is tps):

work_mem = '4GB'

| Test Case | master | patched |
|-----------+--------+---------|
| Test 1    |    306 |     406 |
| Test 2    |    225 |     278 |
| Test 3    |    202 |     248 |
| Test 4    |    184 |     218 |
| Test 5    |    281 |     360 |

work_mem = '4MB'

| Test Case | master | patched |
|-----------+--------+---------|
| Test 1    |    124 |     409 |
| Test 2    |    106 |     280 |
| Test 3    |    100 |     249 |
| Test 4    |    97  |     218 |
| Test 5    |    120 |     369 |


I didn't get the performance improvement as much as David's at the beginning,  I
think that is because David uses the ALLOCSET_DEFAULT_MAXSIZE directly which
will need less number of times for memory allocation.

AFAICS,  Tomas's patch 0002 + David's patch should be ready for commit for round 1.
We can try other opportunities like use rows estimation to allocate initial memory and
GenerationContext improves like 0003/0004.  Would this work? 

--
Best Regards
Andy Fan

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [RFC] building postgres with meson
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: Release notes for February minor releases