Re: Use generation context to speed up tuplesorts - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Use generation context to speed up tuplesorts
Date
Msg-id 13808af0-2bb5-b506-62d0-1fb67e3385d0@enterprisedb.com
Whole thread Raw
In response to Re: Use generation context to speed up tuplesorts  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Use generation context to speed up tuplesorts  (David Rowley <dgrowleyml@gmail.com>)
Re: Use generation context to speed up tuplesorts  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On 8/6/21 3:07 PM, David Rowley wrote:
> On Wed, 4 Aug 2021 at 02:10, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>> A review would be nice, although it can wait - It'd be interesting to
>> know if those patches help with the workload(s) you've been looking at.
> 
> I tried out the v2 set of patches using the attached scripts.  The
> attached spreadsheet includes the original tests and compares master
> with the patch which uses the generation context vs that patch plus
> your v2 patch.
> 
> I've also included 4 additional tests, each of which starts with a 1
> column table and then adds another 32 columns testing the performance
> after adding each additional column. I did this because I wanted to
> see if the performance was more similar to master when the allocations
> had less power of 2 wastage from allocset. If, for example, you look
> at row 123 of the spreadsheet you can see both patched and unpatched
> the allocations were 272 bytes each yet there was still a 50%
> performance improvement with just the generation context patch when
> compared to master.
> 
> Looking at the spreadsheet, you'll also notice that in the 2 column
> test of each of the 4 new tests the number of bytes used for each
> allocation is larger with the generation context. 56 vs 48.  This is
> due to the GenerationChunk struct size being later than the Allocset's
> version by 8 bytes.  This is because it also holds the
> GenerationBlock.  So with the patch there are some cases where we'll
> use slightly more memory.
> 
> Additional tests:
> 
> 1. Sort 10000 tuples on a column with values 0-99 in memory.
> 2. As #1 but with 1 million tuples.
> 3 As #1 but with a large OFFSET to remove the overhead of sending to the client.
> 4. As #2 but with a large OFFSET.
> 
> Test #3 above is the most similar one to the original tests and shows
> similar gains. When the sort becomes larger (1 million tuple test),
> the gains reduce. This indicates the gains are coming from improved
> CPU cache efficiency from the removal of the power of 2 wastage in
> memory allocations.
> 
> All of the tests show that the patches to improve the allocation
> efficiency of generation.c don't help to improve the results of the
> test cases. I wondered if it's maybe worth trying to see what happens
> if instead of doubling the allocations each time, quadruple them
> instead. I didn't try this.
> 

Thanks for the scripts and the spreadsheet with results.

I doubt quadrupling the allocations won't help very much, but I suspect 
the problem might be in the 0004 patch - at least that's what shows 
regression in my results. Could you try with just 0001-0003 applied?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Soumyadeep Chakraborty
Date:
Subject: Re: Changes to recovery_min_apply_delay are ignored while waiting for delay
Next
From: Peter Geoghegan
Date:
Subject: Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE