PG15 beta1 sort performance regression due to Generation context change - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | PG15 beta1 sort performance regression due to Generation context change |
Date | |
Msg-id | CAApHDvqXpLzav6dUeR5vO_RBh_feHrHMLhigVQXw9jHCyKP9PA@mail.gmail.com Whole thread Raw |
Responses |
Re: PG15 beta1 sort performance regression due to Generation context change
Re: PG15 beta1 sort performance regression due to Generation context change |
List | pgsql-hackers |
Hackers, Over the past few days I've been gathering some benchmark results together to show the sort performance improvements in PG15 [1]. One of the test cases I did was to demonstrate Heikki's change to use a k-way merge (65014000b). The test I did to try this out was along the lines of: set max_parallel_workers_per_gather = 0; create table t (a bigint not null, b bigint not null, c bigint not null, d bigint not null, e bigint not null, f bigint not null); insert into t select x,x,x,x,x,x from generate_Series(1,140247142) x; -- 10GB! vacuum freeze t; The query I ran was: select * from t order by a offset 140247142; I tested various sizes of work_mem starting at 4MB and doubled that all the way to 16GB. For many of the smaller values of work_mem the performance is vastly improved by Heikki's change, however for work_mem = 64MB I detected quite a large slowdown. PG14 took 20.9 seconds and PG15 beta 1 took 29 seconds! I've been trying to get to the bottom of this today and finally have discovered this is due to the tuple size allocations in the sort being exactly 64 bytes. Prior to 40af10b57 (Use Generation memory contexts to store tuples in sorts) the tuple for the sort would be stored in an aset context. After 40af10b57 we'll use a generation context. The idea with that change is that the generation context does no power-of-2 round ups for allocations, so we save memory in most cases. However, due to this particular test having a tuple size of 64-bytes, there was no power-of-2 wastage with aset. The problem is that generation chunks have a larger chunk header than aset do due to having to store the block pointer that the chunk belongs to so that GenerationFree() can increment the nfree chunks in the block. aset.c does not require this as freed chunks just go onto a freelist that's global to the entire context. Basically, for my test query, the slowdown is because instead of being able to store 620702 tuples per tape over 226 tapes with an aset context, we can now only store 576845 tuples per tape resulting in requiring 244 tapes when using the generation context. If I had added column "g" to make the tuple size 72 bytes causing aset's code to round allocations up to 128 bytes and generation.c to maintain the 72 bytes then the sort would have stored 385805 tuples over 364 batches for aset and 538761 tuples over 261 batches using the generation context. That would have been a huge win. So it basically looks like I discovered a very bad case that causes a significant slowdown. Yet other cases that are not an exact power of 2 stand to gain significantly from this change. One thing 40af10b57 does is stops those terrible performance jumps when the tuple size crosses a power-of-2 boundary. The performance should be more aligned to the size of the data being sorted now... Unfortunately, that seems to mean regressions for large sorts with power-of-2 sized tuples. I'm unsure exactly what I should do about this right now. David [1] https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/speeding-up-sort-performance-in-postgres-15/ba-p/3396953#change4
pgsql-hackers by date: