Re: Optimize WindowAgg's use of tuplestores - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Optimize WindowAgg's use of tuplestores |
Date | |
Msg-id | CAExHW5tf2HBXrPLAqoDNbte3h0Gz8o5HcuJ3Yq_wPUw2=w5OuQ@mail.gmail.com Whole thread Raw |
In response to | Optimize WindowAgg's use of tuplestores (David Rowley <dgrowley@gmail.com>) |
Responses |
Re: Optimize WindowAgg's use of tuplestores
|
List | pgsql-hackers |
On Sun, Jul 7, 2024 at 4:27 PM David Rowley <dgrowley@gmail.com> wrote: > > As part of making tuplestores faster [1], I noticed that in WindowAgg, when we end one partition we call tuplestore_end()and then we do tuplestore_begin_heap() again for the next partition in begin_partition() and then go on toset up the tuplestore read pointers according to what's required for the given frameOptions of the WindowAgg. This mightmake sense if the frameOptions could change between partitions, but they can't, so I don't see any reason why we can'tjust do tuplestore_clear() at the end of a partition. That resets the read pointer positions back to the start againready for the next partition. > > I wrote the attached patch and checked how it affects performance. It helps quite a bit when there are lots of partitions. > > CREATE TABLE a (a INT NOT NULL); > INSERT INTO a SELECT x FROM generate_series(1,1000000)x; > VACUUM FREEZE ANALYZE a; > > bench.sql: > SELECT a,count(*) OVER (PARTITION BY a) FROM a OFFSET 1000000; > > master: > $ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency > latency average = 293.488 ms > latency average = 295.509 ms > latency average = 297.772 ms > > patched: > $ pgbench -n -f bench.sql -T 60 -M prepared postgres | grep latency > latency average = 203.234 ms > latency average = 204.538 ms > latency average = 203.877 ms > > About 45% faster. > I repeated your measurements but by varying the number of partitions and repeating pgbench 5 times instead of 3. The idea is to see the impact of the change on a lower number of partitions. 10 partitions query: SELECT a,count(*) OVER (PARTITION BY a % 10) FROM a OFFSET 1000000; 100 partitions query: SELECT a,count(*) OVER (PARTITION BY a % 100) FROM a OFFSET 1000000; 1000 partitions query: SELECT a,count(*) OVER (PARTITION BY a % 1000) FROM a OFFSET 1000000; original query with 1M partitions: SELECT a,count(*) OVER (PARTITION BY a) FROM a OFFSET 1000000; Notice that the offset is still the same to avoid any impact it may have on the query execution. Here are the results master: no. of partitions, average latencies 10, 362.166 ms, 369.313 ms, 375.203 ms, 368.798 ms, 372.483 ms 100, 372.885 ms, 381.463 ms, 385.372 ms, 382.915 ms, 383.630 ms 1000, 390.834 ms, 395.653 ms, 400.339 ms, 407.777 ms, 389.906 ms 1000000, 552.848 ms, 553.943 ms, 547.806 ms, 541.871 ms, 546.741 ms patched 10, 356.980 ms, 371.223 ms, 375.550 ms, 378.011 ms, 381.119 ms 100, 392.307 ms, 385.087 ms, 380.383 ms, 390.999 ms, 388.422 ms 1000, 405.136 ms, 397.576 ms, 399.021 ms, 399.572 ms, 406.604 ms 1000000, 394.711 ms, 403.741 ms, 399.008 ms, 392.932 ms, 393.335 ms Observations 1. The numbers corresponding to 10 and 100 partitions are higher when patched. That might be just noise. I don't see any reason why it would impact negatively when there are a small number of partitions. The lower partition cases also have a higher number of rows per partition, so is the difference between MemoryContextDelete() vs MemoryContextReset() making any difference here. May be worth verifying those cases carefully. Otherwise upto 1000 partitions, it doesn't show any differences. 2. For 1M partitions it does make a difference. About 35% in my case. Moreover the change seems to be making the execution times independent of the number of partitions (more or less). Combining this observation with the first one, It might be worth looking at the execution times when there are many rows per partition in case of a higher number of partitions. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: