Re: Optimize WindowAgg's use of tuplestores - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Optimize WindowAgg's use of tuplestores |
Date | |
Msg-id | CAExHW5sAak2LJ9Xp+EBqM2Chj5nsfYWbyq6F-7YhK5SiHu5WFA@mail.gmail.com Whole thread Raw |
In response to | Re: Optimize WindowAgg's use of tuplestores (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Optimize WindowAgg's use of tuplestores
|
List | pgsql-hackers |
On Thu, Jul 11, 2024 at 5:39 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 10 Jul 2024 at 02:42, Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > Observations > > 1. The numbers corresponding to 10 and 100 partitions are higher when > > patched. That might be just noise. I don't see any reason why it would > > impact negatively when there are a small number of partitions. The > > lower partition cases also have a higher number of rows per partition, > > so is the difference between MemoryContextDelete() vs > > MemoryContextReset() making any difference here. May be worth > > verifying those cases carefully. Otherwise upto 1000 partitions, it > > doesn't show any differences. > > I think this might just be noise as a result of rearranging code. In > terms of C code, I don't see any reason for it to be slower. If you > look at GenerationDelete() (as what is getting called from > MemoryContextDelete()), it just calls GenerationReset(). So resetting > is going to always be less work than deleting the context, especially > given we don't need to create the context again when we reset it. > > I wrote the attached script to see if I can also see the slowdown and > I do see the patched code come out slightly slower (within noise > levels) in lower partition counts. > > To get my compiler to produce code in a more optimal order for the > common case, I added unlikely() to the "if (winstate->all_first)" > condition. This is only evaluated on the first time after a rescan, > so putting that code at the end of the function makes more sense. The > attached v2 patch has it this way. You can see the numbers look > slightly better in the attached graph. The change to all_first seems unrelated to the tuplestore optimization. But it's bringing the results inline with the master for lower number of partitions. Thanks for the script. I have similar results on my laptop. From master Testing with 1000000 partitions latency average = 505.738 ms latency average = 509.407 ms latency average = 522.461 ms Testing with 100000 partitions latency average = 329.026 ms latency average = 327.504 ms latency average = 342.556 ms Testing with 10000 partitions latency average = 299.496 ms latency average = 298.266 ms latency average = 306.773 ms Testing with 1000 partitions latency average = 299.006 ms latency average = 302.188 ms latency average = 301.701 ms Testing with 100 partitions latency average = 305.411 ms latency average = 286.935 ms latency average = 302.432 ms Testing with 10 partitions latency average = 288.091 ms latency average = 294.506 ms latency average = 305.082 ms Testing with 1 partitions latency average = 301.121 ms latency average = 319.615 ms latency average = 301.141 ms Patched Testing with 1000000 partitions latency average = 351.683 ms latency average = 352.516 ms latency average = 352.086 ms Testing with 100000 partitions latency average = 300.626 ms latency average = 303.584 ms latency average = 306.959 ms Testing with 10000 partitions latency average = 289.560 ms latency average = 302.248 ms latency average = 297.423 ms Testing with 1000 partitions latency average = 308.600 ms latency average = 299.215 ms latency average = 289.681 ms Testing with 100 partitions latency average = 301.216 ms latency average = 286.240 ms latency average = 291.232 ms Testing with 10 partitions latency average = 305.260 ms latency average = 296.707 ms latency average = 300.266 ms Testing with 1 partitions latency average = 316.199 ms latency average = 314.043 ms latency average = 309.425 ms Now that you are also seeing the slowdown with your earlier patch, I am wondering whether adding unlikely() by itself is a good optimization. There might be some other reason behind the perceived slowdown. How do the numbers look when you just add unlikely() without any other changes? -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: