On Tue, 13 Jul 2021 at 15:15, David Rowley <dgrowleyml@gmail.com> wrote:
> In theory, we likely could get rid of the small regression by having
> two versions of ExecSort() and setting the correct one during
> ExecInitSort() by setting the function pointer to the version we want
> to use in sortstate->ss.ps.ExecProcNode.
Just to see how it would perform, I tried what I mentioned above. I've
included what I ended up with in the attached POC patch.
I got the following results on my AMD hardware.
Test master v8 patch comparison
Test1 448.0 671.7 149.9%
Test2 316.4 317.5 100.3%
Test3 299.5 381.6 127.4%
Test4 219.7 229.5 104.5%
Test5 226.3 254.6 112.5%
Test6 197.9 217.9 110.1%
Test7 179.2 185.3 103.4%
Test8 389.2 544.8 140.0%
I'm a little surprised by your results.
Test1 and Test8 look pretty good to me.
What is compiler and environment?
I repeated (3 times) the benchmark with v8 here,
and the results were not good.
| HEAD | v6 | v7b | v8 | v6 vs head | v8 vs v6 | v8 vs v7b |
Test1 | 288,149636 | 449,018541 | 550,48505 | 468,168165 | 155,83% | 104,26% | 85,05% |
Test2 | 94,766955 | 95,451406 | 94,718982 | 94,800275 | 100,72% | 99,32% | 100,09% |
Test3 | 190,521319 | 260,279802 | 278,115296 | 262,538383 | 136,61% | 100,87% | 94,40% |
Test4 | 78,779344 | 78,253455 | 77,941482 | 78,471546 | 99,33% | 100,28% | 100,68% |
Test5 | 131,362614 | 142,662223 | 149,639041 | 144,849303 | 108,60% | 101,53% | 96,80% |
Test6 | 112,884298 | 124,181671 | 127,58497 | 124,29376 | 110,01% | 100,09% | 97,42% |
Test7 | 69,308587 | 68,643067 | 69,087544 | 69,437312 | 99,04% | 101,16% | 100,51% |
Test8 | 243,674171 | 364,681142 | 419,259703 | 369,239176 | 149,66% | 101,25% | 88,07% |
This time I saw no regression on tests 2, 4 and 7.
I looked to see if there was anywhere else in the executor that
conditionally uses a different exec function in this way and found
nothing, so I'm not too sure if it's a good idea to start doing this.
Specialized functions can be a way to optimize. The compilers themselves do it.
But the ExecSortTuple and ExecSortDatum are much more similar,
which can cause maintenance problems.
I don't think in this case it would be a good idea.
It would be good to get a 2nd opinion about this idea. Also, more
benchmark results with v6 and v8 would be good too.
Yeah, another different machine.
I would like to see other results with v7b.
Attached the file with all results from v8.
regards,
Ranier Vilela