Re: dropping datumSort field - Mailing list pgsql-hackers

From David Rowley
Subject Re: dropping datumSort field
Date
Msg-id CAApHDvqikGZYG=xg+4i=TKXQbnkQ6m9JiLwjV-N40Epp0EY6vQ@mail.gmail.com
Whole thread Raw
In response to Re: dropping datumSort field  (Zhihong Yu <zyu@yugabyte.com>)
Responses Re: dropping datumSort field
List pgsql-hackers
On Wed, 10 Aug 2022 at 03:16, Zhihong Yu <zyu@yugabyte.com> wrote:
> On Tue, Aug 9, 2022 at 8:01 AM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> One problem with this patch is that, if I apply it, PostgreSQL does not compile:
>>
>> nodeSort.c:197:6: error: use of undeclared identifier 'tupDesc'
>>         if (tupDesc->natts == 1)
>>             ^
>> 1 error generated.
>>
>> Leaving that aside, I don't really see any advantage in this sort of change.

I'm with Robert on this one. If you look at the discussion for that
commit, you'll find quite a bit of benchmarking was done around this
[1].  The "if" test here is in quite a hot code path, so we want to
ensure whatever is being referenced here causes the least amount of
CPU cache misses.  Volcano executors which process a single row at a
time are not exactly an ideal workload for a modern processor due to
the bad L1i and L1d hit ratios. This becomes worse with more plan
nodes as these caches are more likely to get flushed of useful cache
lines when mode notes are visited. Various other fields in 'node' have
just been accessed in the code leading up to the "if
(node->datumSort)" check, so we're probably not going to encounter any
CPU pipeline stalls waiting for cache lines to be loaded into L1d.  It
seems you're proposing to change this and have offered no evidence of
no performance regressions from doing so.  Going by the compilation
error above, it seems unlikely that you've given performance any
consideration at all.

I mean this in the best way possible; for the future, I really
recommend arriving with ideas that are well researched and tested.
When you can, arrive with evidence to prove your change is good.  For
this case, evidence would be benchmark results.  The problem is if you
were to arrive with patches such as this too often then you'll start
to struggle to get attention from anyone, let alone a committer. You
don't want to build a reputation for bad quality work as it's likely
most committers will steer clear of your work.  If you want a good
recent example of a good proposal, have a look at Yuya Watari's
write-up at [2] and [3]. There was a clear problem statement there and
a patch that was clearly a proof of concept only, so the author was
under no illusion that what he had was ideal.  Now, some other ideas
were suggested on that thread to hopefully simplify the task and
provide even better performance. Yuya went off and did that and
arrived back armed with further benchmark results. I was quite
impressed with this considering he's not posted to -hackers very
often.  Now, if you were a committer, would you be looking at the
patches from the person who sends in half-thought-through ideas, or
patches from someone that has clearly put a great deal of effort into
first clearly stating the problem and then proving that the problem is
solved by the given patch?

David

[1] https://www.postgresql.org/message-id/CAApHDvrWV%3Dv0qKsC9_BHqhCn9TusrNvCaZDz77StCO--fmgbKA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAJ2pMkZNCgoUKSE+_5LthD+KbXKvq6h2hQN8Esxpxd+cxmgomg@mail.gmail.com
[3] https://www.postgresql.org/message-id/CAJ2pMkZKFVmPHovyyueBpwb_nYYVk2+GaDqgzxZVnjkvxgtXog@mail.gmail.com



pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: optimize lookups in snapshot [sub]xip arrays
Next
From: Zhihong Yu
Date:
Subject: Re: dropping datumSort field