Re: check for null value before looking up the hash function - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: check for null value before looking up the hash function
Date
Msg-id 31ac953f-b0eb-1954-0ada-d4cc7704335f@enterprisedb.com
Whole thread Raw
In response to Re: check for null value before looking up the hash function  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-hackers
On 5/21/22 19:24, David G. Johnston wrote:
> On Sat, May 21, 2022 at 10:04 AM Ranier Vilela <ranier.vf@gmail.com
> <mailto:ranier.vf@gmail.com>> wrote:
> 
>     Em sáb., 21 de mai. de 2022 às 13:13, Tom Lane <tgl@sss.pgh.pa.us
>     <mailto:tgl@sss.pgh.pa.us>> escreveu:
> 
>         Ranier Vilela <ranier.vf@gmail.com <mailto:ranier.vf@gmail.com>>
>         writes:
>         > Em sáb., 21 de mai. de 2022 às 12:05, Tomas Vondra <
>         > tomas.vondra@enterprisedb.com
>         <mailto:tomas.vondra@enterprisedb.com>> escreveu:
>         >> That's a quite bold claim, and yet you haven't supported it
>         by any
>         >> argument whatsoever. Trade-offs between complexity and
>         efficiency are a
>         >> crucial development task, so complicating the logic clearly
>         does matter.
> 
>         > What I meant is that complicating the logic in search of
>         efficiency is
>         > worth it, and that's what everyone is looking for in this thread.
>         > Likewise, not complicating the logic, losing a little bit of
>         efficiency,
>         > applied to all the code, leads to a big loss of efficiency.
>         > In other words, I never miss an opportunity to gain efficiency.
> 
>         [ shrug... ]  You quietly ignored Tomas' main point, which is
>         that no
>         evidence has been provided that there's actually any efficiency
>         gain.
> 
>     IMHO, the point here, is for-non-commiters everything needs benchmarks.
>     But, I have been see many commits withouts benchmarks or any
>     evidence gains.
>     And of course, having benchmarks is better, but for
>     micro-optimizations,
>     It doesn't seem like it's needed that much.
> 
> 
> Mostly because committers don't tend to do this kind of drive-by
> patching that changes long-established code, which are fairly
> categorized as premature optimizations.
> 

FWIW I find the argument that committers are somehow absolved of having
to demonstrate the benefits of a change rather misleading. Perhaps even
offensive, as it hints committers are less demanding/careful when it
comes to their own patches. Which goes directly against my experience
and understanding of what being a committer is.

I'm not going to claim every "optimization" patch submitted by a
committer had a benchmark - some patches certainly are pushed without
it, with just some general reasoning why the change is beneficial.

But I'm sure that when someone suggest the reasoning is wrong, it's
taken seriously - the discussion continues, there's a benchmark etc. And
I can't recall a committer suggesting it's fine because some other patch
didn't have a benchmark either.

> 
> 
>         (1) Sure, in the case where only null values are encountered
>         during a
>         query, we can save a cache lookup, but will that be even
>         micro-measurable
>         compared to general query overhead?  Seems unlikely, especially
>         if this is
>         changed in only one place.  That ties into my complaint about
>         how this is
>         just one instance of a fairly widely used coding pattern.
> 
>     Of course, changing only in one place, the gain is tiny, but, step
>     by step,
>     the coding pattern is changing too, becoming new "fairly widely".
> 
> 
> Agreed, but that isn't what was done here, there was no investigation of
> the overall coding practice and suggestions to change them all to the
> improved form.
> 

Right. If we think the coding pattern is an improvement, we should tweak
all the places, not just one (and hope the other places will magically
switch on their own).

More importantly, though, I kinda doubt tweaking more places will
actually make the difference more significant (assuming it actually does
improve things). How likely is it you need to hash the same data type
multiple times, with just NULL values? And even if you do, improvements
like this tend to sum, not multiply - i.e. if the improvement is 1% for
one place, it's still 1% even if the query hits 10 such places.

> 
>         (2) What are the effects when we *do* eventually encounter a
>         non-null
>         value?  The existing coding will perform all the necessary lookups
>         at first call, but with the proposed change those may be spread
>         across
>         query execution.  It's not implausible that that leads to a net loss
>         of efficiency, due to locality-of-access effects.
> 
>     Weel the current code, test branch for nulls first.
>     Most of the time, this is not true.
>     So, the current code has poor branch prediction, at least.
>     What I proposed, improves the branch prediction, at least.
> 
> 
> Per my other reply, the v3 proposal did not, IMHO, do a good job of
> branch prediction either.
> 
> I find an improvement on code complexity grounds to be warranted, though
> the benefit seems unlikely to outweigh the cost of doing it everywhere
> (fixing only one place actually increases the cost component).
> 
> Even without the plausible locality-of-access argument the benefit here
> is likely to be a micro-optimization that provides only minimal
> benefit.  Evidence to the contrary is welcomed but, yes, the burden is
> going to be placed squarely on the patch author(s) to demonstrate the
> benefit accrued from the code churn is worth the cost.
> 

IMHO questions like this are exactly why some actual benchmark results
would be the best thing to move this forward. I certainly can't look at
C code and say how good it is for branch prediction.


regards
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: docs: mention "pg_read_all_stats" in "track_activities" description
Next
From: Michael Paquier
Date:
Subject: Re: docs: mention "pg_read_all_stats" in "track_activities" description