Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays - Mailing list pgsql-hackers

From James Coleman
Subject Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date
Msg-id CAAaqYe_Jqd+eZ2javcn=i6iAJ+1FGp_-kc7OGZgpNOuTwRCQsQ@mail.gmail.com
Whole thread Raw
In response to Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Wed, Apr 29, 2020 at 11:17 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Wed, Apr 29, 2020 at 10:26:12AM -0400, James Coleman wrote:
> >On Tue, Apr 28, 2020 at 7:05 PM Andres Freund <andres@anarazel.de> wrote:
> >>
> >> Hi,
> >>
> >> On 2020-04-28 18:22:20 -0400, James Coleman wrote:
> >> > I cc'd Andres given his commit introduced simplehash, so I figured
> >> > he'd probably have a few pointers on when each one might be useful.
> >> > [...]
> >> > Do you have any thoughts on what the trade-offs/use-cases etc. are for
> >> > dynahash versus simple hash?
> >> >
> >> > From reading the commit message in b30d3ea824c it seems like simple
> >> > hash is faster and optimized for CPU cache benefits. The comments at
> >> > the top of simplehash.h also discourage it's use in non
> >> > performance/space sensitive uses, but there isn't anything I can see
> >> > that explicitly tries to discuss when dynahash is useful, etc.
> >>
> >> Benefits of dynahash (chained hashtable):
> >> - supports partitioning, useful for shared memory accessed under locks
> >> - better performance for large entries, as they don't need to be moved
> >>   around in case of hash conflicts
> >> - stable pointers to hash table entries
> >>
> >> Benefits of simplehash (open addressing hash table):
> >> - no indirect function calls, known structure sizes, due to "templated"
> >>   code generation (these show up substantially in profiles for dynahash)
> >> - considerably faster for small entries due to previous point, and due
> >>   open addressing hash tables having better cache behaviour than chained
> >>   hashtables
> >> - once set-up the interface is type safe and easier to use
> >> - no overhead of a separate memory context etc
> >>
> >>
> >> > Given the performance notes in that commit message, I thinking
> >> > switching to simple hash is worth it.
> >>
> >> Seems plausible to me.
> >>
> >>
> >> > But I also wonder if there might be some value in a README or comments
> >> > addition that would be a guide to what the various hash
> >> > implementations are useful for. If there's interest, I could try to
> >> > type something short up so that we have something to make the code
> >> > base a bit more discoverable.
> >>
> >> That'd make sense to me.
> >
> >Cool, I'll work on that as I have time then.
> >
> >One question: what is the reasoning behind having SH_STORE_HASH? The
> >only things I could imagine would be a case where you have external
> >pointers to some set of values or need to be able to use the hash for
> >other reasons besides the hash table (and so can avoid calculating it
> >twice), but maybe I'm missing something.
> >
>
> I believe it's because computing the hash may be fairly expensive for
> some data types, in which case it may be better to just store it for
> future use.

But is it storing it for use primarily by the hash table
implementation (i.e., does it need the hash stored this way to avoid
repeated recalculation) or for caller's use?

James



pgsql-hackers by date:

Previous
From: Antonin Houska
Date:
Subject: Re: Accidental use of the PVC_RECURSE_WINDOWFUNCS flag?
Next
From: Tomas Vondra
Date:
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays