Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays - Mailing list pgsql-hackers

From James Coleman
Subject Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date
Msg-id CAAaqYe88MaKdnK=Gpn_0VvjeJb9UBuPMvnWzbuTW_rtM5G+G4Q@mail.gmail.com
Whole thread Raw
In response to Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Wed, Aug 19, 2020 at 3:16 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 01/05/2020 05:20, James Coleman wrote:
> > On Tue, Apr 28, 2020 at 8:25 AM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com> wrote:
> > ...
> >> Any particular reasons to pick dynahash over simplehash? ISTM we're
> >> using simplehash elsewhere in the executor (grouping, tidbitmap, ...),
> >> while there are not many places using dynahash for simple short-lived
> >> hash tables. Of course, that alone is a weak reason to insist on using
> >> simplehash here, but I suppose there were reasons for not using dynahash
> >> and we'll end up facing the same issues here.
> >
> > I've attached a patch series that includes switching to simplehash.
> > Obviously we'd really just collapse all of these patches, but it's
> > perhaps interesting to see the changes required to use each style
> > (binary search, dynahash, simplehash).
> >
> > As before, there are clearly comments and naming things to be
> > addressed, but the implementation should be reasonably clean.
>
> Looks good, aside from the cleanup work that you mentioned. There are a
> few more cases that I think you could easily handle with very little
> extra code:
>
> You could also apply the optimization for non-Const expressions, as long
> as they're stable (i.e. !contain_volatile_functions(expr)).

Is that true? Don't we also have to worry about whether or not the
value is stable (i.e., know when a param has changed)? There have been
discussions about being able to cache stable subexpressions, and my
understanding was that we'd need to have that infrastructure (along
with the necessarily invalidation when the param changes) to be able
to use this for non-Const expressions.

> Deconstructing the array Datum into a simple C array on first call would
> be a win even for very small arrays and for AND semantics, even if you
> don't use a hash table.

Because you wouldn't have to repeatedly detoast it? Or some other
reason I'm not thinking of? My intuition would have been that (aside
from detoasting if necessary) there wouldn't be much real overhead in,
for example, an array storing integers.

James



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: More aggressive vacuuming of temporary tables
Next
From: David Zhang
Date:
Subject: Re: PATCH: Attempt to make dbsize a bit more consistent