Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays - Mailing list pgsql-hackers

From David Rowley
Subject Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date
Msg-id CAApHDvqkXamVm6p=T4+p72pMv6+V+q4B=t2iKmwZhw_9wZOoyg@mail.gmail.com
Whole thread Raw
In response to Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, 24 Apr 2020 at 02:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Thu, Apr 23, 2020 at 09:02:26AM -0400, James Coleman wrote:
> >On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >As to the choice of 9 elements: I just picked that as a starting
> >point; Andres had previously commented off hand that at 8 elements
> >serial scanning was faster, so I figured this was a reasonable
> >starting point for discussion.
> >
> >Perhaps it would make sense to determine that minimum not as a pure
> >constant but scaled based on how many rows the planner expects us to
> >see? Of course that'd be a more invasive patch...so may or may not be
> >as feasible as a reasonable default.
> >
>
> Not sure. That seems a bit overcomplicated and I don't think it depends
> on the number of rows the planner expects to see very much. I think we
> usually assume the linear search is cheaper for small arrays and then at
> some point the binary search starts winning The question is where this
> "break even" point is.
>
> I think we usually use something like 64 or so in other places, but
> maybe I'm wrong. The current limit 9 seems a bit too low, but I may be
> wrong. Let's not obsess about this too much, let's do some experiments
> and pick a value based on that.

If single comparison for a binary search costs about the same as an
equality check, then wouldn't the crossover point be much lower than
64? The binary search should find or not find the target in log2(N)
rather than N.  ceil(log2(9)) is 4, which is of course less than 9.
For 64, it's 6, so are you not just doing a possible 58 equality
checks than necessary?  Of course, it's a bit more complex as for
values that *are* in the array, the linear search will, on average,
only check half the values. Assuming that, then 9 does not seem too
far off.  I guess benchmarks at various crossover points would speak a
thousand words.

David



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: backup manifests
Next
From: Andres Freund
Date:
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays