Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays - Mailing list pgsql-hackers

From James Coleman
Subject Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date
Msg-id CAAaqYe_q5-VRNkCs3JUEc4iDufUjshqnFK-d2AWz8XCugk3jow@mail.gmail.com
Whole thread Raw
In response to Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
List pgsql-hackers
On Fri, Apr 24, 2020 at 5:55 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Apr 24, 2020 at 09:38:54AM -0400, James Coleman wrote:
> >On Thu, Apr 23, 2020 at 10:55 AM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >>
> >> On Thu, Apr 23, 2020 at 09:02:26AM -0400, James Coleman wrote:
> >> >On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
> >> ><tomas.vondra@2ndquadrant.com> wrote:
> >> >>
> >> >> On Mon, Apr 20, 2020 at 09:27:34PM -0400, James Coleman wrote:
> >> >> >Over in "execExprInterp() questions / How to improve scalar array op
> >> >> >expr eval?" [1] I'd mused about how we might be able to optimized
> >> >> >scalar array ops with OR'd semantics.
> >> >> >
> >> >> >This patch implements a binary search for such expressions when the
> >> >> >array argument is a constant so that we can avoid needing to teach
> >> >> >expression execution to cache stable values or know when a param has
> >> >> >changed.
> >> >> >
> >> >> >The speed-up for the target case can pretty impressive: in my
> >> >> >admittedly contrived and relatively unscientific test with a query in
> >> >> >the form:
> >> >> >
> >> >> >select count(*) from generate_series(1,100000) n(i) where i in (<1000
> >> >> >random integers in the series>)
> >> >> >
> >> >> >shows ~30ms for the patch versus ~640ms on master.
> >> >> >
> >> >>
> >> >> Nice improvement, although 1000 items is probably a bit unusual. The
> >> >> threshold used in the patch (9 elements) seems a bit too low - what
> >> >> results have you seen with smaller arrays?
> >> >
> >> >At least in our systems we regularly work with 1000 batches of items,
> >> >which means you get IN clauses of identifiers of that size. Admittedly
> >> >the most common case sees those IN clauses as simple index scans
> >> >(e.g., WHERE <primary key> IN (...)), but it's also common to have a
> >> >broader query that merely filters additionally on something like "...
> >> >AND <some foreign key> IN (...)" where it makes sense for the rest of
> >> >the quals to take precedence in generating a reasonable plan. In that
> >> >case, the IN becomes a regular filter, hence the idea behind the
> >> >patch.
> >> >
> >> >Side note: I'd love for us to be able to treat "IN (VALUES)" the same
> >> >way...but as noted in the other thread that's an extremely large
> >> >amount of work, I think. But similarly you could use a hash here
> >> >instead of a binary search...but this seems quite good.
> >> >
> >> >As to the choice of 9 elements: I just picked that as a starting
> >> >point; Andres had previously commented off hand that at 8 elements
> >> >serial scanning was faster, so I figured this was a reasonable
> >> >starting point for discussion.
> >> >
> >> >Perhaps it would make sense to determine that minimum not as a pure
> >> >constant but scaled based on how many rows the planner expects us to
> >> >see? Of course that'd be a more invasive patch...so may or may not be
> >> >as feasible as a reasonable default.
> >> >
> >>
> >> Not sure. That seems a bit overcomplicated and I don't think it depends
> >> on the number of rows the planner expects to see very much. I think we
> >> usually assume the linear search is cheaper for small arrays and then at
> >> some point the binary search starts winning The question is where this
> >> "break even" point is.
> >
> >Well since it has to do preprocessing work (expanding the array and
> >then sorting it), then the number of rows processed matters, right?
> >For example, doing a linear search on 1000 items only once is going to
> >be cheaper than preprocessing the array and then doing a binary
> >search, but only a very large row count the binary search +
> >preprocessing might very well win out for only a 10 element array.
> >
>
> Hmmm, good point. Essentially the initialization (sorting of the array)
> has some cost, and the question is how much extra per-tuple cost this
> adds. It's probably not worth it for a single lookup, but for many
> lookups it's probably OK. Let's see if I can do the math right:
>
>    N - number of lookups
>    K - number of array elements
>
> Cost to sort the array is
>
>     O(K * log(K)) = C1 * K * log(K)
>
> and the cost of a lookup is C2 * log(K), so with the extra cost amortized
> for N lookups, the total "per lookup" cost is
>
>     C1 * K * log(K) / N + C2 * log(K) = log(K) * (C1 * K / N + C2)
>
> We need to compare this to the O(K) cost of simple linear search, and
> the question is at which point the linear search gets more expensive:
>
>     C3 * K = log(K) * (C1 * K / N + C2)
>
> I think we can assume that C3 is somewhere in between 0.5 and 1, i.e. if
> there's a matching item we find it half-way through on average, and if
> there is not we have to walk the whole array. So let's say it's 1.
>
> C1 and C2 are probably fairly low, I think - C1 is typically ~1.4 for
> random pivot choice IIRC, and C2 is probably similar. With both values
> being ~1.5 we get this:
>
>     K = log(K) * (1.5 * K/N + 1.5)
>
> for a fixed K, we get this formula for N:
>
>     N = log(K) * 1.5 * K / (K - 1.5 * log(K))
>
> and for a bunch of K values the results look like this:
>
>          K |     N
>     -------|-------
>          1 |     0
>         10 |  5.27
>        100 |  7.42
>       1000 | 10.47
>      10000 | 13.83
>     100000 | 17.27
>
> i.e. the binary search with 10k values starts winning over linear search
> with just ~13 lookups.
>
> (Assuming I haven't made some silly mistake in the math ...)
>
> Obviously, this only accounts for cost of comparisons and neglects e.g.
> the indirect costs for less predictable memory access patterns mentioned
> by Andres in his response.
>
> But I think it still shows the number of lookups needed for the binary
> search to be a win is pretty low - at least for reasonable number of
> values in the array. Maybe it's 20 and not 10, but I don't think that
> changes much.
>
> The other question is if we can get N at all and how reliable the value
> is. We can probably get the number of rows, but that will ignore other
> conditions that may eliminate the row before the binary search.
>
> >I'm not trying to argue for more work for myself here: I think the
> >optimization is worth it on its own, and something like this could be
> >a further improvement on its own. But it is interesting to think
> >about.
> >
>
> I don't know. Clearly, if the user sends a query with 10k values and
> only does a single lookup, that won't win. And if we can reasonably and
> reliably protect against that, I wouldn't mind doing that, although it
> means a risk of not using the bin search in case of underestimates etc.
>
> I don't have any hard data about this, but I think we can assume the
> number of rows processed by the clause is (much) higher than the number
> of keys in it. If you have a clause with 10k values, then you probably
> expect it to be applied to many rows, far more than the "beak even"
> point of about 10-20 rows ...
>
> So I wouldn't worry about this too much.

Yeah. I think it becomes a lot more interesting in the future if/when
we end up with a way to use this with params and not just constant
arrays. Then the "group" size would matter a whole lot more.

For now, the constant amount of overhead is quite small, so even if we
only execute it once we won't make the query that much worse (or, at
least, the total query time will still be very small). Also, because
it's only applied to constants, there's a natural limit to how much
overhead we're likely to introduce into a query.

James



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Setting min/max TLS protocol in clientside libpq
Next
From: Peter Geoghegan
Date:
Subject: Using Valgrind to detect faulty buffer accesses (no pin or buffercontent lock held)