Thread: Binary search in ScalarArrayOpExpr for OR'd constant arrays

Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

21 April 2020, 01:27:34

Over in "execExprInterp() questions / How to improve scalar array op
expr eval?" [1] I'd mused about how we might be able to optimized
scalar array ops with OR'd semantics.

This patch implements a binary search for such expressions when the
array argument is a constant so that we can avoid needing to teach
expression execution to cache stable values or know when a param has
changed.

The speed-up for the target case can pretty impressive: in my
admittedly contrived and relatively unscientific test with a query in
the form:

select count(*) from generate_series(1,100000) n(i) where i in (<1000
random integers in the series>)

shows ~30ms for the patch versus ~640ms on master.

James

[1]: https://www.postgresql.org/message-id/flat/CAAaqYe-UQBba7sScrucDOyHb7cDoNbWf_rcLrOWeD4ikP3_qTQ%40mail.gmail.com

Attachment

v1-0001-Binary-search-const-arrays-in-OR-d-ScalarArrayOps.patch

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tomas Vondra

Date:

23 April 2020, 12:47:00

On Mon, Apr 20, 2020 at 09:27:34PM -0400, James Coleman wrote:
>Over in "execExprInterp() questions / How to improve scalar array op
>expr eval?" [1] I'd mused about how we might be able to optimized
>scalar array ops with OR'd semantics.
>
>This patch implements a binary search for such expressions when the
>array argument is a constant so that we can avoid needing to teach
>expression execution to cache stable values or know when a param has
>changed.
>
>The speed-up for the target case can pretty impressive: in my
>admittedly contrived and relatively unscientific test with a query in
>the form:
>
>select count(*) from generate_series(1,100000) n(i) where i in (<1000
>random integers in the series>)
>
>shows ~30ms for the patch versus ~640ms on master.
>

Nice improvement, although 1000 items is probably a bit unusual. The
threshold used in the patch (9 elements) seems a bit too low - what
results have you seen with smaller arrays?

Another idea - would a bloom filter be useful here, as a second
optimization? That is, for large arrays build s small bloom filter,
allowing us to skip even the binary search.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

23 April 2020, 13:02:26

On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Mon, Apr 20, 2020 at 09:27:34PM -0400, James Coleman wrote:
> >Over in "execExprInterp() questions / How to improve scalar array op
> >expr eval?" [1] I'd mused about how we might be able to optimized
> >scalar array ops with OR'd semantics.
> >
> >This patch implements a binary search for such expressions when the
> >array argument is a constant so that we can avoid needing to teach
> >expression execution to cache stable values or know when a param has
> >changed.
> >
> >The speed-up for the target case can pretty impressive: in my
> >admittedly contrived and relatively unscientific test with a query in
> >the form:
> >
> >select count(*) from generate_series(1,100000) n(i) where i in (<1000
> >random integers in the series>)
> >
> >shows ~30ms for the patch versus ~640ms on master.
> >
>
> Nice improvement, although 1000 items is probably a bit unusual. The
> threshold used in the patch (9 elements) seems a bit too low - what
> results have you seen with smaller arrays?

At least in our systems we regularly work with 1000 batches of items,
which means you get IN clauses of identifiers of that size. Admittedly
the most common case sees those IN clauses as simple index scans
(e.g., WHERE <primary key> IN (...)), but it's also common to have a
broader query that merely filters additionally on something like "...
AND <some foreign key> IN (...)" where it makes sense for the rest of
the quals to take precedence in generating a reasonable plan. In that
case, the IN becomes a regular filter, hence the idea behind the
patch.

Side note: I'd love for us to be able to treat "IN (VALUES)" the same
way...but as noted in the other thread that's an extremely large
amount of work, I think. But similarly you could use a hash here
instead of a binary search...but this seems quite good.

As to the choice of 9 elements: I just picked that as a starting
point; Andres had previously commented off hand that at 8 elements
serial scanning was faster, so I figured this was a reasonable
starting point for discussion.

Perhaps it would make sense to determine that minimum not as a pure
constant but scaled based on how many rows the planner expects us to
see? Of course that'd be a more invasive patch...so may or may not be
as feasible as a reasonable default.

> Another idea - would a bloom filter be useful here, as a second
> optimization? That is, for large arrays build s small bloom filter,
> allowing us to skip even the binary search.

That's an interesting idea. I actually haven't personally worked with
bloom filters, so didn't think about that.

Are you thinking that you'd also build the filter *and* presort the
array? Or try to get away with using only the bloom filter and not
expanding and sorting the array at all?

Thanks,
James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tomas Vondra

Date:

23 April 2020, 14:55:51

On Thu, Apr 23, 2020 at 09:02:26AM -0400, James Coleman wrote:
>On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Mon, Apr 20, 2020 at 09:27:34PM -0400, James Coleman wrote:
>> >Over in "execExprInterp() questions / How to improve scalar array op
>> >expr eval?" [1] I'd mused about how we might be able to optimized
>> >scalar array ops with OR'd semantics.
>> >
>> >This patch implements a binary search for such expressions when the
>> >array argument is a constant so that we can avoid needing to teach
>> >expression execution to cache stable values or know when a param has
>> >changed.
>> >
>> >The speed-up for the target case can pretty impressive: in my
>> >admittedly contrived and relatively unscientific test with a query in
>> >the form:
>> >
>> >select count(*) from generate_series(1,100000) n(i) where i in (<1000
>> >random integers in the series>)
>> >
>> >shows ~30ms for the patch versus ~640ms on master.
>> >
>>
>> Nice improvement, although 1000 items is probably a bit unusual. The
>> threshold used in the patch (9 elements) seems a bit too low - what
>> results have you seen with smaller arrays?
>
>At least in our systems we regularly work with 1000 batches of items,
>which means you get IN clauses of identifiers of that size. Admittedly
>the most common case sees those IN clauses as simple index scans
>(e.g., WHERE <primary key> IN (...)), but it's also common to have a
>broader query that merely filters additionally on something like "...
>AND <some foreign key> IN (...)" where it makes sense for the rest of
>the quals to take precedence in generating a reasonable plan. In that
>case, the IN becomes a regular filter, hence the idea behind the
>patch.
>
>Side note: I'd love for us to be able to treat "IN (VALUES)" the same
>way...but as noted in the other thread that's an extremely large
>amount of work, I think. But similarly you could use a hash here
>instead of a binary search...but this seems quite good.
>
>As to the choice of 9 elements: I just picked that as a starting
>point; Andres had previously commented off hand that at 8 elements
>serial scanning was faster, so I figured this was a reasonable
>starting point for discussion.
>
>Perhaps it would make sense to determine that minimum not as a pure
>constant but scaled based on how many rows the planner expects us to
>see? Of course that'd be a more invasive patch...so may or may not be
>as feasible as a reasonable default.
>

Not sure. That seems a bit overcomplicated and I don't think it depends
on the number of rows the planner expects to see very much. I think we
usually assume the linear search is cheaper for small arrays and then at
some point the binary search starts winning The question is where this
"break even" point is.

I think we usually use something like 64 or so in other places, but
maybe I'm wrong. The current limit 9 seems a bit too low, but I may be
wrong. Let's not obsess about this too much, let's do some experiments
and pick a value based on that.


>> Another idea - would a bloom filter be useful here, as a second
>> optimization? That is, for large arrays build s small bloom filter,
>> allowing us to skip even the binary search.
>
>That's an interesting idea. I actually haven't personally worked with
>bloom filters, so didn't think about that.
>
>Are you thinking that you'd also build the filter *and* presort the
>array? Or try to get away with using only the bloom filter and not
>expanding and sorting the array at all?
>

Yeah, something like that. My intuition is the bloom filter is useful
only above some number of items, and the number is higher than for the
binary search. So we'd end up with two thresholds, first one enabling
binary search, the second one enabling bloom filter.

Of course, the "unknown" variable here is how often we actually find the
value in the array. If 100% of the queries has a match, then the bloom
filter is a waste of time. If there are no matches, it can make a
significant difference.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

23 April 2020, 22:09:36

On Fri, 24 Apr 2020 at 02:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Thu, Apr 23, 2020 at 09:02:26AM -0400, James Coleman wrote:
> >On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >As to the choice of 9 elements: I just picked that as a starting
> >point; Andres had previously commented off hand that at 8 elements
> >serial scanning was faster, so I figured this was a reasonable
> >starting point for discussion.
> >
> >Perhaps it would make sense to determine that minimum not as a pure
> >constant but scaled based on how many rows the planner expects us to
> >see? Of course that'd be a more invasive patch...so may or may not be
> >as feasible as a reasonable default.
> >
>
> Not sure. That seems a bit overcomplicated and I don't think it depends
> on the number of rows the planner expects to see very much. I think we
> usually assume the linear search is cheaper for small arrays and then at
> some point the binary search starts winning The question is where this
> "break even" point is.
>
> I think we usually use something like 64 or so in other places, but
> maybe I'm wrong. The current limit 9 seems a bit too low, but I may be
> wrong. Let's not obsess about this too much, let's do some experiments
> and pick a value based on that.

If single comparison for a binary search costs about the same as an
equality check, then wouldn't the crossover point be much lower than
64? The binary search should find or not find the target in log2(N)
rather than N.  ceil(log2(9)) is 4, which is of course less than 9.
For 64, it's 6, so are you not just doing a possible 58 equality
checks than necessary?  Of course, it's a bit more complex as for
values that *are* in the array, the linear search will, on average,
only check half the values. Assuming that, then 9 does not seem too
far off.  I guess benchmarks at various crossover points would speak a
thousand words.

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Andres Freund

Date:

23 April 2020, 22:35:44

Hi,

On 2020-04-24 10:09:36 +1200, David Rowley wrote:
> If single comparison for a binary search costs about the same as an
> equality check, then wouldn't the crossover point be much lower than
> 64?

The costs aren't quite as simple as that though. Binary search usually
has issues with cache misses: In contrast to linear accesses each step
will be a cache miss, as the address is not predictable; and even if the
CPU couldn't predict accesses in the linear search case, often multiple
entries fit on a single cache line. Additionally out-of-order execution
is usually a lot more effective for linear searches (e.g. the next
elements can be compared before the current one is finished if that's
what the branch predictor says is likely).

Greetings,

Andres Freund

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

24 April 2020, 13:38:54

On Thu, Apr 23, 2020 at 10:55 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Thu, Apr 23, 2020 at 09:02:26AM -0400, James Coleman wrote:
> >On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >>
> >> On Mon, Apr 20, 2020 at 09:27:34PM -0400, James Coleman wrote:
> >> >Over in "execExprInterp() questions / How to improve scalar array op
> >> >expr eval?" [1] I'd mused about how we might be able to optimized
> >> >scalar array ops with OR'd semantics.
> >> >
> >> >This patch implements a binary search for such expressions when the
> >> >array argument is a constant so that we can avoid needing to teach
> >> >expression execution to cache stable values or know when a param has
> >> >changed.
> >> >
> >> >The speed-up for the target case can pretty impressive: in my
> >> >admittedly contrived and relatively unscientific test with a query in
> >> >the form:
> >> >
> >> >select count(*) from generate_series(1,100000) n(i) where i in (<1000
> >> >random integers in the series>)
> >> >
> >> >shows ~30ms for the patch versus ~640ms on master.
> >> >
> >>
> >> Nice improvement, although 1000 items is probably a bit unusual. The
> >> threshold used in the patch (9 elements) seems a bit too low - what
> >> results have you seen with smaller arrays?
> >
> >At least in our systems we regularly work with 1000 batches of items,
> >which means you get IN clauses of identifiers of that size. Admittedly
> >the most common case sees those IN clauses as simple index scans
> >(e.g., WHERE <primary key> IN (...)), but it's also common to have a
> >broader query that merely filters additionally on something like "...
> >AND <some foreign key> IN (...)" where it makes sense for the rest of
> >the quals to take precedence in generating a reasonable plan. In that
> >case, the IN becomes a regular filter, hence the idea behind the
> >patch.
> >
> >Side note: I'd love for us to be able to treat "IN (VALUES)" the same
> >way...but as noted in the other thread that's an extremely large
> >amount of work, I think. But similarly you could use a hash here
> >instead of a binary search...but this seems quite good.
> >
> >As to the choice of 9 elements: I just picked that as a starting
> >point; Andres had previously commented off hand that at 8 elements
> >serial scanning was faster, so I figured this was a reasonable
> >starting point for discussion.
> >
> >Perhaps it would make sense to determine that minimum not as a pure
> >constant but scaled based on how many rows the planner expects us to
> >see? Of course that'd be a more invasive patch...so may or may not be
> >as feasible as a reasonable default.
> >
>
> Not sure. That seems a bit overcomplicated and I don't think it depends
> on the number of rows the planner expects to see very much. I think we
> usually assume the linear search is cheaper for small arrays and then at
> some point the binary search starts winning The question is where this
> "break even" point is.

Well since it has to do preprocessing work (expanding the array and
then sorting it), then the number of rows processed matters, right?
For example, doing a linear search on 1000 items only once is going to
be cheaper than preprocessing the array and then doing a binary
search, but only a very large row count the binary search +
preprocessing might very well win out for only a 10 element array.

I'm not trying to argue for more work for myself here: I think the
optimization is worth it on its own, and something like this could be
a further improvement on its own. But it is interesting to think
about.

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tomas Vondra

Date:

24 April 2020, 21:55:15

On Fri, Apr 24, 2020 at 09:38:54AM -0400, James Coleman wrote:
>On Thu, Apr 23, 2020 at 10:55 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Thu, Apr 23, 2020 at 09:02:26AM -0400, James Coleman wrote:
>> >On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
>> ><tomas.vondra@2ndquadrant.com> wrote:
>> >>
>> >> On Mon, Apr 20, 2020 at 09:27:34PM -0400, James Coleman wrote:
>> >> >Over in "execExprInterp() questions / How to improve scalar array op
>> >> >expr eval?" [1] I'd mused about how we might be able to optimized
>> >> >scalar array ops with OR'd semantics.
>> >> >
>> >> >This patch implements a binary search for such expressions when the
>> >> >array argument is a constant so that we can avoid needing to teach
>> >> >expression execution to cache stable values or know when a param has
>> >> >changed.
>> >> >
>> >> >The speed-up for the target case can pretty impressive: in my
>> >> >admittedly contrived and relatively unscientific test with a query in
>> >> >the form:
>> >> >
>> >> >select count(*) from generate_series(1,100000) n(i) where i in (<1000
>> >> >random integers in the series>)
>> >> >
>> >> >shows ~30ms for the patch versus ~640ms on master.
>> >> >
>> >>
>> >> Nice improvement, although 1000 items is probably a bit unusual. The
>> >> threshold used in the patch (9 elements) seems a bit too low - what
>> >> results have you seen with smaller arrays?
>> >
>> >At least in our systems we regularly work with 1000 batches of items,
>> >which means you get IN clauses of identifiers of that size. Admittedly
>> >the most common case sees those IN clauses as simple index scans
>> >(e.g., WHERE <primary key> IN (...)), but it's also common to have a
>> >broader query that merely filters additionally on something like "...
>> >AND <some foreign key> IN (...)" where it makes sense for the rest of
>> >the quals to take precedence in generating a reasonable plan. In that
>> >case, the IN becomes a regular filter, hence the idea behind the
>> >patch.
>> >
>> >Side note: I'd love for us to be able to treat "IN (VALUES)" the same
>> >way...but as noted in the other thread that's an extremely large
>> >amount of work, I think. But similarly you could use a hash here
>> >instead of a binary search...but this seems quite good.
>> >
>> >As to the choice of 9 elements: I just picked that as a starting
>> >point; Andres had previously commented off hand that at 8 elements
>> >serial scanning was faster, so I figured this was a reasonable
>> >starting point for discussion.
>> >
>> >Perhaps it would make sense to determine that minimum not as a pure
>> >constant but scaled based on how many rows the planner expects us to
>> >see? Of course that'd be a more invasive patch...so may or may not be
>> >as feasible as a reasonable default.
>> >
>>
>> Not sure. That seems a bit overcomplicated and I don't think it depends
>> on the number of rows the planner expects to see very much. I think we
>> usually assume the linear search is cheaper for small arrays and then at
>> some point the binary search starts winning The question is where this
>> "break even" point is.
>
>Well since it has to do preprocessing work (expanding the array and
>then sorting it), then the number of rows processed matters, right?
>For example, doing a linear search on 1000 items only once is going to
>be cheaper than preprocessing the array and then doing a binary
>search, but only a very large row count the binary search +
>preprocessing might very well win out for only a 10 element array.
>

Hmmm, good point. Essentially the initialization (sorting of the array)
has some cost, and the question is how much extra per-tuple cost this
adds. It's probably not worth it for a single lookup, but for many
lookups it's probably OK. Let's see if I can do the math right:

   N - number of lookups
   K - number of array elements

Cost to sort the array is

    O(K * log(K)) = C1 * K * log(K)

and the cost of a lookup is C2 * log(K), so with the extra cost amortized
for N lookups, the total "per lookup" cost is

    C1 * K * log(K) / N + C2 * log(K) = log(K) * (C1 * K / N + C2)

We need to compare this to the O(K) cost of simple linear search, and
the question is at which point the linear search gets more expensive:

    C3 * K = log(K) * (C1 * K / N + C2)

I think we can assume that C3 is somewhere in between 0.5 and 1, i.e. if
there's a matching item we find it half-way through on average, and if
there is not we have to walk the whole array. So let's say it's 1.

C1 and C2 are probably fairly low, I think - C1 is typically ~1.4 for
random pivot choice IIRC, and C2 is probably similar. With both values
being ~1.5 we get this:

    K = log(K) * (1.5 * K/N + 1.5)

for a fixed K, we get this formula for N:

    N = log(K) * 1.5 * K / (K - 1.5 * log(K))

and for a bunch of K values the results look like this:

         K |     N
    -------|-------
         1 |     0
        10 |  5.27
       100 |  7.42
      1000 | 10.47
     10000 | 13.83
    100000 | 17.27

i.e. the binary search with 10k values starts winning over linear search
with just ~13 lookups.

(Assuming I haven't made some silly mistake in the math ...)

Obviously, this only accounts for cost of comparisons and neglects e.g.
the indirect costs for less predictable memory access patterns mentioned
by Andres in his response.

But I think it still shows the number of lookups needed for the binary
search to be a win is pretty low - at least for reasonable number of
values in the array. Maybe it's 20 and not 10, but I don't think that
changes much.

The other question is if we can get N at all and how reliable the value
is. We can probably get the number of rows, but that will ignore other
conditions that may eliminate the row before the binary search.

>I'm not trying to argue for more work for myself here: I think the
>optimization is worth it on its own, and something like this could be
>a further improvement on its own. But it is interesting to think
>about.
>

I don't know. Clearly, if the user sends a query with 10k values and
only does a single lookup, that won't win. And if we can reasonably and
reliably protect against that, I wouldn't mind doing that, although it
means a risk of not using the bin search in case of underestimates etc.

I don't have any hard data about this, but I think we can assume the
number of rows processed by the clause is (much) higher than the number
of keys in it. If you have a clause with 10k values, then you probably
expect it to be applied to many rows, far more than the "beak even"
point of about 10-20 rows ...

So I wouldn't worry about this too much.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tomas Vondra

Date:

24 April 2020, 22:21:06

On Thu, Apr 23, 2020 at 04:55:51PM +0200, Tomas Vondra wrote:
>On Thu, Apr 23, 2020 at 09:02:26AM -0400, James Coleman wrote:
>>On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
>><tomas.vondra@2ndquadrant.com> wrote:
>>>
>>>On Mon, Apr 20, 2020 at 09:27:34PM -0400, James Coleman wrote:
>>>>Over in "execExprInterp() questions / How to improve scalar array op
>>>>expr eval?" [1] I'd mused about how we might be able to optimized
>>>>scalar array ops with OR'd semantics.
>>>>
>>>>This patch implements a binary search for such expressions when the
>>>>array argument is a constant so that we can avoid needing to teach
>>>>expression execution to cache stable values or know when a param has
>>>>changed.
>>>>
>>>>The speed-up for the target case can pretty impressive: in my
>>>>admittedly contrived and relatively unscientific test with a query in
>>>>the form:
>>>>
>>>>select count(*) from generate_series(1,100000) n(i) where i in (<1000
>>>>random integers in the series>)
>>>>
>>>>shows ~30ms for the patch versus ~640ms on master.
>>>>
>>>
>>>Nice improvement, although 1000 items is probably a bit unusual. The
>>>threshold used in the patch (9 elements) seems a bit too low - what
>>>results have you seen with smaller arrays?
>>
>>At least in our systems we regularly work with 1000 batches of items,
>>which means you get IN clauses of identifiers of that size. Admittedly
>>the most common case sees those IN clauses as simple index scans
>>(e.g., WHERE <primary key> IN (...)), but it's also common to have a
>>broader query that merely filters additionally on something like "...
>>AND <some foreign key> IN (...)" where it makes sense for the rest of
>>the quals to take precedence in generating a reasonable plan. In that
>>case, the IN becomes a regular filter, hence the idea behind the
>>patch.
>>
>>Side note: I'd love for us to be able to treat "IN (VALUES)" the same
>>way...but as noted in the other thread that's an extremely large
>>amount of work, I think. But similarly you could use a hash here
>>instead of a binary search...but this seems quite good.
>>
>>As to the choice of 9 elements: I just picked that as a starting
>>point; Andres had previously commented off hand that at 8 elements
>>serial scanning was faster, so I figured this was a reasonable
>>starting point for discussion.
>>
>>Perhaps it would make sense to determine that minimum not as a pure
>>constant but scaled based on how many rows the planner expects us to
>>see? Of course that'd be a more invasive patch...so may or may not be
>>as feasible as a reasonable default.
>>
>
>Not sure. That seems a bit overcomplicated and I don't think it depends
>on the number of rows the planner expects to see very much. I think we
>usually assume the linear search is cheaper for small arrays and then at
>some point the binary search starts winning The question is where this
>"break even" point is.
>
>I think we usually use something like 64 or so in other places, but
>maybe I'm wrong. The current limit 9 seems a bit too low, but I may be
>wrong. Let's not obsess about this too much, let's do some experiments
>and pick a value based on that.
>
>
>>>Another idea - would a bloom filter be useful here, as a second
>>>optimization? That is, for large arrays build s small bloom filter,
>>>allowing us to skip even the binary search.
>>
>>That's an interesting idea. I actually haven't personally worked with
>>bloom filters, so didn't think about that.
>>
>>Are you thinking that you'd also build the filter *and* presort the
>>array? Or try to get away with using only the bloom filter and not
>>expanding and sorting the array at all?
>>
>
>Yeah, something like that. My intuition is the bloom filter is useful
>only above some number of items, and the number is higher than for the
>binary search. So we'd end up with two thresholds, first one enabling
>binary search, the second one enabling bloom filter.
>
>Of course, the "unknown" variable here is how often we actually find the
>value in the array. If 100% of the queries has a match, then the bloom
>filter is a waste of time. If there are no matches, it can make a
>significant difference.
>

I did experiment with this is a bit, both to get a bit more familiar
with this code and to see if the bloom filter might help. The short
answer is the bloom filter does not seem to help at all, so I wouldn't
bother about it too much.

Attacched is an updated patch series and, script I used to collect some
performance measurements, and a spreadsheet with results. The patch
series is broken into four parts:

   0001 - the original patch with binary search
   0002 - adds GUCs to enable bin search / tweak threshold
   0003 - allows to use bloom filter + binary search
   0004 - try using murmurhash

The test script runs a wide range of queries with different number
of lookups, keys in the array, match probability (i.e. fraction of
lookups that find a match) ranging from 1% to 100%. And of course, it
runs this with the binsearch/bloom either enabled or disabled (the
threshold was lowered to 1, so it's the on/off GUCs that determine
whether the binsearch/bloom is used).

The results are summarized in the spreadsheet, demonstrating how useless
the bloom filter is. There's not a single case where it would beat the
binary search. I believe this is because theaccess to bloom filter is
random (determined by the hash function) and we don't save much compared
to the log(K) lookups in the sorted array.

That makes sense, I think the bloom filters are meant to be used in
cases when the main data don't fit into memory - which is not the case
here. But I wonder how would this change for cases with more expensive
comparisons - this was using just integers, so maybe strings would
result in different behavior.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

25 April 2020, 01:22:34

On Fri, Apr 24, 2020 at 5:55 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Apr 24, 2020 at 09:38:54AM -0400, James Coleman wrote:
> >On Thu, Apr 23, 2020 at 10:55 AM Tomas Vondra
> ><tomas.vondra@2ndquadrant.com> wrote:
> >>
> >> On Thu, Apr 23, 2020 at 09:02:26AM -0400, James Coleman wrote:
> >> >On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
> >> ><tomas.vondra@2ndquadrant.com> wrote:
> >> >>
> >> >> On Mon, Apr 20, 2020 at 09:27:34PM -0400, James Coleman wrote:
> >> >> >Over in "execExprInterp() questions / How to improve scalar array op
> >> >> >expr eval?" [1] I'd mused about how we might be able to optimized
> >> >> >scalar array ops with OR'd semantics.
> >> >> >
> >> >> >This patch implements a binary search for such expressions when the
> >> >> >array argument is a constant so that we can avoid needing to teach
> >> >> >expression execution to cache stable values or know when a param has
> >> >> >changed.
> >> >> >
> >> >> >The speed-up for the target case can pretty impressive: in my
> >> >> >admittedly contrived and relatively unscientific test with a query in
> >> >> >the form:
> >> >> >
> >> >> >select count(*) from generate_series(1,100000) n(i) where i in (<1000
> >> >> >random integers in the series>)
> >> >> >
> >> >> >shows ~30ms for the patch versus ~640ms on master.
> >> >> >
> >> >>
> >> >> Nice improvement, although 1000 items is probably a bit unusual. The
> >> >> threshold used in the patch (9 elements) seems a bit too low - what
> >> >> results have you seen with smaller arrays?
> >> >
> >> >At least in our systems we regularly work with 1000 batches of items,
> >> >which means you get IN clauses of identifiers of that size. Admittedly
> >> >the most common case sees those IN clauses as simple index scans
> >> >(e.g., WHERE <primary key> IN (...)), but it's also common to have a
> >> >broader query that merely filters additionally on something like "...
> >> >AND <some foreign key> IN (...)" where it makes sense for the rest of
> >> >the quals to take precedence in generating a reasonable plan. In that
> >> >case, the IN becomes a regular filter, hence the idea behind the
> >> >patch.
> >> >
> >> >Side note: I'd love for us to be able to treat "IN (VALUES)" the same
> >> >way...but as noted in the other thread that's an extremely large
> >> >amount of work, I think. But similarly you could use a hash here
> >> >instead of a binary search...but this seems quite good.
> >> >
> >> >As to the choice of 9 elements: I just picked that as a starting
> >> >point; Andres had previously commented off hand that at 8 elements
> >> >serial scanning was faster, so I figured this was a reasonable
> >> >starting point for discussion.
> >> >
> >> >Perhaps it would make sense to determine that minimum not as a pure
> >> >constant but scaled based on how many rows the planner expects us to
> >> >see? Of course that'd be a more invasive patch...so may or may not be
> >> >as feasible as a reasonable default.
> >> >
> >>
> >> Not sure. That seems a bit overcomplicated and I don't think it depends
> >> on the number of rows the planner expects to see very much. I think we
> >> usually assume the linear search is cheaper for small arrays and then at
> >> some point the binary search starts winning The question is where this
> >> "break even" point is.
> >
> >Well since it has to do preprocessing work (expanding the array and
> >then sorting it), then the number of rows processed matters, right?
> >For example, doing a linear search on 1000 items only once is going to
> >be cheaper than preprocessing the array and then doing a binary
> >search, but only a very large row count the binary search +
> >preprocessing might very well win out for only a 10 element array.
> >
>
> Hmmm, good point. Essentially the initialization (sorting of the array)
> has some cost, and the question is how much extra per-tuple cost this
> adds. It's probably not worth it for a single lookup, but for many
> lookups it's probably OK. Let's see if I can do the math right:
>
>    N - number of lookups
>    K - number of array elements
>
> Cost to sort the array is
>
>     O(K * log(K)) = C1 * K * log(K)
>
> and the cost of a lookup is C2 * log(K), so with the extra cost amortized
> for N lookups, the total "per lookup" cost is
>
>     C1 * K * log(K) / N + C2 * log(K) = log(K) * (C1 * K / N + C2)
>
> We need to compare this to the O(K) cost of simple linear search, and
> the question is at which point the linear search gets more expensive:
>
>     C3 * K = log(K) * (C1 * K / N + C2)
>
> I think we can assume that C3 is somewhere in between 0.5 and 1, i.e. if
> there's a matching item we find it half-way through on average, and if
> there is not we have to walk the whole array. So let's say it's 1.
>
> C1 and C2 are probably fairly low, I think - C1 is typically ~1.4 for
> random pivot choice IIRC, and C2 is probably similar. With both values
> being ~1.5 we get this:
>
>     K = log(K) * (1.5 * K/N + 1.5)
>
> for a fixed K, we get this formula for N:
>
>     N = log(K) * 1.5 * K / (K - 1.5 * log(K))
>
> and for a bunch of K values the results look like this:
>
>          K |     N
>     -------|-------
>          1 |     0
>         10 |  5.27
>        100 |  7.42
>       1000 | 10.47
>      10000 | 13.83
>     100000 | 17.27
>
> i.e. the binary search with 10k values starts winning over linear search
> with just ~13 lookups.
>
> (Assuming I haven't made some silly mistake in the math ...)
>
> Obviously, this only accounts for cost of comparisons and neglects e.g.
> the indirect costs for less predictable memory access patterns mentioned
> by Andres in his response.
>
> But I think it still shows the number of lookups needed for the binary
> search to be a win is pretty low - at least for reasonable number of
> values in the array. Maybe it's 20 and not 10, but I don't think that
> changes much.
>
> The other question is if we can get N at all and how reliable the value
> is. We can probably get the number of rows, but that will ignore other
> conditions that may eliminate the row before the binary search.
>
> >I'm not trying to argue for more work for myself here: I think the
> >optimization is worth it on its own, and something like this could be
> >a further improvement on its own. But it is interesting to think
> >about.
> >
>
> I don't know. Clearly, if the user sends a query with 10k values and
> only does a single lookup, that won't win. And if we can reasonably and
> reliably protect against that, I wouldn't mind doing that, although it
> means a risk of not using the bin search in case of underestimates etc.
>
> I don't have any hard data about this, but I think we can assume the
> number of rows processed by the clause is (much) higher than the number
> of keys in it. If you have a clause with 10k values, then you probably
> expect it to be applied to many rows, far more than the "beak even"
> point of about 10-20 rows ...
>
> So I wouldn't worry about this too much.

Yeah. I think it becomes a lot more interesting in the future if/when
we end up with a way to use this with params and not just constant
arrays. Then the "group" size would matter a whole lot more.

For now, the constant amount of overhead is quite small, so even if we
only execute it once we won't make the query that much worse (or, at
least, the total query time will still be very small). Also, because
it's only applied to constants, there's a natural limit to how much
overhead we're likely to introduce into a query.

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tomas Vondra

Date:

25 April 2020, 12:40:24

On Sat, Apr 25, 2020 at 12:21:06AM +0200, Tomas Vondra wrote:
>On Thu, Apr 23, 2020 at 04:55:51PM +0200, Tomas Vondra wrote:
>>On Thu, Apr 23, 2020 at 09:02:26AM -0400, James Coleman wrote:
>>>On Thu, Apr 23, 2020 at 8:47 AM Tomas Vondra
>>><tomas.vondra@2ndquadrant.com> wrote:
>>>>
>>>>On Mon, Apr 20, 2020 at 09:27:34PM -0400, James Coleman wrote:
>>>>>Over in "execExprInterp() questions / How to improve scalar array op
>>>>>expr eval?" [1] I'd mused about how we might be able to optimized
>>>>>scalar array ops with OR'd semantics.
>>>>>
>>>>>This patch implements a binary search for such expressions when the
>>>>>array argument is a constant so that we can avoid needing to teach
>>>>>expression execution to cache stable values or know when a param has
>>>>>changed.
>>>>>
>>>>>The speed-up for the target case can pretty impressive: in my
>>>>>admittedly contrived and relatively unscientific test with a query in
>>>>>the form:
>>>>>
>>>>>select count(*) from generate_series(1,100000) n(i) where i in (<1000
>>>>>random integers in the series>)
>>>>>
>>>>>shows ~30ms for the patch versus ~640ms on master.
>>>>>
>>>>
>>>>Nice improvement, although 1000 items is probably a bit unusual. The
>>>>threshold used in the patch (9 elements) seems a bit too low - what
>>>>results have you seen with smaller arrays?
>>>
>>>At least in our systems we regularly work with 1000 batches of items,
>>>which means you get IN clauses of identifiers of that size. Admittedly
>>>the most common case sees those IN clauses as simple index scans
>>>(e.g., WHERE <primary key> IN (...)), but it's also common to have a
>>>broader query that merely filters additionally on something like "...
>>>AND <some foreign key> IN (...)" where it makes sense for the rest of
>>>the quals to take precedence in generating a reasonable plan. In that
>>>case, the IN becomes a regular filter, hence the idea behind the
>>>patch.
>>>
>>>Side note: I'd love for us to be able to treat "IN (VALUES)" the same
>>>way...but as noted in the other thread that's an extremely large
>>>amount of work, I think. But similarly you could use a hash here
>>>instead of a binary search...but this seems quite good.
>>>
>>>As to the choice of 9 elements: I just picked that as a starting
>>>point; Andres had previously commented off hand that at 8 elements
>>>serial scanning was faster, so I figured this was a reasonable
>>>starting point for discussion.
>>>
>>>Perhaps it would make sense to determine that minimum not as a pure
>>>constant but scaled based on how many rows the planner expects us to
>>>see? Of course that'd be a more invasive patch...so may or may not be
>>>as feasible as a reasonable default.
>>>
>>
>>Not sure. That seems a bit overcomplicated and I don't think it depends
>>on the number of rows the planner expects to see very much. I think we
>>usually assume the linear search is cheaper for small arrays and then at
>>some point the binary search starts winning The question is where this
>>"break even" point is.
>>
>>I think we usually use something like 64 or so in other places, but
>>maybe I'm wrong. The current limit 9 seems a bit too low, but I may be
>>wrong. Let's not obsess about this too much, let's do some experiments
>>and pick a value based on that.
>>
>>
>>>>Another idea - would a bloom filter be useful here, as a second
>>>>optimization? That is, for large arrays build s small bloom filter,
>>>>allowing us to skip even the binary search.
>>>
>>>That's an interesting idea. I actually haven't personally worked with
>>>bloom filters, so didn't think about that.
>>>
>>>Are you thinking that you'd also build the filter *and* presort the
>>>array? Or try to get away with using only the bloom filter and not
>>>expanding and sorting the array at all?
>>>
>>
>>Yeah, something like that. My intuition is the bloom filter is useful
>>only above some number of items, and the number is higher than for the
>>binary search. So we'd end up with two thresholds, first one enabling
>>binary search, the second one enabling bloom filter.
>>
>>Of course, the "unknown" variable here is how often we actually find the
>>value in the array. If 100% of the queries has a match, then the bloom
>>filter is a waste of time. If there are no matches, it can make a
>>significant difference.
>>
>
>I did experiment with this is a bit, both to get a bit more familiar
>with this code and to see if the bloom filter might help. The short
>answer is the bloom filter does not seem to help at all, so I wouldn't
>bother about it too much.
>
>Attacched is an updated patch series and, script I used to collect some
>performance measurements, and a spreadsheet with results. The patch
>series is broken into four parts:
>
>  0001 - the original patch with binary search
>  0002 - adds GUCs to enable bin search / tweak threshold
>  0003 - allows to use bloom filter + binary search
>  0004 - try using murmurhash
>
>The test script runs a wide range of queries with different number
>of lookups, keys in the array, match probability (i.e. fraction of
>lookups that find a match) ranging from 1% to 100%. And of course, it
>runs this with the binsearch/bloom either enabled or disabled (the
>threshold was lowered to 1, so it's the on/off GUCs that determine
>whether the binsearch/bloom is used).
>
>The results are summarized in the spreadsheet, demonstrating how useless
>the bloom filter is. There's not a single case where it would beat the
>binary search. I believe this is because theaccess to bloom filter is
>random (determined by the hash function) and we don't save much compared
>to the log(K) lookups in the sorted array.
>
>That makes sense, I think the bloom filters are meant to be used in
>cases when the main data don't fit into memory - which is not the case
>here. But I wonder how would this change for cases with more expensive
>comparisons - this was using just integers, so maybe strings would
>result in different behavior.

OK, I tried the same test with text columns (with md5 strings), and the
results are about as I predicted - the bloom filter actually makes a
difference in this case. Depending on the number of lookups and
selectivity (i.e. how many lookups have a match in the array) it can
mean additional speedup up to ~5x compared to binary search alone.

For the case with 100% selectivity (i.e. all rows have a match) this
can't really save any time - it's usually still much faster than master,
but it's a bit slower than binary search.

So I think this might be worth investigating further, once the simple
binary search gets committed. We'll probably need to factor in the cost
of the comparison (higher cost -> BF more useful), selectivity of the
filter (fewer matches -> BF more useful) and number of lookups.

This reminds me our attempts to add bloom filters to hash joins, which I
think ran into mostly the same challenge of deciding when the bloom
filter can be useful and is worth the extra work.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

On Sat, Apr 25, 2020 at 06:47:41PM -0400, James Coleman wrote:
>On Sat, Apr 25, 2020 at 5:41 PM David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> On Sun, 26 Apr 2020 at 00:40, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> > This reminds me our attempts to add bloom filters to hash joins, which I
>> > think ran into mostly the same challenge of deciding when the bloom
>> > filter can be useful and is worth the extra work.
>>
>> Speaking of that, it would be interesting to see how a test where you
>> write the query as IN(VALUES(...)) instead of IN() compares. It would
>> be interesting to know if the planner is able to make a more suitable
>> choice and also to see how all the work over the years to improve Hash
>> Joins compares to the bsearch with and without the bloom filter.
>
>It would be interesting.
>
>It also makes one wonder about optimizing these into to hash
>joins...which I'd thought about over at [1]. I think it'd be a very
>significant effort though.
>

I modified the script to also do the join version of the query. I can
only run it on my laptop at the moment, so the results may be a bit
different from those I shared before, but it's interesting I think.

In most cases it's comparable to the binsearch/bloom approach, and in
some cases it actually beats them quite significantly. It seems to
depend on how expensive the comparison is - for "int" the comparison is
very cheap and there's almost no difference. For "text" the comparison
is much more expensive, and there are significant speedups.

For example the test with 100k lookups in array of 10k elements and 10%
match probability, the timings are these

   master:  62362 ms
   binsearch: 201 ms
   bloom:      65 ms
   hashjoin:   36 ms

I do think the explanation is fairly simple - the bloom filter
eliminates about 90% of the expensive comparisons, so it's 20ms plus
some overhead to build and check the bits. The hash join probably
eliminates a lot of the remaining comparisons, because the hash table
is sized to have one tuple per bucket.

Note: I also don't claim the PoC has the most efficient bloom filter
implementation possible. I'm sure it could be made faster.

Anyway, I'm not sure transforming this to a hash join is worth the
effort - I agree that seems quite complex. But perhaps this suggest we
should not be doing binary search and instead just build a simple hash
table - that seems much simpler, and it'll probably give us about the
same benefits.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

On Sun, Apr 26, 2020 at 7:41 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Sun, Apr 26, 2020 at 4:49 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Sun, Apr 26, 2020 at 02:46:19PM -0400, James Coleman wrote:
> > >On Sat, Apr 25, 2020 at 8:31 PM Tomas Vondra
> > ><tomas.vondra@2ndquadrant.com> wrote:
> > >>
> > >> On Sat, Apr 25, 2020 at 06:47:41PM -0400, James Coleman wrote:
> > >> >On Sat, Apr 25, 2020 at 5:41 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > >> >>
> > >> >> On Sun, 26 Apr 2020 at 00:40, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> > >> >> > This reminds me our attempts to add bloom filters to hash joins, which I
> > >> >> > think ran into mostly the same challenge of deciding when the bloom
> > >> >> > filter can be useful and is worth the extra work.
> > >> >>
> > >> >> Speaking of that, it would be interesting to see how a test where you
> > >> >> write the query as IN(VALUES(...)) instead of IN() compares. It would
> > >> >> be interesting to know if the planner is able to make a more suitable
> > >> >> choice and also to see how all the work over the years to improve Hash
> > >> >> Joins compares to the bsearch with and without the bloom filter.
> > >> >
> > >> >It would be interesting.
> > >> >
> > >> >It also makes one wonder about optimizing these into to hash
> > >> >joins...which I'd thought about over at [1]. I think it'd be a very
> > >> >significant effort though.
> > >> >
> > >>
> > >> I modified the script to also do the join version of the query. I can
> > >> only run it on my laptop at the moment, so the results may be a bit
> > >> different from those I shared before, but it's interesting I think.
> > >>
> > >> In most cases it's comparable to the binsearch/bloom approach, and in
> > >> some cases it actually beats them quite significantly. It seems to
> > >> depend on how expensive the comparison is - for "int" the comparison is
> > >> very cheap and there's almost no difference. For "text" the comparison
> > >> is much more expensive, and there are significant speedups.
> > >>
> > >> For example the test with 100k lookups in array of 10k elements and 10%
> > >> match probability, the timings are these
> > >>
> > >> master: 62362 ms
> > >> binsearch: 201 ms
> > >> bloom: 65 ms
> > >> hashjoin: 36 ms
> > >>
> > >> I do think the explanation is fairly simple - the bloom filter
> > >> eliminates about 90% of the expensive comparisons, so it's 20ms plus
> > >> some overhead to build and check the bits. The hash join probably
> > >> eliminates a lot of the remaining comparisons, because the hash table
> > >> is sized to have one tuple per bucket.
> > >>
> > >> Note: I also don't claim the PoC has the most efficient bloom filter
> > >> implementation possible. I'm sure it could be made faster.
> > >>
> > >> Anyway, I'm not sure transforming this to a hash join is worth the
> > >> effort - I agree that seems quite complex. But perhaps this suggest we
> > >> should not be doing binary search and instead just build a simple hash
> > >> table - that seems much simpler, and it'll probably give us about the
> > >> same benefits.
> > >
> > >That's actually what I originally thought about doing, but I chose
> > >binary search since it seemed a lot easier to get off the ground.
> > >
> >
> > OK, that makes perfect sense.
> >
> > >If we instead build a hash is there anything else we need to be
> > >concerned about? For example, work mem? I suppose for the binary
> > >search we already have to expand the array, so perhaps it's not all
> > >that meaningful relative to that...
> > >
> >
> > I don't think we need to be particularly concerned about work_mem. We
> > don't care about it now, and it's not clear to me what we could do about
> > it - we already have the array in memory anyway, so it's a bit futile.
> > Furthermore, if we need to care about it, it probably applies to the
> > binary search too.
> >
> > >I was looking earlier at what our standard hash implementation was,
> > >and it seemed less obvious what was needed to set that up (so binary
> > >search seemed a faster proof of concept). If you happen to have any
> > >pointers to similar usages I should look at, please let me know.
> > >
> >
> > I think the hash join implementation is far too complicated. It has to
> > care about work_mem, so it implements batching, etc. That's a lot of
> > complexity we don't need here. IMO we could use either the usual
> > dynahash, or maybe even the simpler simplehash.
> >
> > FWIW it'd be good to verify the numbers I shared, i.e. checking that the
> > benchmarks makes sense and running it independently. I'm not aware of
> > any issues but it was done late at night and only ran on my laptop.
>
> Some quick calculations (don't have the scripting in a form I can
> attach yet; using this as an opportunity to hack on a genericized
> performance testing framework of sorts) suggest your results are
> correct. I was also testing on my laptop, but I showed 1.) roughly
> equivalent results for IN (VALUES ...) and IN (<list>) for integers,
> but when I switch to (short; average 3 characters long) text values I
> show the hash join on VALUES is twice as fast as the binary search.
>
> Given that, I'm planning to implement this as a hash lookup and share
> a revised patch.

While working on this I noticed that dynahash.c line 499 has this assertion:

Assert(info->entrysize >= info->keysize);

Do you by any chance know why the entry would need to be larger than the key? In this case I'm really treating the hash like a set (if there's a hash set implementation that doesn't store a value, then I'd be happy to use that instead) so I've configured the entry as sizeof(bool) which is obviously smaller than the key.

If it helps, that line was added by Tom in fba2a104c6d.

Thanks,

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

27 April 2020, 03:44:35

On Mon, 27 Apr 2020 at 15:12, James Coleman <jtc331@gmail.com> wrote:
> While working on this I noticed that dynahash.c line 499 has this assertion:
>
> Assert(info->entrysize >= info->keysize);
>
> Do you by any chance know why the entry would need to be larger than the key?

Larger or equal. They'd be equal if you the key was the data, since
you do need to store at least the key.  Looking at the code for
examples where dynahash is used in that situation, I see
_hash_finish_split().

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

27 April 2020, 12:40:15

On Sun, Apr 26, 2020 at 11:44 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Mon, 27 Apr 2020 at 15:12, James Coleman <jtc331@gmail.com> wrote:
> > While working on this I noticed that dynahash.c line 499 has this assertion:
> >
> > Assert(info->entrysize >= info->keysize);
> >
> > Do you by any chance know why the entry would need to be larger than the key?
>
> Larger or equal. They'd be equal if you the key was the data, since
> you do need to store at least the key.  Looking at the code for
> examples where dynahash is used in that situation, I see
> _hash_finish_split().

Ah, I was thinking of it as key and value being separate sizes added
together rather than one including the other.

Thanks,
James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

28 April 2020, 01:04:09

On Sun, Apr 26, 2020 at 7:41 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Sun, Apr 26, 2020 at 4:49 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Sun, Apr 26, 2020 at 02:46:19PM -0400, James Coleman wrote:
> > >On Sat, Apr 25, 2020 at 8:31 PM Tomas Vondra
> > ><tomas.vondra@2ndquadrant.com> wrote:
> > >>
> > >> On Sat, Apr 25, 2020 at 06:47:41PM -0400, James Coleman wrote:
> > >> >On Sat, Apr 25, 2020 at 5:41 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > >> >>
> > >> >> On Sun, 26 Apr 2020 at 00:40, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> > >> >> > This reminds me our attempts to add bloom filters to hash joins, which I
> > >> >> > think ran into mostly the same challenge of deciding when the bloom
> > >> >> > filter can be useful and is worth the extra work.
> > >> >>
> > >> >> Speaking of that, it would be interesting to see how a test where you
> > >> >> write the query as IN(VALUES(...)) instead of IN() compares. It would
> > >> >> be interesting to know if the planner is able to make a more suitable
> > >> >> choice and also to see how all the work over the years to improve Hash
> > >> >> Joins compares to the bsearch with and without the bloom filter.
> > >> >
> > >> >It would be interesting.
> > >> >
> > >> >It also makes one wonder about optimizing these into to hash
> > >> >joins...which I'd thought about over at [1]. I think it'd be a very
> > >> >significant effort though.
> > >> >
> > >>
> > >> I modified the script to also do the join version of the query. I can
> > >> only run it on my laptop at the moment, so the results may be a bit
> > >> different from those I shared before, but it's interesting I think.
> > >>
> > >> In most cases it's comparable to the binsearch/bloom approach, and in
> > >> some cases it actually beats them quite significantly. It seems to
> > >> depend on how expensive the comparison is - for "int" the comparison is
> > >> very cheap and there's almost no difference. For "text" the comparison
> > >> is much more expensive, and there are significant speedups.
> > >>
> > >> For example the test with 100k lookups in array of 10k elements and 10%
> > >> match probability, the timings are these
> > >>
> > >>    master:  62362 ms
> > >>    binsearch: 201 ms
> > >>    bloom:      65 ms
> > >>    hashjoin:   36 ms
> > >>
> > >> I do think the explanation is fairly simple - the bloom filter
> > >> eliminates about 90% of the expensive comparisons, so it's 20ms plus
> > >> some overhead to build and check the bits. The hash join probably
> > >> eliminates a lot of the remaining comparisons, because the hash table
> > >> is sized to have one tuple per bucket.
> > >>
> > >> Note: I also don't claim the PoC has the most efficient bloom filter
> > >> implementation possible. I'm sure it could be made faster.
> > >>
> > >> Anyway, I'm not sure transforming this to a hash join is worth the
> > >> effort - I agree that seems quite complex. But perhaps this suggest we
> > >> should not be doing binary search and instead just build a simple hash
> > >> table - that seems much simpler, and it'll probably give us about the
> > >> same benefits.
> > >
> > >That's actually what I originally thought about doing, but I chose
> > >binary search since it seemed a lot easier to get off the ground.
> > >
> >
> > OK, that makes perfect sense.
> >
> > >If we instead build a hash is there anything else we need to be
> > >concerned about? For example, work mem? I suppose for the binary
> > >search we already have to expand the array, so perhaps it's not all
> > >that meaningful relative to that...
> > >
> >
> > I don't think we need to be particularly concerned about work_mem. We
> > don't care about it now, and it's not clear to me what we could do about
> > it - we already have the array in memory anyway, so it's a bit futile.
> > Furthermore, if we need to care about it, it probably applies to the
> > binary search too.
> >
> > >I was looking earlier at what our standard hash implementation was,
> > >and it seemed less obvious what was needed to set that up (so binary
> > >search seemed a faster proof of concept). If you happen to have any
> > >pointers to similar usages I should look at, please let me know.
> > >
> >
> > I think the hash join implementation is far too complicated. It has to
> > care about work_mem, so it implements batching, etc. That's a lot of
> > complexity we don't need here. IMO we could use either the usual
> > dynahash, or maybe even the simpler simplehash.
> >
> > FWIW it'd be good to verify the numbers I shared, i.e. checking that the
> > benchmarks makes sense and running it independently. I'm not aware of
> > any issues but it was done late at night and only ran on my laptop.
>
> Some quick calculations (don't have the scripting in a form I can
> attach yet; using this as an opportunity to hack on a genericized
> performance testing framework of sorts) suggest your results are
> correct. I was also testing on my laptop, but I showed 1.) roughly
> equivalent results for IN (VALUES ...) and IN (<list>) for integers,
> but when I switch to (short; average 3 characters long) text values I
> show the hash join on VALUES is twice as fast as the binary search.
>
> Given that, I'm planning to implement this as a hash lookup and share
> a revised patch.

I've attached a patch series as before, but with an additional patch
that switches to using dynahash instead of binary search.

Whereas before the benchmarking ended up with a trimodal distribution
(i.e., master with IN <list>, patch with IN <list>, and either with IN
VALUES), the hash implementation brings us back to an effectively
bimodal distribution -- though the hash scalar array op expression
implementation for text is about 5% faster than the hash join.

Current outstanding thoughts (besides comment/naming cleanup):

- The saop costing needs to be updated to match, as Tomas pointed out.

- Should we be concerned about single execution cases? For example, is
the regression of speed on a simple SELECT x IN something we should
try to defeat by only kicking in the optimization if we execute in a
loop at least twice? That might be of particular interest to pl/pgsql.

- Should we have a test for an operator with a non-strict function?
I'm not aware of any built-in ops that have that characteristic; would
you suggest just creating a fake one for the test?

James

út 28. 4. 2020 v 15:26 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:

On Tue, Apr 28, 2020 at 08:39:18AM -0400, James Coleman wrote:
>On Tue, Apr 28, 2020 at 8:25 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Mon, Apr 27, 2020 at 09:04:09PM -0400, James Coleman wrote:
>> >On Sun, Apr 26, 2020 at 7:41 PM James Coleman <jtc331@gmail.com> wrote:
>> >>
>> >> On Sun, Apr 26, 2020 at 4:49 PM Tomas Vondra
>> >> <tomas.vondra@2ndquadrant.com> wrote:
>> >> >
>> >> > On Sun, Apr 26, 2020 at 02:46:19PM -0400, James Coleman wrote:
>> >> > >On Sat, Apr 25, 2020 at 8:31 PM Tomas Vondra
>> >> > ><tomas.vondra@2ndquadrant.com> wrote:
>> >> > >>
>> >> > >> On Sat, Apr 25, 2020 at 06:47:41PM -0400, James Coleman wrote:
>> >> > >> >On Sat, Apr 25, 2020 at 5:41 PM David Rowley <dgrowleyml@gmail.com> wrote:
>> >> > >> >>
>> >> > >> >> On Sun, 26 Apr 2020 at 00:40, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> >> > >> >> > This reminds me our attempts to add bloom filters to hash joins, which I
>> >> > >> >> > think ran into mostly the same challenge of deciding when the bloom
>> >> > >> >> > filter can be useful and is worth the extra work.
>> >> > >> >>
>> >> > >> >> Speaking of that, it would be interesting to see how a test where you
>> >> > >> >> write the query as IN(VALUES(...)) instead of IN() compares. It would
>> >> > >> >> be interesting to know if the planner is able to make a more suitable
>> >> > >> >> choice and also to see how all the work over the years to improve Hash
>> >> > >> >> Joins compares to the bsearch with and without the bloom filter.
>> >> > >> >
>> >> > >> >It would be interesting.
>> >> > >> >
>> >> > >> >It also makes one wonder about optimizing these into to hash
>> >> > >> >joins...which I'd thought about over at [1]. I think it'd be a very
>> >> > >> >significant effort though.
>> >> > >> >
>> >> > >>
>> >> > >> I modified the script to also do the join version of the query. I can
>> >> > >> only run it on my laptop at the moment, so the results may be a bit
>> >> > >> different from those I shared before, but it's interesting I think.
>> >> > >>
>> >> > >> In most cases it's comparable to the binsearch/bloom approach, and in
>> >> > >> some cases it actually beats them quite significantly. It seems to
>> >> > >> depend on how expensive the comparison is - for "int" the comparison is
>> >> > >> very cheap and there's almost no difference. For "text" the comparison
>> >> > >> is much more expensive, and there are significant speedups.
>> >> > >>
>> >> > >> For example the test with 100k lookups in array of 10k elements and 10%
>> >> > >> match probability, the timings are these
>> >> > >>
>> >> > >> master: 62362 ms
>> >> > >> binsearch: 201 ms
>> >> > >> bloom: 65 ms
>> >> > >> hashjoin: 36 ms
>> >> > >>
>> >> > >> I do think the explanation is fairly simple - the bloom filter
>> >> > >> eliminates about 90% of the expensive comparisons, so it's 20ms plus
>> >> > >> some overhead to build and check the bits. The hash join probably
>> >> > >> eliminates a lot of the remaining comparisons, because the hash table
>> >> > >> is sized to have one tuple per bucket.
>> >> > >>
>> >> > >> Note: I also don't claim the PoC has the most efficient bloom filter
>> >> > >> implementation possible. I'm sure it could be made faster.
>> >> > >>
>> >> > >> Anyway, I'm not sure transforming this to a hash join is worth the
>> >> > >> effort - I agree that seems quite complex. But perhaps this suggest we
>> >> > >> should not be doing binary search and instead just build a simple hash
>> >> > >> table - that seems much simpler, and it'll probably give us about the
>> >> > >> same benefits.
>> >> > >
>> >> > >That's actually what I originally thought about doing, but I chose
>> >> > >binary search since it seemed a lot easier to get off the ground.
>> >> > >
>> >> >
>> >> > OK, that makes perfect sense.
>> >> >
>> >> > >If we instead build a hash is there anything else we need to be
>> >> > >concerned about? For example, work mem? I suppose for the binary
>> >> > >search we already have to expand the array, so perhaps it's not all
>> >> > >that meaningful relative to that...
>> >> > >
>> >> >
>> >> > I don't think we need to be particularly concerned about work_mem. We
>> >> > don't care about it now, and it's not clear to me what we could do about
>> >> > it - we already have the array in memory anyway, so it's a bit futile.
>> >> > Furthermore, if we need to care about it, it probably applies to the
>> >> > binary search too.
>> >> >
>> >> > >I was looking earlier at what our standard hash implementation was,
>> >> > >and it seemed less obvious what was needed to set that up (so binary
>> >> > >search seemed a faster proof of concept). If you happen to have any
>> >> > >pointers to similar usages I should look at, please let me know.
>> >> > >
>> >> >
>> >> > I think the hash join implementation is far too complicated. It has to
>> >> > care about work_mem, so it implements batching, etc. That's a lot of
>> >> > complexity we don't need here. IMO we could use either the usual
>> >> > dynahash, or maybe even the simpler simplehash.
>> >> >
>> >> > FWIW it'd be good to verify the numbers I shared, i.e. checking that the
>> >> > benchmarks makes sense and running it independently. I'm not aware of
>> >> > any issues but it was done late at night and only ran on my laptop.
>> >>
>> >> Some quick calculations (don't have the scripting in a form I can
>> >> attach yet; using this as an opportunity to hack on a genericized
>> >> performance testing framework of sorts) suggest your results are
>> >> correct. I was also testing on my laptop, but I showed 1.) roughly
>> >> equivalent results for IN (VALUES ...) and IN (<list>) for integers,
>> >> but when I switch to (short; average 3 characters long) text values I
>> >> show the hash join on VALUES is twice as fast as the binary search.
>> >>
>> >> Given that, I'm planning to implement this as a hash lookup and share
>> >> a revised patch.
>> >
>> >I've attached a patch series as before, but with an additional patch
>> >that switches to using dynahash instead of binary search.
>> >
>>
>> OK. I can't take closer look at the moment, I'll check later.
>>
>> Any particular reasons to pick dynahash over simplehash? ISTM we're
>> using simplehash elsewhere in the executor (grouping, tidbitmap, ...),
>> while there are not many places using dynahash for simple short-lived
>> hash tables. Of course, that alone is a weak reason to insist on using
>> simplehash here, but I suppose there were reasons for not using dynahash
>> and we'll end up facing the same issues here.
>
>No particular reason; it wasn't clear to me that there was a reason to
>prefer one or the other (and I'm not acquainted with the codebase
>enough to know the differences), so I chose dynahash because it was
>easier to find examples to follow for implementation.
>

OK, understood.

>> >Whereas before the benchmarking ended up with a trimodal distribution
>> >(i.e., master with IN <list>, patch with IN <list>, and either with IN
>> >VALUES), the hash implementation brings us back to an effectively
>> >bimodal distribution -- though the hash scalar array op expression
>> >implementation for text is about 5% faster than the hash join.
>> >
>>
>> Nice. I'm not surprised this is a bit faster than hash join, which has
>> to worry about additional stuff.
>>
>> >Current outstanding thoughts (besides comment/naming cleanup):
>> >
>> >- The saop costing needs to be updated to match, as Tomas pointed out.
>> >
>> >- Should we be concerned about single execution cases? For example, is
>> >the regression of speed on a simple SELECT x IN something we should
>> >try to defeat by only kicking in the optimization if we execute in a
>> >loop at least twice? That might be of particular interest to pl/pgsql.
>> >
>>
>> I don't follow. How is this related to the number of executions and/or
>> plpgsql? I suspect you might be talking about prepared statements, but
>> surely the hash table is built for each execution anyway, even if the
>> plan is reused, right?
>>
>> I think the issue we've been talking about is considering the number of
>> lookups we expect to do in the array/hash table. But that has nothing to
>> do with plpgsql and/or multiple executions ...
>
>Suppose I do "SELECT 1 IN <100 item list>" (whether just as a
>standalone query or in pl/pgsql). Then it doesn't make sense to use
>the optimization, because it can't possibly win over a naive linear
>scan through the array since to build the hash we have to do that
>linear scan anyway (I suppose in theory the hashing function could be
>dramatically faster than the equality op, so maybe it could win
>overall, but it seems unlikely to me).

True. I'm sure we could construct such cases, but this is hardly the
only place where that would be an issue. We could create some rough
costing model and make a decision based on that.

>I'm not so concerned about this in any query where we have a real FROM
>clause because even if we end up with only one row, the relative
>penalty is low, and the potential gain is very high. But simple
>expressions in pl/pgsql, for example, are a case where we can know for
>certain (correct me if I've wrong on this) that we'll only execute the
>expression once, which means there's probably always a penalty for
>choosing the implementation with setup costs over the default linear
>scan through the array.
>

What do you mean by "simple expressions"? I'm not plpgsql expert and I
see it mostly as a way to glue together SQL queries, but yeah - if we
know a given ScalarArrayOpExpr will only be executed once, then we can
disable this optimization for now.

a := a + 1

is translated to

SELECT $1 + 1 and save result to var a

The queries like this "SELECT $1 + 1" are simple expressions. They are evaluated just on executor level - it skip SPI

the simple expression has not FROM clause, and have to return just one row. I am not sure if it is required, it has to return just one column.

I am not sure if executor knows so expression is executed as simply expressions. But probably it can be deduced from context

Pavel

>> >- Should we have a test for an operator with a non-strict function?
>> >I'm not aware of any built-in ops that have that characteristic;
>> >would you suggest just creating a fake one for the test?
>> >
>>
>> Dunno, I haven't thought about this very much. In general I think the
>> new code should simply behave the same as the current code, i.e. if
>> it does not check for strictness, we don't need either.
>
>Both the original implementation and this optimized version have the
>following short circuit condition:
>
>if (fcinfo->args[0].isnull && strictfunc)
>
>But I'm not sure there are any existing tests to show that the "&&
>strictfunc" is required.
>

Ah! You mean a regression test, not a test in the "if condition" sense.
I don't see a reason not to have such test, although it's probably not
something we should require from this patch.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tomas Vondra

Date:

28 April 2020, 14:48:28

On Tue, Apr 28, 2020 at 03:43:43PM +0200, Pavel Stehule wrote:
>út 28. 4. 2020 v 15:26 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com>
>napsal:
>
>> ...
>>
>> >I'm not so concerned about this in any query where we have a real FROM
>> >clause because even if we end up with only one row, the relative
>> >penalty is low, and the potential gain is very high. But simple
>> >expressions in pl/pgsql, for example, are a case where we can know for
>> >certain (correct me if I've wrong on this) that we'll only execute the
>> >expression once, which means there's probably always a penalty for
>> >choosing the implementation with setup costs over the default linear
>> >scan through the array.
>> >
>>
>> What do you mean by "simple expressions"? I'm not plpgsql expert and I
>> see it mostly as a way to glue together SQL queries, but yeah - if we
>> know a given ScalarArrayOpExpr will only be executed once, then we can
>> disable this optimization for now.
>>
>
>a := a + 1
>
>is translated to
>
>SELECT $1 + 1 and save result to var a
>
>The queries like this "SELECT $1 + 1" are simple expressions. They are
>evaluated just on executor level - it skip SPI
>
>the simple expression has not FROM clause, and have to return just one row.
>I am not sure if it is required, it has to return just one column.
>
>I am not sure if executor knows so expression is executed as simply
>expressions. But probably it can be deduced from context
>

Not sure. The executor state is created by exec_eval_simple_expr, which
calls ExecInitExprWithParams (and it's the only caller). And that in
turn is the only place that leaves (state->parent == NULL). So maybe
that's a way to identify simple (standalone) expressions? Otherwise we
might add a new EEO_FLAG_* to identify these expressions explicitly.

I wonder if it would be possible to identify cases when the expression
is executed in a loop, e.g. like this:

     FOR i IN 1..1000 LOOP
         x := y IN (1, 2, ..., 999);
     END LOOP;

in which case we only build the ScalarArrayOpExpr once, so maybe we
could keep the hash table for all executions. But maybe that's not
possible or maybe it's pointless for other reasons. It sure looks a bit
like trying to build a query engine from FOR loop.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Pavel Stehule

Date:

28 April 2020, 15:17:33

út 28. 4. 2020 v 16:48 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:

On Tue, Apr 28, 2020 at 03:43:43PM +0200, Pavel Stehule wrote:
>út 28. 4. 2020 v 15:26 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com>
>napsal:
>
>> ...
>>
>> >I'm not so concerned about this in any query where we have a real FROM
>> >clause because even if we end up with only one row, the relative
>> >penalty is low, and the potential gain is very high. But simple
>> >expressions in pl/pgsql, for example, are a case where we can know for
>> >certain (correct me if I've wrong on this) that we'll only execute the
>> >expression once, which means there's probably always a penalty for
>> >choosing the implementation with setup costs over the default linear
>> >scan through the array.
>> >
>>
>> What do you mean by "simple expressions"? I'm not plpgsql expert and I
>> see it mostly as a way to glue together SQL queries, but yeah - if we
>> know a given ScalarArrayOpExpr will only be executed once, then we can
>> disable this optimization for now.
>>
>
>a := a + 1
>
>is translated to
>
>SELECT $1 + 1 and save result to var a
>
>The queries like this "SELECT $1 + 1" are simple expressions. They are
>evaluated just on executor level - it skip SPI
>
>the simple expression has not FROM clause, and have to return just one row.
>I am not sure if it is required, it has to return just one column.
>
>I am not sure if executor knows so expression is executed as simply
>expressions. But probably it can be deduced from context
>

Not sure. The executor state is created by exec_eval_simple_expr, which
calls ExecInitExprWithParams (and it's the only caller). And that in
turn is the only place that leaves (state->parent == NULL). So maybe
that's a way to identify simple (standalone) expressions? Otherwise we
might add a new EEO_FLAG_* to identify these expressions explicitly.

I wonder if it would be possible to identify cases when the expression
is executed in a loop, e.g. like this:

FOR i IN 1..1000 LOOP
x := y IN (1, 2, ..., 999);
END LOOP;

in which case we only build the ScalarArrayOpExpr once, so maybe we
could keep the hash table for all executions. But maybe that's not
possible or maybe it's pointless for other reasons. It sure looks a bit
like trying to build a query engine from FOR loop.

Theoretically it is possible, not now. But I don't think so it is necessary. I cannot to remember this pattern in any plpgsql code and I never seen any request on this feature.

I don't think so this is task for plpgsql engine. Maybe for JIT sometimes.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

28 April 2020, 16:16:47

On Tue, Apr 28, 2020 at 11:18 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
>
> út 28. 4. 2020 v 16:48 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
>>
>> On Tue, Apr 28, 2020 at 03:43:43PM +0200, Pavel Stehule wrote:
>> >út 28. 4. 2020 v 15:26 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> >napsal:
>> >
>> >> ...
>> >>
>> >> >I'm not so concerned about this in any query where we have a real FROM
>> >> >clause because even if we end up with only one row, the relative
>> >> >penalty is low, and the potential gain is very high. But simple
>> >> >expressions in pl/pgsql, for example, are a case where we can know for
>> >> >certain (correct me if I've wrong on this) that we'll only execute the
>> >> >expression once, which means there's probably always a penalty for
>> >> >choosing the implementation with setup costs over the default linear
>> >> >scan through the array.
>> >> >
>> >>
>> >> What do you mean by "simple expressions"? I'm not plpgsql expert and I
>> >> see it mostly as a way to glue together SQL queries, but yeah - if we
>> >> know a given ScalarArrayOpExpr will only be executed once, then we can
>> >> disable this optimization for now.
>> >>
>> >
>> >a := a + 1
>> >
>> >is translated to
>> >
>> >SELECT $1 + 1 and save result to var a
>> >
>> >The queries like this "SELECT $1 + 1" are simple expressions. They are
>> >evaluated just on executor level - it skip SPI
>> >
>> >the simple expression has not FROM clause, and have to return just one row.
>> >I am not sure if it is required, it has to return just one column.

Yes, this is what I meant by simple expressions.

>> >I am not sure if executor knows so expression is executed as simply
>> >expressions. But probably it can be deduced from context
>> >
>>
>> Not sure. The executor state is created by exec_eval_simple_expr, which
>> calls ExecInitExprWithParams (and it's the only caller). And that in
>> turn is the only place that leaves (state->parent == NULL). So maybe
>> that's a way to identify simple (standalone) expressions? Otherwise we
>> might add a new EEO_FLAG_* to identify these expressions explicitly.

I'll look into doing one of these.

>> I wonder if it would be possible to identify cases when the expression
>> is executed in a loop, e.g. like this:
>>
>>      FOR i IN 1..1000 LOOP
>>          x := y IN (1, 2, ..., 999);
>>      END LOOP;
>>
>> in which case we only build the ScalarArrayOpExpr once, so maybe we
>> could keep the hash table for all executions. But maybe that's not
>> possible or maybe it's pointless for other reasons. It sure looks a bit
>> like trying to build a query engine from FOR loop.
>
>
> Theoretically it is possible, not now. But I don't think so it is necessary. I cannot to remember this pattern in any
plpgsqlcode and I never seen any request on this feature. 
>
> I don't think so this is task for plpgsql engine. Maybe for JIT sometimes.

Agreed. I'd thought about this kind of scenario when I brought this
up, but I think solving it would the responsibility of the pg/pgsql
compiler rather than the expression evaluation code, because it'd have
to recognize the situation and setup a shared expression evaluation
context to be reused each time through the loop.

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Pavel Stehule

Date:

28 April 2020, 17:39:32

út 28. 4. 2020 v 18:17 odesílatel James Coleman <jtc331@gmail.com> napsal:

On Tue, Apr 28, 2020 at 11:18 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
>
> út 28. 4. 2020 v 16:48 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
>>
>> On Tue, Apr 28, 2020 at 03:43:43PM +0200, Pavel Stehule wrote:
>> >út 28. 4. 2020 v 15:26 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> >napsal:
>> >
>> >> ...
>> >>
>> >> >I'm not so concerned about this in any query where we have a real FROM
>> >> >clause because even if we end up with only one row, the relative
>> >> >penalty is low, and the potential gain is very high. But simple
>> >> >expressions in pl/pgsql, for example, are a case where we can know for
>> >> >certain (correct me if I've wrong on this) that we'll only execute the
>> >> >expression once, which means there's probably always a penalty for
>> >> >choosing the implementation with setup costs over the default linear
>> >> >scan through the array.
>> >> >
>> >>
>> >> What do you mean by "simple expressions"? I'm not plpgsql expert and I
>> >> see it mostly as a way to glue together SQL queries, but yeah - if we
>> >> know a given ScalarArrayOpExpr will only be executed once, then we can
>> >> disable this optimization for now.
>> >>
>> >
>> >a := a + 1
>> >
>> >is translated to
>> >
>> >SELECT $1 + 1 and save result to var a
>> >
>> >The queries like this "SELECT $1 + 1" are simple expressions. They are
>> >evaluated just on executor level - it skip SPI
>> >
>> >the simple expression has not FROM clause, and have to return just one row.
>> >I am not sure if it is required, it has to return just one column.

Yes, this is what I meant by simple expressions.

>> >I am not sure if executor knows so expression is executed as simply
>> >expressions. But probably it can be deduced from context
>> >
>>
>> Not sure. The executor state is created by exec_eval_simple_expr, which
>> calls ExecInitExprWithParams (and it's the only caller). And that in
>> turn is the only place that leaves (state->parent == NULL). So maybe
>> that's a way to identify simple (standalone) expressions? Otherwise we
>> might add a new EEO_FLAG_* to identify these expressions explicitly.

I'll look into doing one of these.

>> I wonder if it would be possible to identify cases when the expression
>> is executed in a loop, e.g. like this:
>>
>> FOR i IN 1..1000 LOOP
>> x := y IN (1, 2, ..., 999);
>> END LOOP;
>>
>> in which case we only build the ScalarArrayOpExpr once, so maybe we
>> could keep the hash table for all executions. But maybe that's not
>> possible or maybe it's pointless for other reasons. It sure looks a bit
>> like trying to build a query engine from FOR loop.
>
>
> Theoretically it is possible, not now. But I don't think so it is necessary. I cannot to remember this pattern in any plpgsql code and I never seen any request on this feature.
>
> I don't think so this is task for plpgsql engine. Maybe for JIT sometimes.

Agreed. I'd thought about this kind of scenario when I brought this
up, but I think solving it would the responsibility of the pg/pgsql
compiler rather than the expression evaluation code, because it'd have
to recognize the situation and setup a shared expression evaluation
context to be reused each time through the loop.

can be nice if new implementation was not slower then older in all environments and context (including plpgsql expressions)

Regards

Pavel

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

28 April 2020, 18:24:54

On Tue, Apr 28, 2020 at 1:40 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
>
> út 28. 4. 2020 v 18:17 odesílatel James Coleman <jtc331@gmail.com> napsal:
>>
>> On Tue, Apr 28, 2020 at 11:18 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> >
>> >
>> >
>> > út 28. 4. 2020 v 16:48 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
>> >>
>> >> On Tue, Apr 28, 2020 at 03:43:43PM +0200, Pavel Stehule wrote:
>> >> >út 28. 4. 2020 v 15:26 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> >> >napsal:
>> >> >
>> >> >> ...
>> >> >>
>> >> >> >I'm not so concerned about this in any query where we have a real FROM
>> >> >> >clause because even if we end up with only one row, the relative
>> >> >> >penalty is low, and the potential gain is very high. But simple
>> >> >> >expressions in pl/pgsql, for example, are a case where we can know for
>> >> >> >certain (correct me if I've wrong on this) that we'll only execute the
>> >> >> >expression once, which means there's probably always a penalty for
>> >> >> >choosing the implementation with setup costs over the default linear
>> >> >> >scan through the array.
>> >> >> >
>> >> >>
>> >> >> What do you mean by "simple expressions"? I'm not plpgsql expert and I
>> >> >> see it mostly as a way to glue together SQL queries, but yeah - if we
>> >> >> know a given ScalarArrayOpExpr will only be executed once, then we can
>> >> >> disable this optimization for now.
>> >> >>
>> >> >
>> >> >a := a + 1
>> >> >
>> >> >is translated to
>> >> >
>> >> >SELECT $1 + 1 and save result to var a
>> >> >
>> >> >The queries like this "SELECT $1 + 1" are simple expressions. They are
>> >> >evaluated just on executor level - it skip SPI
>> >> >
>> >> >the simple expression has not FROM clause, and have to return just one row.
>> >> >I am not sure if it is required, it has to return just one column.
>>
>> Yes, this is what I meant by simple expressions.
>>
>> >> >I am not sure if executor knows so expression is executed as simply
>> >> >expressions. But probably it can be deduced from context
>> >> >
>> >>
>> >> Not sure. The executor state is created by exec_eval_simple_expr, which
>> >> calls ExecInitExprWithParams (and it's the only caller). And that in
>> >> turn is the only place that leaves (state->parent == NULL). So maybe
>> >> that's a way to identify simple (standalone) expressions? Otherwise we
>> >> might add a new EEO_FLAG_* to identify these expressions explicitly.
>>
>> I'll look into doing one of these.
>>
>> >> I wonder if it would be possible to identify cases when the expression
>> >> is executed in a loop, e.g. like this:
>> >>
>> >>      FOR i IN 1..1000 LOOP
>> >>          x := y IN (1, 2, ..., 999);
>> >>      END LOOP;
>> >>
>> >> in which case we only build the ScalarArrayOpExpr once, so maybe we
>> >> could keep the hash table for all executions. But maybe that's not
>> >> possible or maybe it's pointless for other reasons. It sure looks a bit
>> >> like trying to build a query engine from FOR loop.
>> >
>> >
>> > Theoretically it is possible, not now. But I don't think so it is necessary. I cannot to remember this pattern in
anyplpgsql code and I never seen any request on this feature. 
>> >
>> > I don't think so this is task for plpgsql engine. Maybe for JIT sometimes.
>>
>> Agreed. I'd thought about this kind of scenario when I brought this
>> up, but I think solving it would the responsibility of the pg/pgsql
>> compiler rather than the expression evaluation code, because it'd have
>> to recognize the situation and setup a shared expression evaluation
>> context to be reused each time through the loop.
>
>
> can be nice if new implementation was not slower then older in all environments and context (including plpgsql
expressions)

Agreed, which is why I'm going to look into preventing using the new
code path for simple expressions.

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

28 April 2020, 22:22:20

I cc'd Andres given his commit introduced simplehash, so I figured
he'd probably have a few pointers on when each one might be useful.

On Tue, Apr 28, 2020 at 8:39 AM James Coleman <jtc331@gmail.com> wrote:
...
> > Any particular reasons to pick dynahash over simplehash? ISTM we're
> > using simplehash elsewhere in the executor (grouping, tidbitmap, ...),
> > while there are not many places using dynahash for simple short-lived
> > hash tables. Of course, that alone is a weak reason to insist on using
> > simplehash here, but I suppose there were reasons for not using dynahash
> > and we'll end up facing the same issues here.
>
> No particular reason; it wasn't clear to me that there was a reason to
> prefer one or the other (and I'm not acquainted with the codebase
> enough to know the differences), so I chose dynahash because it was
> easier to find examples to follow for implementation.

Do you have any thoughts on what the trade-offs/use-cases etc. are for
dynahash versus simple hash?

From reading the commit message in b30d3ea824c it seems like simple
hash is faster and optimized for CPU cache benefits. The comments at
the top of simplehash.h also discourage it's use in non
performance/space sensitive uses, but there isn't anything I can see
that explicitly tries to discuss when dynahash is useful, etc.

Given the performance notes in that commit message, I thinking
switching to simple hash is worth it.

But I also wonder if there might be some value in a README or comments
addition that would be a guide to what the various hash
implementations are useful for. If there's interest, I could try to
type something short up so that we have something to make the code
base a bit more discoverable.

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tomas Vondra

Date:

28 April 2020, 22:52:33

On Tue, Apr 28, 2020 at 06:22:20PM -0400, James Coleman wrote:
>I cc'd Andres given his commit introduced simplehash, so I figured
>he'd probably have a few pointers on when each one might be useful.
>
>On Tue, Apr 28, 2020 at 8:39 AM James Coleman <jtc331@gmail.com> wrote:
>...
>> > Any particular reasons to pick dynahash over simplehash? ISTM we're
>> > using simplehash elsewhere in the executor (grouping, tidbitmap, ...),
>> > while there are not many places using dynahash for simple short-lived
>> > hash tables. Of course, that alone is a weak reason to insist on using
>> > simplehash here, but I suppose there were reasons for not using dynahash
>> > and we'll end up facing the same issues here.
>>
>> No particular reason; it wasn't clear to me that there was a reason to
>> prefer one or the other (and I'm not acquainted with the codebase
>> enough to know the differences), so I chose dynahash because it was
>> easier to find examples to follow for implementation.
>
>Do you have any thoughts on what the trade-offs/use-cases etc. are for
>dynahash versus simple hash?
>
>From reading the commit message in b30d3ea824c it seems like simple
>hash is faster and optimized for CPU cache benefits. The comments at
>the top of simplehash.h also discourage it's use in non
>performance/space sensitive uses, but there isn't anything I can see
>that explicitly tries to discuss when dynahash is useful, etc.
>
>Given the performance notes in that commit message, I thinking
>switching to simple hash is worth it.
>

I recall doing some benchmarks for that patch, but it's so long I don't
really remember the details. But in general, I agree simplehash is a bit
more efficient in terms of CPU / caching.

I think the changes required to switch from dynahash to simplehash are
fairly small, so I think the best thing we can do is just try do some
measurement and then decide.

>But I also wonder if there might be some value in a README or comments
>addition that would be a guide to what the various hash
>implementations are useful for. If there's interest, I could try to
>type something short up so that we have something to make the code
>base a bit more discoverable.
>

I wouldn't object to that. Although maybe we should simply add some
basic recommendations to the comments in dynahash/simplehash.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Andres Freund

Date:

28 April 2020, 23:05:31

Hi,

On 2020-04-28 18:22:20 -0400, James Coleman wrote:
> I cc'd Andres given his commit introduced simplehash, so I figured
> he'd probably have a few pointers on when each one might be useful.
> [...]
> Do you have any thoughts on what the trade-offs/use-cases etc. are for
> dynahash versus simple hash?
> 
> From reading the commit message in b30d3ea824c it seems like simple
> hash is faster and optimized for CPU cache benefits. The comments at
> the top of simplehash.h also discourage it's use in non
> performance/space sensitive uses, but there isn't anything I can see
> that explicitly tries to discuss when dynahash is useful, etc.

Benefits of dynahash (chained hashtable):
- supports partitioning, useful for shared memory accessed under locks
- better performance for large entries, as they don't need to be moved
  around in case of hash conflicts
- stable pointers to hash table entries

Benefits of simplehash (open addressing hash table):
- no indirect function calls, known structure sizes, due to "templated"
  code generation (these show up substantially in profiles for dynahash)
- considerably faster for small entries due to previous point, and due
  open addressing hash tables having better cache behaviour than chained
  hashtables
- once set-up the interface is type safe and easier to use
- no overhead of a separate memory context etc


> Given the performance notes in that commit message, I thinking
> switching to simple hash is worth it.

Seems plausible to me.


> But I also wonder if there might be some value in a README or comments
> addition that would be a guide to what the various hash
> implementations are useful for. If there's interest, I could try to
> type something short up so that we have something to make the code
> base a bit more discoverable.

That'd make sense to me.

Greetings,

Andres Freund

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

29 April 2020, 14:26:12

On Tue, Apr 28, 2020 at 7:05 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-04-28 18:22:20 -0400, James Coleman wrote:
> > I cc'd Andres given his commit introduced simplehash, so I figured
> > he'd probably have a few pointers on when each one might be useful.
> > [...]
> > Do you have any thoughts on what the trade-offs/use-cases etc. are for
> > dynahash versus simple hash?
> >
> > From reading the commit message in b30d3ea824c it seems like simple
> > hash is faster and optimized for CPU cache benefits. The comments at
> > the top of simplehash.h also discourage it's use in non
> > performance/space sensitive uses, but there isn't anything I can see
> > that explicitly tries to discuss when dynahash is useful, etc.
>
> Benefits of dynahash (chained hashtable):
> - supports partitioning, useful for shared memory accessed under locks
> - better performance for large entries, as they don't need to be moved
>   around in case of hash conflicts
> - stable pointers to hash table entries
>
> Benefits of simplehash (open addressing hash table):
> - no indirect function calls, known structure sizes, due to "templated"
>   code generation (these show up substantially in profiles for dynahash)
> - considerably faster for small entries due to previous point, and due
>   open addressing hash tables having better cache behaviour than chained
>   hashtables
> - once set-up the interface is type safe and easier to use
> - no overhead of a separate memory context etc
>
>
> > Given the performance notes in that commit message, I thinking
> > switching to simple hash is worth it.
>
> Seems plausible to me.
>
>
> > But I also wonder if there might be some value in a README or comments
> > addition that would be a guide to what the various hash
> > implementations are useful for. If there's interest, I could try to
> > type something short up so that we have something to make the code
> > base a bit more discoverable.
>
> That'd make sense to me.

Cool, I'll work on that as I have time then.

One question: what is the reasoning behind having SH_STORE_HASH? The
only things I could imagine would be a case where you have external
pointers to some set of values or need to be able to use the hash for
other reasons besides the hash table (and so can avoid calculating it
twice), but maybe I'm missing something.

Thanks,
James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tomas Vondra

Date:

29 April 2020, 15:17:35

On Wed, Apr 29, 2020 at 10:26:12AM -0400, James Coleman wrote:
>On Tue, Apr 28, 2020 at 7:05 PM Andres Freund <andres@anarazel.de> wrote:
>>
>> Hi,
>>
>> On 2020-04-28 18:22:20 -0400, James Coleman wrote:
>> > I cc'd Andres given his commit introduced simplehash, so I figured
>> > he'd probably have a few pointers on when each one might be useful.
>> > [...]
>> > Do you have any thoughts on what the trade-offs/use-cases etc. are for
>> > dynahash versus simple hash?
>> >
>> > From reading the commit message in b30d3ea824c it seems like simple
>> > hash is faster and optimized for CPU cache benefits. The comments at
>> > the top of simplehash.h also discourage it's use in non
>> > performance/space sensitive uses, but there isn't anything I can see
>> > that explicitly tries to discuss when dynahash is useful, etc.
>>
>> Benefits of dynahash (chained hashtable):
>> - supports partitioning, useful for shared memory accessed under locks
>> - better performance for large entries, as they don't need to be moved
>>   around in case of hash conflicts
>> - stable pointers to hash table entries
>>
>> Benefits of simplehash (open addressing hash table):
>> - no indirect function calls, known structure sizes, due to "templated"
>>   code generation (these show up substantially in profiles for dynahash)
>> - considerably faster for small entries due to previous point, and due
>>   open addressing hash tables having better cache behaviour than chained
>>   hashtables
>> - once set-up the interface is type safe and easier to use
>> - no overhead of a separate memory context etc
>>
>>
>> > Given the performance notes in that commit message, I thinking
>> > switching to simple hash is worth it.
>>
>> Seems plausible to me.
>>
>>
>> > But I also wonder if there might be some value in a README or comments
>> > addition that would be a guide to what the various hash
>> > implementations are useful for. If there's interest, I could try to
>> > type something short up so that we have something to make the code
>> > base a bit more discoverable.
>>
>> That'd make sense to me.
>
>Cool, I'll work on that as I have time then.
>
>One question: what is the reasoning behind having SH_STORE_HASH? The
>only things I could imagine would be a case where you have external
>pointers to some set of values or need to be able to use the hash for
>other reasons besides the hash table (and so can avoid calculating it
>twice), but maybe I'm missing something.
>

I believe it's because computing the hash may be fairly expensive for
some data types, in which case it may be better to just store it for
future use.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

29 April 2020, 15:34:24

On Wed, Apr 29, 2020 at 11:17 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Wed, Apr 29, 2020 at 10:26:12AM -0400, James Coleman wrote:
> >On Tue, Apr 28, 2020 at 7:05 PM Andres Freund <andres@anarazel.de> wrote:
> >>
> >> Hi,
> >>
> >> On 2020-04-28 18:22:20 -0400, James Coleman wrote:
> >> > I cc'd Andres given his commit introduced simplehash, so I figured
> >> > he'd probably have a few pointers on when each one might be useful.
> >> > [...]
> >> > Do you have any thoughts on what the trade-offs/use-cases etc. are for
> >> > dynahash versus simple hash?
> >> >
> >> > From reading the commit message in b30d3ea824c it seems like simple
> >> > hash is faster and optimized for CPU cache benefits. The comments at
> >> > the top of simplehash.h also discourage it's use in non
> >> > performance/space sensitive uses, but there isn't anything I can see
> >> > that explicitly tries to discuss when dynahash is useful, etc.
> >>
> >> Benefits of dynahash (chained hashtable):
> >> - supports partitioning, useful for shared memory accessed under locks
> >> - better performance for large entries, as they don't need to be moved
> >>   around in case of hash conflicts
> >> - stable pointers to hash table entries
> >>
> >> Benefits of simplehash (open addressing hash table):
> >> - no indirect function calls, known structure sizes, due to "templated"
> >>   code generation (these show up substantially in profiles for dynahash)
> >> - considerably faster for small entries due to previous point, and due
> >>   open addressing hash tables having better cache behaviour than chained
> >>   hashtables
> >> - once set-up the interface is type safe and easier to use
> >> - no overhead of a separate memory context etc
> >>
> >>
> >> > Given the performance notes in that commit message, I thinking
> >> > switching to simple hash is worth it.
> >>
> >> Seems plausible to me.
> >>
> >>
> >> > But I also wonder if there might be some value in a README or comments
> >> > addition that would be a guide to what the various hash
> >> > implementations are useful for. If there's interest, I could try to
> >> > type something short up so that we have something to make the code
> >> > base a bit more discoverable.
> >>
> >> That'd make sense to me.
> >
> >Cool, I'll work on that as I have time then.
> >
> >One question: what is the reasoning behind having SH_STORE_HASH? The
> >only things I could imagine would be a case where you have external
> >pointers to some set of values or need to be able to use the hash for
> >other reasons besides the hash table (and so can avoid calculating it
> >twice), but maybe I'm missing something.
> >
>
> I believe it's because computing the hash may be fairly expensive for
> some data types, in which case it may be better to just store it for
> future use.

But is it storing it for use primarily by the hash table
implementation (i.e., does it need the hash stored this way to avoid
repeated recalculation) or for caller's use?

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tomas Vondra

Date:

29 April 2020, 15:50:19

On Wed, Apr 29, 2020 at 11:34:24AM -0400, James Coleman wrote:
>On Wed, Apr 29, 2020 at 11:17 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Wed, Apr 29, 2020 at 10:26:12AM -0400, James Coleman wrote:
>> >On Tue, Apr 28, 2020 at 7:05 PM Andres Freund <andres@anarazel.de> wrote:
>> >>
>> >> Hi,
>> >>
>> >> On 2020-04-28 18:22:20 -0400, James Coleman wrote:
>> >> > I cc'd Andres given his commit introduced simplehash, so I figured
>> >> > he'd probably have a few pointers on when each one might be useful.
>> >> > [...]
>> >> > Do you have any thoughts on what the trade-offs/use-cases etc. are for
>> >> > dynahash versus simple hash?
>> >> >
>> >> > From reading the commit message in b30d3ea824c it seems like simple
>> >> > hash is faster and optimized for CPU cache benefits. The comments at
>> >> > the top of simplehash.h also discourage it's use in non
>> >> > performance/space sensitive uses, but there isn't anything I can see
>> >> > that explicitly tries to discuss when dynahash is useful, etc.
>> >>
>> >> Benefits of dynahash (chained hashtable):
>> >> - supports partitioning, useful for shared memory accessed under locks
>> >> - better performance for large entries, as they don't need to be moved
>> >>   around in case of hash conflicts
>> >> - stable pointers to hash table entries
>> >>
>> >> Benefits of simplehash (open addressing hash table):
>> >> - no indirect function calls, known structure sizes, due to "templated"
>> >>   code generation (these show up substantially in profiles for dynahash)
>> >> - considerably faster for small entries due to previous point, and due
>> >>   open addressing hash tables having better cache behaviour than chained
>> >>   hashtables
>> >> - once set-up the interface is type safe and easier to use
>> >> - no overhead of a separate memory context etc
>> >>
>> >>
>> >> > Given the performance notes in that commit message, I thinking
>> >> > switching to simple hash is worth it.
>> >>
>> >> Seems plausible to me.
>> >>
>> >>
>> >> > But I also wonder if there might be some value in a README or comments
>> >> > addition that would be a guide to what the various hash
>> >> > implementations are useful for. If there's interest, I could try to
>> >> > type something short up so that we have something to make the code
>> >> > base a bit more discoverable.
>> >>
>> >> That'd make sense to me.
>> >
>> >Cool, I'll work on that as I have time then.
>> >
>> >One question: what is the reasoning behind having SH_STORE_HASH? The
>> >only things I could imagine would be a case where you have external
>> >pointers to some set of values or need to be able to use the hash for
>> >other reasons besides the hash table (and so can avoid calculating it
>> >twice), but maybe I'm missing something.
>> >
>>
>> I believe it's because computing the hash may be fairly expensive for
>> some data types, in which case it may be better to just store it for
>> future use.
>
>But is it storing it for use primarily by the hash table
>implementation (i.e., does it need the hash stored this way to avoid
>repeated recalculation) or for caller's use?
>

For the hash table implementation, I think. Simplehash is using "robin
hood" hashing, which needs to decide which entry to move in case of
collision. AFAIC that requires the knowledge of hash value for both the
new and existing entry, and having to compute that over and over would
be fairly expensive. But that's my understanding, I might be wrong and
it's useful for external use too.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

01 May 2020, 02:20:39

On Tue, Apr 28, 2020 at 8:25 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
...
> Any particular reasons to pick dynahash over simplehash? ISTM we're
> using simplehash elsewhere in the executor (grouping, tidbitmap, ...),
> while there are not many places using dynahash for simple short-lived
> hash tables. Of course, that alone is a weak reason to insist on using
> simplehash here, but I suppose there were reasons for not using dynahash
> and we'll end up facing the same issues here.

I've attached a patch series that includes switching to simplehash.
Obviously we'd really just collapse all of these patches, but it's
perhaps interesting to see the changes required to use each style
(binary search, dynahash, simplehash).

As before, there are clearly comments and naming things to be
addressed, but the implementation should be reasonably clean.

James

On Fri, Sep 11, 2020 at 5:11 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Tue, Sep 8, 2020 at 4:37 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >
> > On 08/09/2020 22:25, James Coleman wrote:
> > > On Wed, Aug 19, 2020 at 3:16 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > >>
> > >> You could also apply the optimization for non-Const expressions, as long
> > >> as they're stable (i.e. !contain_volatile_functions(expr)).
> > >
> > > Is that true? Don't we also have to worry about whether or not the
> > > value is stable (i.e., know when a param has changed)? There have been
> > > discussions about being able to cache stable subexpressions, and my
> > > understanding was that we'd need to have that infrastructure (along
> > > with the necessarily invalidation when the param changes) to be able
> > > to use this for non-Const expressions.
> >
> > Yeah, you're right, you'd have to also check for PARAM_EXEC Params. And
> > Vars. I think the conditions are the same as those checked in
> > match_clause_to_partition_key() in partprune.c (it's a long function,
> > search for "if (!IsA(rightop, Const))"). Not sure it's worth the
> > trouble, then. But it would be nice to handle queries like "WHERE column
> > = ANY ($1)"
>
> If I'm understanding properly you're imagining something in the form of:
>
> with x as (select '{1,2,3,4,5,6,7,8,9,10}'::int[])
> select * from t where i = any ((select * from x)::int[]);
>
> I've been playing around with this and currently have these checks:
>
> contain_var_clause((Node *) arrayarg)
> contain_volatile_functions((Node *) arrayarg)
> contain_exec_param((Node *) arrayarg, NIL)
>
> (the last one I had to modify the function to handle empty lists)
>
> If any are true, then have to disable the optimization. But for
> queries in the form above the test contain_exec_param((Node *)
> arrayarg, NIL) evaluates to true, even though we know from looking at
> the query that the array subexpression is stable for the length of the
> query.
>
> Am I misunderstanding what you're going for? Or is there a way to
> confirm the expr, although an exec param, won't change?
>
> Another interesting thing that this would imply is that we'd either have to:
>
> 1. Remove the array length check altogether,
> 2. Always use the hash when have a non-Const, but when a Const only if
> the array length check passes, or
> 3. Make this new expr op more fully featured by teaching it how to use
> either a straight loop through a deconstructed array or use the hash.
>
> That last option feeds into further discussion in the below:
>
> > >> Deconstructing the array Datum into a simple C array on first call would
> > >> be a win even for very small arrays and for AND semantics, even if you
> > >> don't use a hash table.
> > >
> > > Because you wouldn't have to repeatedly detoast it? Or some other
> > > reason I'm not thinking of? My intuition would have been that (aside
> > > from detoasting if necessary) there wouldn't be much real overhead in,
> > > for example, an array storing integers.
> >
> > Dealing with NULLs and different element sizes in the array is pretty
> > complicated. Looping through the array currently looks like this:
> >
> >         /* Loop over the array elements */
> >         s = (char *) ARR_DATA_PTR(arr);
> >         bitmap = ARR_NULLBITMAP(arr);
> >         bitmask = 1;
> >
> >         for (int i = 0; i < nitems; i++)
> >         {
> >                 Datum           elt;
> >                 Datum           thisresult;
> >
> >                 /* Get array element, checking for NULL */
> >                 if (bitmap && (*bitmap & bitmask) == 0)
> >                 {
> >                         fcinfo->args[1].value = (Datum) 0;
> >                         fcinfo->args[1].isnull = true;
> >                 }
> >                 else
> >                 {
> >                         elt = fetch_att(s, typbyval, typlen);
> >                         s = att_addlength_pointer(s, typlen, s);
> >                         s = (char *) att_align_nominal(s, typalign);
> >                         fcinfo->args[1].value = elt;
> >                         fcinfo->args[1].isnull = false;
> >                 }
> >
> >                 [do stuff with Datum/isnull]
> >
> >                 /* advance bitmap pointer if any */
> >                 if (bitmap)
> >                 {
> >                         bitmask <<= 1;
> >                         if (bitmask == 0x100)
> >                         {
> >                                 bitmap++;
> >                                 bitmask = 1;
> >                         }
> >                 }
> >         }
> >
> > Compared with just:
> >
> >         for (int i = 0; i < nitems; i++)
> >         {
> >                 Datum           elt = datums[i];
> >
> >                 [do stuff with the Datum]
> >         }
> >
> > I'm not sure how much difference that makes, but I presume it's not
> > zero, and it seems like an easy win when you have the code to deal with
> > the Datum array representation anyway.
>
> Doing this would necessitate option 3 above: we'd have to have this
> new expr op be able both to use a hash or alternatively do a normal
> loop.
>
> Being able to use this in more cases than just a Const array expr is
> certainly interesting, but I'm not sure yet about the feasibility or
> desirability of that at this point given the above restrictions.
>
> One other point in favor of the additional complexity here is that
> it's likely that the above described runtime switching between hash
> and loop would be necessary (for this optimization to come into play)
> if caching of stable subexpressions ever lands. I have some interest
> in working on that...but it's also a large project.

I've attached a cleaned up patch. Last CF it was listed in is
https://commitfest.postgresql.org/29/2542/ -- what's the appropriate
step to take here given it's an already existing patch, but not yet
moved into recent CFs?

Thanks,
James

Attachment

v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

06 April 2021, 03:58:08

On Sat, 20 Mar 2021 at 09:41, James Coleman <jtc331@gmail.com> wrote:
> I've attached a cleaned up patch. Last CF it was listed in is
> https://commitfest.postgresql.org/29/2542/ -- what's the appropriate
> step to take here given it's an already existing patch, but not yet
> moved into recent CFs?

I had a look at this patch.  I like the idea of using a simplehash.h
hash table to hash the constant values so that repeat lookups can be
performed much more quickly, however, I'm a bit concerned that there
are quite a few places in the code where we often just execute a
ScalarArrayOpExpr once and I'm a bit worried that we'll slow down
expression evaluation of those cases.

The two cases that I have in mind are:

1. eval_const_expressions() where we use the executor to evaluate the
ScalarArrayOpExpr to see if the result is Const.
2. CHECK constraints with IN clauses and single-row INSERTs.

I tried to benchmark both of these but I'm struggling to get stable
enough performance for #2, even with fsync=off.  Sometimes I'm getting
results 2.5x slower than other runs.

For benchmarking #1 I'm also not too sure I'm getting stable enough
results for them to mean anything.

I was running:

create table a (a int);

bench.sql: explain select * from a where a
in(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16);

drowley@amd3990x:~$ pgbench -n -T 60 -j 64 -c 64 -f bench.sql -P 10 postgres
Master (6d41dd045):
tps = 992586.991045 (without initial connection time)
tps = 987964.990483 (without initial connection time)
tps = 994309.670918 (without initial connection time)

Master + v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch
tps = 956022.557626 (without initial connection time)
tps = 963043.352823 (without initial connection time)
tps = 968582.600100 (without initial connection time)

This puts the patched version about 3% slower. I'm not sure how much
of that is changes in the binary and noise and how much is the
needless hashtable build done for eval_const_expressions().

I wondered if we should make it the query planner's job of deciding if
the ScalarArrayOpExpr should be hashed or not.  I ended up with the
attached rough-cut patch that introduces HashedScalarArrayOpExpr and
has the query planner decide if it's going to replace
ScalarArrayOpExpr with these HashedScalarArrayOpExpr during
preprocess_expression().  I do think that we might want to consider
being a bit selective about when we do these replacements.  It seems
likely that we'd want to do this for EXPRKIND_QUAL and maybe
EXPRKIND_TARGET, but I imagine that converting ScalarArrayOpExpr to
HashedScalarArrayOpExpr for EXPRKIND_VALUES would be a waste of time
since those will just be executed once.

I tried the same above test with the
v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch plus the
attached rough-cut patch and got:

master + v4-0001-Hash-lookup-const-arrays-in-OR-d-ScalarArrayOps.patch
+ v5-0002-Rough-cut-patch-for-HashedScalarArrayOpExpr.patch
tps = 1167969.983173 (without initial connection time)
tps = 1199636.793314 (without initial connection time)
tps = 1190690.939963 (without initial connection time)

I can't really explain why this became faster. I was expecting it just
to reduce that slowdown of the v4 patch a little. I don't really see
any reason why it would become faster.  It's almost 20% faster which
seems like too much to just be fluctuations in code alignment in the
binary.

The attached patch is still missing the required changes to
llvmjit_expr.c. I think that was also missing from the original patch
too, however.

Also, I added HashedScalarArrayOpExpr to plannodes.h.  All other Expr
type nodes are in primnodes.h. However, I put HashedScalarArrayOpExpr
in plannodes.h because the parser does not generate this and it's not
going to be stored in the catalogue files anywhere. I'm not so sure
inventing a new Expr type node that only can be generated by the
planner is a good thing to do.

Anyway, wondering what you think of the idea of allowing the planner
to choose if it's going to hash or not?

It might also be good if someone else can check if they can get a bit
more stable performance results from benchmarking the patches.

(Also attached your v4 patch again just so anyone following along at
home does not need to hunt around for the correct set of patches to
apply to test this)

David

On Thu, 8 Apr 2021 at 05:54, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> I only ran that with a single client, the machine only has 4 cores and
> this should not be related to concurrency, so 1 client seems fine. The
> average of 10 runs, 15 seconds each look like this:

Thanks for running these tests.   The reason I added so much
concurrency was that this AMD machine has some weird behaviour in
regards to power management.   Every bios update I seem to get changes
the power management but it still is very unstable despite me having
it on the most optimal settings in the bios.  Working it a bit harder
seems to help make it realise that there might be some urgency.

>             simple   prepared      10/s    10/p    100/s    100/p
>     -------------------------------------------------------------
>     master   21847      59476     23343   59380    11757    56488
>     v4       21546      57757     22864   57704    11572    57350
>     v4+v5    23374      56089     24410   56140    14765    55302
>
> The first two columns are your bench.sql, with -M simple or prepared.
> The other columns are 10 or 100 values, /s is simple, /p is prepared.
>
> Compared to master:
>
>             simple   prepared      10/s    10/p    100/s    100/p
>     -------------------------------------------------------------
>     v4      98.62%     97.11%    97.95%  97.18%   98.43%  101.52%
>     v4+v5  106.99%     94.31%   104.57%  94.54%  125.59%   97.90%
>
> That seems to mostly match your observation - there's a small
> performance hit (~2%), although that might be due to changes in the
> layout of the binary. And v4+v5 improves that a bit (even compared to
> master), although I don't see the same 20% speedup.

I've spent more time hacking at this patch.  I had a bit of a change
of heart earlier about having this new HashedScalarArrayOpExpr node
type.  There were more places that I imagined that I needed to add
handling for it.  For example, partprune.c needed to know about it to
allow partition pruning on them. While supporting that is just a few
lines to make a recursive call passing in the underlying
ScalarArrayOpExpr, I just didn't like the idea.

Instead, I think it'll be better just to add a new field to
ScalarArrayOpExpr and have the planner set that to tell the executor
that it should use a hash table to perform the lookups rather than a
linear search. This can just be the hash function oid, which also
saves the executor from having to look that up.

After quite a bit of hacking, I've ended up with the attached.  I
added the required JIT code to teach the jit code about
EEOP_HASHED_SCALARARRAYOP.

I also wrote the missing regression test for non-strict equality ops
and moved the declaration for the simplehash.h code into
execExprInterp.c and forward declared ScalarArrayOpExprHashTable in
exprExpr.h. I also rewrote a large number of comments and fixed a few
things like missing permission checks for the hash function.

I've not done any further performance tests yet but will start those now.

David

Attachment

saop_hash_v6.patch

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

08 April 2021, 10:54:47

On Thu, 8 Apr 2021 at 18:50, David Rowley <dgrowleyml@gmail.com> wrote:
> I've not done any further performance tests yet but will start those now.

I ran a set of tests on this:

select * from a where a in( < 1 to 10 > );
and
select * from a where a in( < 1 to 100 > );

the table "a" is just an empty table with a single int column.

I ran "pgbench -T 15 -c 16 -t 16" ten times each and the resulting tps
is averaged over the 10 runs.

With 10 items in the IN clause:
master: 99887.9098314 tps
patched: 103235.7616416 tps (3.35% faster)

With 100 items:
master: 62442.4838792 tps
patched:62275.4955754 tps (0.27% slower)

These tests are just designed to test the overhead of the additional
planning and expression initialisation.  Testing the actual
performance of the patch vs master with large IN lists shows the
expected significant speedups.

These results show that there's not much in the way of a measurable
slowdown in planning or executor startup from the additional code
which decides if we should hash the ScalarArrayOpExpr.

I think the changes in the patch are fairly isolated and the test
coverage is now pretty good.  I'm planning on looking at the patch
again now and will consider pushing it for PG14.

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

08 April 2021, 12:00:59

On Thu, 8 Apr 2021 at 22:54, David Rowley <dgrowleyml@gmail.com> wrote:
>
> I think the changes in the patch are fairly isolated and the test
> coverage is now pretty good.  I'm planning on looking at the patch
> again now and will consider pushing it for PG14.

I push this with some minor cleanup from the v6 patch I posted earlier.

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

08 April 2021, 16:57:28

On Thu, Apr 8, 2021 at 8:01 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 8 Apr 2021 at 22:54, David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > I think the changes in the patch are fairly isolated and the test
> > coverage is now pretty good.  I'm planning on looking at the patch
> > again now and will consider pushing it for PG14.
>
> I push this with some minor cleanup from the v6 patch I posted earlier.
>
> David

Thank you!

I assume proper procedure for the CF entry is to move it into the
current CF and then mark it as committed, however I don't know how (or
don't have permissions?) to move it into the current CF. How does one
go about doing that?

Here's the entry: https://commitfest.postgresql.org/29/2542/

Thanks,
James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Alvaro Herrera

Date:

08 April 2021, 17:03:58

On 2021-Apr-08, James Coleman wrote:

> I assume proper procedure for the CF entry is to move it into the
> current CF and then mark it as committed, however I don't know how (or
> don't have permissions?) to move it into the current CF. How does one
> go about doing that?
> 
> Here's the entry: https://commitfest.postgresql.org/29/2542/

Done, thanks.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

08 April 2021, 17:11:13

On Thu, Apr 8, 2021 at 1:04 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2021-Apr-08, James Coleman wrote:
>
> > I assume proper procedure for the CF entry is to move it into the
> > current CF and then mark it as committed, however I don't know how (or
> > don't have permissions?) to move it into the current CF. How does one
> > go about doing that?
> >
> > Here's the entry: https://commitfest.postgresql.org/29/2542/
>
> Done, thanks.
>
> --
> Álvaro Herrera                            39°49'30"S 73°17'W

Thanks. Is that something I should be able to do myself (should I be
asking someone for getting privileges in the app to do so)? I'm not
sure what the project policy is on that.

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Alvaro Herrera

Date:

08 April 2021, 17:16:21

On 2021-Apr-08, James Coleman wrote:

> On Thu, Apr 8, 2021 at 1:04 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> > On 2021-Apr-08, James Coleman wrote:
> >
> > > I assume proper procedure for the CF entry is to move it into the
> > > current CF and then mark it as committed, however I don't know how (or
> > > don't have permissions?) to move it into the current CF. How does one
> > > go about doing that?
> > >
> > > Here's the entry: https://commitfest.postgresql.org/29/2542/
> >
> > Done, thanks.

> Thanks. Is that something I should be able to do myself

No, sorry.

-- 
Álvaro Herrera       Valdivia, Chile
"La vida es para el que se aventura"

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tomas Vondra

Date:

08 April 2021, 21:32:13

On 4/8/21 2:00 PM, David Rowley wrote:
> On Thu, 8 Apr 2021 at 22:54, David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> I think the changes in the patch are fairly isolated and the test
>> coverage is now pretty good.  I'm planning on looking at the patch
>> again now and will consider pushing it for PG14.
> 
> I push this with some minor cleanup from the v6 patch I posted earlier.
> 

I ran the same set of benchmarks on the v6, which I think should be
mostly identical to what was committed. I extended the test a bit to
test table with 0, 1, 5 and 1000 rows, and also with int and text
values, to see how it works with more expensive comparators.

I built binaries with gcc 9.2.0 and clang 11.0.0, the full results are
attached. There's a bit of difference between gcc and clang, but the
general behavior is about the same, so I'll only present gcc results to
keep this simple. I'll only throughput comparison to master, so >1.0
means good, <1.0 means bad. If you're interested in actual tps, see the
full results.

For the v5 patch (actually v4-0001 + v5-0002) and v6, the results are:

integer column / v5
===================

    rows      10/p    100/p     16/p     10/s    100/s     16/s
    -----------------------------------------------------------
       0       97%      97%      97%     107%     126%     108%
       1       95%      82%      94%     108%     132%     110%
       5       95%      83%      95%     108%     132%     110%
    1000      129%     481%     171%     131%     382%     165%

integer column / v6
===================

    rows      10/p    100/p     16/p     10/s    100/s     16/s
    -----------------------------------------------------------
       0       97%      97%      97%      98%      98%      98%
       1       96%      84%      95%      97%      97%      98%
       5       97%      85%      96%      98%      97%      97%
    1000      129%     489%     172%     128%     330%     162%

text column / v5
================

    rows      10/p    100/p     16/p     10/s    100/s     16/s
    -----------------------------------------------------------
       0      100%     100%     100%     106%     119%     108%
       1       96%      81%      95%     107%     120%     109%
       5       97%      82%      96%     107%     121%     109%
    1000      291%    1622%     402%     255%    1092%     337%

text column / v6
================

    rows      10/p    100/p     16/p     10/s    100/s     16/s
    -----------------------------------------------------------
       0      101%     101%     101%      98%      99%      99%
       1       98%      82%      96%      98%      96%      97%
       5      100%      84%      98%      98%      96%      98%
    1000      297%    1645%     408%     255%    1000%     336%

Overall, the behavior for integer and text columns is the same, for both
patches. There's a couple interesting observations:

1) For the "simple" query mode, v5 helped quite a bit (20-30% speedup),
but v6 does not seem to help at all - it's either same or slower than
unpatched master.

I wonder why is that, and if we could get some of the speedup with v6?
At first I thought that maybe v5 is not building the hash table in cases
where v6 does, but that shouldn't be faster than master.

2) For the "prepared" mode, there's a clear performance hit the longer
the array is (for both v5 and v6). For 100 elements it's about 15%,
which is not great.

I think the reason is fairly simple - building the hash table is not
free, and with few rows it's not worth it - it'd be faster to just
search the array directly. Unfortunately, the logic that makes the
decision to switch to hashing only looks at the array length only, and
ignores the number of rows entirely. So I think if we want to address
this, convert_saop_to_hashed_saop needs to compare

    has_build_cost + nrows * hash_lookup_cost

and

    nrows * linear_lookup_cost

to make reasonable decision.

I was thinking that maybe we can ignore this, because people probably
have much larger tables in practice. But I'm not sure that's really
true, because there may be other quals and it's possible the preceding
ones are quite selective, filtering most of the rows.

I'm not sure how much of the necessary information we have available in
convert_saop_to_hashed_saop_walker, though :-( I suppose we know the
number of input rows for that plan node, not sure about selectivity of
the other quals, though.

It's also a bit strange that we get speedup for "simple" protocol, while
for "prepared" it gets slower. That seems counter-intuitive, because why
should we see opposite outcomes in those cases? I'd assume that we'll
see either speedup or slowdown in both cases, with the relative change
being more significant in the "prepared" mode.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

On Sun, 11 Apr 2021 at 10:38, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I wonder what's the relationship between the length of the IN list and
> the minimum number of rows needed for the hash to start winning.

I made the attached spreadsheet which demonstrates the crossover point
using the costs that I coded into cost_qual_eval_walker().

It basically shows, for large arrays, that there are fairly
significant benefits to hashing for just 2 lookups and not hashing
only just wins for 1 lookup.  However, the cost model does not account
for allocating memory for the hash table, which is far from free.

You can adjust the number of items in the IN clause by changing the
value in cell B1. The values in B2 and B3 are what I saw the planner
set when I tested with both INT and TEXT types.

David

Attachment

cost_comparison_hashed_vs_non-hashed_saops.ods

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

12 April 2021, 23:35:21

On Fri, 9 Apr 2021 at 00:00, David Rowley <dgrowleyml@gmail.com> wrote:
> I push this with some minor cleanup from the v6 patch I posted earlier.

I realised when working on something unrelated last night that we can
also do hash lookups for NOT IN too.

We'd just need to check if the operator's negator operator is
hashable.  No new fields would need to be added to ScalarArrayOpExpr.
We'd just set the hashfuncid to the correct value and then set the
opfuncid to the negator function.  In the executor, we'd know to check
if the value is in the table or not in the table based on the useOr
value.

I'm not really sure whether lack of NOT IN support is going to be a
source of bug reports for PG14 or not.  If it was, then it might be
worth doing something about that for PG14.   Otherwise, we can just
leave it for future work for PG15 and beyond.  I personally don't have
any strong feelings either way, but I'm leaning towards just writing a
patch and thinking of pushing it sometime after we branch for PG15.

I've included the RMT, just in case they want to voice an opinion on that.

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tom Lane

Date:

12 April 2021, 23:42:03

David Rowley <dgrowleyml@gmail.com> writes:
> I realised when working on something unrelated last night that we can
> also do hash lookups for NOT IN too.

... and still get the behavior right for nulls?

            regards, tom lane

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

12 April 2021, 23:49:08

On Tue, 13 Apr 2021 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > I realised when working on something unrelated last night that we can
> > also do hash lookups for NOT IN too.
>
> ... and still get the behavior right for nulls?

Yeah, it will. There are already some special cases for NULLs in the
IN version. Those would need to be adapted for NOT IN.

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

13 April 2021, 02:07:16

On Mon, Apr 12, 2021 at 7:49 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 13 Apr 2021 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > David Rowley <dgrowleyml@gmail.com> writes:
> > > I realised when working on something unrelated last night that we can
> > > also do hash lookups for NOT IN too.
> >
> > ... and still get the behavior right for nulls?
>
> Yeah, it will. There are already some special cases for NULLs in the
> IN version. Those would need to be adapted for NOT IN.

I hadn't thought about using the negator operator directly that way
when I initially wrote the patch.

But also I didn't think a whole lot about the NOT IN case at all --
and there's no mention of such that I see in this thread or the
precursor thread. It's pretty obvious that it wasn't part of my
immediate need, but obviously it'd be nice to have the consistency.

All that to say this: my vote would be to put it into PG15 also.

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

13 April 2021, 17:40:01

On Mon, Apr 12, 2021 at 10:07 PM James Coleman <jtc331@gmail.com> wrote:
>
> On Mon, Apr 12, 2021 at 7:49 PM David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > On Tue, 13 Apr 2021 at 11:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >
> > > David Rowley <dgrowleyml@gmail.com> writes:
> > > > I realised when working on something unrelated last night that we can
> > > > also do hash lookups for NOT IN too.
> > >
> > > ... and still get the behavior right for nulls?
> >
> > Yeah, it will. There are already some special cases for NULLs in the
> > IN version. Those would need to be adapted for NOT IN.
>
> I hadn't thought about using the negator operator directly that way
> when I initially wrote the patch.
>
> But also I didn't think a whole lot about the NOT IN case at all --
> and there's no mention of such that I see in this thread or the
> precursor thread. It's pretty obvious that it wasn't part of my
> immediate need, but obviously it'd be nice to have the consistency.
>
> All that to say this: my vote would be to put it into PG15 also.

...and here's a draft patch. I can take this to a new thread if you'd
prefer; the one here already got committed, on the other hand this is
pretty strongly linked to this discussion, so I figured it made sense
to post it here.

James

Attachment

v1-0001-Add-HashedScalarArrayOp-support-for-NOT-IN.patch

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

24 April 2021, 10:25:25

On Wed, 14 Apr 2021 at 05:40, James Coleman <jtc331@gmail.com> wrote:
> ...and here's a draft patch. I can take this to a new thread if you'd
> prefer; the one here already got committed, on the other hand this is
> pretty strongly linked to this discussion, so I figured it made sense
> to post it here.

I only glanced at this when you sent it and I was confused about how
it works. The patch didn't look like how I imagined it should and I
couldn't see how the executor part worked without any changes.

Anyway, I decided to clear up my confusion tonight and apply the patch
to figure all this out...  unfortunately, I see why I was confused
now. It actually does not work at all :-(

You're still passing the <> operator to get_op_hash_functions(), which
of course is not hashable, so we just never do hashing for NOT IN.

All your tests pass just fine because the standard non-hashed code path is used.

My idea was that you'd not add any fields to ScalarArrayOpExpr and for
soaps with useOr == false, check if the negator of the operator is
hashable. If so set the opfuncid to the negator operator's function.

I'm a bit undecided if it's safe to set the opfuncid to the negator
function.  If anything were to set that again based on the opno then
it would likely set it to the wrong thing. We can't go changing the
opno either because EXPLAIN would display the wrong thing.

Anyway, I've attached what I ended up with after spending a few hours
looking at this.

I pretty much used all your tests as is with the exception of removing
one that looked duplicated.

David

Attachment

v2-0001-Speedup-NOT-IN-with-a-set-of-Consts.patch

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

07 May 2021, 21:14:52

On Sat, Apr 24, 2021 at 6:25 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 14 Apr 2021 at 05:40, James Coleman <jtc331@gmail.com> wrote:
> > ...and here's a draft patch. I can take this to a new thread if you'd
> > prefer; the one here already got committed, on the other hand this is
> > pretty strongly linked to this discussion, so I figured it made sense
> > to post it here.
>
> I only glanced at this when you sent it and I was confused about how
> it works. The patch didn't look like how I imagined it should and I
> couldn't see how the executor part worked without any changes.
>
> Anyway, I decided to clear up my confusion tonight and apply the patch
> to figure all this out...  unfortunately, I see why I was confused
> now. It actually does not work at all :-(
>
> You're still passing the <> operator to get_op_hash_functions(), which
> of course is not hashable, so we just never do hashing for NOT IN.
>
> All your tests pass just fine because the standard non-hashed code path is used.

I was surprised when it "just worked" too; I should have stopped to
verify the path was being taken. Egg on my face for not doing so. :(

> My idea was that you'd not add any fields to ScalarArrayOpExpr and for
> soaps with useOr == false, check if the negator of the operator is
> hashable. If so set the opfuncid to the negator operator's function.
>
> I'm a bit undecided if it's safe to set the opfuncid to the negator
> function.  If anything were to set that again based on the opno then
> it would likely set it to the wrong thing. We can't go changing the
> opno either because EXPLAIN would display the wrong thing.

I don't personally see a reason why this is a problem. But I also
don't know that I have enough knowledge of the codebase to say that
definitively.

> Anyway, I've attached what I ended up with after spending a few hours
> looking at this.

Overall I like this approach.

One thing I think we could clean up:

+ bool            useOr;  /* use OR or AND semantics? */
...
+ /* useOr == true means an IN clause, useOr == false is NOT IN */

I'm wondering if the intersection of these two lines implies that
useOr isn't quite the right name here. Perhaps something like
"negated"?

On the other hand (to make the counterargument) useOr would keep it
consistent with the other ones.

The other thing I got to thinking about was = ALL. It doesn't get
turned into a hash op because the negator of = isn't hashable. I think
it's correct that that's the determining factor, because I can't
imagine what it would mean to hash <>. But...I wanted to confirm I
wasn't missing something. We don't have explicit tests for that case,
but I'm not sure it's necessary either.

> I pretty much used all your tests as is with the exception of removing
> one that looked duplicated.

Sounds good.

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

08 May 2021, 00:38:33

On Sat, 8 May 2021 at 09:15, James Coleman <jtc331@gmail.com> wrote:
>
> On Sat, Apr 24, 2021 at 6:25 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > I'm a bit undecided if it's safe to set the opfuncid to the negator
> > function.  If anything were to set that again based on the opno then
> > it would likely set it to the wrong thing. We can't go changing the
> > opno either because EXPLAIN would display the wrong thing.
>
> I don't personally see a reason why this is a problem. But I also
> don't know that I have enough knowledge of the codebase to say that
> definitively.

The reason for my concern is that if the opfuncid is set to
InvalidOid, set_sa_opfuncid() always sets the ScalarArrayOpExpr's
opfuncid to get_opcode(opexpr->opno) . I'm effectively setting the
opfuncid to get_opcode(get_negator(opexpr->opno)), if anything were to
reset the ScalarArrayOpExpr's opfuncid to InvalidOid, then
set_sa_opfuncid() would repopulate it with the wrong value.

Maybe the solution there is to teach set_sa_opfuncid() about our
hashing NOT IN trick and have it check if (!opexpr->useOr &&
OidIsValid(opexpr->hashfuncid)) and if that's true then do
opexpr->opfuncid = get_opcode(get_negator(opexpr->opno)).  Then we
could just not bothing setting opfuncid in
convert_saop_to_hashed_saop_walker().

> > Anyway, I've attached what I ended up with after spending a few hours
> > looking at this.
>
> Overall I like this approach.
>
> One thing I think we could clean up:
>
> + bool            useOr;  /* use OR or AND semantics? */
> ...
> + /* useOr == true means an IN clause, useOr == false is NOT IN */
>
> I'm wondering if the intersection of these two lines implies that
> useOr isn't quite the right name here. Perhaps something like
> "negated"?

I'm not sure I want to go changing that.  The whole IN() / NOT IN()
behaviour regarding NULLs all seems pretty weird until you mentally
replace a IN (1,2,3) with a = 1 OR a = 2 OR a = 3.  And for the a NOT
IN(1,2,3) case, a <> 1 AND a <> 2 AND a <> 3.  People can make a bit
more sense of the weirdness of NULLs with NOT IN when they mentally
convert their expression like that.  I think having that in code is
useful too. Any optimisations that are added must match those
semantics.

> The other thing I got to thinking about was = ALL. It doesn't get
> turned into a hash op because the negator of = isn't hashable. I think
> it's correct that that's the determining factor, because I can't
> imagine what it would mean to hash <>. But...I wanted to confirm I
> wasn't missing something. We don't have explicit tests for that case,
> but I'm not sure it's necessary either.

It's important to think of other cases, I just don't think there's any
need to do anything for that one.  Remember that we have the
restriction of requiring a set of Consts, so for that case to be met,
someone would have to write something like: col =
ALL('{1,1,1,1,1,1,1,1}'::int[]);  I think if anyone comes along
complaining that a query containing that is not as fast as they'd like
then we might tell them that they should just use: col = 1. A sanity
checkup might not go amiss either.

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Tom Lane

Date:

08 May 2021, 01:16:49

David Rowley <dgrowleyml@gmail.com> writes:
> On Sat, 8 May 2021 at 09:15, James Coleman <jtc331@gmail.com> wrote:
>> On Sat, Apr 24, 2021 at 6:25 AM David Rowley <dgrowleyml@gmail.com> wrote:
>>> I'm a bit undecided if it's safe to set the opfuncid to the negator
>>> function.  If anything were to set that again based on the opno then
>>> it would likely set it to the wrong thing. We can't go changing the
>>> opno either because EXPLAIN would display the wrong thing.

>> I don't personally see a reason why this is a problem. But I also
>> don't know that I have enough knowledge of the codebase to say that
>> definitively.

> The reason for my concern is that if the opfuncid is set to
> InvalidOid, set_sa_opfuncid() always sets the ScalarArrayOpExpr's
> opfuncid to get_opcode(opexpr->opno).

I will personally veto any design that involves setting opfuncid to
something that doesn't match the opno.  That's just horrid, and it
will break something somewhere, either immediately or down the road.

I don't immediately see why you can't add an "invert" boolean flag to
ScalarArrayOpExpr and let the executor machinery deal with this.  That'd
have the advantage of not having to depend on there being a negator.

            regards, tom lane

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

08 May 2021, 01:36:58

On Fri, May 7, 2021 at 9:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > On Sat, 8 May 2021 at 09:15, James Coleman <jtc331@gmail.com> wrote:
> >> On Sat, Apr 24, 2021 at 6:25 AM David Rowley <dgrowleyml@gmail.com> wrote:
> >>> I'm a bit undecided if it's safe to set the opfuncid to the negator
> >>> function.  If anything were to set that again based on the opno then
> >>> it would likely set it to the wrong thing. We can't go changing the
> >>> opno either because EXPLAIN would display the wrong thing.
>
> >> I don't personally see a reason why this is a problem. But I also
> >> don't know that I have enough knowledge of the codebase to say that
> >> definitively.
>
> > The reason for my concern is that if the opfuncid is set to
> > InvalidOid, set_sa_opfuncid() always sets the ScalarArrayOpExpr's
> > opfuncid to get_opcode(opexpr->opno).
>
> I will personally veto any design that involves setting opfuncid to
> something that doesn't match the opno.  That's just horrid, and it
> will break something somewhere, either immediately or down the road.

This is the "project design" style/policy I don't have. Thanks.

> I don't immediately see why you can't add an "invert" boolean flag to
> ScalarArrayOpExpr and let the executor machinery deal with this.  That'd
> have the advantage of not having to depend on there being a negator.

Don't we need to have a negator to be able to look up the proper has
function? At least somewhere in the process you'd have to convert from
looking up the <> op to looking up the = op and then setting the
"invert" flag.

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

James Coleman

Date:

08 May 2021, 01:49:50

On Fri, May 7, 2021 at 8:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> It's important to think of other cases, I just don't think there's any
> need to do anything for that one.  Remember that we have the
> restriction of requiring a set of Consts, so for that case to be met,
> someone would have to write something like: col =
> ALL('{1,1,1,1,1,1,1,1}'::int[]);  I think if anyone comes along
> complaining that a query containing that is not as fast as they'd like
> then we might tell them that they should just use: col = 1. A sanity
> checkup might not go amiss either.

I wasn't concerned with trying to optimize this case (I don't think we
can anyway, at least not without adding new work, like de-duplicating
the array first). Though I do hope that someday I'll/we'll get around
to getting the stable subexpressions caching patch finished, and then
this will be able to work for more than constant arrays.

I just wanted to confirm we'd thought through the cases we can't
handle to ensure we're not accidentally covering them.

James

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

08 May 2021, 02:04:24

On Sat, 8 May 2021 at 13:37, James Coleman <jtc331@gmail.com> wrote:
>
> On Fri, May 7, 2021 at 9:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I don't immediately see why you can't add an "invert" boolean flag to
> > ScalarArrayOpExpr and let the executor machinery deal with this.  That'd
> > have the advantage of not having to depend on there being a negator.
>
> Don't we need to have a negator to be able to look up the proper has
> function? At least somewhere in the process you'd have to convert from
> looking up the <> op to looking up the = op and then setting the
> "invert" flag.

Yeah, we *do* need to ensure there's a negator in the planner as we
need to use it during hash probes.  It's no good checking the hash
bucket we landed on does match with the <> operator's function.  We
won't find many matches that way!

I'm not opposed to adding some new field if that's what it takes.  I'd
imagine the new field will be something like negfuncid which will be
InvalidOid unless the hash function is set and useOr == false

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

08 May 2021, 04:50:00

On Sat, 8 May 2021 at 14:04, David Rowley <dgrowleyml@gmail.com> wrote:
> I'm not opposed to adding some new field if that's what it takes.  I'd
> imagine the new field will be something like negfuncid which will be
> InvalidOid unless the hash function is set and useOr == false

Just while this is still swapped into main memory, I've attached a
patch that adds a new field to ScalarArrayOpExpr rather than
repurposing the existing field.

David

Attachment

v3-0001-Speedup-NOT-IN-with-a-set-of-Consts.patch

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

Zhihong Yu

Date:

08 May 2021, 08:21:44

On Fri, May 7, 2021 at 9:50 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Sat, 8 May 2021 at 14:04, David Rowley <dgrowleyml@gmail.com> wrote:
> I'm not opposed to adding some new field if that's what it takes. I'd
> imagine the new field will be something like negfuncid which will be
> InvalidOid unless the hash function is set and useOr == false

Just while this is still swapped into main memory, I've attached a
patch that adds a new field to ScalarArrayOpExpr rather than
repurposing the existing field.

David

Hi,

+ if (!OidIsValid(saop->negfuncid))

+ record_plan_function_dependency(root, saop->hashfuncid);

Is there a typo in the second line ? (root, saop->negfuncid)

Cheers

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

08 May 2021, 08:29:30

On Sat, 8 May 2021 at 20:17, Zhihong Yu <zyu@yugabyte.com> wrote:
> +       if (!OidIsValid(saop->negfuncid))
> +           record_plan_function_dependency(root, saop->hashfuncid);
>
> Is there a typo in the second line ? (root, saop->negfuncid)

Yeah, that's a mistake. Thanks for checking it.

David

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

23 May 2021, 06:36:03

On Sat, 8 May 2021 at 20:29, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Sat, 8 May 2021 at 20:17, Zhihong Yu <zyu@yugabyte.com> wrote:
> > +       if (!OidIsValid(saop->negfuncid))
> > +           record_plan_function_dependency(root, saop->hashfuncid);
> >
> > Is there a typo in the second line ? (root, saop->negfuncid)
>
> Yeah, that's a mistake. Thanks for checking it.

I've attached a patch which fixes the mistake mentioned above.

Also, dropped the RMT from the thread. I only introduced them when I
wanted some input about if hashing NOT IN should be included in PG14.
Nobody seems to think that should be done.

David

Attachment

v4-0001-Speedup-NOT-IN-with-a-set-of-Consts.patch

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

06 July 2021, 10:39:12

I've been looking at the NOT IN hashing patch again and after making a
few minor tweaks I think it's pretty much ready to go.

If anyone feels differently, please let me know in the next couple of
days. Otherwise, I plan on taking a final look and pushing it soon.

David

Attachment

v5-0001-Use-hash-table-to-speed-up-NOT-IN-with-a-set-of-C.patch

Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From

David Rowley

Date:

07 July 2021, 04:32:19

On Tue, 6 Jul 2021 at 22:39, David Rowley <dgrowleyml@gmail.com> wrote:
> If anyone feels differently, please let me know in the next couple of
> days. Otherwise, I plan on taking a final look and pushing it soon.

After doing some very minor adjustments, I pushed this. (29f45e299).

Thanks to James and Zhihong for reviewing.

David