Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays - Mailing list pgsql-hackers

From James Coleman
Subject Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date
Msg-id CAAaqYe-4m9FYT5uHLb7LN=b6CFDmXBZqdmgfyHySP-wgcEL0VQ@mail.gmail.com
Whole thread Raw
In response to Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Sat, Apr 24, 2021 at 6:25 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 14 Apr 2021 at 05:40, James Coleman <jtc331@gmail.com> wrote:
> > ...and here's a draft patch. I can take this to a new thread if you'd
> > prefer; the one here already got committed, on the other hand this is
> > pretty strongly linked to this discussion, so I figured it made sense
> > to post it here.
>
> I only glanced at this when you sent it and I was confused about how
> it works. The patch didn't look like how I imagined it should and I
> couldn't see how the executor part worked without any changes.
>
> Anyway, I decided to clear up my confusion tonight and apply the patch
> to figure all this out...  unfortunately, I see why I was confused
> now. It actually does not work at all :-(
>
> You're still passing the <> operator to get_op_hash_functions(), which
> of course is not hashable, so we just never do hashing for NOT IN.
>
> All your tests pass just fine because the standard non-hashed code path is used.

I was surprised when it "just worked" too; I should have stopped to
verify the path was being taken. Egg on my face for not doing so. :(

> My idea was that you'd not add any fields to ScalarArrayOpExpr and for
> soaps with useOr == false, check if the negator of the operator is
> hashable. If so set the opfuncid to the negator operator's function.
>
> I'm a bit undecided if it's safe to set the opfuncid to the negator
> function.  If anything were to set that again based on the opno then
> it would likely set it to the wrong thing. We can't go changing the
> opno either because EXPLAIN would display the wrong thing.

I don't personally see a reason why this is a problem. But I also
don't know that I have enough knowledge of the codebase to say that
definitively.

> Anyway, I've attached what I ended up with after spending a few hours
> looking at this.

Overall I like this approach.

One thing I think we could clean up:

+ bool            useOr;  /* use OR or AND semantics? */
...
+ /* useOr == true means an IN clause, useOr == false is NOT IN */

I'm wondering if the intersection of these two lines implies that
useOr isn't quite the right name here. Perhaps something like
"negated"?

On the other hand (to make the counterargument) useOr would keep it
consistent with the other ones.

The other thing I got to thinking about was = ALL. It doesn't get
turned into a hash op because the negator of = isn't hashable. I think
it's correct that that's the determining factor, because I can't
imagine what it would mean to hash <>. But...I wanted to confirm I
wasn't missing something. We don't have explicit tests for that case,
but I'm not sure it's necessary either.

> I pretty much used all your tests as is with the exception of removing
> one that looked duplicated.

Sounds good.

James



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: plan with result cache is very slow when work_mem is not enough
Next
From: Tomas Vondra
Date:
Subject: Re: plan with result cache is very slow when work_mem is not enough