Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id CA+TgmoYjBgPgNCuGhMmFp_4HdaZDiEBRFzDsW9p92DK8sie_Fg@mail.gmail.com
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: POC, WIP: OR-clause support for indexes
List pgsql-hackers
On Mon, Jun 24, 2024 at 1:47 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I agree, with the proviso that "avoid gratuitous failures" should
> include cases where a query that got the optimization suddenly fails
> to get the optimization, due only to some very innocuous looking
> change. Such as a change from using a constant 1_000_000_000 to a
> constant 5_000_000_000 in the query text. That is a POLA violation.

Nope, I don't agree with that at all. If you imagine that we can
either have the optimization apply to one of those cases on the other,
or on the other hand we can have some cases that outright fail, I
think it's entirely clear that the former is better.

> Maybe it doesn't. My point was only that the B-Tree code doesn't
> necessarily need to use just one rhs type for the same column input
> opclass. The definition of SOAP works (or could work) in basically the
> same way, provided the "OR condition" were provably disjunct. We could
> for example mix different operators for the same nbtree scan key (with
> some work in nbtutils.c), just as we could support "where mycol =5 OR
> mycol IS NULL" with much effort.
>
> BTW, did you know MySQL has long supported the latter? It has a <=>
> operator, which is basically a non-standard spelling of IS NOT
> DISTINCT FROM. Importantly, it is indexable, whereas right now
> Postgres doesn't support indexing IS NOT DISTINCT FROM. If you're
> interested in working on this problem within the scope of this patch,
> or some follow-up patch, I can take care of the nbtree side of things.

I was assuming this patch shouldn't be changing the way indexes work
at all, just making use of the facilities that we have today. More
could be done, but that might make it harder to get anything
committed.

Before we get too deep into arguing about hypotheticals, I don't think
there's any problem here that we can't solve with the infrastructure
we already have. For instance, consider this:

robert.haas=# explain select * from foo where a in (1, 1000000000000000);
                        QUERY PLAN
-----------------------------------------------------------
 Seq Scan on foo1 foo  (cost=0.00..25.88 rows=13 width=36)
   Filter: (a = ANY ('{1,1000000000000000}'::bigint[]))
(2 rows)

I don't know exactly what's happening here, but it seems very similar
to what we need to have happen for this patch to work. pg_typeof(1) is
integer, and pg_typeof(1000000000000000) is bigint, and we're able to
figure out that it's OK to put both of those in an array of a single
type and without having any type conversion failures. If you replace
1000000000000000 with 2, then the array ends up being of type
integer[] rather than type bigint[], so. clearly the system is able to
reason its way through these kinds of scenarios already.

It's even possible, in my mind at least, that the patch is already
doing exactly the right things here. Even if it isn't, the problem
doesn't seem to be fundamental, because if this example can work (and
it does) then what the patch is trying to do should be workable, too.
We just have to make sure we're plugging all the pieces properly
together, and that we have comments adequately explain what is
happening and test cases that verify it. My feeling is that the patch
doesn't meet that standard today, but I think that just means it needs
some more work. I'm not arguing we have to throw the whole thing out,
or invent a lot of new infrastructure, or anything like that.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Stan Hu
Date:
Subject: PostgreSQL does not compile on macOS SDK 15.0
Next
From: Robert Haas
Date:
Subject: Re: improve predefined roles documentation