Thread: Re: Self contradictory examining on rel's baserestrictinfo

Re: Self contradictory examining on rel's baserestrictinfo

From
Robert Haas
Date:
On Mon, Nov 25, 2024 at 3:58 AM ro b <bigbro_wq@hotmail.com> wrote:
> 1. Background
>     A few months ago, when i read source codes of B-tree in routine
>     _bt_preprocess_keys, i found that there are more contradictory
>     checking case we can add. I sent email to pgsql-hackers and
>     then community contributor replied me and told me someone had
>     already proposed this question. Thanks for taking the time
>     to address my question. After serveral conversations, i found
>     that we can do something more. We can place these jobs at planning time.

When you're referring to things that happened in the past, you should
provide links to specific messages and names of specific contributors.
It will be difficult for anyone to find the previous discussion based
on your description of "a few months ago" and a "community
contributor".

I'm a little confused because it seems like you think we don't do any
of this kind of thing already. But:

robert.haas=# explain select * from foo where a < 1 and a > 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a < 1 and a = 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a <> 1 and a = 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a in (1,2,3) and a is null;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.

>     7) Scalar array comparison expression
>     First we need to deconstruct the const array, figure out the null and non-null
>     elements.
>     If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
>     ALL(array[56, null])), it's contradictory.

True, but that already seems to be working:

robert.haas=# explain select * from foo where a <= all(array[56, null]);
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

I'm not saying there is no room for improvement here, but I think you
will need to (1) be more clear about exactly which cases we are
already handling vs. which ones you want to handle, (2) possibly split
the patch into smaller patches each of which handles one specific case
instead of bundling many improvements together, and (3) improve the
comments and commit message in the patch(es).

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Self contradictory examining on rel's baserestrictinfo

From
Peter Geoghegan
Date:
On Mon, Nov 25, 2024 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
> There are cases where we don't already draw the necessary conclusions,
> such as a>1 and a>2, which could be simplified to a>2. But those cases
> aren't necessarily that common.

Actually, we do use the more restrictive operator with cases like "a>1
and a>2" -- but only in contexts that happen to involve a B-Tree index
scan, where _bt_preprocess_keys gets called. So it's a bit hit or
miss.

Tom has in the past expressed an interesting in moving the stuff in
_bt_preprocess_keys into the planner:

https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us

Note that almost nothing in _bt_preprocess_keys is particularly
related to nbtree itself, except perhaps for the stuff that marks scan
keys required. The fact that "WHERE a IN (1, 2, 3) AND a < 2" can
actually be simplified to "WHERE a = 1" has exactly nothing to do with
B-Tree index scans, but we'll only do this simplification in the
context of a B-Tree index scan. There is also some redundancy between
_bt_preprocess_keys and the planner, which is less than ideal.

--
Peter Geoghegan



Re: Self contradictory examining on rel's baserestrictinfo

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Mon, Nov 25, 2024 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> There are cases where we don't already draw the necessary conclusions,
>> such as a>1 and a>2, which could be simplified to a>2. But those cases
>> aren't necessarily that common.

> Actually, we do use the more restrictive operator with cases like "a>1
> and a>2" -- but only in contexts that happen to involve a B-Tree index
> scan, where _bt_preprocess_keys gets called. So it's a bit hit or
> miss.

Worth noting also is that _bt_preprocess_keys can detect cases that
the planner cannot because the relevant values are not constants.

I'm a little skeptical that we should expend a lot more effort on
the sorts of cases discussed here.  Basically, this sort of patch
improves matters for people who write crummy queries while penalizing
everybody else.  We need to be very careful about adding cycles to
planner runtime in pursuit of optimizations that are only rarely
successful.

I'm not trying to say that we should reject all of this out-of-hand,
but I want to see some analysis of how much each check costs and
how likely it is to win in real-world queries.

BTW, it's also worth pointing out the constraint_exclusion GUC.
If that's turned up to "on" from its default setting of "partition",
it authorizes the planner to spend more effort looking for
contradictory quals.  It might be reasonable to gate some of these
checks with that.  (I also can't help wondering if any of them are
already implemented but are hidden behind that.  I think Robert
must have had constraint_exclusion = on for his examples a
couple messages back.)

            regards, tom lane



Re: Self contradictory examining on rel's baserestrictinfo

From
Peter Geoghegan
Date:
On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm a little skeptical that we should expend a lot more effort on
> the sorts of cases discussed here.  Basically, this sort of patch
> improves matters for people who write crummy queries while penalizing
> everybody else.

I think that it's more complicated than that. Rather than explain what
I mean in general terms, I'll give you a specific example of the kind
of thing that seems quite interesting to me:

It would be fairly easy to teach nbtree about a new kind of
ScalarArrayOp that worked just like a conventional SAOP, but also
returned tuples matching "IS NULL" (IS NULL uses the equals strategy
internally already, so it'd be almost the same as treating NULL as
just another array element). This could have significant advantages
over what's even possible right now, particularly in cases involving
ORDER BY ... LIMIT.

I suppose that we'd have to invent some kind of new syntax for this.
But wouldn't it also make sense if it worked with "WHERE a IN (1, 2)
OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"? Theoretically
this would still be a case that amounted to improving matters for
badly written queries...but not really (we can hardly expect many
users to adopt our esoteric non-standard syntax).

In fact, you could make a similar argument for rewriting IN() into a
"= ANY()" SOAP (in the way that we always have).

> We need to be very careful about adding cycles to
> planner runtime in pursuit of optimizations that are only rarely
> successful.

I agree.

--
Peter Geoghegan



Re: Self contradictory examining on rel's baserestrictinfo

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> It would be fairly easy to teach nbtree about a new kind of
> ScalarArrayOp that worked just like a conventional SAOP, but also
> returned tuples matching "IS NULL" (IS NULL uses the equals strategy
> internally already, so it'd be almost the same as treating NULL as
> just another array element). This could have significant advantages
> over what's even possible right now, particularly in cases involving
> ORDER BY ... LIMIT.

> I suppose that we'd have to invent some kind of new syntax for this.
> But wouldn't it also make sense if it worked with "WHERE a IN (1, 2)
> OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"?

I'd be a strong -1 for inventing new syntax for that.  However, if
that's actually a common query pattern (I'm not convinced of it)
we could certainly build something to recognize that usage and
transform it into a suitable executable structure.

However, I don't see the connection between that and the current
thread.  That pattern is not self-contradictory.  My doubts
about its usefulness have more to do with it being evidence of
the SQL anti-pattern of treating NULL as a normal data value.
But once you've made that choice in your data representation,
you're going to end up with queries like this, and there's
nothing you could do to write them "better".

            regards, tom lane



Re: Self contradictory examining on rel's baserestrictinfo

From
Peter Geoghegan
Date:
On Mon, Nov 25, 2024 at 6:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > I suppose that we'd have to invent some kind of new syntax for this.
> > But wouldn't it also make sense if it worked with "WHERE a IN (1, 2)
> > OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"?
>
> I'd be a strong -1 for inventing new syntax for that.  However, if
> that's actually a common query pattern (I'm not convinced of it)
> we could certainly build something to recognize that usage and
> transform it into a suitable executable structure.

My basic point is this: SQL is supposed to be declarative -- in theory
it isn't supposed to matter how you write the query.

I don't see any hard distinction between the sorts of transformations
that the OP is talking about, and what you're describing here. There's
quite a lot of gray area, at least.

> However, I don't see the connection between that and the current
> thread.  That pattern is not self-contradictory.  My doubts
> about its usefulness have more to do with it being evidence of
> the SQL anti-pattern of treating NULL as a normal data value.

It's just an example, chosen to be easy to explain. I guess you didn't
like that example, so I'll go with another:

What about the "general OR optimization" stuff described by the MDAM
paper? The redundant and contradictory qual stuff is needed there, to
make sure that the final index scan access path (consisting of a
series of disjunctive index scans in key space order) will reliably
avoid returning duplicate rows.

(FWIW I think that the OP took some cues from the MDAM paper, since
they talked about doing things like transforming SQL Standard row
value constructor expressions into disjuncts.)

I don't have a concrete proposal here, or anything like it. I just
want to express that I believe that there's a lack of joined-up
thinking about nbtree preprocessing and the optimizer (not for the
first time). We *are* missing a few interesting tricks here.

--
Peter Geoghegan



Re: Self contradictory examining on rel's baserestrictinfo

From
Robert Haas
Date:
On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> BTW, it's also worth pointing out the constraint_exclusion GUC.
> If that's turned up to "on" from its default setting of "partition",
> it authorizes the planner to spend more effort looking for
> contradictory quals.  It might be reasonable to gate some of these
> checks with that.  (I also can't help wondering if any of them are
> already implemented but are hidden behind that.  I think Robert
> must have had constraint_exclusion = on for his examples a
> couple messages back.)

I definitely didn't change the value of that setting. It's possible I
was running on an older branch, but I don't think it was anything
ancient.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Self contradictory examining on rel's baserestrictinfo

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ...  I think Robert
>> must have had constraint_exclusion = on for his examples a
>> couple messages back.)

> I definitely didn't change the value of that setting. It's possible I
> was running on an older branch, but I don't think it was anything
> ancient.

Hmm, constraint_exclusion has defaulted to "partition" for many
years now.  But when I tried to reproduce your examples at [1]:

regression=# create table foo (a int);
CREATE TABLE
regression=# explain select * from foo where a < 1 and a > 1;
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on foo  (cost=0.00..48.25 rows=13 width=4)
   Filter: ((a < 1) AND (a > 1))
(2 rows)

regression=# show constraint_exclusion;
 constraint_exclusion
----------------------
 partition
(1 row)

regression=# set constraint_exclusion = on;
SET
regression=# explain select * from foo where a < 1 and a > 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

and similarly for some of the other examples (I didn't try
every one).  Maybe your test table was partitioned??

            regards, tom lane

[1] https://www.postgresql.org/message-id/CA%2BTgmoZqiCwHbZczXXLjucfuHi%3D7EahSyzEj5yrqYKMQ0QOL9Q%40mail.gmail.com



Re: Self contradictory examining on rel's baserestrictinfo

From
Robert Haas
Date:
On Tue, Nov 26, 2024 at 12:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Maybe your test table was partitioned??

Ah, yes, it was.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Self contradictory examining on rel's baserestrictinfo

From
ro b
Date:
Thank you for your advice and guidance.
I didn't know the constraint_exclusion switch existed.
As your advice i need to make what i am doing to be clear.

> There are cases where we don't already draw the necessary conclusions,
> such as a>1 and a>2, which could be simplified to a>2. But those cases
> aren't necessarily that common.

The path i committed not just test contradictory but also do the 
simplification.  The simplification is limited in the BTREE.
Could you interpret the case in a little more detail.

Best regards


From: Robert Haas <robertmhaas@gmail.com>
Sent: Tuesday, November 26, 2024 04:55
To: ro b <bigbro_wq@hotmail.com>
Cc: pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
 
On Mon, Nov 25, 2024 at 3:58 AM ro b <bigbro_wq@hotmail.com> wrote:
> 1. Background
>     A few months ago, when i read source codes of B-tree in routine
>     _bt_preprocess_keys, i found that there are more contradictory
>     checking case we can add. I sent email to pgsql-hackers and
>     then community contributor replied me and told me someone had
>     already proposed this question. Thanks for taking the time
>     to address my question. After serveral conversations, i found
>     that we can do something more. We can place these jobs at planning time.

When you're referring to things that happened in the past, you should
provide links to specific messages and names of specific contributors.
It will be difficult for anyone to find the previous discussion based
on your description of "a few months ago" and a "community
contributor".

I'm a little confused because it seems like you think we don't do any
of this kind of thing already. But:

robert.haas=# explain select * from foo where a < 1 and a > 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a < 1 and a = 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a <> 1 and a = 1;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

robert.haas=# explain select * from foo where a in (1,2,3) and a is null;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.

>     7) Scalar array comparison expression
>     First we need to deconstruct the const array, figure out the null and non-null
>     elements.
>     If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
>     ALL(array[56, null])), it's contradictory.

True, but that already seems to be working:

robert.haas=# explain select * from foo where a <= all(array[56, null]);
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

I'm not saying there is no room for improvement here, but I think you
will need to (1) be more clear about exactly which cases we are
already handling vs. which ones you want to handle, (2) possibly split
the patch into smaller patches each of which handles one specific case
instead of bundling many improvements together, and (3) improve the
comments and commit message in the patch(es).

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Self contradictory examining on rel's baserestrictinfo

From
Robert Haas
Date:
On Wed, Nov 27, 2024 at 9:30 AM ro b <bigbro_wq@hotmail.com> wrote:
> The path i committed not just test contradictory but also do the
> simplification.  The simplification is limited in the BTREE.
> Could you interpret the case in a little more detail.

I am not able to understand this paragraph.

Regretfully,

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Self contradictory examining on rel's baserestrictinfo

From
ro b
Date:
I mean why can't draw the conclusions 
like the case a>1 and a>2 which can be simplified 
to a>2 if the operator > is btree opclass. 


Best regards


From: Robert Haas <robertmhaas@gmail.com>
Sent: Wednesday, November 27, 2024 22:52
To: ro b <bigbro_wq@hotmail.com>
Cc: Peter Geoghegan <pg@bowt.ie>; pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
 
On Wed, Nov 27, 2024 at 9:30 AM ro b <bigbro_wq@hotmail.com> wrote:
> The path i committed not just test contradictory but also do the
> simplification.  The simplification is limited in the BTREE.
> Could you interpret the case in a little more detail.

I am not able to understand this paragraph.

Regretfully,

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Self contradictory examining on rel's baserestrictinfo

From
Robert Haas
Date:
On Wed, Nov 27, 2024 at 10:28 AM ro b <bigbro_wq@hotmail.com> wrote:
> I mean why can't draw the conclusions
> like the case a>1 and a>2 which can be simplified
> to a>2 if the operator > is btree opclass.

This question has already been discussed in some detail on this email
thread. First, we sometimes do detect it, as discussed by Peter.
Second, it is an unusual case and so not worth spending a lot of CPU
cycles to detect, as discussed by Tom.

We might still accept a patch to improve things in this area. For
example, if somebody could find a way to change the code so that we
are able to detect more cases without needing to spend more CPU
cycles, I think everyone would like that. However, that may be a
tricky goal to accomplish.

--
Robert Haas
EDB: http://www.enterprisedb.com