Thread: Re: Self contradictory examining on rel's baserestrictinfo
On Mon, Nov 25, 2024 at 3:58 AM ro b <bigbro_wq@hotmail.com> wrote: > 1. Background > A few months ago, when i read source codes of B-tree in routine > _bt_preprocess_keys, i found that there are more contradictory > checking case we can add. I sent email to pgsql-hackers and > then community contributor replied me and told me someone had > already proposed this question. Thanks for taking the time > to address my question. After serveral conversations, i found > that we can do something more. We can place these jobs at planning time. When you're referring to things that happened in the past, you should provide links to specific messages and names of specific contributors. It will be difficult for anyone to find the previous discussion based on your description of "a few months ago" and a "community contributor". I'm a little confused because it seems like you think we don't do any of this kind of thing already. But: robert.haas=# explain select * from foo where a < 1 and a > 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) robert.haas=# explain select * from foo where a < 1 and a = 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) robert.haas=# explain select * from foo where a <> 1 and a = 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) robert.haas=# explain select * from foo where a in (1,2,3) and a is null; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) There are cases where we don't already draw the necessary conclusions, such as a>1 and a>2, which could be simplified to a>2. But those cases aren't necessarily that common. > 7) Scalar array comparison expression > First we need to deconstruct the const array, figure out the null and non-null > elements. > If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <= > ALL(array[56, null])), it's contradictory. True, but that already seems to be working: robert.haas=# explain select * from foo where a <= all(array[56, null]); QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) I'm not saying there is no room for improvement here, but I think you will need to (1) be more clear about exactly which cases we are already handling vs. which ones you want to handle, (2) possibly split the patch into smaller patches each of which handles one specific case instead of bundling many improvements together, and (3) improve the comments and commit message in the patch(es). -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Nov 25, 2024 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote: > There are cases where we don't already draw the necessary conclusions, > such as a>1 and a>2, which could be simplified to a>2. But those cases > aren't necessarily that common. Actually, we do use the more restrictive operator with cases like "a>1 and a>2" -- but only in contexts that happen to involve a B-Tree index scan, where _bt_preprocess_keys gets called. So it's a bit hit or miss. Tom has in the past expressed an interesting in moving the stuff in _bt_preprocess_keys into the planner: https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us Note that almost nothing in _bt_preprocess_keys is particularly related to nbtree itself, except perhaps for the stuff that marks scan keys required. The fact that "WHERE a IN (1, 2, 3) AND a < 2" can actually be simplified to "WHERE a = 1" has exactly nothing to do with B-Tree index scans, but we'll only do this simplification in the context of a B-Tree index scan. There is also some redundancy between _bt_preprocess_keys and the planner, which is less than ideal. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Mon, Nov 25, 2024 at 3:55 PM Robert Haas <robertmhaas@gmail.com> wrote: >> There are cases where we don't already draw the necessary conclusions, >> such as a>1 and a>2, which could be simplified to a>2. But those cases >> aren't necessarily that common. > Actually, we do use the more restrictive operator with cases like "a>1 > and a>2" -- but only in contexts that happen to involve a B-Tree index > scan, where _bt_preprocess_keys gets called. So it's a bit hit or > miss. Worth noting also is that _bt_preprocess_keys can detect cases that the planner cannot because the relevant values are not constants. I'm a little skeptical that we should expend a lot more effort on the sorts of cases discussed here. Basically, this sort of patch improves matters for people who write crummy queries while penalizing everybody else. We need to be very careful about adding cycles to planner runtime in pursuit of optimizations that are only rarely successful. I'm not trying to say that we should reject all of this out-of-hand, but I want to see some analysis of how much each check costs and how likely it is to win in real-world queries. BTW, it's also worth pointing out the constraint_exclusion GUC. If that's turned up to "on" from its default setting of "partition", it authorizes the planner to spend more effort looking for contradictory quals. It might be reasonable to gate some of these checks with that. (I also can't help wondering if any of them are already implemented but are hidden behind that. I think Robert must have had constraint_exclusion = on for his examples a couple messages back.) regards, tom lane
On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm a little skeptical that we should expend a lot more effort on > the sorts of cases discussed here. Basically, this sort of patch > improves matters for people who write crummy queries while penalizing > everybody else. I think that it's more complicated than that. Rather than explain what I mean in general terms, I'll give you a specific example of the kind of thing that seems quite interesting to me: It would be fairly easy to teach nbtree about a new kind of ScalarArrayOp that worked just like a conventional SAOP, but also returned tuples matching "IS NULL" (IS NULL uses the equals strategy internally already, so it'd be almost the same as treating NULL as just another array element). This could have significant advantages over what's even possible right now, particularly in cases involving ORDER BY ... LIMIT. I suppose that we'd have to invent some kind of new syntax for this. But wouldn't it also make sense if it worked with "WHERE a IN (1, 2) OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"? Theoretically this would still be a case that amounted to improving matters for badly written queries...but not really (we can hardly expect many users to adopt our esoteric non-standard syntax). In fact, you could make a similar argument for rewriting IN() into a "= ANY()" SOAP (in the way that we always have). > We need to be very careful about adding cycles to > planner runtime in pursuit of optimizations that are only rarely > successful. I agree. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > It would be fairly easy to teach nbtree about a new kind of > ScalarArrayOp that worked just like a conventional SAOP, but also > returned tuples matching "IS NULL" (IS NULL uses the equals strategy > internally already, so it'd be almost the same as treating NULL as > just another array element). This could have significant advantages > over what's even possible right now, particularly in cases involving > ORDER BY ... LIMIT. > I suppose that we'd have to invent some kind of new syntax for this. > But wouldn't it also make sense if it worked with "WHERE a IN (1, 2) > OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"? I'd be a strong -1 for inventing new syntax for that. However, if that's actually a common query pattern (I'm not convinced of it) we could certainly build something to recognize that usage and transform it into a suitable executable structure. However, I don't see the connection between that and the current thread. That pattern is not self-contradictory. My doubts about its usefulness have more to do with it being evidence of the SQL anti-pattern of treating NULL as a normal data value. But once you've made that choice in your data representation, you're going to end up with queries like this, and there's nothing you could do to write them "better". regards, tom lane
On Mon, Nov 25, 2024 at 6:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@bowt.ie> writes: > > I suppose that we'd have to invent some kind of new syntax for this. > > But wouldn't it also make sense if it worked with "WHERE a IN (1, 2) > > OR a IS NULL"? Or even with "WHERE a = 1 OR a IS NULL"? > > I'd be a strong -1 for inventing new syntax for that. However, if > that's actually a common query pattern (I'm not convinced of it) > we could certainly build something to recognize that usage and > transform it into a suitable executable structure. My basic point is this: SQL is supposed to be declarative -- in theory it isn't supposed to matter how you write the query. I don't see any hard distinction between the sorts of transformations that the OP is talking about, and what you're describing here. There's quite a lot of gray area, at least. > However, I don't see the connection between that and the current > thread. That pattern is not self-contradictory. My doubts > about its usefulness have more to do with it being evidence of > the SQL anti-pattern of treating NULL as a normal data value. It's just an example, chosen to be easy to explain. I guess you didn't like that example, so I'll go with another: What about the "general OR optimization" stuff described by the MDAM paper? The redundant and contradictory qual stuff is needed there, to make sure that the final index scan access path (consisting of a series of disjunctive index scans in key space order) will reliably avoid returning duplicate rows. (FWIW I think that the OP took some cues from the MDAM paper, since they talked about doing things like transforming SQL Standard row value constructor expressions into disjuncts.) I don't have a concrete proposal here, or anything like it. I just want to express that I believe that there's a lack of joined-up thinking about nbtree preprocessing and the optimizer (not for the first time). We *are* missing a few interesting tricks here. -- Peter Geoghegan
On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > BTW, it's also worth pointing out the constraint_exclusion GUC. > If that's turned up to "on" from its default setting of "partition", > it authorizes the planner to spend more effort looking for > contradictory quals. It might be reasonable to gate some of these > checks with that. (I also can't help wondering if any of them are > already implemented but are hidden behind that. I think Robert > must have had constraint_exclusion = on for his examples a > couple messages back.) I definitely didn't change the value of that setting. It's possible I was running on an older branch, but I don't think it was anything ancient. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Nov 25, 2024 at 4:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... I think Robert >> must have had constraint_exclusion = on for his examples a >> couple messages back.) > I definitely didn't change the value of that setting. It's possible I > was running on an older branch, but I don't think it was anything > ancient. Hmm, constraint_exclusion has defaulted to "partition" for many years now. But when I tried to reproduce your examples at [1]: regression=# create table foo (a int); CREATE TABLE regression=# explain select * from foo where a < 1 and a > 1; QUERY PLAN ----------------------------------------------------- Seq Scan on foo (cost=0.00..48.25 rows=13 width=4) Filter: ((a < 1) AND (a > 1)) (2 rows) regression=# show constraint_exclusion; constraint_exclusion ---------------------- partition (1 row) regression=# set constraint_exclusion = on; SET regression=# explain select * from foo where a < 1 and a > 1; QUERY PLAN ------------------------------------------ Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) and similarly for some of the other examples (I didn't try every one). Maybe your test table was partitioned?? regards, tom lane [1] https://www.postgresql.org/message-id/CA%2BTgmoZqiCwHbZczXXLjucfuHi%3D7EahSyzEj5yrqYKMQ0QOL9Q%40mail.gmail.com
On Tue, Nov 26, 2024 at 12:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maybe your test table was partitioned?? Ah, yes, it was. -- Robert Haas EDB: http://www.enterprisedb.com
Thank you for your advice and guidance.
I didn't know the constraint_exclusion switch existed.
As your advice i need to make what i am doing to be clear.
> There are cases where we don't already draw the necessary conclusions,
> such as a>1 and a>2, which could be simplified to a>2. But those cases
> aren't necessarily that common.
The path i committed not just test contradictory but also do the
simplification. The simplification is limited in the BTREE.
simplification. The simplification is limited in the BTREE.
Could you interpret the case in a little more detail.
Best regards
From: Robert Haas <robertmhaas@gmail.com>
Sent: Tuesday, November 26, 2024 04:55
To: ro b <bigbro_wq@hotmail.com>
Cc: pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
Sent: Tuesday, November 26, 2024 04:55
To: ro b <bigbro_wq@hotmail.com>
Cc: pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
On Mon, Nov 25, 2024 at 3:58 AM ro b <bigbro_wq@hotmail.com> wrote:
> 1. Background
> A few months ago, when i read source codes of B-tree in routine
> _bt_preprocess_keys, i found that there are more contradictory
> checking case we can add. I sent email to pgsql-hackers and
> then community contributor replied me and told me someone had
> already proposed this question. Thanks for taking the time
> to address my question. After serveral conversations, i found
> that we can do something more. We can place these jobs at planning time.
When you're referring to things that happened in the past, you should
provide links to specific messages and names of specific contributors.
It will be difficult for anyone to find the previous discussion based
on your description of "a few months ago" and a "community
contributor".
I'm a little confused because it seems like you think we don't do any
of this kind of thing already. But:
robert.haas=# explain select * from foo where a < 1 and a > 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a < 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a <> 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a in (1,2,3) and a is null;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.
> 7) Scalar array comparison expression
> First we need to deconstruct the const array, figure out the null and non-null
> elements.
> If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
> ALL(array[56, null])), it's contradictory.
True, but that already seems to be working:
robert.haas=# explain select * from foo where a <= all(array[56, null]);
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
I'm not saying there is no room for improvement here, but I think you
will need to (1) be more clear about exactly which cases we are
already handling vs. which ones you want to handle, (2) possibly split
the patch into smaller patches each of which handles one specific case
instead of bundling many improvements together, and (3) improve the
comments and commit message in the patch(es).
--
Robert Haas
EDB: http://www.enterprisedb.com
> 1. Background
> A few months ago, when i read source codes of B-tree in routine
> _bt_preprocess_keys, i found that there are more contradictory
> checking case we can add. I sent email to pgsql-hackers and
> then community contributor replied me and told me someone had
> already proposed this question. Thanks for taking the time
> to address my question. After serveral conversations, i found
> that we can do something more. We can place these jobs at planning time.
When you're referring to things that happened in the past, you should
provide links to specific messages and names of specific contributors.
It will be difficult for anyone to find the previous discussion based
on your description of "a few months ago" and a "community
contributor".
I'm a little confused because it seems like you think we don't do any
of this kind of thing already. But:
robert.haas=# explain select * from foo where a < 1 and a > 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a < 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a <> 1 and a = 1;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
robert.haas=# explain select * from foo where a in (1,2,3) and a is null;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
There are cases where we don't already draw the necessary conclusions,
such as a>1 and a>2, which could be simplified to a>2. But those cases
aren't necessarily that common.
> 7) Scalar array comparison expression
> First we need to deconstruct the const array, figure out the null and non-null
> elements.
> If ALL flag is set and the Const contain NULL. we will get nothing (eg. x <=
> ALL(array[56, null])), it's contradictory.
True, but that already seems to be working:
robert.haas=# explain select * from foo where a <= all(array[56, null]);
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
One-Time Filter: false
(2 rows)
I'm not saying there is no room for improvement here, but I think you
will need to (1) be more clear about exactly which cases we are
already handling vs. which ones you want to handle, (2) possibly split
the patch into smaller patches each of which handles one specific case
instead of bundling many improvements together, and (3) improve the
comments and commit message in the patch(es).
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Nov 27, 2024 at 9:30 AM ro b <bigbro_wq@hotmail.com> wrote: > The path i committed not just test contradictory but also do the > simplification. The simplification is limited in the BTREE. > Could you interpret the case in a little more detail. I am not able to understand this paragraph. Regretfully, -- Robert Haas EDB: http://www.enterprisedb.com
I mean why can't draw the conclusions
like the case a>1 and a>2 which can be simplified
to a>2 if the operator > is btree opclass.
Best regards
From: Robert Haas <robertmhaas@gmail.com>
Sent: Wednesday, November 27, 2024 22:52
To: ro b <bigbro_wq@hotmail.com>
Cc: Peter Geoghegan <pg@bowt.ie>; pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
Sent: Wednesday, November 27, 2024 22:52
To: ro b <bigbro_wq@hotmail.com>
Cc: Peter Geoghegan <pg@bowt.ie>; pgsql-hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
On Wed, Nov 27, 2024 at 9:30 AM ro b <bigbro_wq@hotmail.com> wrote:
> The path i committed not just test contradictory but also do the
> simplification. The simplification is limited in the BTREE.
> Could you interpret the case in a little more detail.
I am not able to understand this paragraph.
Regretfully,
--
Robert Haas
EDB: http://www.enterprisedb.com
> The path i committed not just test contradictory but also do the
> simplification. The simplification is limited in the BTREE.
> Could you interpret the case in a little more detail.
I am not able to understand this paragraph.
Regretfully,
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Nov 27, 2024 at 10:28 AM ro b <bigbro_wq@hotmail.com> wrote: > I mean why can't draw the conclusions > like the case a>1 and a>2 which can be simplified > to a>2 if the operator > is btree opclass. This question has already been discussed in some detail on this email thread. First, we sometimes do detect it, as discussed by Peter. Second, it is an unusual case and so not worth spending a lot of CPU cycles to detect, as discussed by Tom. We might still accept a patch to improve things in this area. For example, if somebody could find a way to change the code so that we are able to detect more cases without needing to spend more CPU cycles, I think everyone would like that. However, that may be a tricky goal to accomplish. -- Robert Haas EDB: http://www.enterprisedb.com