Thread: Unexpected (wrong?) result querying boolean partitioned table with NULL partition
Unexpected (wrong?) result querying boolean partitioned table with NULL partition
From
David Kimura
Date:
Hi Hackers, Is it fair to assume that, given the same data, a partitioned table should return the same results as a non-partitioned table? If that's true, then I think I may have stumbled across a case of wrong results on boolean partitioned tables. In following example, I think we incorrectly skip the default partition scan: CREATE TABLE boolpart (a bool) PARTITION BY LIST (a); CREATE TABLE boolpart_default PARTITION OF boolpart default; CREATE TABLE boolpart_t PARTITION OF boolpart FOR VALUES IN ('true'); CREATE TABLE boolpart_f PARTITION OF boolpart FOR VALUES IN ('false'); INSERT INTO boolpart VALUES (true), (false), (null); EXPLAIN SELECT * FROM boolpart WHERE a IS NOT true; QUERY PLAN ----------------------------------------------------------------------- Seq Scan on boolpart_f boolpart (cost=0.00..38.10 rows=1405 width=1) Filter: (a IS NOT TRUE) (2 rows) SELECT * FROM boolpart WHERE a IS NOT true; a --- f (1 row) Compare that to the result of a non-partitioned table: CREATE TABLE booltab (a bool); INSERT INTO booltab VALUES (true), (false), (null); EXPLAIN SELECT * FROM booltab WHERE a IS NOT true; QUERY PLAN ----------------------------------------------------------- Seq Scan on booltab (cost=0.00..38.10 rows=1405 width=1) Filter: (a IS NOT TRUE) (2 rows) SELECT * FROM booltab WHERE a IS NOT true; a --- f (2 rows) I think the issue has to do with assumptions made about boolean test IS NOT inequality logic which is different from inequality of other operators. Specifically, "true IS NOT NULL" is not the same as "true<>NULL". In partition pruning, match_boolean_partition_clause() tries to match partkey with clause and outputs PARTCLAUSE_MATCH_CLAUSE and an outconst TRUE for (IS_TRUE or IS_NOT_FALSE) and inversely FALSE for (IS_FALSE or IS_NOT_TRUE). However, I don't think this gradularity is sufficient for "IS NOT" logic when a NULL value partition is present. One idea is to use the negation operator for IS_NOT_(true|false) (i.e. BooleanNotEqualOperator instead of BooleanEqualOperator). But besides presumably being a more expensive operation, not equal is not part of the btree opfamily for bool_ops. So, seems like that won't really fit into the current partition pruning framework. Then I realized that the issue is just about adding the default or null partition in these very particular scenarios. And struct PartitionBoundInfoData already holds that information. So if we can identify these scenarios and pass that information into get_matching_partitions() then we can add the necessary partitions. Attached is a very rough sketch of that idea. Thoughts? Does this seem like a legit issue? And if so, do either of the proposed solutions seem reasonable? Thanks, David
Attachment
Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition
From
David Rowley
Date:
On Wed, 12 Apr 2023 at 22:13, David Kimura <david.g.kimura@gmail.com> wrote: > Is it fair to assume that, given the same data, a partitioned table should > return the same results as a non-partitioned table? Yes, and also the same as when enable_partition_pruning is set to off. > CREATE TABLE boolpart (a bool) PARTITION BY LIST (a); > CREATE TABLE boolpart_default PARTITION OF boolpart default; > CREATE TABLE boolpart_t PARTITION OF boolpart FOR VALUES IN ('true'); > CREATE TABLE boolpart_f PARTITION OF boolpart FOR VALUES IN ('false'); > INSERT INTO boolpart VALUES (true), (false), (null); > > EXPLAIN SELECT * FROM boolpart WHERE a IS NOT true; > QUERY PLAN > ----------------------------------------------------------------------- > Seq Scan on boolpart_f boolpart (cost=0.00..38.10 rows=1405 width=1) > Filter: (a IS NOT TRUE) > (2 rows) > > SELECT * FROM boolpart WHERE a IS NOT true; > a > --- > f > (1 row) > > Compare that to the result of a non-partitioned table: > > CREATE TABLE booltab (a bool); > INSERT INTO booltab VALUES (true), (false), (null); > > EXPLAIN SELECT * FROM booltab WHERE a IS NOT true; > QUERY PLAN > ----------------------------------------------------------- > Seq Scan on booltab (cost=0.00..38.10 rows=1405 width=1) > Filter: (a IS NOT TRUE) > (2 rows) > > SELECT * FROM booltab WHERE a IS NOT true; > a > --- > f Ouch. That's certainly not correct. > I think the issue has to do with assumptions made about boolean test IS NOT > inequality logic which is different from inequality of other operators. > Specifically, "true IS NOT NULL" is not the same as "true<>NULL". Yeah, that's wrong. > One idea is to use the negation operator for IS_NOT_(true|false) (i.e. > BooleanNotEqualOperator instead of BooleanEqualOperator). But besides > presumably being a more expensive operation, not equal is not part of the btree > opfamily for bool_ops. So, seems like that won't really fit into the current > partition pruning framework. There's already code to effectively handle <> operators. Just the PartClauseInfo.op_is_ne needs to be set to true. get_matching_list_bounds() then handles that by taking the inverse of the partitions matching the equality operator. Effectively, I think that's the attached patch. There seems to be a bunch of tests checking this already, all of them assuming the incorrect plans. David
Attachment
Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition
From
David Kimura
Date:
On Wed, Apr 12, 2023 at 4:13 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Wed, 12 Apr 2023 at 22:13, David Kimura <david.g.kimura@gmail.com> wrote: > > Is it fair to assume that, given the same data, a partitioned table should > > return the same results as a non-partitioned table? > > Yes, and also the same as when enable_partition_pruning is set to off. Thanks for making me aware of that GUC. > > One idea is to use the negation operator for IS_NOT_(true|false) (i.e. > > BooleanNotEqualOperator instead of BooleanEqualOperator). But besides > > presumably being a more expensive operation, not equal is not part of the btree > > opfamily for bool_ops. So, seems like that won't really fit into the current > > partition pruning framework. > > There's already code to effectively handle <> operators. Just the > PartClauseInfo.op_is_ne needs to be set to true. > get_matching_list_bounds() then handles that by taking the inverse of > the partitions matching the equality operator. Ah, I missed that when I first tried to implement that approach. Indeed, this seems cleaner. Also, the domain space for boolean partitions is very small, so any added cost for searching not equal seems negligible. Thanks, David
Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition
From
Richard Guo
Date:
On Wed, Apr 12, 2023 at 7:13 PM David Rowley <dgrowleyml@gmail.com> wrote:
There's already code to effectively handle <> operators. Just the
PartClauseInfo.op_is_ne needs to be set to true.
get_matching_list_bounds() then handles that by taking the inverse of
the partitions matching the equality operator.
Effectively, I think that's the attached patch.
I think there is a thinko here.
+ switch (btest->booltesttype)
+ {
+ case IS_NOT_TRUE:
+ *noteq = true;
+ /* fall through */
+ case IS_TRUE:
+ *outconst = (Expr *) makeBoolConst(true, false);
+ break;
+ case IS_NOT_FALSE:
+ *noteq = true;
+ /* fall through */
+ case IS_FALSE:
+ *outconst = (Expr *) makeBoolConst(false, false);
+ break;
+ default:
+ Assert(false); /* hmm? */
+ return PARTCLAUSE_UNSUPPORTED;
+ }
The *outconst should be set to true in case IS_NOT_FALSE and set to
false in case IS_NOT_TRUE, something like:
switch (btest->booltesttype)
{
- case IS_NOT_TRUE:
+ case IS_NOT_FALSE:
*noteq = true;
/* fall through */
case IS_TRUE:
*outconst = (Expr *) makeBoolConst(true, false);
break;
- case IS_NOT_FALSE:
+ case IS_NOT_TRUE:
*noteq = true;
/* fall through */
case IS_FALSE:
Thanks
Richard
+ switch (btest->booltesttype)
+ {
+ case IS_NOT_TRUE:
+ *noteq = true;
+ /* fall through */
+ case IS_TRUE:
+ *outconst = (Expr *) makeBoolConst(true, false);
+ break;
+ case IS_NOT_FALSE:
+ *noteq = true;
+ /* fall through */
+ case IS_FALSE:
+ *outconst = (Expr *) makeBoolConst(false, false);
+ break;
+ default:
+ Assert(false); /* hmm? */
+ return PARTCLAUSE_UNSUPPORTED;
+ }
The *outconst should be set to true in case IS_NOT_FALSE and set to
false in case IS_NOT_TRUE, something like:
switch (btest->booltesttype)
{
- case IS_NOT_TRUE:
+ case IS_NOT_FALSE:
*noteq = true;
/* fall through */
case IS_TRUE:
*outconst = (Expr *) makeBoolConst(true, false);
break;
- case IS_NOT_FALSE:
+ case IS_NOT_TRUE:
*noteq = true;
/* fall through */
case IS_FALSE:
Thanks
Richard
Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition
From
Richard Guo
Date:
On Thu, Apr 13, 2023 at 10:39 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Apr 12, 2023 at 7:13 PM David Rowley <dgrowleyml@gmail.com> wrote:There's already code to effectively handle <> operators. Just the
PartClauseInfo.op_is_ne needs to be set to true.
get_matching_list_bounds() then handles that by taking the inverse of
the partitions matching the equality operator.
Effectively, I think that's the attached patch.I think there is a thinko here.
Sorry. It's my thinko. In cases IS_NOT_TRUE and IS_NOT_FALSE the
op_is_ne is set to true. So the logic in origin patch is right.
BTW, I wonder if we should elog an Error here.
default:
- Assert(false); /* hmm? */
- return PARTCLAUSE_UNSUPPORTED;
+ elog(ERROR, "unrecognized booltesttype: %d",
+ (int) btest->booltesttype);
+ break;
Otherwise the patch looks good to me.
Thanks
Richard
op_is_ne is set to true. So the logic in origin patch is right.
BTW, I wonder if we should elog an Error here.
default:
- Assert(false); /* hmm? */
- return PARTCLAUSE_UNSUPPORTED;
+ elog(ERROR, "unrecognized booltesttype: %d",
+ (int) btest->booltesttype);
+ break;
Otherwise the patch looks good to me.
Thanks
Richard
Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition
From
David Rowley
Date:
On Thu, 13 Apr 2023 at 15:30, Richard Guo <guofenglinux@gmail.com> wrote: > BTW, I wonder if we should elog an Error here. > > default: > - Assert(false); /* hmm? */ > - return PARTCLAUSE_UNSUPPORTED; > + elog(ERROR, "unrecognized booltesttype: %d", > + (int) btest->booltesttype); > + break; I wondered about that, hence my not-so-commitable comment left in there. My last thoughts were that maybe we should just move the IS_UNKNOWN and IS_NOT_UNKNOWN down into the switch and let -Wall let us know if something is missing. It hardly seems worth keeping the slightly earlier exit for those two cases. That just amounts to the RelabelType check and is this the partition key. I doubt IS[_NOT]_UNKNOWN is common enough for us to warrant contorting the code to make it a few dozen nanoseconds faster. Having smaller code is probably more of a win, which we'd get if we didn't add the ERROR you propose. > Otherwise the patch looks good to me. Thanks for having a look. David
Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition
From
David Kimura
Date:
On Wed, Apr 12, 2023 at 4:13 AM David Rowley <dgrowleyml@gmail.com> wrote: > > There seems to be a bunch of tests checking this already, all of them > assuming the incorrect plans. Given that the plan alone wasn't sufficient to catch this error previously, would it be worthwhile to add some data to the tests to make it abundantly obvious? I had noticed that the default partition seems to be an edge case in the code. Perhaps it's overkill, but would it be worth adding a test where the NULL partition is not the default? Thanks, David
Re: Unexpected (wrong?) result querying boolean partitioned table with NULL partition
From
David Rowley
Date:
On Thu, 13 Apr 2023 at 15:45, David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 13 Apr 2023 at 15:30, Richard Guo <guofenglinux@gmail.com> wrote: > > BTW, I wonder if we should elog an Error here. > > > > default: > > - Assert(false); /* hmm? */ > > - return PARTCLAUSE_UNSUPPORTED; > > + elog(ERROR, "unrecognized booltesttype: %d", > > + (int) btest->booltesttype); > > + break; > > I wondered about that, hence my not-so-commitable comment left in there. > > My last thoughts were that maybe we should just move the IS_UNKNOWN > and IS_NOT_UNKNOWN down into the switch and let -Wall let us know if > something is missing. > > It hardly seems worth keeping the slightly earlier exit for those two > cases. That just amounts to the RelabelType check and is this the > partition key. I doubt IS[_NOT]_UNKNOWN is common enough for us to > warrant contorting the code to make it a few dozen nanoseconds faster. > Having smaller code is probably more of a win, which we'd get if we > didn't add the ERROR you propose. After having looked at the code in more detail, I don't think it's a good idea to move the IS_UNKNOWN and IS_NOT_UNKNOWN down into the switch. Having them tested early means we can return PARTCLAUSE_UNSUPPORTED even when the clause does not match the current partition key. If we moved those into the switch statement, then if the qual didn't match to the partition key, then we'd return PARTCLAUSE_NOMATCH and we'd maybe waste further effort later trying to match the same qual to some other partition key. All I ended up doing was removing the Assert(). I don't really see the need to add an ERROR. It's not like any other value would cause the code to misbehave. We'll just return PARTCLAUSE_UNSUPPORTED and no pruning would get done for that qual. I also struggle to imagine what possible other values we could ever add to BoolTestType. After looking a bit deeper and testing a bit more, I found another bug in match_boolean_partition_clause() around the equal(negate_clause((Node *) leftop), partkey). The code there just always set *outconst to a false Const regardless of is_not_clause. I see the code coverage tool shows that line as untested, so I fixed the bug and wrote some tests to exercise the code. As David Kimura suggested, I also added some data to the tables in question and repeated the same queries again without the EXPLAIN. I generated the expected output with enable_partition_pruning = off then put it back on again and saw that the same results are shown. I considered writing a plpgsql function that we can pass a table name and a query and it goes and makes a temp table, populates it with the query with enable_partition_pruning = off then tries again with pruning on and verifies the results are the same as what's stored in the temp table. I'll maybe go and do that for master only, it's just a bit more than what I wanted to do in the back branches. I've pushed the fix now. Thanks for the report about this, David, and thank you both for the reviews. David David