Thread: BUG #16500: SQL Abend. select multi_key_columns_range_partition_table
BUG #16500: SQL Abend. select multi_key_columns_range_partition_table
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16500 Logged by: Kobayashi Hisanori Email address: hisanori.kobayashi.bp@nttdata.com PostgreSQL version: 12.2 Operating system: Windows10 Description: I found that abend(crash) sql. select ... from multi_key_columns_range_partition_table where key_col1 <= xxx and key_col2 = yyy ; Same for 12.2 and 12.3 I am not good at English, so I will send a reproduction script. ------------------------------ -- ***** execute from psql -- -- ***** Create Test Table drop table if exists pt_test01 ; create table pt_test01 ( kbn smallint not null , nen char(4) not null , mm char(2) not null , cd char(3) not null , val00 numeric(15, 3) , usr varchar(10) ) partition by range(kbn, nen) with (oids=false) ; create table pt_test01_1_2019 partition of pt_test01 for values from (1, '2019') to (1, '2020') ; create table pt_test01_1_2020 partition of pt_test01 for values from (1, '2020') to (1, '2021') ; create table pt_test01_2_2019 partition of pt_test01 for values from (2, '2019') to (2, '2020') ; create table pt_test01_2_2020 partition of pt_test01 for values from (2, '2020') to (2, '2021') ; -- -- ***** Test SQL select * from pt_test01 where kbn <= 1 and nen = '2020' ; -- -- ***** Result:ABEND/Connection Closed -- ***** Message(Japanese) -- サーバとの接続が想定外にクローズされました -- おそらく要求の処理前または処理中にサーバが異常終了 -- したことを意味しています。 -- サーバへの接続が失われました。リセットしています: 失敗。 -- ***** Google translation -- The connection with the server was closed unexpectedly -- Server crashed, probably before or during processing of the request -- It means that you have done it. -- The connection to the server has been lost. Resetting: failed. ------------------------------
On Thu, Jun 18, 2020 at 6:49 PM PG Bug reporting form <noreply@postgresql.org> wrote: > I will send a reproduction script. > drop table if exists pt_test01 ; > create table pt_test01 > ( kbn smallint not null > , nen char(4) not null > , mm char(2) not null > , cd char(3) not null > , val00 numeric(15, 3) > , usr varchar(10) > ) > partition by range(kbn, nen) with (oids=false) > ; > create table pt_test01_1_2019 partition of pt_test01 for values from (1, > '2019') to (1, '2020') ; > create table pt_test01_1_2020 partition of pt_test01 for values from (1, > '2020') to (1, '2021') ; > create table pt_test01_2_2019 partition of pt_test01 for values from (2, > '2019') to (2, '2020') ; > create table pt_test01_2_2020 partition of pt_test01 for values from (2, > '2020') to (2, '2021') ; > select * from pt_test01 where kbn <= 1 and nen = '2020' ; This caused a server crash in my environment as well. Maybe I'm missing something, but ISTM that it's odd to allow partitions with empty range bounds. I'll dig into this (and another one Kobayashi-san reported). Best regards, Etsuro Fujita
> On Thu, Jun 18, 2020 at 10:05:12PM +0900, Etsuro Fujita wrote: > On Thu, Jun 18, 2020 at 6:49 PM PG Bug reporting form > <noreply@postgresql.org> wrote: > > I will send a reproduction script. > > > drop table if exists pt_test01 ; > > create table pt_test01 > > ( kbn smallint not null > > , nen char(4) not null > > , mm char(2) not null > > , cd char(3) not null > > , val00 numeric(15, 3) > > , usr varchar(10) > > ) > > partition by range(kbn, nen) with (oids=false) > > ; > > create table pt_test01_1_2019 partition of pt_test01 for values from (1, > > '2019') to (1, '2020') ; > > create table pt_test01_1_2020 partition of pt_test01 for values from (1, > > '2020') to (1, '2021') ; > > create table pt_test01_2_2019 partition of pt_test01 for values from (2, > > '2019') to (2, '2020') ; > > create table pt_test01_2_2020 partition of pt_test01 for values from (2, > > '2020') to (2, '2021') ; > > > select * from pt_test01 where kbn <= 1 and nen = '2020' ; > > This caused a server crash in my environment as well. Maybe I'm > missing something, but ISTM that it's odd to allow partitions with > empty range bounds. I'll dig into this (and another one Kobayashi-san > reported). Yep, I also can reproduce it easily. Can't say I'm familiar with this code, but after looking through it a bit it seems something strange is going on with a pruning step generated using prefix. Looks like it has only expressions for the second attribute of the partition key, but tries to compare with the full boundinfo. Without this pruning step the query above doesn't crash for me.
> On Thu, Jun 18, 2020 at 06:44:24PM +0200, Dmitry Dolgov wrote: > > Yep, I also can reproduce it easily. Can't say I'm familiar with this > code, but after looking through it a bit it seems something strange is > going on with a pruning step generated using prefix. Looks like it has > only expressions for the second attribute of the partition key, but > tries to compare with the full boundinfo. Without this pruning step the > query above doesn't crash for me. After some more investigation it seems that the issue is an empty prefix in the code mentioned in my previous email. When it's constructed BTGreaterStrategyNumber is excluded, which makes the logic somehow dependend on the order or clauses (e.g. the original reported case would be fine for a table with those two colums in other order), it looks strange for me. Not sure if the attached fix is correct, but it allows to avoid empty prefix, avoiding the crash, and passes the tests.
Attachment
On Wed, Jul 1, 2020 at 9:53 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > On Thu, Jun 18, 2020 at 06:44:24PM +0200, Dmitry Dolgov wrote: > > Yep, I also can reproduce it easily. Can't say I'm familiar with this > > code, but after looking through it a bit it seems something strange is > > going on with a pruning step generated using prefix. Looks like it has > > only expressions for the second attribute of the partition key, but > > tries to compare with the full boundinfo. Without this pruning step the > > query above doesn't crash for me. > > After some more investigation it seems that the issue is an empty prefix > in the code mentioned in my previous email. When it's constructed > BTGreaterStrategyNumber is excluded, which makes the logic somehow > dependend on the order or clauses (e.g. the original reported case would > be fine for a table with those two colums in other order), it looks > strange for me. Not sure if the attached fix is correct, but it allows > to avoid empty prefix, avoiding the crash, and passes the tests. Thanks for the patch, Dmitry! It applies cleanly and make check passes successfully. I'l look at the patch more closely tomorrow. Sorry for the late response. I was really busy with other jobs. Best regards, Etsuro Fujita
On Thu, Jul 2, 2020 at 8:52 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > I'll look at the patch more closely tomorrow. I spent some time reviewing the patch, and noticed that the changes to gen_prune_steps_from_opexps() break the logic in get_matching_range_bounds(), causing another issue. Here is an example: postgres=# create table prefix_test (a int, b varchar) partition by range (a, b); postgres=# create table prefix_test1 partition of prefix_test for values from (1, 'a') to (1, 'b'); postgres=# create table prefix_test2 partition of prefix_test for values from (2, 'a') to (2, 'b'); postgres=# set enable_partition_pruning to on; postgres=# explain select * from prefix_test where a <= 2 and b = 'a'; QUERY PLAN -------------------------------------------------------------------------- Seq Scan on prefix_test2 prefix_test (cost=0.00..29.05 rows=2 width=36) Filter: ((a <= 2) AND ((b)::text = 'a'::text)) (2 rows) Will do a bit more investigation about this. Best regards, Etsuro Fujita
On Sat, Jul 4, 2020 at 2:35 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > I spent some time reviewing the patch, and noticed that the changes to > gen_prune_steps_from_opexps() break the logic in > get_matching_range_bounds(), causing another issue. Here is an > example: > > postgres=# create table prefix_test (a int, b varchar) partition by > range (a, b); > postgres=# create table prefix_test1 partition of prefix_test for > values from (1, 'a') to (1, 'b'); > postgres=# create table prefix_test2 partition of prefix_test for > values from (2, 'a') to (2, 'b'); > postgres=# set enable_partition_pruning to on; > > postgres=# explain select * from prefix_test where a <= 2 and b = 'a'; > QUERY PLAN > -------------------------------------------------------------------------- > Seq Scan on prefix_test2 prefix_test (cost=0.00..29.05 rows=2 width=36) > Filter: ((a <= 2) AND ((b)::text = 'a'::text)) > (2 rows) > > Will do a bit more investigation about this. I think get_matching_range_bounds() assumes that if opstrategy (ie, the strategy of the last expression of the given lookup key) is the = strategy, all the expressions of the lookup key have the = strategy. I’m not sure we can extend that function to support cases where preceding expressions have strategies other than the = strategy like a <= 2 and b = ’a’ in the above (or a <= 1 and b = ’a’ in your patch), but if so, doing that seems to me more like an improvement than a fix. I think a simple fix for this issue would be to just give up on generating pruning steps if prefix contains no clauses, like the attached. Best regards, Etsuro Fujita
Attachment
> On Sun, Jul 05, 2020 at 04:45:40PM +0900, Etsuro Fujita wrote: > On Sat, Jul 4, 2020 at 2:35 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > I spent some time reviewing the patch, and noticed that the changes to > > gen_prune_steps_from_opexps() break the logic in > > get_matching_range_bounds(), causing another issue. Here is an > > example: > > > > postgres=# create table prefix_test (a int, b varchar) partition by > > range (a, b); > > postgres=# create table prefix_test1 partition of prefix_test for > > values from (1, 'a') to (1, 'b'); > > postgres=# create table prefix_test2 partition of prefix_test for > > values from (2, 'a') to (2, 'b'); > > postgres=# set enable_partition_pruning to on; > > > > postgres=# explain select * from prefix_test where a <= 2 and b = 'a'; > > QUERY PLAN > > -------------------------------------------------------------------------- > > Seq Scan on prefix_test2 prefix_test (cost=0.00..29.05 rows=2 width=36) > > Filter: ((a <= 2) AND ((b)::text = 'a'::text)) > > (2 rows) > > > > Will do a bit more investigation about this. > > I think get_matching_range_bounds() assumes that if opstrategy (ie, > the strategy of the last expression of the given lookup key) is the = > strategy, all the expressions of the lookup key have the = strategy. > I’m not sure we can extend that function to support cases where > preceding expressions have strategies other than the = strategy like a > <= 2 and b = ’a’ in the above (or a <= 1 and b = ’a’ in your patch), > but if so, doing that seems to me more like an improvement than a fix. > I think a simple fix for this issue would be to just give up on > generating pruning steps if prefix contains no clauses, like the > attached. Yes, I agree. Not generating any pruning steps if prefix has no clauses was my first idea, but looking at attached patch I've apparently missed one part in the implementation and was under the false impression it wouldn't work.
On Mon, Jul 6, 2020 at 1:55 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > On Sun, Jul 05, 2020 at 04:45:40PM +0900, Etsuro Fujita wrote: > > I think a simple fix for this issue would be to just give up on > > generating pruning steps if prefix contains no clauses, like the > > attached. > > Yes, I agree. Not generating any pruning steps if prefix has no clauses > was my first idea, but looking at attached patch I've apparently missed > one part in the implementation and was under the false impression it > wouldn't work. I think the fix I proposed handles both cases reported in BUG #16500 and BUG #16501, but I noticed there is a case that the fix doesn’t cover. Here is an example: postgres=# create table rp_prefix_test2 (a int, b int, c int) partition by range (a, b, c); postgres=# create table rp_prefix_test2_p1 partition of rp_prefix_test2 for values from (1, 1, 0) to (1, 1, 10); postgres=# create table rp_prefix_test2_p2 partition of rp_prefix_test2 for values from (2, 2, 0) to (2, 2, 10); postgres=# select * from rp_prefix_test2 where a <= 1 and b = 1 and c >= 0; Even with the fix, this fails an assertion defined in get_steps_using_prefix(). The reason for that is: that function assumes that the passed-in prefix contains at least one clause for each earlier partition key, but the caller (i.e., gen_prune_steps_from_opexps()) doesn’t take it into account. In the example, get_steps_using_prefix() is called with c >= 0 plus b = 1 as a prefix, which doesn’t contain a clause for the first partition key "a", breaking that assumption, which leads to the error. To fix, I modified the previous patch further so that get_steps_using_prefix() is called only when prefix created contains clauses for each earlier partition key. While at it, I simplified the preprocessing to sort out btree clauses, because the preprocessing would not be useful anymore, and because keeping the preprocessing as-is would only reduce the chance of getting more partition pruning steps. Attached is an updated version of the patch. Maybe I'm missing something, but I suspect that hash partitioning would also have a similar issue, but I didn’t look at it closely. Will do. Thanks for reviewing! Best regards, Etsuro Fujita
Attachment
On Wed, Jul 8, 2020 at 3:15 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > Maybe I'm missing something, but I > suspect that hash partitioning would also have a similar issue, Here is an example causing the same assertion failure: create table hp_prefix_test (a int, b int, c int, d int) partition by hash (a part_test_int4_ops, b part_test_int4_ops, c part_test_int4_ops, d part_test_int4_ops); create table hp_prefix_test_p1 partition of hp_prefix_test for values with (modulus 2, remainder 0); create table hp_prefix_test_p2 partition of hp_prefix_test for values with (modulus 2, remainder 1); explain (costs off) select * from hp_prefix_test where a = 1 and b is null and c = 1 and d = 1; where part_test_int4_ops is borrowed from insert.sql. For hash partitioning, I think prefix is allowed to contain no clauses for all/any of earlier partition keys unlike range partitioning, so I modified the assertion test to avoid the failure. While working on it, I noticed there is yet another issue in generating pruning steps. This is the comment for get_steps_using_prefix(): * To generate steps, step_lastexpr and step_lastcmpfn are appended to * expressions and cmpfns, respectively, extracted from the clauses in * 'prefix'. Actually, since 'prefix' may contain multiple clauses for the * same partition key column, we must generate steps for various combinations * of the clauses of different keys. But part of that function assumes that prefix contains at most one clause for each of middle partition keys, which causes the same assertion failure when there are multiple clauses for the middle partition keys in prefix. Here is an example using range partitioning causing the failure: create table rp_prefix_test3 (a int, b int, c int, d int) partition by range(a, b, c, d); create table rp_prefix_test3_p1 partition of rp_prefix_test3 for values from (1, 1, 1, 0) to (1, 1, 1, 10); create table rp_prefix_test3_p2 partition of rp_prefix_test3 for values from (2, 2, 2, 0) to (2, 2, 2, 10); explain (costs off) select * from rp_prefix_test3 where a >= 1 and b >= 1 and b >= 2 and c >= 2 and d >= 0; To fix, I modified that function (precisely, get_steps_using_prefix_recurse()) to allow the middle partition keys also to have multiple clauses in prefix. Attached is an updated version of the patch. Best regards, Etsuro Fujita
Attachment
On Sun, Jul 12, 2020 at 9:55 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > Attached is an updated version of the patch. Here is a new version of the patch. Changes: * Fix a typo in a comment in get_steps_using_prefix_recurse() * Remove redundant regression test cases * Add a regression test comment * Add the commit message If there are no objections, I will commit this version. Best regards, Etsuro Fujita
Attachment
> On Wed, Jul 15, 2020 at 04:35:52PM +0900, Etsuro Fujita wrote: > On Sun, Jul 12, 2020 at 9:55 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > Attached is an updated version of the patch. > > Here is a new version of the patch. > > Changes: > * Fix a typo in a comment in get_steps_using_prefix_recurse() > * Remove redundant regression test cases > * Add a regression test comment > * Add the commit message > > If there are no objections, I will commit this version. Thanks for the patch! Sorry, forgot to follow up in time. To the best of my knowledge the patch looks good, I have just a couple of questions: + List *step_exprs1, + *step_cmpfns1; pc = lfirst(lc); if (pc->keyno == cur_keyno) { - /* clean up before starting a new recursion cycle. */ - if (cur_keyno == 0) - { - list_free(step_exprs); - list_free(step_cmpfns); - step_exprs = list_make1(pc->expr); - step_cmpfns = list_make1_oid(pc->cmpfn); - } - else - { - step_exprs = lappend(step_exprs, pc->expr); - step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn); - } + /* Leave the original step_exprs unmodified. */ + step_exprs1 = list_copy(step_exprs); + step_exprs1 = lappend(step_exprs1, pc->expr); + + /* Leave the original step_cmpfns unmodified. */ + step_cmpfns1 = list_copy(step_cmpfns); + step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn); Does this also belong to some of mentioned fixes? I'm curious if there could be memory concumption concerns, since it's done within a recursive function. - Assert(list_length(step_exprs) == cur_keyno); + Assert(list_length(step_exprs) == cur_keyno || + (context->rel->part_scheme->strategy == + PARTITION_STRATEGY_HASH && + step_opstrategy == HTEqualStrategyNumber && + !bms_is_empty(step_nullkeys) && + bms_num_members(step_nullkeys) + list_length(step_exprs) + 2 == + context->rel->part_scheme->partnatts)); I see the explanation in the commit message, but this assert condition is quite complex, maybe worth couple of commentary lines right there?
On Wed, Jul 15, 2020 at 7:00 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > I have just a couple of questions: > > + List *step_exprs1, > + *step_cmpfns1; > > pc = lfirst(lc); > if (pc->keyno == cur_keyno) > { > - /* clean up before starting a new recursion cycle. */ > - if (cur_keyno == 0) > - { > - list_free(step_exprs); > - list_free(step_cmpfns); > - step_exprs = list_make1(pc->expr); > - step_cmpfns = list_make1_oid(pc->cmpfn); > - } > - else > - { > - step_exprs = lappend(step_exprs, pc->expr); > - step_cmpfns = lappend_oid(step_cmpfns, pc->cmpfn); > - } > + /* Leave the original step_exprs unmodified. */ > + step_exprs1 = list_copy(step_exprs); > + step_exprs1 = lappend(step_exprs1, pc->expr); > + > + /* Leave the original step_cmpfns unmodified. */ > + step_cmpfns1 = list_copy(step_cmpfns); > + step_cmpfns1 = lappend_oid(step_cmpfns1, pc->cmpfn); > > Does this also belong to some of mentioned fixes? Yes, the above change is made to address this issue mentioned in the commit message in the attached, which is an updated version of the patch: * The list to pass to get_steps_using_prefix() is allowed to contain multiple clauses for the same partition key, as described in the comment for that function, but that function actually assumed that the list contained just a single clause for each of middle partition keys, which led to an assertion failure when the list contained multiple clauses for such partition keys. Update that function to match the comment. > I'm curious if there > could be memory concumption concerns, since it's done within a recursive > function. I didn't think so because I thought that in practice, the number of partition keys, the number of clauses for the same partition key, and thus the number of combinations of clauses for partition keys are small, but yes, it's a good thing to save memory, so I added the list_free() calls to the above change. > - Assert(list_length(step_exprs) == cur_keyno); > + Assert(list_length(step_exprs) == cur_keyno || > + (context->rel->part_scheme->strategy == > + PARTITION_STRATEGY_HASH && > + step_opstrategy == HTEqualStrategyNumber && > + !bms_is_empty(step_nullkeys) && > + bms_num_members(step_nullkeys) + list_length(step_exprs) + 2 == > + context->rel->part_scheme->partnatts)); > > I see the explanation in the commit message, but this assert condition > is quite complex, maybe worth couple of commentary lines right there? OK, to improve the readability, I split the assertion test into three, and added comments. One thing I noticed related to the assertion test is that there are still cases where we fail to pass the "bms_num_members(step_nullkeys) + list_length(step_exprs) + 2 == context->rel->part_scheme->partnatts" test. Here is an example (Here, I use a custom operator === to prevent the equivclass.c machinery from reordering clauses. See the attached for the definition of it): create table hp_contradict_test (a int, b int) partition by hash (a part_test_int4_ops2, b part_test_int4_ops2); create table hp_contradict_test_p1 partition of hp_contradict_test for values with (modulus 2, remainder 0); create table hp_contradict_test_p2 partition of hp_contradict_test for values with (modulus 2, remainder 1); explain (costs off) select * from hp_contradict_test where a is null and a === 1 and b === 1; QUERY PLAN -------------------------- Result One-Time Filter: false (2 rows) This works fine, BUT: explain (costs off) select * from hp_contradict_test where a === 1 and b === 1 and a is null; this causes the failure to pass the mentioned test because for the latter query, we fail to detect self-contradiction from "a === 1" and "a is null", and then perform get_steps_using_prefix() with prefix containing "a === 1" and step_nullkeys containing the partition key "a", causing the failure. Fortunately, that doesn't cause any issue on a normal build, as the self-contradiction is detected later in the query processing. I fixed this issue as well in the attached, by adding to get_partprune_steps() a little bit of code to detect the self-contradiction. Thanks for the review! Sorry for the late response. Best regards, Etsuro Fujita
Attachment
On Fri, Jul 24, 2020 at 11:45 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > which is an updated version of the > patch: There seems to be no objections from Dmitry (or anyone else), so I committed the patch after tweaking/fixing some comments and fixing typos in the commit message. Best regards, Etsuro Fujita