Thread: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table
BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 15672 Logged by: Jianing Yang Email address: jianingy.yang@gmail.com PostgreSQL version: 11.2 Operating system: Ubuntu 18.10 Description: Reproduce steps: 1. create a partition table with the following constraints a. with a unique key on partition key and a varchar type field b. using hash partition 2. alter the length of the varchar type field 3. drop the partition table in a transaction 4. crash Screenshot: → pgcli -h localhost -p 5555 -U postgres -W Server: PostgreSQL 11.2 Version: 2.0.1 Chat: https://gitter.im/dbcli/pgcli Mail: https://groups.google.com/forum/#!forum/pgcli Home: http://pgcli.com postgres@localhost:postgres> show server_version; +-------------------------------+ | server_version | |-------------------------------| | 11.2 (Debian 11.2-1.pgdg90+1) | +-------------------------------+ SHOW Time: 0.011s postgres@localhost:postgres> create table users(user_id int, name varchar (64), unique (user_id, name)) partition by hash(user_id); CREATE TABLE Time: 0.004s postgres@localhost:postgres> create table users_000 partition of users for values with (modulus 2, remainder 0); CREATE TABLE Time: 0.012s postgres@localhost:postgres> create table users_001 partition of users for values with (modulus 2, remainder 1); CREATE TABLE Time: 0.012s postgres@localhost:postgres> alter table users alter column name type varchar(127); You're about to run a destructive command. Do you want to proceed? (y/n): y Your call! ALTER TABLE Time: 0.007s postgres@localhost:postgres> \d users; +----------+------------------------+-------------+ | Column | Type | Modifiers | |----------+------------------------+-------------| | user_id | integer | | | name | character varying(127) | | +----------+------------------------+-------------+ Indexes: "users_user_id_name_key" UNIQUE CONSTRAINT, btree (user_id, name) Partition key: HASH (user_id) Number of partitions 2: (Use \d+ to list them.) Time: 0.012s postgres@localhost:postgres> begin; BEGIN Time: 0.001s postgres@localhost:postgres> drop table users; You're about to run a destructive command. Do you want to proceed? (y/n): y Your call! DROP TABLE Time: 0.002s postgres@localhost:postgres> commit; Connection reset. Reconnect (Y/n): Server Log: 2019-03-06 14:59:26.101 UTC [61] ERROR: SMgrRelation hashtable corrupted 2019-03-06 14:59:26.101 UTC [61] STATEMENT: commit 2019-03-06 14:59:26.101 UTC [61] WARNING: AbortTransaction while in COMMIT state 2019-03-06 14:59:26.101 UTC [61] PANIC: cannot abort transaction 573, it was already committed 2019-03-06 14:59:26.178 UTC [1] LOG: server process (PID 61) was terminated by signal 6: Aborted 2019-03-06 14:59:26.178 UTC [1] DETAIL: Failed process was running: commit 2019-03-06 14:59:26.178 UTC [1] LOG: terminating any other active server processes 2019-03-06 14:59:26.178 UTC [58] WARNING: terminating connection because of crash of another server process 2019-03-06 14:59:26.178 UTC [58] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2019-03-06 14:59:26.178 UTC [58] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2019-03-06 14:59:26.179 UTC [1] LOG: all server processes terminated; reinitializing 2019-03-06 14:59:26.186 UTC [68] LOG: database system was interrupted; last known up at 2019-03-06 14:58:30 UTC 2019-03-06 14:59:26.212 UTC [68] LOG: database system was not properly shut down; automatic recovery in progress 2019-03-06 14:59:26.214 UTC [68] LOG: redo starts at 0/1650278 2019-03-06 14:59:26.215 UTC [68] FATAL: SMgrRelation hashtable corrupted 2019-03-06 14:59:26.215 UTC [68] CONTEXT: WAL redo at 0/16768C8 for Transaction/COMMIT: 2019-03-06 14:59:26.099739+00; rels: base/13067/16389 base/13067/16387 base/13067/16394 base/13067/16387; inval msgs: catcache 41 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 50 catcache 49 catcache 19 catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 50 catcache 49 catcache 74 catcache 73 catcache 74 catcache 73 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 50 catcache 49 catcache 19 catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 50 catcache 49 catcache 74 catcache 73 catcache 74 catcache 73 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 7 catcache 6 catcache 50 catcache 49 catcache 19 catcache 32 catcache 7 catcache 6 catcache 7 catcache 6 catcache 50 catcache 49 catcache 74 catcache 73 catcache 74 catcache 73 relcache 16384 snapshot 2608 snapshot 2608 relcache 16399 relcache 16384 snapshot 2608 snapshot 2608 snapshot 2608 relcache 16389 relcache 16384 snapshot 2608 snapshot 2608 relcache 16401 relcache 16389 snapshot 2608 snapshot 2608 snapshot 2608 relcache 16394 relcache 16384 snapshot 2608 snapshot 2608 relcache 16403 relcache 16394 snapshot 2608 snapshot 2608 snapshot 2608 2019-03-06 14:59:26.216 UTC [1] LOG: startup process (PID 68) exited with exit code 1 2019-03-06 14:59:26.216 UTC [1] LOG: aborting startup due to startup process failure 2019-03-06 14:59:26.217 UTC [1] LOG: database system is shut down Affected Server Version: 11.1 11.2
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Michael Paquier
Date:
On Wed, Mar 06, 2019 at 03:06:53PM +0000, PG Bug reporting form wrote: > 1. create a partition table with the following constraints > a. with a unique key on partition key and a varchar type field > b. using hash partition > 2. alter the length of the varchar type field > 3. drop the partition table in a transaction > 4. crash I can reproduce the failure easily, not on HEAD but with REL_11_STABLE: (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f585729b535 in __GI_abort () at abort.c:79 #2 0x000055eef597e60a in errfinish (dummy=0) at elog.c:555 #3 0x000055eef5980c50 in elog_finish (elevel=22, fmt=0x55eef5a41408 "cannot abort transaction %u, it was already committed") at elog.c:1376 #4 0x000055eef5479647 in RecordTransactionAbort (isSubXact=false) at xact.c:1580 #5 0x000055eef547a6c0 in AbortTransaction () at xact.c:2602 #6 0x000055eef547aef4 in AbortCurrentTransaction () at xact.c:3104 That's worth an investigation, SMgrRelationHash is getting messed up which causes the transaction commit to fail where it should not. -- Michael
Attachment
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
Hi, On 2019/03/07 9:03, Michael Paquier wrote: > On Wed, Mar 06, 2019 at 03:06:53PM +0000, PG Bug reporting form wrote: >> 1. create a partition table with the following constraints >> a. with a unique key on partition key and a varchar type field >> b. using hash partition >> 2. alter the length of the varchar type field >> 3. drop the partition table in a transaction >> 4. crash > > I can reproduce the failure easily, not on HEAD but with > REL_11_STABLE: Same here. I could reproduce it with 11.0. > (gdb) bt > #0 __GI_raise (sig=sig@entry=6) at > ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007f585729b535 in __GI_abort () at abort.c:79 > #2 0x000055eef597e60a in errfinish (dummy=0) at elog.c:555 > #3 0x000055eef5980c50 in elog_finish (elevel=22, fmt=0x55eef5a41408 > "cannot abort transaction %u, it was already committed") at > elog.c:1376 > #4 0x000055eef5479647 in RecordTransactionAbort (isSubXact=false) at > xact.c:1580 > #5 0x000055eef547a6c0 in AbortTransaction () at xact.c:2602 > #6 0x000055eef547aef4 in AbortCurrentTransaction () at xact.c:3104 > > That's worth an investigation, SMgrRelationHash is getting messed up > which causes the transaction commit to fail where it should not. Looking at what was causing the SMgrRelationHash corruption, it seems there were entries with same/duplicated relnode value in pendingDeletes list. The problem start when ALTER TABLE users ALTER COLUMN is executed. create table users(user_id int, name varchar(64), unique (user_id, name)) partition by list(user_id); create table users_000 partition of users for values in(0); create table users_001 partition of users for values in(1); select relname, relfilenode from pg_class where relname like 'users%'; relname │ relfilenode ────────────────────────────┼───────────── users │ 16441 users_000 │ 16446 users_000_user_id_name_key │ 16449 users_001 │ 16451 users_001_user_id_name_key │ 16454 users_user_id_name_key │ 16444 (6 rows) alter table users alter column name type varchar(127); select relname, relfilenode from pg_class where relname like 'users%'; relname │ relfilenode ────────────────────────────┼───────────── users │ 16441 users_000 │ 16446 users_000_user_id_name_key │ 16444 <=== duplicated users_001 │ 16451 users_001_user_id_name_key │ 16444 <=== duplicated users_user_id_name_key │ 16444 <=== duplicated (6 rows) Ran out off time... Thanks, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On Thu, Mar 7, 2019 at 11:17 AM Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > The problem start when ALTER TABLE users ALTER COLUMN is executed. > create table users(user_id int, name varchar(64), unique (user_id, name)) > partition by list(user_id); > > create table users_000 partition of users for values in(0); > create table users_001 partition of users for values in(1); > select relname, relfilenode from pg_class where relname like 'users%'; > relname │ relfilenode > ────────────────────────────┼───────────── > users │ 16441 > users_000 │ 16446 > users_000_user_id_name_key │ 16449 > users_001 │ 16451 > users_001_user_id_name_key │ 16454 > users_user_id_name_key │ 16444 > (6 rows) > > alter table users alter column name type varchar(127); > select relname, relfilenode from pg_class where relname like 'users%'; > relname │ relfilenode > ────────────────────────────┼───────────── > users │ 16441 > users_000 │ 16446 > users_000_user_id_name_key │ 16444 <=== duplicated > users_001 │ 16451 > users_001_user_id_name_key │ 16444 <=== duplicated > users_user_id_name_key │ 16444 <=== duplicated > (6 rows) I checked why users_000's and user_0001's indexes end up reusing users_user_id_name_key's relfilenode. At the surface, it's because DefineIndex(<parent's-index-to-be-recreated>) is carrying oldNode = <old-parents-index's-relfilenode> in IndexStmt, which is recursively passed down to DefineIndex(<child-indexes-to-be-recreated>). This DefineIndex() chain is running due to ATPostAlterTypeCleanup() on the parent rel. This surface problem may be solved in DefineIndex() by just resetting oldNode in each child IndexStmt before recursing, but that means child indexes are recreated with new relfilenodes. That solves the immediate problem of relfilenodes being wrongly duplicated, that's leading to madness such as SMgrRelationHash corruption being seen in the original bug report. But, the root problem seems to be that ATPostAlterTypeCleanup() on child tables isn't setting up their own DefineIndex(<child-index-to-be-rewritten>) step. That's because the parent's ATPostAlterTypeCleanup() dropped child copies of the UNIQUE constraint due to dependencies (+ CCI). So, ATExecAlterColumnType() on child relations isn't able to find the constraint on the individual child relations to turn into their own DefineIndex(<child-index-to-be-rewritten>). If we manage to handle each relation's ATPostAlterTypeCleanup() independently, child's recreated indexes will be able to reuse their old relfilenodes and everything will be fine. But maybe that will require significant overhaul of how this post-alter-type-cleanup occurs? Thanks, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/03/07 20:36, Amit Langote wrote: > On Thu, Mar 7, 2019 at 11:17 AM Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> The problem start when ALTER TABLE users ALTER COLUMN is executed. >> create table users(user_id int, name varchar(64), unique (user_id, name)) >> partition by list(user_id); >> >> create table users_000 partition of users for values in(0); >> create table users_001 partition of users for values in(1); >> select relname, relfilenode from pg_class where relname like 'users%'; >> relname │ relfilenode >> ────────────────────────────┼───────────── >> users │ 16441 >> users_000 │ 16446 >> users_000_user_id_name_key │ 16449 >> users_001 │ 16451 >> users_001_user_id_name_key │ 16454 >> users_user_id_name_key │ 16444 >> (6 rows) >> >> alter table users alter column name type varchar(127); >> select relname, relfilenode from pg_class where relname like 'users%'; >> relname │ relfilenode >> ────────────────────────────┼───────────── >> users │ 16441 >> users_000 │ 16446 >> users_000_user_id_name_key │ 16444 <=== duplicated >> users_001 │ 16451 >> users_001_user_id_name_key │ 16444 <=== duplicated >> users_user_id_name_key │ 16444 <=== duplicated >> (6 rows) > > I checked why users_000's and user_0001's indexes end up reusing > users_user_id_name_key's relfilenode. At the surface, it's because > DefineIndex(<parent's-index-to-be-recreated>) is carrying oldNode = > <old-parents-index's-relfilenode> in IndexStmt, which is recursively > passed down to DefineIndex(<child-indexes-to-be-recreated>). This > DefineIndex() chain is running due to ATPostAlterTypeCleanup() on the > parent rel. This surface problem may be solved in DefineIndex() by > just resetting oldNode in each child IndexStmt before recursing, but > that means child indexes are recreated with new relfilenodes. That > solves the immediate problem of relfilenodes being wrongly duplicated, > that's leading to madness such as SMgrRelationHash corruption being > seen in the original bug report. This doesn't happen in HEAD, because in HEAD we got 807ae415c5, which changed things so that partitioned relations always have their relfilenode set to 0. So, there's no question of parent's relfilenode being passed to children and hence being duplicated. But that also means child indexes are unnecessarily rewritten, that is, with new relfilenodes. > But, the root problem seems to be that ATPostAlterTypeCleanup() on > child tables isn't setting up their own > DefineIndex(<child-index-to-be-rewritten>) step. That's because the > parent's ATPostAlterTypeCleanup() dropped child copies of the UNIQUE > constraint due to dependencies (+ CCI). So, ATExecAlterColumnType() > on child relations isn't able to find the constraint on the individual > child relations to turn into their own > DefineIndex(<child-index-to-be-rewritten>). If we manage to handle > each relation's ATPostAlterTypeCleanup() independently, child's > recreated indexes will be able to reuse their old relfilenodes and > everything will be fine. But maybe that will require significant > overhaul of how this post-alter-type-cleanup occurs? We could try to solve this dividing ATPostAlterTypeCleanup processing into two functions: 1 The first one runs right after ATExecAlterColumnType() is finished for a given table (like it does today), which then runs ATPostAlterTypeParse to generate commands for constraints and/or indexes to re-add. This part won't drop the old constraints/indexes just yet, so child constraints/indexes will remain for ATExecAlterColumnType to see when executed for the children. 2. Dropping the old constraints/indexes is the responsibility of the 2nd part, which runs just before executing ATExecReAddIndex or ATExecAddConstraint (with is_readd = true), so that the new constraints don't collide with the existing ones. This arrangement allows step 1 to generate the commands to recreate even the child indexes such that the old relfilenode can be be preserved by setting IndexStmt.oldNode. Attached patch is a very rough sketch, which fails some regression tests, but I ran out of time today. Thanks, Amit
Attachment
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Michael Paquier
Date:
On Wed, Mar 27, 2019 at 11:56:20AM +0900, Michael Paquier wrote: > Adding it to the section for older bugs sounds fine to me. Thanks for > doing so. I have begun looking at this issue. Hopefully I'll be able to provide an update soon. -- Michael
Attachment
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Michael Paquier
Date:
On Wed, Mar 27, 2019 at 11:56:20AM +0900, Michael Paquier wrote: > Adding it to the section for older bugs sounds fine to me. Thanks for > doing so. I have begun looking at this issue. Hopefully I'll be able to provide an update soon. -- Michael
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/04/11 15:57, Michael Paquier wrote: > On Wed, Mar 27, 2019 at 11:56:20AM +0900, Michael Paquier wrote: >> Adding it to the section for older bugs sounds fine to me. Thanks for >> doing so. > > I have begun looking at this issue. Hopefully I'll be able to provide > an update soon. Great, thanks. Regards, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/04/11 15:57, Michael Paquier wrote: > On Wed, Mar 27, 2019 at 11:56:20AM +0900, Michael Paquier wrote: >> Adding it to the section for older bugs sounds fine to me. Thanks for >> doing so. > > I have begun looking at this issue. Hopefully I'll be able to provide > an update soon. Great, thanks. Regards, Amit
ISTM we could work around the problem with the attached, which I think is a good change independently of anything else. There is still an issue, which manifests in both 11 and HEAD, namely that the code also propagates the parent index's comment to any child indexes. You can see that with this extended test case: create table users(user_id int, name varchar(64), unique (user_id, name)) partition by hash(user_id); comment on index users_user_id_name_key is 'parent index'; create table users_000 partition of users for values with (modulus 2, remainder 0); create table users_001 partition of users for values with (modulus 2, remainder 1); select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'users%'; alter table users alter column name type varchar(127); select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'users%'; which gives me (in 11, with this patch) ... relname | relfilenode | obj_description ----------------------------+-------------+----------------- users | 89389 | users_000 | 89394 | users_000_user_id_name_key | 89397 | users_001 | 89399 | users_001_user_id_name_key | 89402 | users_user_id_name_key | 89392 | parent index (6 rows) ALTER TABLE relname | relfilenode | obj_description ----------------------------+-------------+----------------- users | 89389 | users_000 | 89394 | users_000_user_id_name_key | 89406 | parent index users_001 | 89399 | users_001_user_id_name_key | 89408 | parent index users_user_id_name_key | 89404 | parent index (6 rows) However, I doubt that that's bad enough to justify a major rewrite of the ALTER TABLE code in 11 ... and maybe not in HEAD either; I wouldn't be too unhappy to leave it to v13. regards, tom lane diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index f8ee4b0..c5c543f 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -252,8 +252,17 @@ CheckIndexCompatible(Oid oldId, if (!ret) return false; - /* For polymorphic opcintype, column type changes break compatibility. */ + /* We now need to actually look at the index rel. */ irel = index_open(oldId, AccessShareLock); /* caller probably has a lock */ + + /* If it's a partitioned index, there is no storage to share. */ + if (irel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + { + index_close(irel, NoLock); + return false; + } + + /* For polymorphic opcintype, column type changes break compatibility. */ for (i = 0; i < old_natts; i++) { if (IsPolymorphicType(get_opclass_input_type(classObjectId[i])) &&
ISTM we could work around the problem with the attached, which I think is a good change independently of anything else. There is still an issue, which manifests in both 11 and HEAD, namely that the code also propagates the parent index's comment to any child indexes. You can see that with this extended test case: create table users(user_id int, name varchar(64), unique (user_id, name)) partition by hash(user_id); comment on index users_user_id_name_key is 'parent index'; create table users_000 partition of users for values with (modulus 2, remainder 0); create table users_001 partition of users for values with (modulus 2, remainder 1); select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'users%'; alter table users alter column name type varchar(127); select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'users%'; which gives me (in 11, with this patch) ... relname | relfilenode | obj_description ----------------------------+-------------+----------------- users | 89389 | users_000 | 89394 | users_000_user_id_name_key | 89397 | users_001 | 89399 | users_001_user_id_name_key | 89402 | users_user_id_name_key | 89392 | parent index (6 rows) ALTER TABLE relname | relfilenode | obj_description ----------------------------+-------------+----------------- users | 89389 | users_000 | 89394 | users_000_user_id_name_key | 89406 | parent index users_001 | 89399 | users_001_user_id_name_key | 89408 | parent index users_user_id_name_key | 89404 | parent index (6 rows) However, I doubt that that's bad enough to justify a major rewrite of the ALTER TABLE code in 11 ... and maybe not in HEAD either; I wouldn't be too unhappy to leave it to v13. regards, tom lane diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index f8ee4b0..c5c543f 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -252,8 +252,17 @@ CheckIndexCompatible(Oid oldId, if (!ret) return false; - /* For polymorphic opcintype, column type changes break compatibility. */ + /* We now need to actually look at the index rel. */ irel = index_open(oldId, AccessShareLock); /* caller probably has a lock */ + + /* If it's a partitioned index, there is no storage to share. */ + if (irel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) + { + index_close(irel, NoLock); + return false; + } + + /* For polymorphic opcintype, column type changes break compatibility. */ for (i = 0; i < old_natts; i++) { if (IsPolymorphicType(get_opclass_input_type(classObjectId[i])) &&
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
Thanks for looking at this. On 2019/04/24 7:03, Tom Lane wrote: > ISTM we could work around the problem with the attached, which I think > is a good change independently of anything else. Agreed. > There is still an issue, which manifests in both 11 and HEAD, namely > that the code also propagates the parent index's comment to any child > indexes. You can see that with this extended test case: > > create table users(user_id int, name varchar(64), unique (user_id, name)) partition by hash(user_id); > comment on index users_user_id_name_key is 'parent index'; > create table users_000 partition of users for values with (modulus 2, remainder 0); > create table users_001 partition of users for values with (modulus 2, remainder 1); > > select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'users%'; > alter table users alter column name type varchar(127); > select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'users%'; > > which gives me (in 11, with this patch) > > ... > relname | relfilenode | obj_description > ----------------------------+-------------+----------------- > users | 89389 | > users_000 | 89394 | > users_000_user_id_name_key | 89397 | > users_001 | 89399 | > users_001_user_id_name_key | 89402 | > users_user_id_name_key | 89392 | parent index > (6 rows) > > ALTER TABLE > relname | relfilenode | obj_description > ----------------------------+-------------+----------------- > users | 89389 | > users_000 | 89394 | > users_000_user_id_name_key | 89406 | parent index > users_001 | 89399 | > users_001_user_id_name_key | 89408 | parent index > users_user_id_name_key | 89404 | parent index > (6 rows) This may be seen as slightly worse if the child indexes had their own comments, which would get overwritten by the parent's. create table pp (a int, b text, unique (a, b)) partition by list (a); create table pp1 partition of pp for values in (1); create table pp2 partition of pp for values in (2); comment on index pp_a_b_key is 'parent index'; comment on index pp1_a_b_key is 'child index 1'; comment on index pp2_a_b_key is 'child index 2'; select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'pp%'; relname │ relfilenode │ obj_description ─────────────┼─────────────┼───────────────── pp │ 16420 │ pp1 │ 16425 │ pp1_a_b_key │ 16428 │ child index 1 pp2 │ 16433 │ pp2_a_b_key │ 16436 │ child index 2 pp_a_b_key │ 16423 │ parent index (6 rows) alter table pp alter b type varchar(128); select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'pp%'; relname │ relfilenode │ obj_description ─────────────┼─────────────┼───────────────── pp │ 16420 │ pp1 │ 16447 │ pp1_a_b_key │ 16450 │ parent index pp2 │ 16451 │ pp2_a_b_key │ 16454 │ parent index pp_a_b_key │ 16441 │ parent index (6 rows) > However, I doubt that that's bad enough to justify a major rewrite > of the ALTER TABLE code in 11 ... and maybe not in HEAD either; > I wouldn't be too unhappy to leave it to v13. Yeah, it's probably decent amount of code churn to undertake as a bug-fix. Thanks, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
Thanks for looking at this. On 2019/04/24 7:03, Tom Lane wrote: > ISTM we could work around the problem with the attached, which I think > is a good change independently of anything else. Agreed. > There is still an issue, which manifests in both 11 and HEAD, namely > that the code also propagates the parent index's comment to any child > indexes. You can see that with this extended test case: > > create table users(user_id int, name varchar(64), unique (user_id, name)) partition by hash(user_id); > comment on index users_user_id_name_key is 'parent index'; > create table users_000 partition of users for values with (modulus 2, remainder 0); > create table users_001 partition of users for values with (modulus 2, remainder 1); > > select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'users%'; > alter table users alter column name type varchar(127); > select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'users%'; > > which gives me (in 11, with this patch) > > ... > relname | relfilenode | obj_description > ----------------------------+-------------+----------------- > users | 89389 | > users_000 | 89394 | > users_000_user_id_name_key | 89397 | > users_001 | 89399 | > users_001_user_id_name_key | 89402 | > users_user_id_name_key | 89392 | parent index > (6 rows) > > ALTER TABLE > relname | relfilenode | obj_description > ----------------------------+-------------+----------------- > users | 89389 | > users_000 | 89394 | > users_000_user_id_name_key | 89406 | parent index > users_001 | 89399 | > users_001_user_id_name_key | 89408 | parent index > users_user_id_name_key | 89404 | parent index > (6 rows) This may be seen as slightly worse if the child indexes had their own comments, which would get overwritten by the parent's. create table pp (a int, b text, unique (a, b)) partition by list (a); create table pp1 partition of pp for values in (1); create table pp2 partition of pp for values in (2); comment on index pp_a_b_key is 'parent index'; comment on index pp1_a_b_key is 'child index 1'; comment on index pp2_a_b_key is 'child index 2'; select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'pp%'; relname │ relfilenode │ obj_description ─────────────┼─────────────┼───────────────── pp │ 16420 │ pp1 │ 16425 │ pp1_a_b_key │ 16428 │ child index 1 pp2 │ 16433 │ pp2_a_b_key │ 16436 │ child index 2 pp_a_b_key │ 16423 │ parent index (6 rows) alter table pp alter b type varchar(128); select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'pp%'; relname │ relfilenode │ obj_description ─────────────┼─────────────┼───────────────── pp │ 16420 │ pp1 │ 16447 │ pp1_a_b_key │ 16450 │ parent index pp2 │ 16451 │ pp2_a_b_key │ 16454 │ parent index pp_a_b_key │ 16441 │ parent index (6 rows) > However, I doubt that that's bad enough to justify a major rewrite > of the ALTER TABLE code in 11 ... and maybe not in HEAD either; > I wouldn't be too unhappy to leave it to v13. Yeah, it's probably decent amount of code churn to undertake as a bug-fix. Thanks, Amit
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2019/04/24 7:03, Tom Lane wrote: >> ISTM we could work around the problem with the attached, which I think >> is a good change independently of anything else. > Agreed. After thinking about it more, it seems like a bad idea to put the check in CheckIndexCompatible; that could interfere with any other use of the function to match up potential child indexes. (I wonder why this test isn't the same as what DefineIndex uses to spot potential child indexes, BTW --- it uses a completely separate function CompareIndexInfo, which seems both wasteful and trouble waiting to happen.) So I think we should test the relkind in TryReuseIndex, instead. I think it would be a good idea to *also* do what you suggested upthread and have DefineIndex clear the field when cloning an IndexStmt to create a child; no good could possibly come of passing that down when we intend to create a new index. In short, I think the right short-term fix is as attached (plus a new regression test case based on the submitted example). Longer term, it's clearly bad that we fail to reuse child indexes in this scenario; the point about mangled comments is minor by comparison. I'm inclined to think that what we want to do is *not* recurse when creating the parent index, but to initially make it NOT VALID, and then do ALTER ATTACH PARTITION with each re-used child index. This would successfully reproduce the previous state in case only some of the partitions have attached indexes, which I don't think works correctly right now. BTW, I hadn't ever looked closely at what the index reuse code does, and now that I have, my heart just sinks. I think that logic needs to be nuked from orbit. RelationPreserveStorage was never meant to be abused in this way; I invite you to contemplate whether it's not a problem that it doesn't check the backend and nestLevel fields of PendingRelDelete entries before killing them. (In its originally-designed use for mapped rels at transaction end, that wasn't a problem, but I'm afraid that it may be one now.) The right way to do this IMO would be something closer to the heap-swap logic in cluster.c, where we exchange the relfilenodes of two live indexes, rather than what is happening now. Or for that matter, do we really need to delete the old indexes at all? None of that looks very practical for v12, let alone back-patching to v11, though. regards, tom lane diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index a1c91b5..002a8b8 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1125,6 +1125,18 @@ DefineIndex(Oid relationId, ListCell *lc; /* + * We can't use the same index name for the child index, + * so clear idxname to let the recursive invocation choose + * a new name. Likewise, the existing target relation + * field is wrong, and if indexOid or oldNode are set, + * they mustn't be applied to the child either. + */ + childStmt->idxname = NULL; + childStmt->relation = NULL; + childStmt->indexOid = InvalidOid; + childStmt->oldNode = InvalidOid; + + /* * Adjust any Vars (both in expressions and in the index's * WHERE clause) to match the partition's column numbering * in case it's different from the parent's. @@ -1155,8 +1167,6 @@ DefineIndex(Oid relationId, if (found_whole_row) elog(ERROR, "cannot convert whole-row table reference"); - childStmt->idxname = NULL; - childStmt->relation = NULL; DefineIndex(childRelid, childStmt, InvalidOid, /* no predefined OID */ indexRelationId, /* this is our child */ diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index aa7328e..66f863f 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -11454,7 +11454,9 @@ TryReuseIndex(Oid oldId, IndexStmt *stmt) { Relation irel = index_open(oldId, NoLock); - stmt->oldNode = irel->rd_node.relNode; + /* If it's a partitioned index, there is no storage to share. */ + if (irel->rd_rel->relkind != RELKIND_PARTITIONED_INDEX) + stmt->oldNode = irel->rd_node.relNode; index_close(irel, NoLock); } } diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index f8ee4b0..4b7520b 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1001,6 +1001,19 @@ DefineIndex(Oid relationId, ListCell *lc; /* + * We can't use the same index name for the child index, + * so clear idxname to let the recursive invocation choose + * a new name. Likewise, the existing target relation + * field is wrong, and if indexOid or oldNode are set, + * they mustn't be applied to the child either. + */ + childStmt->idxname = NULL; + childStmt->relation = NULL; + childStmt->relationId = childRelid; + childStmt->indexOid = InvalidOid; + childStmt->oldNode = InvalidOid; + + /* * Adjust any Vars (both in expressions and in the index's * WHERE clause) to match the partition's column numbering * in case it's different from the parent's. @@ -1031,8 +1044,6 @@ DefineIndex(Oid relationId, if (found_whole_row) elog(ERROR, "cannot convert whole-row table reference"); - childStmt->idxname = NULL; - childStmt->relationId = childRelid; DefineIndex(childRelid, childStmt, InvalidOid, /* no predefined OID */ indexRelationId, /* this is our child */ diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 65ede33..865fc42 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -10696,7 +10696,9 @@ TryReuseIndex(Oid oldId, IndexStmt *stmt) { Relation irel = index_open(oldId, NoLock); - stmt->oldNode = irel->rd_node.relNode; + /* If it's a partitioned index, there is no storage to share. */ + if (irel->rd_rel->relkind != RELKIND_PARTITIONED_INDEX) + stmt->oldNode = irel->rd_node.relNode; index_close(irel, NoLock); } }
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2019/04/24 7:03, Tom Lane wrote: >> ISTM we could work around the problem with the attached, which I think >> is a good change independently of anything else. > Agreed. After thinking about it more, it seems like a bad idea to put the check in CheckIndexCompatible; that could interfere with any other use of the function to match up potential child indexes. (I wonder why this test isn't the same as what DefineIndex uses to spot potential child indexes, BTW --- it uses a completely separate function CompareIndexInfo, which seems both wasteful and trouble waiting to happen.) So I think we should test the relkind in TryReuseIndex, instead. I think it would be a good idea to *also* do what you suggested upthread and have DefineIndex clear the field when cloning an IndexStmt to create a child; no good could possibly come of passing that down when we intend to create a new index. In short, I think the right short-term fix is as attached (plus a new regression test case based on the submitted example). Longer term, it's clearly bad that we fail to reuse child indexes in this scenario; the point about mangled comments is minor by comparison. I'm inclined to think that what we want to do is *not* recurse when creating the parent index, but to initially make it NOT VALID, and then do ALTER ATTACH PARTITION with each re-used child index. This would successfully reproduce the previous state in case only some of the partitions have attached indexes, which I don't think works correctly right now. BTW, I hadn't ever looked closely at what the index reuse code does, and now that I have, my heart just sinks. I think that logic needs to be nuked from orbit. RelationPreserveStorage was never meant to be abused in this way; I invite you to contemplate whether it's not a problem that it doesn't check the backend and nestLevel fields of PendingRelDelete entries before killing them. (In its originally-designed use for mapped rels at transaction end, that wasn't a problem, but I'm afraid that it may be one now.) The right way to do this IMO would be something closer to the heap-swap logic in cluster.c, where we exchange the relfilenodes of two live indexes, rather than what is happening now. Or for that matter, do we really need to delete the old indexes at all? None of that looks very practical for v12, let alone back-patching to v11, though. regards, tom lane diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index a1c91b5..002a8b8 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1125,6 +1125,18 @@ DefineIndex(Oid relationId, ListCell *lc; /* + * We can't use the same index name for the child index, + * so clear idxname to let the recursive invocation choose + * a new name. Likewise, the existing target relation + * field is wrong, and if indexOid or oldNode are set, + * they mustn't be applied to the child either. + */ + childStmt->idxname = NULL; + childStmt->relation = NULL; + childStmt->indexOid = InvalidOid; + childStmt->oldNode = InvalidOid; + + /* * Adjust any Vars (both in expressions and in the index's * WHERE clause) to match the partition's column numbering * in case it's different from the parent's. @@ -1155,8 +1167,6 @@ DefineIndex(Oid relationId, if (found_whole_row) elog(ERROR, "cannot convert whole-row table reference"); - childStmt->idxname = NULL; - childStmt->relation = NULL; DefineIndex(childRelid, childStmt, InvalidOid, /* no predefined OID */ indexRelationId, /* this is our child */ diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index aa7328e..66f863f 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -11454,7 +11454,9 @@ TryReuseIndex(Oid oldId, IndexStmt *stmt) { Relation irel = index_open(oldId, NoLock); - stmt->oldNode = irel->rd_node.relNode; + /* If it's a partitioned index, there is no storage to share. */ + if (irel->rd_rel->relkind != RELKIND_PARTITIONED_INDEX) + stmt->oldNode = irel->rd_node.relNode; index_close(irel, NoLock); } } diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index f8ee4b0..4b7520b 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1001,6 +1001,19 @@ DefineIndex(Oid relationId, ListCell *lc; /* + * We can't use the same index name for the child index, + * so clear idxname to let the recursive invocation choose + * a new name. Likewise, the existing target relation + * field is wrong, and if indexOid or oldNode are set, + * they mustn't be applied to the child either. + */ + childStmt->idxname = NULL; + childStmt->relation = NULL; + childStmt->relationId = childRelid; + childStmt->indexOid = InvalidOid; + childStmt->oldNode = InvalidOid; + + /* * Adjust any Vars (both in expressions and in the index's * WHERE clause) to match the partition's column numbering * in case it's different from the parent's. @@ -1031,8 +1044,6 @@ DefineIndex(Oid relationId, if (found_whole_row) elog(ERROR, "cannot convert whole-row table reference"); - childStmt->idxname = NULL; - childStmt->relationId = childRelid; DefineIndex(childRelid, childStmt, InvalidOid, /* no predefined OID */ indexRelationId, /* this is our child */ diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 65ede33..865fc42 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -10696,7 +10696,9 @@ TryReuseIndex(Oid oldId, IndexStmt *stmt) { Relation irel = index_open(oldId, NoLock); - stmt->oldNode = irel->rd_node.relNode; + /* If it's a partitioned index, there is no storage to share. */ + if (irel->rd_rel->relkind != RELKIND_PARTITIONED_INDEX) + stmt->oldNode = irel->rd_node.relNode; index_close(irel, NoLock); } }
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/04/25 8:27, Tom Lane wrote: > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >> On 2019/04/24 7:03, Tom Lane wrote: >>> ISTM we could work around the problem with the attached, which I think >>> is a good change independently of anything else. > >> Agreed. > > After thinking about it more, it seems like a bad idea to put the check > in CheckIndexCompatible; that could interfere with any other use of the > function to match up potential child indexes. CheckIndexCompatible() seems to be invented solely for the purposes of post-ALTER-COLUMN-TYPE, which is what I get from the header comment of that function: * This is tailored to the needs of ALTER TABLE ALTER TYPE, which recreates * any indexes that depended on a changing column from their pg_get_indexdef * or pg_get_constraintdef definitions. We omit some of the sanity checks of * DefineIndex. We assume that the old and new indexes have the same number * of columns and that if one has an expression column or predicate, both do. * Errors arising from the attribute list still apply. Given that, it may have been better to keep the function local to tablecmds.c to avoid potential misuse; I see no other callers of this function beside TryReuseIndex(), which in turn is only used by post-ALTER-COLUMN-TYPE processing. > (I wonder why this test > isn't the same as what DefineIndex uses to spot potential child indexes, > BTW --- it uses a completely separate function CompareIndexInfo, which > seems both wasteful and trouble waiting to happen.) I don't think that CheckIndexCompatible's job is to spot child indexes compatible with a given parent index; that's CompareIndexInfo's job. The latter was invented for the partition index DDL code, complete with provisions to do mapping between parent and child attributes before matching if their TupleDescs are different. CheckIndexCompatible, as mentioned above, is to do the errands of ATPostAlterTypeParse(). /* * CheckIndexCompatible * Determine whether an existing index definition is compatible with a * prospective index definition, such that the existing index storage * could become the storage of the new index, avoiding a rebuild. So, maybe a bigger (different?) charter than CompareIndexInfo's, or so I think. Although, they may be able share the code. > So I think we should test the relkind in TryReuseIndex, instead. Which would work too. Or we could just not call TryReuseIndex() if the the index in question is partitioned. > I think it would be a good idea to *also* do what you suggested > upthread and have DefineIndex clear the field when cloning an > IndexStmt to create a child; no good could possibly come of > passing that down when we intend to create a new index. Note that oldNode is only set by TryReuseIndex() today. Being careful in DefineIndex might be a good idea anyway. > In short, I think the right short-term fix is as attached (plus > a new regression test case based on the submitted example). Sounds good. > Longer term, it's clearly bad that we fail to reuse child indexes > in this scenario; the point about mangled comments is minor by > comparison. Agreed. > I'm inclined to think that what we want to do is > *not* recurse when creating the parent index, but to initially > make it NOT VALID, and then do ALTER ATTACH PARTITION with each > re-used child index. This would successfully reproduce the > previous state in case only some of the partitions have attached > indexes, which I don't think works correctly right now. Well, an index in the parent must be present in all partitions, so I don't understand which case you're thinking of. > BTW, I hadn't ever looked closely at what the index reuse code > does, and now that I have, my heart just sinks. I think that > logic needs to be nuked from orbit. RelationPreserveStorage was > never meant to be abused in this way; I invite you to contemplate > whether it's not a problem that it doesn't check the backend and > nestLevel fields of PendingRelDelete entries before killing them. > (In its originally-designed use for mapped rels at transaction end, > that wasn't a problem, but I'm afraid that it may be one now.) > > The right way to do this IMO would be something closer to the > heap-swap logic in cluster.c, where we exchange the relfilenodes > of two live indexes, rather than what is happening now. Or for > that matter, do we really need to delete the old indexes at all? Yeah, we wouldn't need TryReuseIndex and subsequent RelationPreserveStorage if we hadn't dropped the old indexes to begin with. Instead, in ATPostAlterTypeParse, check if the index after ALTER TYPE is still incompatible (CheckIndexCompatible) and if it is, don't add a new AT_ReAddIndex command. If it's not, *then* drop the index, and recreate the index from scratch using an IndexStmt generated from the old index definition. I guess We can get rid of IndexStmt.oldNode too. > None of that looks very practical for v12, let alone back-patching > to v11, though. Yes. Thanks, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/04/25 8:27, Tom Lane wrote: > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: >> On 2019/04/24 7:03, Tom Lane wrote: >>> ISTM we could work around the problem with the attached, which I think >>> is a good change independently of anything else. > >> Agreed. > > After thinking about it more, it seems like a bad idea to put the check > in CheckIndexCompatible; that could interfere with any other use of the > function to match up potential child indexes. CheckIndexCompatible() seems to be invented solely for the purposes of post-ALTER-COLUMN-TYPE, which is what I get from the header comment of that function: * This is tailored to the needs of ALTER TABLE ALTER TYPE, which recreates * any indexes that depended on a changing column from their pg_get_indexdef * or pg_get_constraintdef definitions. We omit some of the sanity checks of * DefineIndex. We assume that the old and new indexes have the same number * of columns and that if one has an expression column or predicate, both do. * Errors arising from the attribute list still apply. Given that, it may have been better to keep the function local to tablecmds.c to avoid potential misuse; I see no other callers of this function beside TryReuseIndex(), which in turn is only used by post-ALTER-COLUMN-TYPE processing. > (I wonder why this test > isn't the same as what DefineIndex uses to spot potential child indexes, > BTW --- it uses a completely separate function CompareIndexInfo, which > seems both wasteful and trouble waiting to happen.) I don't think that CheckIndexCompatible's job is to spot child indexes compatible with a given parent index; that's CompareIndexInfo's job. The latter was invented for the partition index DDL code, complete with provisions to do mapping between parent and child attributes before matching if their TupleDescs are different. CheckIndexCompatible, as mentioned above, is to do the errands of ATPostAlterTypeParse(). /* * CheckIndexCompatible * Determine whether an existing index definition is compatible with a * prospective index definition, such that the existing index storage * could become the storage of the new index, avoiding a rebuild. So, maybe a bigger (different?) charter than CompareIndexInfo's, or so I think. Although, they may be able share the code. > So I think we should test the relkind in TryReuseIndex, instead. Which would work too. Or we could just not call TryReuseIndex() if the the index in question is partitioned. > I think it would be a good idea to *also* do what you suggested > upthread and have DefineIndex clear the field when cloning an > IndexStmt to create a child; no good could possibly come of > passing that down when we intend to create a new index. Note that oldNode is only set by TryReuseIndex() today. Being careful in DefineIndex might be a good idea anyway. > In short, I think the right short-term fix is as attached (plus > a new regression test case based on the submitted example). Sounds good. > Longer term, it's clearly bad that we fail to reuse child indexes > in this scenario; the point about mangled comments is minor by > comparison. Agreed. > I'm inclined to think that what we want to do is > *not* recurse when creating the parent index, but to initially > make it NOT VALID, and then do ALTER ATTACH PARTITION with each > re-used child index. This would successfully reproduce the > previous state in case only some of the partitions have attached > indexes, which I don't think works correctly right now. Well, an index in the parent must be present in all partitions, so I don't understand which case you're thinking of. > BTW, I hadn't ever looked closely at what the index reuse code > does, and now that I have, my heart just sinks. I think that > logic needs to be nuked from orbit. RelationPreserveStorage was > never meant to be abused in this way; I invite you to contemplate > whether it's not a problem that it doesn't check the backend and > nestLevel fields of PendingRelDelete entries before killing them. > (In its originally-designed use for mapped rels at transaction end, > that wasn't a problem, but I'm afraid that it may be one now.) > > The right way to do this IMO would be something closer to the > heap-swap logic in cluster.c, where we exchange the relfilenodes > of two live indexes, rather than what is happening now. Or for > that matter, do we really need to delete the old indexes at all? Yeah, we wouldn't need TryReuseIndex and subsequent RelationPreserveStorage if we hadn't dropped the old indexes to begin with. Instead, in ATPostAlterTypeParse, check if the index after ALTER TYPE is still incompatible (CheckIndexCompatible) and if it is, don't add a new AT_ReAddIndex command. If it's not, *then* drop the index, and recreate the index from scratch using an IndexStmt generated from the old index definition. I guess We can get rid of IndexStmt.oldNode too. > None of that looks very practical for v12, let alone back-patching > to v11, though. Yes. Thanks, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/04/25 11:21, Amit Langote wrote: > On 2019/04/25 8:27, Tom Lane wrote: >> BTW, I hadn't ever looked closely at what the index reuse code >> does, and now that I have, my heart just sinks. I think that >> logic needs to be nuked from orbit. RelationPreserveStorage was >> never meant to be abused in this way; I invite you to contemplate >> whether it's not a problem that it doesn't check the backend and >> nestLevel fields of PendingRelDelete entries before killing them. >> (In its originally-designed use for mapped rels at transaction end, >> that wasn't a problem, but I'm afraid that it may be one now.) >> >> The right way to do this IMO would be something closer to the >> heap-swap logic in cluster.c, where we exchange the relfilenodes >> of two live indexes, rather than what is happening now. Or for >> that matter, do we really need to delete the old indexes at all? > > Yeah, we wouldn't need TryReuseIndex and subsequent > RelationPreserveStorage if we hadn't dropped the old indexes to begin > with. Instead, in ATPostAlterTypeParse, check if the index after ALTER > TYPE is still incompatible (CheckIndexCompatible) and if it is, don't add > a new AT_ReAddIndex command. If it's not, *then* drop the index, and > recreate the index from scratch using an IndexStmt generated from the old > index definition. I guess We can get rid of IndexStmt.oldNode too. Thinking on this more and growing confident that we could indeed avoid drop index + recreate-it-while-preserving-storage, instead by just not doing anything when CheckIndexCompatible says the old index will be fine despite ALTER TYPE, but only if the table is not rewritten. I gave this a try and came up with the attached patch. It fixes the bug related to partitioned indexes (the originally reported one) and then some. Basically, I aimed to rewrite the code in ATPostAlterTypeCleanup and ATPostAlterTypeParse such that we no longer have to rely on an implementation based on setting "oldNode" to preserve old indexes. With the attached, for the cases in which the table won't be rewritten and hence the indexes not rebuilt, ATPostAlterTypeParse() simply won't queue a AT_ReAddIndex command to rebuild index while preserving the storage. That means both ATAddIndex() and DefineIndex can be freed of the duty of looking out for the "oldNode" case, because that case no longer exists. Another main change is that inherited (!conislocal) constraints are now recognized by ATPostAlterTypeParse directly, instead of ATPostAlterTypeCleanup checking for them and skipping ATPostAlterTypeParse() as a whole for such constraints. For one, I had to make that change to make the above-described approach work. Also, doing that allowed to fix another bug whereby the comments of child constraints would go away when they're reconstructed. Notice what happens on un-patched PG 11: create table pp (a int, b text, unique (a, b), c varchar(64)) partition by list (a); create table pp1 partition of pp for values in (1); create table pp2 partition of pp for values in (2); alter table pp add constraint c_chk check (c <> ''); comment on constraint c_chk ON pp is 'parent check constraint'; comment on constraint c_chk ON pp1 is 'child check constraint 1'; comment on constraint c_chk ON pp2 is 'child check constraint 2'; select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk'; conname │ obj_description ─────────┼────────────────────────── c_chk │ parent check constraint c_chk │ child check constraint 1 c_chk │ child check constraint 2 (3 rows) alter table pp alter c type varchar(64); select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk'; conname │ obj_description ─────────┼───────────────────────── c_chk │ parent check constraint c_chk │ c_chk │ (3 rows) The patch fixes that with some surgery of RebuildConstraintComment combined with aforementioned changes. With the patch: alter table pp alter c type varchar(64); select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk'; conname │ obj_description ─────────┼────────────────────────── c_chk │ parent check constraint c_chk │ child check constraint 1 c_chk │ child check constraint 2 (3 rows) alter table pp alter c type varchar(128); select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk'; conname │ obj_description ─────────┼────────────────────────── c_chk │ parent check constraint c_chk │ child check constraint 1 c_chk │ child check constraint 2 (3 rows) Also for index comments, but only in the case when indexes are not rebuilt. comment on index pp_a_b_key is 'parent index'; comment on index pp1_a_b_key is 'child index 1'; comment on index pp2_a_b_key is 'child index 2'; select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'pp%key'; relname │ relfilenode │ obj_description ─────────────┼─────────────┼───────────────── pp1_a_b_key │ 17280 │ child index 1 pp2_a_b_key │ 17284 │ child index 2 pp_a_b_key │ 17271 │ parent index (3 rows) -- no rewrite, indexes untouched, comments preserved alter table pp alter b type varchar(128); select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'pp%key'; relname │ relfilenode │ obj_description ─────────────┼─────────────┼───────────────── pp1_a_b_key │ 17280 │ child index 1 pp2_a_b_key │ 17284 │ child index 2 pp_a_b_key │ 17271 │ parent index (3 rows) -- table rewritten, indexes rebuild, child indexes' comments gone alter table pp alter b type varchar(64); select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'pp%key'; relname │ relfilenode │ obj_description ─────────────┼─────────────┼───────────────── pp1_a_b_key │ 17294 │ pp2_a_b_key │ 17298 │ pp_a_b_key │ 17285 │ parent index (3 rows) I've also added tests for both the originally reported bug and the comment ones. The patch applies to PG 11. Thanks, Amit
Attachment
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/04/25 11:21, Amit Langote wrote: > On 2019/04/25 8:27, Tom Lane wrote: >> BTW, I hadn't ever looked closely at what the index reuse code >> does, and now that I have, my heart just sinks. I think that >> logic needs to be nuked from orbit. RelationPreserveStorage was >> never meant to be abused in this way; I invite you to contemplate >> whether it's not a problem that it doesn't check the backend and >> nestLevel fields of PendingRelDelete entries before killing them. >> (In its originally-designed use for mapped rels at transaction end, >> that wasn't a problem, but I'm afraid that it may be one now.) >> >> The right way to do this IMO would be something closer to the >> heap-swap logic in cluster.c, where we exchange the relfilenodes >> of two live indexes, rather than what is happening now. Or for >> that matter, do we really need to delete the old indexes at all? > > Yeah, we wouldn't need TryReuseIndex and subsequent > RelationPreserveStorage if we hadn't dropped the old indexes to begin > with. Instead, in ATPostAlterTypeParse, check if the index after ALTER > TYPE is still incompatible (CheckIndexCompatible) and if it is, don't add > a new AT_ReAddIndex command. If it's not, *then* drop the index, and > recreate the index from scratch using an IndexStmt generated from the old > index definition. I guess We can get rid of IndexStmt.oldNode too. Thinking on this more and growing confident that we could indeed avoid drop index + recreate-it-while-preserving-storage, instead by just not doing anything when CheckIndexCompatible says the old index will be fine despite ALTER TYPE, but only if the table is not rewritten. I gave this a try and came up with the attached patch. It fixes the bug related to partitioned indexes (the originally reported one) and then some. Basically, I aimed to rewrite the code in ATPostAlterTypeCleanup and ATPostAlterTypeParse such that we no longer have to rely on an implementation based on setting "oldNode" to preserve old indexes. With the attached, for the cases in which the table won't be rewritten and hence the indexes not rebuilt, ATPostAlterTypeParse() simply won't queue a AT_ReAddIndex command to rebuild index while preserving the storage. That means both ATAddIndex() and DefineIndex can be freed of the duty of looking out for the "oldNode" case, because that case no longer exists. Another main change is that inherited (!conislocal) constraints are now recognized by ATPostAlterTypeParse directly, instead of ATPostAlterTypeCleanup checking for them and skipping ATPostAlterTypeParse() as a whole for such constraints. For one, I had to make that change to make the above-described approach work. Also, doing that allowed to fix another bug whereby the comments of child constraints would go away when they're reconstructed. Notice what happens on un-patched PG 11: create table pp (a int, b text, unique (a, b), c varchar(64)) partition by list (a); create table pp1 partition of pp for values in (1); create table pp2 partition of pp for values in (2); alter table pp add constraint c_chk check (c <> ''); comment on constraint c_chk ON pp is 'parent check constraint'; comment on constraint c_chk ON pp1 is 'child check constraint 1'; comment on constraint c_chk ON pp2 is 'child check constraint 2'; select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk'; conname │ obj_description ─────────┼────────────────────────── c_chk │ parent check constraint c_chk │ child check constraint 1 c_chk │ child check constraint 2 (3 rows) alter table pp alter c type varchar(64); select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk'; conname │ obj_description ─────────┼───────────────────────── c_chk │ parent check constraint c_chk │ c_chk │ (3 rows) The patch fixes that with some surgery of RebuildConstraintComment combined with aforementioned changes. With the patch: alter table pp alter c type varchar(64); select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk'; conname │ obj_description ─────────┼────────────────────────── c_chk │ parent check constraint c_chk │ child check constraint 1 c_chk │ child check constraint 2 (3 rows) alter table pp alter c type varchar(128); select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk'; conname │ obj_description ─────────┼────────────────────────── c_chk │ parent check constraint c_chk │ child check constraint 1 c_chk │ child check constraint 2 (3 rows) Also for index comments, but only in the case when indexes are not rebuilt. comment on index pp_a_b_key is 'parent index'; comment on index pp1_a_b_key is 'child index 1'; comment on index pp2_a_b_key is 'child index 2'; select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'pp%key'; relname │ relfilenode │ obj_description ─────────────┼─────────────┼───────────────── pp1_a_b_key │ 17280 │ child index 1 pp2_a_b_key │ 17284 │ child index 2 pp_a_b_key │ 17271 │ parent index (3 rows) -- no rewrite, indexes untouched, comments preserved alter table pp alter b type varchar(128); select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'pp%key'; relname │ relfilenode │ obj_description ─────────────┼─────────────┼───────────────── pp1_a_b_key │ 17280 │ child index 1 pp2_a_b_key │ 17284 │ child index 2 pp_a_b_key │ 17271 │ parent index (3 rows) -- table rewritten, indexes rebuild, child indexes' comments gone alter table pp alter b type varchar(64); select relname, relfilenode, obj_description(oid,'pg_class') from pg_class where relname like 'pp%key'; relname │ relfilenode │ obj_description ─────────────┼─────────────┼───────────────── pp1_a_b_key │ 17294 │ pp2_a_b_key │ 17298 │ pp_a_b_key │ 17285 │ parent index (3 rows) I've also added tests for both the originally reported bug and the comment ones. The patch applies to PG 11. Thanks, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Alvaro Herrera
Date:
Haven't read the patch, but I tried applying it on top of my tablespace fixing patch ... and my first report is that this query in regress fails (three times): select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk' order by 1, 2; conname | obj_description ---------+--------------------------------------- + c_chk | alttype_cleanup_idx check constraint c_chk | alttype_cleanup_idx1 check constraint c_chk | alttype_cleanup_idx2 check constraint - c_chk | alttype_cleanup_idx check constraint (3 rows) I think you should use 'ORDER BY 2 COLLATE "C"' to avoid the problem. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Alvaro Herrera
Date:
Haven't read the patch, but I tried applying it on top of my tablespace fixing patch ... and my first report is that this query in regress fails (three times): select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk' order by 1, 2; conname | obj_description ---------+--------------------------------------- + c_chk | alttype_cleanup_idx check constraint c_chk | alttype_cleanup_idx1 check constraint c_chk | alttype_cleanup_idx2 check constraint - c_chk | alttype_cleanup_idx check constraint (3 rows) I think you should use 'ORDER BY 2 COLLATE "C"' to avoid the problem. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On Thu, Apr 25, 2019 at 10:46 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Haven't read the patch, but I tried applying it on top of my tablespace > fixing patch ... and my first report is that this query in regress fails > (three times): > > select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk' order by 1, 2; > conname | obj_description > ---------+--------------------------------------- > + c_chk | alttype_cleanup_idx check constraint > c_chk | alttype_cleanup_idx1 check constraint > c_chk | alttype_cleanup_idx2 check constraint > - c_chk | alttype_cleanup_idx check constraint > (3 rows) > > I think you should use 'ORDER BY 2 COLLATE "C"' to avoid the problem. Oops, will do. Thanks. Regards, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On Thu, Apr 25, 2019 at 10:46 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Haven't read the patch, but I tried applying it on top of my tablespace > fixing patch ... and my first report is that this query in regress fails > (three times): > > select conname, obj_description(oid, 'pg_constraint') from pg_constraint where conname = 'c_chk' order by 1, 2; > conname | obj_description > ---------+--------------------------------------- > + c_chk | alttype_cleanup_idx check constraint > c_chk | alttype_cleanup_idx1 check constraint > c_chk | alttype_cleanup_idx2 check constraint > - c_chk | alttype_cleanup_idx check constraint > (3 rows) > > I think you should use 'ORDER BY 2 COLLATE "C"' to avoid the problem. Oops, will do. Thanks. Regards, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/04/25 19:02, Amit Langote wrote: > On 2019/04/25 11:21, Amit Langote wrote: >> On 2019/04/25 8:27, Tom Lane wrote: >>> BTW, I hadn't ever looked closely at what the index reuse code >>> does, and now that I have, my heart just sinks. I think that >>> logic needs to be nuked from orbit. RelationPreserveStorage was >>> never meant to be abused in this way; I invite you to contemplate >>> whether it's not a problem that it doesn't check the backend and >>> nestLevel fields of PendingRelDelete entries before killing them. >>> (In its originally-designed use for mapped rels at transaction end, >>> that wasn't a problem, but I'm afraid that it may be one now.) >>> >>> The right way to do this IMO would be something closer to the >>> heap-swap logic in cluster.c, where we exchange the relfilenodes >>> of two live indexes, rather than what is happening now. Or for >>> that matter, do we really need to delete the old indexes at all? >> >> Yeah, we wouldn't need TryReuseIndex and subsequent >> RelationPreserveStorage if we hadn't dropped the old indexes to begin >> with. Instead, in ATPostAlterTypeParse, check if the index after ALTER >> TYPE is still incompatible (CheckIndexCompatible) and if it is, don't add >> a new AT_ReAddIndex command. If it's not, *then* drop the index, and >> recreate the index from scratch using an IndexStmt generated from the old >> index definition. I guess We can get rid of IndexStmt.oldNode too. > > Thinking on this more and growing confident that we could indeed avoid > drop index + recreate-it-while-preserving-storage, instead by just not > doing anything when CheckIndexCompatible says the old index will be fine > despite ALTER TYPE, but only if the table is not rewritten. I gave this a > try and came up with the attached patch. It fixes the bug related to > partitioned indexes (the originally reported one) and then some. > > Basically, I aimed to rewrite the code in ATPostAlterTypeCleanup and > ATPostAlterTypeParse such that we no longer have to rely on an > implementation based on setting "oldNode" to preserve old indexes. With > the attached, for the cases in which the table won't be rewritten and > hence the indexes not rebuilt, ATPostAlterTypeParse() simply won't queue a > AT_ReAddIndex command to rebuild index while preserving the storage. That > means both ATAddIndex() and DefineIndex can be freed of the duty of > looking out for the "oldNode" case, because that case no longer exists. > > Another main change is that inherited (!conislocal) constraints are now > recognized by ATPostAlterTypeParse directly, instead of > ATPostAlterTypeCleanup checking for them and skipping > ATPostAlterTypeParse() as a whole for such constraints. For one, I had to > make that change to make the above-described approach work. Also, doing > that allowed to fix another bug whereby the comments of child constraints > would go away when they're reconstructed. Notice what happens on > un-patched PG 11: > > create table pp (a int, b text, unique (a, b), c varchar(64)) partition by > list (a); > create table pp1 partition of pp for values in (1); > create table pp2 partition of pp for values in (2); > alter table pp add constraint c_chk check (c <> ''); > comment on constraint c_chk ON pp is 'parent check constraint'; > comment on constraint c_chk ON pp1 is 'child check constraint 1'; > comment on constraint c_chk ON pp2 is 'child check constraint 2'; > select conname, obj_description(oid, 'pg_constraint') from pg_constraint > where conname = 'c_chk'; > conname │ obj_description > ─────────┼────────────────────────── > c_chk │ parent check constraint > c_chk │ child check constraint 1 > c_chk │ child check constraint 2 > (3 rows) > > alter table pp alter c type varchar(64); > > select conname, obj_description(oid, 'pg_constraint') from pg_constraint > where conname = 'c_chk'; > conname │ obj_description > ─────────┼───────────────────────── > c_chk │ parent check constraint > c_chk │ > c_chk │ > (3 rows) > > The patch fixes that with some surgery of RebuildConstraintComment > combined with aforementioned changes. With the patch: > > alter table pp alter c type varchar(64); > > select conname, obj_description(oid, 'pg_constraint') from pg_constraint > where conname = 'c_chk'; > conname │ obj_description > ─────────┼────────────────────────── > c_chk │ parent check constraint > c_chk │ child check constraint 1 > c_chk │ child check constraint 2 > (3 rows) > > alter table pp alter c type varchar(128); > > select conname, obj_description(oid, 'pg_constraint') from pg_constraint > where conname = 'c_chk'; > conname │ obj_description > ─────────┼────────────────────────── > c_chk │ parent check constraint > c_chk │ child check constraint 1 > c_chk │ child check constraint 2 > (3 rows) > > Also for index comments, but only in the case when indexes are not rebuilt. > > comment on index pp_a_b_key is 'parent index'; > comment on index pp1_a_b_key is 'child index 1'; > comment on index pp2_a_b_key is 'child index 2'; > > select relname, relfilenode, obj_description(oid,'pg_class') from pg_class > where relname like 'pp%key'; > relname │ relfilenode │ obj_description > ─────────────┼─────────────┼───────────────── > pp1_a_b_key │ 17280 │ child index 1 > pp2_a_b_key │ 17284 │ child index 2 > pp_a_b_key │ 17271 │ parent index > (3 rows) > > -- no rewrite, indexes untouched, comments preserved > alter table pp alter b type varchar(128); > > select relname, relfilenode, obj_description(oid,'pg_class') from pg_class > where relname like 'pp%key'; > relname │ relfilenode │ obj_description > ─────────────┼─────────────┼───────────────── > pp1_a_b_key │ 17280 │ child index 1 > pp2_a_b_key │ 17284 │ child index 2 > pp_a_b_key │ 17271 │ parent index > (3 rows) > > -- table rewritten, indexes rebuild, child indexes' comments gone > alter table pp alter b type varchar(64); > > select relname, relfilenode, obj_description(oid,'pg_class') from pg_class > where relname like 'pp%key'; > relname │ relfilenode │ obj_description > ─────────────┼─────────────┼───────────────── > pp1_a_b_key │ 17294 │ > pp2_a_b_key │ 17298 │ > pp_a_b_key │ 17285 │ parent index > (3 rows) > > > I've also added tests for both the originally reported bug and the comment > ones. > > The patch applies to PG 11. Per Alvaro's report, regression tests added weren't portable. Fixed that in the attached updated patch. Thanks, Amit
Attachment
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/04/25 19:02, Amit Langote wrote: > On 2019/04/25 11:21, Amit Langote wrote: >> On 2019/04/25 8:27, Tom Lane wrote: >>> BTW, I hadn't ever looked closely at what the index reuse code >>> does, and now that I have, my heart just sinks. I think that >>> logic needs to be nuked from orbit. RelationPreserveStorage was >>> never meant to be abused in this way; I invite you to contemplate >>> whether it's not a problem that it doesn't check the backend and >>> nestLevel fields of PendingRelDelete entries before killing them. >>> (In its originally-designed use for mapped rels at transaction end, >>> that wasn't a problem, but I'm afraid that it may be one now.) >>> >>> The right way to do this IMO would be something closer to the >>> heap-swap logic in cluster.c, where we exchange the relfilenodes >>> of two live indexes, rather than what is happening now. Or for >>> that matter, do we really need to delete the old indexes at all? >> >> Yeah, we wouldn't need TryReuseIndex and subsequent >> RelationPreserveStorage if we hadn't dropped the old indexes to begin >> with. Instead, in ATPostAlterTypeParse, check if the index after ALTER >> TYPE is still incompatible (CheckIndexCompatible) and if it is, don't add >> a new AT_ReAddIndex command. If it's not, *then* drop the index, and >> recreate the index from scratch using an IndexStmt generated from the old >> index definition. I guess We can get rid of IndexStmt.oldNode too. > > Thinking on this more and growing confident that we could indeed avoid > drop index + recreate-it-while-preserving-storage, instead by just not > doing anything when CheckIndexCompatible says the old index will be fine > despite ALTER TYPE, but only if the table is not rewritten. I gave this a > try and came up with the attached patch. It fixes the bug related to > partitioned indexes (the originally reported one) and then some. > > Basically, I aimed to rewrite the code in ATPostAlterTypeCleanup and > ATPostAlterTypeParse such that we no longer have to rely on an > implementation based on setting "oldNode" to preserve old indexes. With > the attached, for the cases in which the table won't be rewritten and > hence the indexes not rebuilt, ATPostAlterTypeParse() simply won't queue a > AT_ReAddIndex command to rebuild index while preserving the storage. That > means both ATAddIndex() and DefineIndex can be freed of the duty of > looking out for the "oldNode" case, because that case no longer exists. > > Another main change is that inherited (!conislocal) constraints are now > recognized by ATPostAlterTypeParse directly, instead of > ATPostAlterTypeCleanup checking for them and skipping > ATPostAlterTypeParse() as a whole for such constraints. For one, I had to > make that change to make the above-described approach work. Also, doing > that allowed to fix another bug whereby the comments of child constraints > would go away when they're reconstructed. Notice what happens on > un-patched PG 11: > > create table pp (a int, b text, unique (a, b), c varchar(64)) partition by > list (a); > create table pp1 partition of pp for values in (1); > create table pp2 partition of pp for values in (2); > alter table pp add constraint c_chk check (c <> ''); > comment on constraint c_chk ON pp is 'parent check constraint'; > comment on constraint c_chk ON pp1 is 'child check constraint 1'; > comment on constraint c_chk ON pp2 is 'child check constraint 2'; > select conname, obj_description(oid, 'pg_constraint') from pg_constraint > where conname = 'c_chk'; > conname │ obj_description > ─────────┼────────────────────────── > c_chk │ parent check constraint > c_chk │ child check constraint 1 > c_chk │ child check constraint 2 > (3 rows) > > alter table pp alter c type varchar(64); > > select conname, obj_description(oid, 'pg_constraint') from pg_constraint > where conname = 'c_chk'; > conname │ obj_description > ─────────┼───────────────────────── > c_chk │ parent check constraint > c_chk │ > c_chk │ > (3 rows) > > The patch fixes that with some surgery of RebuildConstraintComment > combined with aforementioned changes. With the patch: > > alter table pp alter c type varchar(64); > > select conname, obj_description(oid, 'pg_constraint') from pg_constraint > where conname = 'c_chk'; > conname │ obj_description > ─────────┼────────────────────────── > c_chk │ parent check constraint > c_chk │ child check constraint 1 > c_chk │ child check constraint 2 > (3 rows) > > alter table pp alter c type varchar(128); > > select conname, obj_description(oid, 'pg_constraint') from pg_constraint > where conname = 'c_chk'; > conname │ obj_description > ─────────┼────────────────────────── > c_chk │ parent check constraint > c_chk │ child check constraint 1 > c_chk │ child check constraint 2 > (3 rows) > > Also for index comments, but only in the case when indexes are not rebuilt. > > comment on index pp_a_b_key is 'parent index'; > comment on index pp1_a_b_key is 'child index 1'; > comment on index pp2_a_b_key is 'child index 2'; > > select relname, relfilenode, obj_description(oid,'pg_class') from pg_class > where relname like 'pp%key'; > relname │ relfilenode │ obj_description > ─────────────┼─────────────┼───────────────── > pp1_a_b_key │ 17280 │ child index 1 > pp2_a_b_key │ 17284 │ child index 2 > pp_a_b_key │ 17271 │ parent index > (3 rows) > > -- no rewrite, indexes untouched, comments preserved > alter table pp alter b type varchar(128); > > select relname, relfilenode, obj_description(oid,'pg_class') from pg_class > where relname like 'pp%key'; > relname │ relfilenode │ obj_description > ─────────────┼─────────────┼───────────────── > pp1_a_b_key │ 17280 │ child index 1 > pp2_a_b_key │ 17284 │ child index 2 > pp_a_b_key │ 17271 │ parent index > (3 rows) > > -- table rewritten, indexes rebuild, child indexes' comments gone > alter table pp alter b type varchar(64); > > select relname, relfilenode, obj_description(oid,'pg_class') from pg_class > where relname like 'pp%key'; > relname │ relfilenode │ obj_description > ─────────────┼─────────────┼───────────────── > pp1_a_b_key │ 17294 │ > pp2_a_b_key │ 17298 │ > pp_a_b_key │ 17285 │ parent index > (3 rows) > > > I've also added tests for both the originally reported bug and the comment > ones. > > The patch applies to PG 11. Per Alvaro's report, regression tests added weren't portable. Fixed that in the attached updated patch. Thanks, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Alvaro Herrera
Date:
Hi, Please trim the quoted text in your reply. On 2019-Apr-26, Amit Langote wrote: > Per Alvaro's report, regression tests added weren't portable. Fixed that > in the attached updated patch. Um, this one doesn't apply because of yesterday's 87259588d0ab. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Alvaro Herrera
Date:
Hi, Please trim the quoted text in your reply. On 2019-Apr-26, Amit Langote wrote: > Per Alvaro's report, regression tests added weren't portable. Fixed that > in the attached updated patch. Um, this one doesn't apply because of yesterday's 87259588d0ab. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Um, this one doesn't apply because of yesterday's 87259588d0ab. Before we spend too much time on minutiae, we should ask ourselves whether this patch is even going in the right direction. I'm not sure. One point is that if we simply adopt the old index as-is, we won't see any updates in its metadata. An example is that if we have an index on a varchar(10) column, and we alter the column to varchar(12), the current behavior is to generate a new index that agrees with that: regression=# create table pp(f1 varchar(10) unique); CREATE TABLE regression=# \d pp_f1_key Index "public.pp_f1_key" Column | Type | Key? | Definition --------+-----------------------+------+------------ f1 | character varying(10) | yes | f1 unique, btree, for table "public.pp" regression=# alter table pp alter column f1 type varchar(12); ALTER TABLE regression=# \d pp_f1_key Index "public.pp_f1_key" Column | Type | Key? | Definition --------+-----------------------+------+------------ f1 | character varying(12) | yes | f1 unique, btree, for table "public.pp" With this patch, I believe, the index column will still claim to be varchar(10). Is that OK? It might not actually break anything right now, but at the very least it's likely to be confusing. Also, it'd essentially render the declared types/typmods of index columns untrustworthy, which seems like something that would come back to bite us. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Um, this one doesn't apply because of yesterday's 87259588d0ab. Before we spend too much time on minutiae, we should ask ourselves whether this patch is even going in the right direction. I'm not sure. One point is that if we simply adopt the old index as-is, we won't see any updates in its metadata. An example is that if we have an index on a varchar(10) column, and we alter the column to varchar(12), the current behavior is to generate a new index that agrees with that: regression=# create table pp(f1 varchar(10) unique); CREATE TABLE regression=# \d pp_f1_key Index "public.pp_f1_key" Column | Type | Key? | Definition --------+-----------------------+------+------------ f1 | character varying(10) | yes | f1 unique, btree, for table "public.pp" regression=# alter table pp alter column f1 type varchar(12); ALTER TABLE regression=# \d pp_f1_key Index "public.pp_f1_key" Column | Type | Key? | Definition --------+-----------------------+------+------------ f1 | character varying(12) | yes | f1 unique, btree, for table "public.pp" With this patch, I believe, the index column will still claim to be varchar(10). Is that OK? It might not actually break anything right now, but at the very least it's likely to be confusing. Also, it'd essentially render the declared types/typmods of index columns untrustworthy, which seems like something that would come back to bite us. regards, tom lane
I went ahead and pushed the stopgap patches, along with regression tests based on yours. The tests show the current (i.e. wrong) behavior for index comment and relfilenode reuse. I think that whenever we fix that, we can just adjust the expected output instead of adding more tests. regards, tom lane
I went ahead and pushed the stopgap patches, along with regression tests based on yours. The tests show the current (i.e. wrong) behavior for index comment and relfilenode reuse. I think that whenever we fix that, we can just adjust the expected output instead of adding more tests. regards, tom lane
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/04/27 3:57, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Um, this one doesn't apply because of yesterday's 87259588d0ab. > > Before we spend too much time on minutiae, we should ask ourselves whether > this patch is even going in the right direction. I'm not sure. > > One point is that if we simply adopt the old index as-is, we won't see > any updates in its metadata. An example is that if we have an index > on a varchar(10) column, and we alter the column to varchar(12), > the current behavior is to generate a new index that agrees with that: > > regression=# create table pp(f1 varchar(10) unique); > CREATE TABLE > regression=# \d pp_f1_key > Index "public.pp_f1_key" > Column | Type | Key? | Definition > --------+-----------------------+------+------------ > f1 | character varying(10) | yes | f1 > unique, btree, for table "public.pp" > > regression=# alter table pp alter column f1 type varchar(12); > ALTER TABLE > regression=# \d pp_f1_key > Index "public.pp_f1_key" > Column | Type | Key? | Definition > --------+-----------------------+------+------------ > f1 | character varying(12) | yes | f1 > unique, btree, for table "public.pp" > > With this patch, I believe, the index column will still claim to be > varchar(10). You're right, that's what happens. > Is that OK? It might not actually break anything > right now, but at the very least it's likely to be confusing. > Also, it'd essentially render the declared types/typmods of index > columns untrustworthy, which seems like something that would come > back to bite us. That's definitely misleading. Still, I think it'd be nice if we didn't have to do full-blown DefineIndex() in this case if only to update the pg_attribute tuples of the index relation. Maybe we could update them directly in the ALTER COLUMN TYPE's code path? Thanks, Amit
Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping apartition table
From
Amit Langote
Date:
On 2019/04/27 3:57, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Um, this one doesn't apply because of yesterday's 87259588d0ab. > > Before we spend too much time on minutiae, we should ask ourselves whether > this patch is even going in the right direction. I'm not sure. > > One point is that if we simply adopt the old index as-is, we won't see > any updates in its metadata. An example is that if we have an index > on a varchar(10) column, and we alter the column to varchar(12), > the current behavior is to generate a new index that agrees with that: > > regression=# create table pp(f1 varchar(10) unique); > CREATE TABLE > regression=# \d pp_f1_key > Index "public.pp_f1_key" > Column | Type | Key? | Definition > --------+-----------------------+------+------------ > f1 | character varying(10) | yes | f1 > unique, btree, for table "public.pp" > > regression=# alter table pp alter column f1 type varchar(12); > ALTER TABLE > regression=# \d pp_f1_key > Index "public.pp_f1_key" > Column | Type | Key? | Definition > --------+-----------------------+------+------------ > f1 | character varying(12) | yes | f1 > unique, btree, for table "public.pp" > > With this patch, I believe, the index column will still claim to be > varchar(10). You're right, that's what happens. > Is that OK? It might not actually break anything > right now, but at the very least it's likely to be confusing. > Also, it'd essentially render the declared types/typmods of index > columns untrustworthy, which seems like something that would come > back to bite us. That's definitely misleading. Still, I think it'd be nice if we didn't have to do full-blown DefineIndex() in this case if only to update the pg_attribute tuples of the index relation. Maybe we could update them directly in the ALTER COLUMN TYPE's code path? Thanks, Amit