Thread: BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction
BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17607 Logged by: Thomas Mc Kay Email address: thomas.d.mckay@gmail.com PostgreSQL version: 13.8 Operating system: Debian in docker Description: Hello, I've encountered a bug while doing some SQL migrations on a PostgreSQL 13.8 server. My initial use-case is a Django app running migrations, a test setup and a test, but that code was more than 10K lines of DDL, so I've reduced the code to a minimal reproducible example (as best I could). I also tested the same script in 14.0 and 14.5 and in both, the bug is fixed. I haven't seen anything related to it in the changelog for 14.0, so I don't know whether the bug was fixed purposefully and forgotten in the changelog or if it was fixed by accident, maybe as a corollary of another bugfix/feature. I also don't know if this is something the devs would want to fix in the 13.x line. Either way, I figured it might make a good regression test case for 14.x. Environment (Docker): - PostgreSQL 13.8 (Debian 13.8-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit Test code: ``` -- Migration 1 CREATE TABLE "tbl" ("id" serial NOT NULL PRIMARY KEY); -- Migration 2 ALTER TABLE "tbl" ADD COLUMN "value" int DEFAULT 0 NOT NULL; CREATE FUNCTION on_tbl_parent_id_change_fn() RETURNS TRIGGER AS $on_tbl_parent_id_change$ BEGIN RAISE EXCEPTION 'BOOM !'; END $on_tbl_parent_id_change$ LANGUAGE PLpgSQL VOLATILE; CREATE TRIGGER on_tbl_parent_id_update AFTER UPDATE ON tbl REFERENCING OLD TABLE AS old_rows NEW TABLE AS new_rows FOR EACH STATEMENT EXECUTE PROCEDURE on_tbl_parent_id_change_fn(); -- Test setup INSERT INTO "tbl" ("value") VALUES (2); -- Test BEGIN; SAVEPOINT svp; UPDATE tbl SET value=1; -- Fails here ROLLBACK TO SAVEPOINT svp; ROLLBACK; ``` Output (verbose): ``` -- CREATE TABLE -- ALTER TABLE -- CREATE FUNCTION -- CREATE TRIGGER -- INSERT 0 1 -- BEGIN -- SAVEPOINT -- WARNING: 01000: AbortSubTransaction while in ABORT state -- LOCATION: AbortSubTransaction, xact.c:4999 -- WARNING: 01000: AbortSubTransaction while in ABORT state -- LOCATION: AbortSubTransaction, xact.c:4999 -- WARNING: 01000: AbortSubTransaction while in ABORT state -- LOCATION: AbortSubTransaction, xact.c:4999 -- ERROR: P0001: BOOM ! -- CONTEXT: PL/pgSQL function on_tbl_parent_id_change_fn() line 3 at RAISE -- LOCATION: exec_stmt_raise, pl_exec.c:3854 -- ERROR: XX000: tupdesc reference 0x7fc02873fbc8 is not owned by resource owner SubTransaction -- LOCATION: ResourceOwnerForgetTupleDesc, resowner.c:1188 -- ERROR: XX000: tupdesc reference 0x7fc02873fbc8 is not owned by resource owner SubTransaction -- LOCATION: ResourceOwnerForgetTupleDesc, resowner.c:1188 -- ERROR: XX000: tupdesc reference 0x7fc02873fbc8 is not owned by resource owner SubTransaction -- LOCATION: ResourceOwnerForgetTupleDesc, resowner.c:1188 -- ERROR: XX000: tupdesc reference 0x7fc02873fbc8 is not owned by resource owner SubTransaction -- LOCATION: ResourceOwnerForgetTupleDesc, resowner.c:1188 -- PANIC: XX000: ERRORDATA_STACK_SIZE exceeded -- LOCATION: errstart, elog.c:359 -- server closed the connection unexpectedly -- This probably means the server terminated abnormally -- before or while processing the request. -- The connection to the server was lost. Attempting reset: Failed. -- You are currently not connected to a database. -- You are currently not connected to a database. ``` Expected output: ``` CREATE TABLE ALTER TABLE CREATE FUNCTION CREATE TRIGGER INSERT 0 1 BEGIN SAVEPOINT ERROR: BOOM ! CONTEXT: PL/pgSQL function on_tbl_parent_id_change_fn() line 3 at RAISE ROLLBACK ROLLBACK ``` The bug disappears if any of the following is true: - value is added to the table definition and not with ALTER TABLE. - value is declared without a default. - `REFERENCING OLD TABLE AS old_rows NEW TABLE AS new_rows` is removed from the trigger. - a `VACUUM FULL tbl` is done between the trigger creation and the insert. I don't know what to make of it, though. Cheers, Thomas
Re: BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > I've encountered a bug while doing some SQL migrations on a PostgreSQL 13.8 > server. My initial use-case is a Django app running migrations, a test setup > and a test, but that code was more than 10K lines of DDL, so I've reduced > the code to a minimal reproducible example (as best I could). Thanks for the minimal reproducer! I confirm that this problem is visible in the v12 and v13 branches, but not before or after. On the master branch, it was introduced at Author: Alvaro Herrera <alvherre@alvh.no-ip.org> Branch: master Release: REL_14_BR [25936fd46] 2021-02-27 18:09:15 -0300 Fix use-after-free bug with AfterTriggersTableData.storeslot and repaired, seemingly accidentally, by Author: Tom Lane <tgl@sss.pgh.pa.us> Branch: master Release: REL_14_BR [c5b7ba4e6] 2021-04-06 15:57:11 -0400 Postpone some stuff out of ExecInitModifyTable. 25936fd46 was back-patched into v12 and v13, which is why they show the bug. c5b7ba4e6 was not back-patched, both because it'd be way too invasive and because we weren't aware that it was fixing anything. I'm not really sure that it *did* fix anything --- perhaps there are variants of this example that still fail? cc'ing Alvaro for comment. regards, tom lane
Re: BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction
From
Tom Lane
Date:
I wrote: > PG Bug reporting form <noreply@postgresql.org> writes: >> I've encountered a bug while doing some SQL migrations on a PostgreSQL 13.8 >> server. My initial use-case is a Django app running migrations, a test setup >> and a test, but that code was more than 10K lines of DDL, so I've reduced >> the code to a minimal reproducible example (as best I could). > Thanks for the minimal reproducer! I confirm that this problem is visible > in the v12 and v13 branches, but not before or after. I got a chance to look at this today, and what I find is that we're making a tuple slot that will be freed at end of subtransaction, but it has a refcount on a refcounted tuple descriptor that belongs to the Portal's resowner. While trying to free the slot, we get a complaint because the subtransaction's resowner isn't holding the appropriate tupdesc reference. It seems to me there are probably other hazards here, because the tupdesc could possibly also go away before the slot does. I think what we ought to do is copy the tupdesc, so that we have a non-refcounted descriptor that we know has exactly the right lifespan. As attached. Not sure about whether to bother with a test case. The submitted reproducer doesn't work in >= v14, and even in v12/v13 it seems awfully corner-case-ish. (I find for instance that you need the ALTER TABLE step rather than just making the table in one step, and I'm a tad baffled as to why.) In any case, I've got zero faith that there are not other ways to trigger the same problem, so I think we need to apply this up to HEAD. regards, tom lane diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 59b38d00ed..b9a012512d 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -4340,11 +4340,13 @@ GetAfterTriggersStoreSlot(AfterTriggersTableData *table, MemoryContext oldcxt; /* - * We only need this slot only until AfterTriggerEndQuery, but making - * it last till end-of-subxact is good enough. It'll be freed by - * AfterTriggerFreeQuery(). + * We need this slot only until AfterTriggerEndQuery, but making it + * last till end-of-subxact is good enough. It'll be freed by + * AfterTriggerFreeQuery(). However, the passed-in tupdesc might have + * a different lifespan, so we'd better make a copy of that. */ oldcxt = MemoryContextSwitchTo(CurTransactionContext); + tupdesc = CreateTupleDescCopy(tupdesc); table->storeslot = MakeSingleTupleTableSlot(tupdesc, &TTSOpsVirtual); MemoryContextSwitchTo(oldcxt); }
Re: BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction
From
Tom Lane
Date:
I wrote: > It seems to me there are probably other hazards here, because the tupdesc > could possibly also go away before the slot does. I think what we ought > to do is copy the tupdesc, so that we have a non-refcounted descriptor > that we know has exactly the right lifespan. As attached. Actually, after looking around some more, I realize that there is a second mistake in 25936fd46: it ignored the comment on AfterTriggerFreeQuery that * Note: it's important for this to be safe if interrupted by an error * and then called again for the same query level. This is the reason why we end up in a recursive error and PANIC: we keep trying to free the tupdesc again after the previous error. If I fix that but omit the CreateTupleDescCopy step, then the reproducer behaves much more sanely: psql:bug17607.sql:29: WARNING: AbortSubTransaction while in ABORT state psql:bug17607.sql:29: ERROR: BOOM ! CONTEXT: PL/pgSQL function on_tbl_parent_id_change_fn() line 3 at RAISE ERROR: tupdesc reference 0x7fdef236a1b8 is not owned by resource owner SubTransaction ROLLBACK ROLLBACK So what we actually need here is more like the attached. regards, tom lane diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 59b38d00ed..1b1193997b 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -4340,11 +4340,13 @@ GetAfterTriggersStoreSlot(AfterTriggersTableData *table, MemoryContext oldcxt; /* - * We only need this slot only until AfterTriggerEndQuery, but making - * it last till end-of-subxact is good enough. It'll be freed by - * AfterTriggerFreeQuery(). + * We need this slot only until AfterTriggerEndQuery, but making it + * last till end-of-subxact is good enough. It'll be freed by + * AfterTriggerFreeQuery(). However, the passed-in tupdesc might have + * a different lifespan, so we'd better make a copy of that. */ oldcxt = MemoryContextSwitchTo(CurTransactionContext); + tupdesc = CreateTupleDescCopy(tupdesc); table->storeslot = MakeSingleTupleTableSlot(tupdesc, &TTSOpsVirtual); MemoryContextSwitchTo(oldcxt); } @@ -4640,7 +4642,12 @@ AfterTriggerFreeQuery(AfterTriggersQueryData *qs) if (ts) tuplestore_end(ts); if (table->storeslot) - ExecDropSingleTupleTableSlot(table->storeslot); + { + TupleTableSlot *slot = table->storeslot; + + table->storeslot = NULL; + ExecDropSingleTupleTableSlot(slot); + } } /*
Re: BUG #17607: Server process crashes when PLpgSQL function raises error in subtransaction
From
Tom Lane
Date:
... and after yet more poking at it, I see why c5b7ba4e6 masked the problem: it added an optimization such that we don't use the storeslot at all unless a tuple mapping conversion is required. That led me to the test case shown in the attached, which exhibits this symptom in all branches since v12. I had been planning to let this wait until after 15rc1 wrap, but now that I have evidence that the bug is still live in v15, I'm going to go ahead and push. Both components of the fix seem like extremely safe changes, even for the day before wrap. regards, tom lane diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 0ec8d84a1b..0fcf090f22 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -4775,11 +4775,13 @@ GetAfterTriggersStoreSlot(AfterTriggersTableData *table, MemoryContext oldcxt; /* - * We only need this slot only until AfterTriggerEndQuery, but making - * it last till end-of-subxact is good enough. It'll be freed by - * AfterTriggerFreeQuery(). + * We need this slot only until AfterTriggerEndQuery, but making it + * last till end-of-subxact is good enough. It'll be freed by + * AfterTriggerFreeQuery(). However, the passed-in tupdesc might have + * a different lifespan, so we'd better make a copy of that. */ oldcxt = MemoryContextSwitchTo(CurTransactionContext); + tupdesc = CreateTupleDescCopy(tupdesc); table->storeslot = MakeSingleTupleTableSlot(tupdesc, &TTSOpsVirtual); MemoryContextSwitchTo(oldcxt); } @@ -5098,7 +5100,12 @@ AfterTriggerFreeQuery(AfterTriggersQueryData *qs) if (ts) tuplestore_end(ts); if (table->storeslot) - ExecDropSingleTupleTableSlot(table->storeslot); + { + TupleTableSlot *slot = table->storeslot; + + table->storeslot = NULL; + ExecDropSingleTupleTableSlot(slot); + } } /* diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out index 641f5e67c4..8b8eadd181 100644 --- a/src/test/regress/expected/triggers.out +++ b/src/test/regress/expected/triggers.out @@ -3468,7 +3468,7 @@ insert into convslot_test_parent(col1) values ('1'); insert into convslot_test_child(col1) values ('1'); insert into convslot_test_parent(col1) values ('3'); insert into convslot_test_child(col1) values ('3'); -create or replace function trigger_function1() +create function convslot_trig1() returns trigger language plpgsql AS $$ @@ -3478,7 +3478,7 @@ raise notice 'trigger = %, old_table = %', (select string_agg(old_table::text, ', ' order by col1) from old_table); return null; end; $$; -create or replace function trigger_function2() +create function convslot_trig2() returns trigger language plpgsql AS $$ @@ -3490,10 +3490,10 @@ return null; end; $$; create trigger but_trigger after update on convslot_test_child referencing new table as new_table -for each statement execute function trigger_function2(); +for each statement execute function convslot_trig2(); update convslot_test_parent set col1 = col1 || '1'; NOTICE: trigger = but_trigger, new table = (11,tutu), (31,tutu) -create or replace function trigger_function3() +create function convslot_trig3() returns trigger language plpgsql AS $$ @@ -3506,16 +3506,42 @@ return null; end; $$; create trigger but_trigger2 after update on convslot_test_child referencing old table as old_table new table as new_table -for each statement execute function trigger_function3(); +for each statement execute function convslot_trig3(); update convslot_test_parent set col1 = col1 || '1'; NOTICE: trigger = but_trigger, new table = (111,tutu), (311,tutu) NOTICE: trigger = but_trigger2, old_table = (11,tutu), (31,tutu), new table = (111,tutu), (311,tutu) create trigger bdt_trigger after delete on convslot_test_child referencing old table as old_table -for each statement execute function trigger_function1(); +for each statement execute function convslot_trig1(); delete from convslot_test_parent; NOTICE: trigger = bdt_trigger, old_table = (111,tutu), (311,tutu) drop table convslot_test_child, convslot_test_parent; +drop function convslot_trig1(); +drop function convslot_trig2(); +drop function convslot_trig3(); +-- Bug #17607: variant of above in which trigger function raises an error; +-- we don't see any ill effects unless trigger tuple requires mapping +create table convslot_test_parent (id int primary key, val int) +partition by range (id); +create table convslot_test_part (val int, id int not null); +alter table convslot_test_parent + attach partition convslot_test_part for values from (1) to (1000); +create function convslot_trig4() returns trigger as +$$begin raise exception 'BOOM!'; end$$ language plpgsql; +create trigger convslot_test_parent_update + after update on convslot_test_parent + referencing old table as old_rows new table as new_rows + for each statement execute procedure convslot_trig4(); +insert into convslot_test_parent (id, val) values (1, 2); +begin; +savepoint svp; +update convslot_test_parent set val = 3; -- error expected +ERROR: BOOM! +CONTEXT: PL/pgSQL function convslot_trig4() line 1 at RAISE +rollback to savepoint svp; +rollback; +drop table convslot_test_parent; +drop function convslot_trig4(); -- Test trigger renaming on partitioned tables create table grandparent (id int, primary key (id)) partition by range (id); create table middle partition of grandparent for values from (1) to (10) diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql index 9d21bbe2a9..4d8504fb24 100644 --- a/src/test/regress/sql/triggers.sql +++ b/src/test/regress/sql/triggers.sql @@ -2615,7 +2615,7 @@ insert into convslot_test_child(col1) values ('1'); insert into convslot_test_parent(col1) values ('3'); insert into convslot_test_child(col1) values ('3'); -create or replace function trigger_function1() +create function convslot_trig1() returns trigger language plpgsql AS $$ @@ -2626,7 +2626,7 @@ raise notice 'trigger = %, old_table = %', return null; end; $$; -create or replace function trigger_function2() +create function convslot_trig2() returns trigger language plpgsql AS $$ @@ -2639,11 +2639,11 @@ end; $$; create trigger but_trigger after update on convslot_test_child referencing new table as new_table -for each statement execute function trigger_function2(); +for each statement execute function convslot_trig2(); update convslot_test_parent set col1 = col1 || '1'; -create or replace function trigger_function3() +create function convslot_trig3() returns trigger language plpgsql AS $$ @@ -2657,15 +2657,48 @@ end; $$; create trigger but_trigger2 after update on convslot_test_child referencing old table as old_table new table as new_table -for each statement execute function trigger_function3(); +for each statement execute function convslot_trig3(); update convslot_test_parent set col1 = col1 || '1'; create trigger bdt_trigger after delete on convslot_test_child referencing old table as old_table -for each statement execute function trigger_function1(); +for each statement execute function convslot_trig1(); delete from convslot_test_parent; drop table convslot_test_child, convslot_test_parent; +drop function convslot_trig1(); +drop function convslot_trig2(); +drop function convslot_trig3(); + +-- Bug #17607: variant of above in which trigger function raises an error; +-- we don't see any ill effects unless trigger tuple requires mapping + +create table convslot_test_parent (id int primary key, val int) +partition by range (id); + +create table convslot_test_part (val int, id int not null); + +alter table convslot_test_parent + attach partition convslot_test_part for values from (1) to (1000); + +create function convslot_trig4() returns trigger as +$$begin raise exception 'BOOM!'; end$$ language plpgsql; + +create trigger convslot_test_parent_update + after update on convslot_test_parent + referencing old table as old_rows new table as new_rows + for each statement execute procedure convslot_trig4(); + +insert into convslot_test_parent (id, val) values (1, 2); + +begin; +savepoint svp; +update convslot_test_parent set val = 3; -- error expected +rollback to savepoint svp; +rollback; + +drop table convslot_test_parent; +drop function convslot_trig4(); -- Test trigger renaming on partitioned tables create table grandparent (id int, primary key (id)) partition by range (id);