Thread: BUG #16577: Segfault on altering a table located in a dropped tablespace
BUG #16577: Segfault on altering a table located in a dropped tablespace
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16577 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 13beta2 Operating system: Ubuntu 20.04 Description: When executing the following query (a modified excerpt from the tablespace regression test): CREATE TABLESPACE regress_tblspace LOCATION '@testtablespace@'; CREATE TABLE test_default_tab_p(id bigint, val bigint) PARTITION BY LIST (id) TABLESPACE regress_tblspace; CREATE INDEX test_index2 on test_default_tab_p (val) TABLESPACE regress_tblspace; DROP TABLESPACE regress_tblspace; \d+ test_default_tab_p; ALTER TABLE test_default_tab_p ALTER val TYPE bigint; I get a segfault with the stacktrace: Core was generated by `postgres: law regression [local] ALTER TABLE '. Program terminated with signal SIGSEGV, Segmentation fault. #0 quote_identifier (ident=0x0) at ruleutils.c:10754 10754 safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_'); (gdb) bt #0 quote_identifier (ident=0x0) at ruleutils.c:10754 #1 0x000055cd6c3798d3 in pg_get_indexdef_worker (indexrelid=indexrelid@entry=16389, colno=colno@entry=0, excludeOps=excludeOps@entry=0x0, attrsOnly=attrsOnly@entry=false, keysOnly=keysOnly@entry=false, showTblSpc=showTblSpc@entry=true, inherits=true, prettyFlags=0, missing_ok=false) at ruleutils.c:1460 #2 0x000055cd6c379ab1 in pg_get_indexdef_string (indexrelid=indexrelid@entry=16389) at ruleutils.c:1161 #3 0x000055cd6c0ca0c1 in RememberIndexForRebuilding (indoid=16389, tab=tab@entry=0x55cd6c89df20) at tablecmds.c:11718 #4 0x000055cd6c0cd764 in ATExecAlterColumnType (tab=tab@entry=0x55cd6c89df20, rel=rel@entry=0x7f0e3cef3570, cmd=<optimized out>, lockmode=lockmode@entry=8) at tablecmds.c:11280 #5 0x000055cd6c0dece5 in ATExecCmd (wqueue=wqueue@entry=0x7ffe0dcc7a30, tab=tab@entry=0x55cd6c89df20, rel=rel@entry=0x7f0e3cef3570, cmd=<optimized out>, lockmode=lockmode@entry=8, cur_pass=cur_pass@entry=1, context=0x7ffe0dcc7b40) at tablecmds.c:4523 #6 0x000055cd6c0df155 in ATRewriteCatalogs (wqueue=wqueue@entry=0x7ffe0dcc7a30, lockmode=lockmode@entry=8, context=context@entry=0x7ffe0dcc7b40) at ../../../src/include/nodes/nodes.h:594 #7 0x000055cd6c0df3a0 in ATController (parsetree=parsetree@entry=0x55cd6c712040, rel=rel@entry=0x7f0e3cef3570, cmds=0x55cd6c712008, recurse=true, lockmode=lockmode@entry=8, context=context@entry=0x7ffe0dcc7b40) at tablecmds.c:3971 #8 0x000055cd6c0df42a in AlterTable (stmt=stmt@entry=0x55cd6c712040, lockmode=lockmode@entry=8, context=context@entry=0x7ffe0dcc7b40) at tablecmds.c:3627 #9 0x000055cd6c2af6f8 in ProcessUtilitySlow (pstate=pstate@entry=0x55cd6c89ddb0, pstmt=pstmt@entry=0x55cd6c7120e8, queryString=queryString@entry=0x55cd6c711350 "ALTER TABLE test_default_tab_p ALTER val TYPE bigint;", context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=0x55cd6c712388, qc=0x7ffe0dcc8060) at utility.c:1269 #10 0x000055cd6c2af1f2 in standard_ProcessUtility (pstmt=0x55cd6c7120e8, queryString=0x55cd6c711350 "ALTER TABLE test_default_tab_p ALTER val TYPE bigint;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55cd6c712388, qc=0x7ffe0dcc8060) at utility.c:1069 #11 0x000055cd6c2af2d1 in ProcessUtility (pstmt=pstmt@entry=0x55cd6c7120e8, queryString=<optimized out>, context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>, queryEnv=<optimized out>, dest=dest@entry=0x55cd6c712388, qc=0x7ffe0dcc8060) at utility.c:524 #12 0x000055cd6c2ab73c in PortalRunUtility (portal=portal@entry=0x55cd6c7747f0, pstmt=pstmt@entry=0x55cd6c7120e8, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x55cd6c712388, qc=qc@entry=0x7ffe0dcc8060) at pquery.c:1157 #13 0x000055cd6c2ac270 in PortalRunMulti (portal=portal@entry=0x55cd6c7747f0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x55cd6c712388, altdest=altdest@entry=0x55cd6c712388, qc=qc@entry=0x7ffe0dcc8060) at pquery.c:1303 #14 0x000055cd6c2acf52 in PortalRun (portal=portal@entry=0x55cd6c7747f0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55cd6c712388, altdest=altdest@entry=0x55cd6c712388, qc=0x7ffe0dcc8060) at pquery.c:779 #15 0x000055cd6c2a93bc in exec_simple_query ( query_string=query_string@entry=0x55cd6c711350 "ALTER TABLE test_default_tab_p ALTER val TYPE bigint;") at postgres.c:1239 #16 0x000055cd6c2ab2d8 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x55cd6c73ca88, dbname=<optimized out>, username=<optimized out>) at postgres.c:4315 #17 0x000055cd6c216e67 in BackendRun (port=port@entry=0x55cd6c735380) at postmaster.c:4523 #18 0x000055cd6c219fdd in BackendStartup (port=port@entry=0x55cd6c735380) at postmaster.c:4215 #19 0x000055cd6c21a224 in ServerLoop () at postmaster.c:1727 #20 0x000055cd6c21b74d in PostmasterMain (argc=8, argv=<optimized out>) at postmaster.c:1400 #21 0x000055cd6c164bed in main (argc=8, argv=0x55cd6c70b9e0) at main.c:210 Best regards, Alexander
Re: BUG #16577: Segfault on altering a table located in a dropped tablespace
From
Michael Paquier
Date:
On Sun, Aug 09, 2020 at 11:00:01AM +0000, PG Bug reporting form wrote: > When executing the following query (a modified excerpt from the tablespace > regression test): > CREATE TABLESPACE regress_tblspace LOCATION '@testtablespace@'; > CREATE TABLE test_default_tab_p(id bigint, val bigint) > PARTITION BY LIST (id) TABLESPACE regress_tblspace; > CREATE INDEX test_index2 on test_default_tab_p (val) TABLESPACE > regress_tblspace; > DROP TABLESPACE regress_tblspace; > \d+ test_default_tab_p; > ALTER TABLE test_default_tab_p ALTER val TYPE bigint; Thanks Alexander for the report. Interesting case indeed. For a normal table we would complain that the tablespace is not empty when attempting to drop the tablespace. But here we have only one partitioned table still holding references to the tablespace. Things get even more spicy with stuff like that, once you try to play with the orphaned tablespace reference: CREATE TABLESPACE popo location '/tmp/popo'; CREATE TABLE parent_tab (a int) partition by list (a) tablespace popo; DROP TABLESPACE popo; CREATE TABLE child_tab partition of parent_tab for values in (1); ERROR: 58P01: could not create directory "pg_tblspc/24587/PG_12_201909212/16384": No such file or directory LOCATION: TablespaceCreateDbspace, tablespace.c:161 The issue with indexes is present since 11, but we have more as reltablespace gets also set for partitioned tables since 12. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Thanks Alexander for the report. Interesting case indeed. > For a normal table we would complain that the tablespace is not empty > when attempting to drop the tablespace. But here we have only one > partitioned table still holding references to the tablespace. Yeah, this seems like a mess. DROP TABLESPACE supposes that it can drop the tablespace if there are no physical files in it, and it's really hard to see how it could test any more carefully given that it cannot see what is in pg_class of other databases. Offhand it seems like we could either 1. Start creating an empty physical file for each partitioned table or index. 2. Forget the idea that a partitioned table/index has an associated tablespace. Neither of these are terribly attractive. But I notice that we already backed off the idea that this is a thing to some extent: regression=# CREATE TABLE test_default_tab_p(id bigint, val bigint) PARTITION BY LIST (id) TABLESPACE pg_default; ERROR: cannot specify default tablespace for partitioned relations I'm a bit inclined to think that this "feature" is sufficiently broken that we should just drop it. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Aug-09, Tom Lane wrote: >> Michael Paquier <michael@paquier.xyz> writes: >>> For a normal table we would complain that the tablespace is not empty >>> when attempting to drop the tablespace. But here we have only one >>> partitioned table still holding references to the tablespace. > Ah, so it turns out that the physical files were necessary after all. > Maybe the solution to this problem is indeed to have them. It means > partly reverting this commit: I think actually the hardest part will be figuring out what is the conversion path, e.g. will pg_upgrade have to know about this. One point that strikes me is that that will put us in a place where "does this relation have storage" is not a binary choice. The possible answers will be "yes", "no", or "has an empty stub file". If we try to take shortcuts rather than handling that honestly, we'll be in for more bugs. > As for the crash at hand, it seems it can be solved easily by making > ruleutils avoid trying to dereference a null pointer, as in the attached > patch. Meh. I have a feeling that that's just the tip of the iceberg of things that will go wrong in this scenario. I'm not sure how much band-aid code we want to expend on the case --- after all, tablespace create/drop is a superuser-only activity, so people aren't doing it carelessly (one hopes). regards, tom lane
Re: BUG #16577: Segfault on altering a table located in a dropped tablespace
From
Tomas Vondra
Date:
On Tue, Sep 08, 2020 at 06:37:29PM -0300, Alvaro Herrera wrote: >On 2020-Sep-08, Michael Paquier wrote: > >> On Tue, Aug 11, 2020 at 04:04:07PM +0900, Michael Paquier wrote: >> > Hmm. Creating a file for partitioned table would be a completely new >> > thing as well. heap_create() has never created a file for partitioned >> > tables since 10 so this could open to a new class of bugs. >> >> This thread has stalled for a couple of weeks now, and I would tend to >> take the path where we'd basically revert 8725958 and ca41030. That's >> too late for v13 to do anything about that. But not for 14. Any >> opinions? > >Well, naturally I oppose this idea. > Would it actually solve the issue? ISTM we'd still have to expect cases with partitioned tables without storage, so presumably we'd have to do something else ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #16577: Segfault on altering a table located in a dropped tablespace
From
Michael Paquier
Date:
On Wed, Sep 09, 2020 at 01:55:53AM +0200, Tomas Vondra wrote: > Would it actually solve the issue? ISTM we'd still have to expect cases > with partitioned tables without storage, so presumably we'd have to do > something else ... I am not sure what you mean here. If we don't keep anymore references to tablespace OIDs in pg_class for partitioned tables, meaning that we don't leave anything dangling if the tablespace is dropped without any files in its location, how could that be a problem? -- Michael
Attachment
Re: BUG #16577: Segfault on altering a table located in a dropped tablespace
From
Alvaro Herrera
Date:
On 2020-Sep-09, Tomas Vondra wrote: > On Tue, Sep 08, 2020 at 06:37:29PM -0300, Alvaro Herrera wrote: > > On 2020-Sep-08, Michael Paquier wrote: > > > > > On Tue, Aug 11, 2020 at 04:04:07PM +0900, Michael Paquier wrote: > > > > Hmm. Creating a file for partitioned table would be a completely new > > > > thing as well. heap_create() has never created a file for partitioned > > > > tables since 10 so this could open to a new class of bugs. > > > > > > This thread has stalled for a couple of weeks now, and I would tend to > > > take the path where we'd basically revert 8725958 and ca41030. That's > > > too late for v13 to do anything about that. But not for 14. Any > > > opinions? > > > > Well, naturally I oppose this idea. > > Would it actually solve the issue? ISTM we'd still have to expect cases > with partitioned tables without storage, so presumably we'd have to do > something else ... It just dawned on me that a way to fix this is to use a pg_shdepend entry to protect the tablespace from being dropped.
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2020-Oct-28, Michael Paquier wrote: >> Haven't thought of that approach, good idea! That would not be >> backpatchable but that would be a solution that does not require >> creating files where we don't need them. Did you begin to look at >> that? > I haven't started on this one yet, but I intend to do so shortly. > Strictly speaking, we can still introduce a new category of pg_shdepend > entries in back branches; it won't break anything that works today. Yeah, as long as the patched version won't actively fail when those pg_shdepend entries are missing, I don't think a backpatch is too hazardous. It might be worth checking that the extra entries don't create huge problems if one does downgrade after some of them exist --- but my feeling for how that mechanism works is that it'd Just Work, and indeed provide the missing DROP protection even without explicit action by the backend. I would not be too excited about offering instructions for people to manually add/remove the dependency entries. The amount of value added, versus the risks of bollixing things completely, doesn't sound like a good tradeoff. regards, tom lane
Re: BUG #16577: Segfault on altering a table located in a dropped tablespace
From
Alvaro Herrera
Date:
On 2020-Oct-15, Alvaro Herrera wrote: > It just dawned on me that a way to fix this is to use a pg_shdepend > entry to protect the tablespace from being dropped. Here's a proposed patch for this. -- Álvaro Herrera