DROP TABLESPACE causes panic during recovery - Mailing list pgsql-hackers

From Tom Lane
Subject DROP TABLESPACE causes panic during recovery
Date
Msg-id 4687.1091644796@sss.pgh.pa.us
Whole thread Raw
Responses Re: DROP TABLESPACE causes panic during recovery
List pgsql-hackers
In CVS tip, try running the regression tests against an installed
postmaster (ie, make installcheck); then as soon as the tests are
done, kill -9 the bgwriter process to force a database restart.
Most of the time you'll get a PANIC during recovery:

LOG:  background writer process (PID 2493) was terminated by signal 9
LOG:  server process (PID 2493) was terminated by signal 9
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2004-08-04 14:26:23 EDT
LOG:  checkpoint record is at 0/4C1CA28
LOG:  redo record is at 0/4BFD510; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 11269; next OID: 294376
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/4BFD510
PANIC:  could not create directory "/home/postgres/testversion/data/pg_tblspc/301180/163304": No such file or
directory
LOG:  startup process (PID 4560) was terminated by signal 6
LOG:  aborting startup due to startup process failure

The panic is here:

(gdb) bt
#0  0xc0141220 in ?? () from /usr/lib/libc.1
#1  0xc00aa7ec in ?? () from /usr/lib/libc.1
#2  0xc008c2b8 in ?? () from /usr/lib/libc.1
#3  0xc0086d9c in ?? () from /usr/lib/libc.1
#4  0x2c6080 in errfinish (dummy=1) at elog.c:454
#5  0x185984 in TablespaceCreateDbspace (spcNode=1074100592, dbNode=0,   isRedo=1 '\001') at tablespace.c:140
#6  0x23c90c in smgrcreate (reln=0x400a1d80, isTemp=0 '\000', isRedo=1 '\001')   at smgr.c:327
#7  0x23d6cc in smgr_redo (lsn={xlogid = 0, xrecoff = 86455912},   record=0x40067be8) at smgr.c:876
#8  0x115714 in StartupXLOG () at xlog.c:4229
#9  0x11dc5c in BootstrapMain (argc=4, argv=0x7b03b630) at bootstrap.c:426
#10 0x20b7dc in StartChildProcess (xlop=2) at postmaster.c:3233

and of course the problem is that log replay is not prepared to cope
with a reference to a table that's in a tablespace that no longer
exists.  The regression tests trigger the problem because they do a
DROP TABLESPACE near the end.

This is impossible to fix nicely because the information to reconstruct
the tablespace is simply not available.  We could make an ordinary
directory (not a symlink) under pg_tblspc and then limp along in the
expectation that it would get removed before we finish replay.  Or we
could just skip logged operations on files within the tablespace, but
that feels pretty uncomfortable to me --- it amounts to deliberately
discarding data ...

Any thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PITR - recovery to a particular transaction
Next
From: Kris Jurka
Date:
Subject: Re: postgres and Jdbc 2.0