Thread: DROP TABLESPACE causes panic during recovery

DROP TABLESPACE causes panic during recovery

From

Tom Lane

Date:

04 August 2004, 15:40:11

In CVS tip, try running the regression tests against an installed
postmaster (ie, make installcheck); then as soon as the tests are
done, kill -9 the bgwriter process to force a database restart.
Most of the time you'll get a PANIC during recovery:

LOG:  background writer process (PID 2493) was terminated by signal 9
LOG:  server process (PID 2493) was terminated by signal 9
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2004-08-04 14:26:23 EDT
LOG:  checkpoint record is at 0/4C1CA28
LOG:  redo record is at 0/4BFD510; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 11269; next OID: 294376
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/4BFD510
PANIC:  could not create directory "/home/postgres/testversion/data/pg_tblspc/301180/163304": No such file or
directory
LOG:  startup process (PID 4560) was terminated by signal 6
LOG:  aborting startup due to startup process failure

The panic is here:

(gdb) bt
#0  0xc0141220 in ?? () from /usr/lib/libc.1
#1  0xc00aa7ec in ?? () from /usr/lib/libc.1
#2  0xc008c2b8 in ?? () from /usr/lib/libc.1
#3  0xc0086d9c in ?? () from /usr/lib/libc.1
#4  0x2c6080 in errfinish (dummy=1) at elog.c:454
#5  0x185984 in TablespaceCreateDbspace (spcNode=1074100592, dbNode=0,   isRedo=1 '\001') at tablespace.c:140
#6  0x23c90c in smgrcreate (reln=0x400a1d80, isTemp=0 '\000', isRedo=1 '\001')   at smgr.c:327
#7  0x23d6cc in smgr_redo (lsn={xlogid = 0, xrecoff = 86455912},   record=0x40067be8) at smgr.c:876
#8  0x115714 in StartupXLOG () at xlog.c:4229
#9  0x11dc5c in BootstrapMain (argc=4, argv=0x7b03b630) at bootstrap.c:426
#10 0x20b7dc in StartChildProcess (xlop=2) at postmaster.c:3233

and of course the problem is that log replay is not prepared to cope
with a reference to a table that's in a tablespace that no longer
exists.  The regression tests trigger the problem because they do a
DROP TABLESPACE near the end.

This is impossible to fix nicely because the information to reconstruct
the tablespace is simply not available.  We could make an ordinary
directory (not a symlink) under pg_tblspc and then limp along in the
expectation that it would get removed before we finish replay.  Or we
could just skip logged operations on files within the tablespace, but
that feels pretty uncomfortable to me --- it amounts to deliberately
discarding data ...

Any thoughts?
        regards, tom lane

Re: DROP TABLESPACE causes panic during recovery

From

Kevin Brown

Date:

04 August 2004, 21:56:07

Tom Lane wrote:
> In CVS tip, try running the regression tests against an installed
> postmaster (ie, make installcheck); then as soon as the tests are
> done, kill -9 the bgwriter process to force a database restart.
> Most of the time you'll get a PANIC during recovery:

[...]

> This is impossible to fix nicely because the information to reconstruct
> the tablespace is simply not available.  We could make an ordinary
> directory (not a symlink) under pg_tblspc and then limp along in the
> expectation that it would get removed before we finish replay.  Or we
> could just skip logged operations on files within the tablespace, but
> that feels pretty uncomfortable to me --- it amounts to deliberately
> discarding data ...
> 
> Any thoughts?

How is a dropped table handled by the recovery code?  Doesn't it present
the same sort of issues (though on a smaller scale)?



-- 
Kevin Brown                          kevin@sysexperts.com

Re: DROP TABLESPACE causes panic during recovery

From

Tom Lane

Date:

04 August 2004, 23:48:33

Kevin Brown <kevin@sysexperts.com> writes:
> Tom Lane wrote:
>> This is impossible to fix nicely because the information to reconstruct
>> the tablespace is simply not available.  We could make an ordinary
>> directory (not a symlink) under pg_tblspc and then limp along in the
>> expectation that it would get removed before we finish replay.  Or we
>> could just skip logged operations on files within the tablespace, but
>> that feels pretty uncomfortable to me --- it amounts to deliberately
>> discarding data ...

> How is a dropped table handled by the recovery code?  Doesn't it present
> the same sort of issues (though on a smaller scale)?

Not really.  If the replay code encounters an update to a table file
that's not there, it simply creates the file and plows ahead.  The thing
that I'm stuck on about tablespaces is that if the symlink in
$PGDATA/pg_tblspc isn't there, there's no evident way to recreate it
correctly --- we have no idea where it was supposed to point.
        regards, tom lane

Re: DROP TABLESPACE causes panic during recovery

From

Gavin Sherry

Date:

05 August 2004, 00:06:04

On Wed, 4 Aug 2004, Tom Lane wrote:

> Kevin Brown <kevin@sysexperts.com> writes:
> > Tom Lane wrote:
> >> This is impossible to fix nicely because the information to reconstruct
> >> the tablespace is simply not available.  We could make an ordinary
> >> directory (not a symlink) under pg_tblspc and then limp along in the
> >> expectation that it would get removed before we finish replay.  Or we
> >> could just skip logged operations on files within the tablespace, but
> >> that feels pretty uncomfortable to me --- it amounts to deliberately
> >> discarding data ...
>
> > How is a dropped table handled by the recovery code?  Doesn't it present
> > the same sort of issues (though on a smaller scale)?
>
> Not really.  If the replay code encounters an update to a table file
> that's not there, it simply creates the file and plows ahead.  The thing
> that I'm stuck on about tablespaces is that if the symlink in
> $PGDATA/pg_tblspc isn't there, there's no evident way to recreate it
> correctly --- we have no idea where it was supposed to point.

I don't think we have any choice but to log the symlink creation. Will
this solve the problem?

Gavin

Re: DROP TABLESPACE causes panic during recovery

From

Tom Lane

Date:

05 August 2004, 00:10:13

Gavin Sherry <swm@linuxworld.com.au> writes:
> On Wed, 4 Aug 2004, Tom Lane wrote:
>> Not really.  If the replay code encounters an update to a table file
>> that's not there, it simply creates the file and plows ahead.  The thing
>> that I'm stuck on about tablespaces is that if the symlink in
>> $PGDATA/pg_tblspc isn't there, there's no evident way to recreate it
>> correctly --- we have no idea where it was supposed to point.

> I don't think we have any choice but to log the symlink creation. Will
> this solve the problem?

We do need to do that, but it will *not* solve this problem.  The
scenario that causes the problem is
CREATE TABLESPACE...much time passes...CHECKPOINT...modify tables in tablespacedrop tables in tablespaceDROP
TABLESPACE...systemcrash
 

Now the system needs to replay from the last checkpoint.  It's going to
hit updates to tables that aren't there anymore in a tablespace that's
not there anymore.  There will not be anything in the replayed part of
the log that will give a clue where that tablespace was physically.
        regards, tom lane

Re: DROP TABLESPACE causes panic during recovery

From

Gavin Sherry

Date:

05 August 2004, 00:24:38

On Wed, 4 Aug 2004, Tom Lane wrote:

> Gavin Sherry <swm@linuxworld.com.au> writes:
> > On Wed, 4 Aug 2004, Tom Lane wrote:
> >> Not really.  If the replay code encounters an update to a table file
> >> that's not there, it simply creates the file and plows ahead.  The thing
> >> that I'm stuck on about tablespaces is that if the symlink in
> >> $PGDATA/pg_tblspc isn't there, there's no evident way to recreate it
> >> correctly --- we have no idea where it was supposed to point.
>
> > I don't think we have any choice but to log the symlink creation. Will
> > this solve the problem?
>
> We do need to do that, but it will *not* solve this problem.  The
> scenario that causes the problem is
>
>     CREATE TABLESPACE
>     ...
>     much time passes
>     ...
>     CHECKPOINT
>     ...
>     modify tables in tablespace
>     drop tables in tablespace
>     DROP TABLESPACE
>     ...
>     system crash
>
> Now the system needs to replay from the last checkpoint.  It's going to
> hit updates to tables that aren't there anymore in a tablespace that's
> not there anymore.  There will not be anything in the replayed part of
> the log that will give a clue where that tablespace was physically.

Ahh, yes of course.

Seems like the best way would be to create the path under pg_tblspc as
directories and plough ahead, like you said. The only alternatively that
comes to mind is that we could keep all the directory structure and
symlinks around until the next checkpoint. But that would be messy and may
well not solve the problem anyway for things like PITR.

Gavin

Re: DROP TABLESPACE causes panic during recovery

From

Christopher Kings-Lynne

Date:

05 August 2004, 00:40:24

> We do need to do that, but it will *not* solve this problem.  The
> scenario that causes the problem is
> 
>     CREATE TABLESPACE
>     ...
>     much time passes
>     ...
>     CHECKPOINT
>     ...
>     modify tables in tablespace
>     drop tables in tablespace
>     DROP TABLESPACE
>     ...
>     system crash
> 
> Now the system needs to replay from the last checkpoint.  It's going to
> hit updates to tables that aren't there anymore in a tablespace that's
> not there anymore.  There will not be anything in the replayed part of
> the log that will give a clue where that tablespace was physically.

Maybe we need to create a new system tablespace: pg_recovery

Then when this situation occurs, if the tablespace cannot be located, we 
recrated the objects in the system 'pg_recovery' tablespace or something.

I dunno :)

Chris

Re: DROP TABLESPACE causes panic during recovery

From

"Andrew Dunstan"

Date:

05 August 2004, 00:48:28

Tom Lane said:
>The
> scenario that causes the problem is
>
>     CREATE TABLESPACE
>     ...
>     much time passes
>     ...
>     CHECKPOINT
>     ...
>     modify tables in tablespace
>     drop tables in tablespace
>     DROP TABLESPACE
>     ...
>     system crash
>
> Now the system needs to replay from the last checkpoint.  It's going to
> hit updates to tables that aren't there anymore in a tablespace that's
> not there anymore.  There will not be anything in the replayed part of
> the log that will give a clue where that tablespace was physically.
>

Could we create the tables in the default tablespace? Or create a dummy
tablespace (since it's not there we expect it to be removed anyway, don't
we?) I guess the big danger would be running out of disk space, but maybe
that is a lower risk than this one.

cheers

andrew

Re: DROP TABLESPACE causes panic during recovery

From

Greg Stark

Date:

05 August 2004, 00:59:30

Gavin Sherry <swm@linuxworld.com.au> writes:

> >     CREATE TABLESPACE
> >     ...
> >     much time passes
> >     ...
> >     CHECKPOINT
> >     ...
> >     modify tables in tablespace
> >     drop tables in tablespace
> >     DROP TABLESPACE
> >     ...
> >     system crash

What happens here if no table spaces are involved?

It just creates bogus tables with partial data counting on the restore to see
the drop table command later and delete the corrupt tables?

Does that pose any danger with PITR? The scenario above seems ok since if the
PITR starting point is after the drop table/tablespace then presumably the
recovery target has to be after that as well? Is there any other scenario
where the partial data files could escape the recovery process?

-- 
greg

Re: DROP TABLESPACE causes panic during recovery

From

Bruce Momjian

Date:

05 August 2004, 01:05:28

Andrew Dunstan wrote:
> Tom Lane said:
> >The
> > scenario that causes the problem is
> >
> >     CREATE TABLESPACE
> >     ...
> >     much time passes
> >     ...
> >     CHECKPOINT
> >     ...
> >     modify tables in tablespace
> >     drop tables in tablespace
> >     DROP TABLESPACE
> >     ...
> >     system crash
> >
> > Now the system needs to replay from the last checkpoint.  It's going to
> > hit updates to tables that aren't there anymore in a tablespace that's
> > not there anymore.  There will not be anything in the replayed part of
> > the log that will give a clue where that tablespace was physically.
> >
> 
> Could we create the tables in the default tablespace? Or create a dummy
> tablespace (since it's not there we expect it to be removed anyway, don't
> we?) I guess the big danger would be running out of disk space, but maybe
> that is a lower risk than this one.

Uh, why is the symlink not going to be there already?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: DROP TABLESPACE causes panic during recovery

From

Tom Lane

Date:

05 August 2004, 01:12:42

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Uh, why is the symlink not going to be there already?

Because we removed it at the DROP TABLESPACE.
        regards, tom lane

Re: DROP TABLESPACE causes panic during recovery

From

Christopher Kings-Lynne

Date:

05 August 2004, 01:39:36

>>Uh, why is the symlink not going to be there already?
> 
> 
> Because we removed it at the DROP TABLESPACE.

Maybe we could avoid removing it until the next checkpoint?  Or is that 
not enough.  Maybe it could stay there forever :/

Chris

Re: DROP TABLESPACE causes panic during recovery

From

Tom Lane

Date:

05 August 2004, 01:59:32

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> Maybe we could avoid removing it until the next checkpoint?  Or is that 
> not enough.  Maybe it could stay there forever :/

Part of the problem here is that this code has to serve several
purposes.  We have different scenarios to worry about:
* crash recovery from the most recent checkpoint
* PITR replay over a long interval (many checkpoints)
* recovery in the face of a partially corrupt filesystem

It's the last one that is mostly bothering me at the moment.  I don't
want us to throw away data simply because the filesystem forgot an
inode.  Yeah, we might not have enough data in the WAL log to completely
reconstruct a table, but we should push out what we do have, *not* toss
it into the bit bucket.

In the first case (straight crash recovery) I think it is true that any
reference to a missing file is a reference to a file that will get
deleted before recovery finishes.  But I don't think that holds for PITR
(we might be asked to stop short of where the table gets deleted) nor
for the case where there's been filesystem damage.
        regards, tom lane

Re: DROP TABLESPACE causes panic during recovery

From

Kevin Brown

Date:

08 August 2004, 16:10:18

Tom Lane wrote:
> Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> > Maybe we could avoid removing it until the next checkpoint?  Or is that 
> > not enough.  Maybe it could stay there forever :/
> 
> Part of the problem here is that this code has to serve several
> purposes.  We have different scenarios to worry about:
> 
>     * crash recovery from the most recent checkpoint
> 
>     * PITR replay over a long interval (many checkpoints)
> 
>     * recovery in the face of a partially corrupt filesystem
> 
> It's the last one that is mostly bothering me at the moment.  I don't
> want us to throw away data simply because the filesystem forgot an
> inode.  Yeah, we might not have enough data in the WAL log to completely
> reconstruct a table, but we should push out what we do have, *not* toss
> it into the bit bucket.

I like the idea tossed out by one of the others the most: create a
"recovery" system tablespace, and use it to resolve issues like this.

The question is: what do you do with the tables in that tablespace once
recovery is complete?  Leave them there?  That's certainly a possibility
(in fact, it seems the best option, especially now that we're doing
PITR), but it means that the DBA would have to periodically clean up that
tablespace so that it doesn't run out of space during a later recovery.
Actually, it seems to me to be the only option that isn't the equivalent
of throwing away the data...

> In the first case (straight crash recovery) I think it is true that any
> reference to a missing file is a reference to a file that will get
> deleted before recovery finishes.  But I don't think that holds for PITR
> (we might be asked to stop short of where the table gets deleted) nor
> for the case where there's been filesystem damage.

But doesn't PITR assume that a full filesystem-level restore of the
database as it was prior to the events in the first event log being
replayed has been done?  In that event, wouldn't the PITR process Just
Work?

-- 
Kevin Brown                          kevin@sysexperts.com

Re: DROP TABLESPACE causes panic during recovery

From

Bruce Momjian

Date:

13 August 2004, 01:02:17

Did we resolve this?

---------------------------------------------------------------------------

Tom Lane wrote:
> Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> > Maybe we could avoid removing it until the next checkpoint?  Or is that 
> > not enough.  Maybe it could stay there forever :/
> 
> Part of the problem here is that this code has to serve several
> purposes.  We have different scenarios to worry about:
> 
>     * crash recovery from the most recent checkpoint
> 
>     * PITR replay over a long interval (many checkpoints)
> 
>     * recovery in the face of a partially corrupt filesystem
> 
> It's the last one that is mostly bothering me at the moment.  I don't
> want us to throw away data simply because the filesystem forgot an
> inode.  Yeah, we might not have enough data in the WAL log to completely
> reconstruct a table, but we should push out what we do have, *not* toss
> it into the bit bucket.
> 
> In the first case (straight crash recovery) I think it is true that any
> reference to a missing file is a reference to a file that will get
> deleted before recovery finishes.  But I don't think that holds for PITR
> (we might be asked to stop short of where the table gets deleted) nor
> for the case where there's been filesystem damage.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: DROP TABLESPACE causes panic during recovery

From

Tom Lane

Date:

13 August 2004, 01:12:12

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Did we resolve this?

No, it's an open issue.
        regards, tom lane

Re: DROP TABLESPACE causes panic during recovery

From

Bruce Momjian

Date:

14 August 2004, 21:57:04

Added to open items:
* fix recovery of DROP TABLESPACE after checkpoint


---------------------------------------------------------------------------

Tom Lane wrote:
> Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> > Maybe we could avoid removing it until the next checkpoint?  Or is that 
> > not enough.  Maybe it could stay there forever :/
> 
> Part of the problem here is that this code has to serve several
> purposes.  We have different scenarios to worry about:
> 
>     * crash recovery from the most recent checkpoint
> 
>     * PITR replay over a long interval (many checkpoints)
> 
>     * recovery in the face of a partially corrupt filesystem
> 
> It's the last one that is mostly bothering me at the moment.  I don't
> want us to throw away data simply because the filesystem forgot an
> inode.  Yeah, we might not have enough data in the WAL log to completely
> reconstruct a table, but we should push out what we do have, *not* toss
> it into the bit bucket.
> 
> In the first case (straight crash recovery) I think it is true that any
> reference to a missing file is a reference to a file that will get
> deleted before recovery finishes.  But I don't think that holds for PITR
> (we might be asked to stop short of where the table gets deleted) nor
> for the case where there's been filesystem damage.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: DROP TABLESPACE causes panic during recovery

From

Bruce Momjian

Date:

06 October 2004, 18:34:27

Is this fixed?

---------------------------------------------------------------------------

Tom Lane wrote:
> Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> > Maybe we could avoid removing it until the next checkpoint?  Or is that 
> > not enough.  Maybe it could stay there forever :/
> 
> Part of the problem here is that this code has to serve several
> purposes.  We have different scenarios to worry about:
> 
>     * crash recovery from the most recent checkpoint
> 
>     * PITR replay over a long interval (many checkpoints)
> 
>     * recovery in the face of a partially corrupt filesystem
> 
> It's the last one that is mostly bothering me at the moment.  I don't
> want us to throw away data simply because the filesystem forgot an
> inode.  Yeah, we might not have enough data in the WAL log to completely
> reconstruct a table, but we should push out what we do have, *not* toss
> it into the bit bucket.
> 
> In the first case (straight crash recovery) I think it is true that any
> reference to a missing file is a reference to a file that will get
> deleted before recovery finishes.  But I don't think that holds for PITR
> (we might be asked to stop short of where the table gets deleted) nor
> for the case where there's been filesystem damage.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: DROP TABLESPACE causes panic during recovery

From

Tom Lane

Date:

06 October 2004, 20:16:53

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Is this fixed?

Yes.
        regards, tom lane