Unlogged tables can vanish after a crash - Mailing list pgsql-hackers

From Albe Laurenz
Subject Unlogged tables can vanish after a crash
Date
Msg-id A737B7A37273E048B164557ADEF4A58B17D9FC1B@ntex2010a.host.magwien.gv.at
Whole thread Raw
Responses Re: Unlogged tables can vanish after a crash  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
I observed an interesting (and I think buggy) behaviour today after one of
our clusters crashed due to an "out of space" condition in the data directory.

Five databases in that cluster have each one unlogged table.

The log reads as follows:

PANIC  could not write to file "pg_xlog/xlogtemp.1820": No space left on device
...
LOG    terminating any other active server processes
...
LOG    all server processes terminated; reinitializing
LOG    database system was interrupted; last known up at 2014-11-18 18:04:28 CET
LOG    database system was not properly shut down; automatic recovery in progress
LOG    redo starts at C9/50403B20
LOG    redo done at C9/5AFFFF98
LOG    checkpoint starting: end-of-recovery immediate
LOG    checkpoint complete: ...
LOG    autovacuum launcher started
LOG    database system is ready to accept connections
...
PANIC  could not write to file "pg_xlog/xlogtemp.4417": No space left on device
...
LOG    terminating any other active server processes
...
LOG    all server processes terminated; reinitializing
LOG    database system was interrupted; last known up at 2014-11-18 18:04:38 CET
LOG    database system was not properly shut down; automatic recovery in progress
LOG    redo starts at C9/5B000070
LOG    redo done at C9/5FFFE4E0
LOG    checkpoint starting: end-of-recovery immediate
LOG    checkpoint complete: ...
FATAL  could not write to file "pg_xlog/xlogtemp.4442": No space left on device
LOG    startup process (PID 4442) exited with exit code 1
LOG    aborting startup due to startup process failure

After the problem was removed, the cluster was restarted.
The log reads as follows:

LOG    ending log output to stderr  Future log output will go to log destination "csvlog".
LOG    database system was shut down at 2014-11-18 18:05:03 CET
LOG    autovacuum launcher started
LOG    database system is ready to accept connections


So no crash recovery was performed, probably because the startup process
failed *after* it completed the end-of-recovery checkpoint.

Now the main fork files for all five unlogged tables are gone; the init fork files
are still there.

Obviously the main fork got nuked during recovery, but the startup process died
before it could recreate them:
   /*    * Preallocate additional log files, if wanted.    */   PreallocXlogFiles(EndOfLog);
   /*    * Reset initial contents of unlogged relations.  This has to be done    * AFTER recovery is complete so that
anyunlogged relations created    * during recovery also get picked up.    */   if (InRecovery)
ResetUnloggedRelations(UNLOGGED_RELATION_INIT);

It seems to me that the right fix would be to recreate the unlogged
relations *before* the checkpoint.

Yours,
Laurenz Albe

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: tracking commit timestamps
Next
From: Simon Riggs
Date:
Subject: Re: proposal: plpgsql - Assert statement