Thread: Unlogged tables can vanish after a crash

Unlogged tables can vanish after a crash

From
Albe Laurenz
Date:
I observed an interesting (and I think buggy) behaviour today after one of
our clusters crashed due to an "out of space" condition in the data directory.

Five databases in that cluster have each one unlogged table.

The log reads as follows:

PANIC  could not write to file "pg_xlog/xlogtemp.1820": No space left on device
...
LOG    terminating any other active server processes
...
LOG    all server processes terminated; reinitializing
LOG    database system was interrupted; last known up at 2014-11-18 18:04:28 CET
LOG    database system was not properly shut down; automatic recovery in progress
LOG    redo starts at C9/50403B20
LOG    redo done at C9/5AFFFF98
LOG    checkpoint starting: end-of-recovery immediate
LOG    checkpoint complete: ...
LOG    autovacuum launcher started
LOG    database system is ready to accept connections
...
PANIC  could not write to file "pg_xlog/xlogtemp.4417": No space left on device
...
LOG    terminating any other active server processes
...
LOG    all server processes terminated; reinitializing
LOG    database system was interrupted; last known up at 2014-11-18 18:04:38 CET
LOG    database system was not properly shut down; automatic recovery in progress
LOG    redo starts at C9/5B000070
LOG    redo done at C9/5FFFE4E0
LOG    checkpoint starting: end-of-recovery immediate
LOG    checkpoint complete: ...
FATAL  could not write to file "pg_xlog/xlogtemp.4442": No space left on device
LOG    startup process (PID 4442) exited with exit code 1
LOG    aborting startup due to startup process failure

After the problem was removed, the cluster was restarted.
The log reads as follows:

LOG    ending log output to stderr  Future log output will go to log destination "csvlog".
LOG    database system was shut down at 2014-11-18 18:05:03 CET
LOG    autovacuum launcher started
LOG    database system is ready to accept connections


So no crash recovery was performed, probably because the startup process
failed *after* it completed the end-of-recovery checkpoint.

Now the main fork files for all five unlogged tables are gone; the init fork files
are still there.

Obviously the main fork got nuked during recovery, but the startup process died
before it could recreate them:
   /*    * Preallocate additional log files, if wanted.    */   PreallocXlogFiles(EndOfLog);
   /*    * Reset initial contents of unlogged relations.  This has to be done    * AFTER recovery is complete so that
anyunlogged relations created    * during recovery also get picked up.    */   if (InRecovery)
ResetUnloggedRelations(UNLOGGED_RELATION_INIT);

It seems to me that the right fix would be to recreate the unlogged
relations *before* the checkpoint.

Yours,
Laurenz Albe

Re: Unlogged tables can vanish after a crash

From
Andres Freund
Date:
Hi,

On 2014-11-19 11:26:56 +0000, Albe Laurenz wrote:
> I observed an interesting (and I think buggy) behaviour today after one of
> our clusters crashed due to an "out of space" condition in the data directory.

Hah, just a couple days I pushed a fix for that ;)

http://archives.postgresql.org/message-id/20140912112246.GA4984%40alap3.anarazel.de
and
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d3586fc8aa5d9365a5c50cb5e555971eb633a4ec

> So no crash recovery was performed, probably because the startup process
> failed *after* it completed the end-of-recovery checkpoint.
> 
> Now the main fork files for all five unlogged tables are gone; the init fork files
> are still there.

You can "recover" them by restarting with -m immediate or so again.

> It seems to me that the right fix would be to recreate the unlogged
> relations *before* the checkpoint.

Yep, that's what we're doing now.

Greetings,

Andres Freund



Re: Unlogged tables can vanish after a crash

From
Albe Laurenz
Date:
Andres Freund wrote:
> On 2014-11-19 11:26:56 +0000, Albe Laurenz wrote:
>> I observed an interesting (and I think buggy) behaviour today after one of
>> our clusters crashed due to an "out of space" condition in the data directory.
> 
> Hah, just a couple days I pushed a fix for that ;)
> 
> http://archives.postgresql.org/message-id/20140912112246.GA4984%40alap3.anarazel.de
> and
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d3586fc8aa5d9365a5c50cb5e555971eb633a4ec

Thanks, I didn't see that.
PostgreSQL, the database system where your bugs get fixed before you report them!

Yours,
Laurenz Albe