Re: Hard limit on WAL space used (because PANIC sucks) - Mailing list pgsql-hackers

From MauMau
Subject Re: Hard limit on WAL space used (because PANIC sucks)
Date
Msg-id DF0B1E1C6BD54895B16AAED7CFD94413@maumau
Whole thread Raw
In response to Re: Hard limit on WAL space used (because PANIC sucks)  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Hard limit on WAL space used (because PANIC sucks)
List pgsql-hackers
From: "Craig Ringer" <craig@2ndquadrant.com>
> On 06/09/2013 08:32 AM, MauMau wrote:
>>
>> - Failure of a disk containing data directory or tablespace
>> If checkpoint can't write buffers to disk because of disk failure,
>> checkpoint cannot complete, thus WAL files accumulate in pg_xlog/.
>> This means that one disk failure will lead to postgres shutdown.
>
> I've seen a couple of people bitten by the misunderstanding that
> tablespaces are a way to split up your data based on different
> reliability requirements, and I really need to write a docs patch for
> http://www.postgresql.org/docs/current/static/manage-ag-tablespaces.html
> <http://www.postgresql.org/docs/9.2/static/manage-ag-tablespaces.html>
> that adds a prominent warning like:
>
> WARNING: Every tablespace must be present before the database can be
> started. There is no easy way to recover the database if a tablespace is
> lost to disk failure, deletion, use of volatile storage, etc. <b>Do not
> put a tablespace on a RAM disk</b>; instead just use UNLOGGED tables.
>
> (Opinions on the above?)

Yes, I'm sure this is useful for DBAs to know how postgres behaves and take 
some preparations.  However, this does not apply to my case, because I'm 
using tablespaces for I/O distribution across multiple disks and simply for 
database capacity.

The problem is that the reliability of the database system decreases with 
more disks, because failure of any one of those disks would result in a 
database PANIC shutdown


> I'd rather like to be able to recover from this by treating the
> tablespace as dead, so any attempt to get a lock on any table within it
> fails with an error and already-in-WAL writes to it just get discarded.
> It's the sort of thing that'd only be reasonable to do as a recovery
> option (like zero_damaged_pages) since if applied by default it'd lead
> to potentially severe and unexpected data loss.

I'm in favor of taking a tablespace offline when I/O failure is encountered, 
and continue running the database server.  But WAL must not be discarded 
because committed transactions must be preserved for durability of ACID.

Postgres needs to take these steps when it encounters an I/O error:

1. Take the tablespace offline, so that subsequent read/write against it 
returns an error without actually issuing read/write against data files.

2. Discard shared buffers containing data in the tablespace.

WAL is not affected by the offlining of tablespaces.  WAL records already 
written on the WAL buffer will be written to pg_xlog/ and archived as usual. 
Those WAL records will be used to recover committed transactions during 
archive recovery.

Regards
MauMau




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Valgrind Memcheck support
Next
From: Andrew Dunstan
Date:
Subject: Re: JSON and unicode surrogate pairs