Re: Hard limit on WAL space used (because PANIC sucks) - Mailing list pgsql-hackers

From MauMau
Subject Re: Hard limit on WAL space used (because PANIC sucks)
Date
Msg-id 879A3981DBFB4B2D8B3A92C1B5E5CED6@maumau
Whole thread Raw
In response to Re: Hard limit on WAL space used (because PANIC sucks)  (Daniel Farina <daniel@heroku.com>)
Responses Re: Hard limit on WAL space used (because PANIC sucks)
Re: Hard limit on WAL space used (because PANIC sucks)
List pgsql-hackers
From: "Daniel Farina" <daniel@heroku.com>
> On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Right now, what we're telling users is "You can have continuous backup
>> with Postgres, but you'd better hire and expensive consultant to set it
>> up for you, or use this external tool of dubious provenance which
>> there's no packages for, or you might accidentally cause your database
>> to shut down in the middle of the night."
>>
>> At which point most sensible users say "no thanks, I'll use something 
>> else".

> Inverted and just as well supported: "if you want to not accidentally
> lose data, you better hire an expensive consultant to check your
> systems for all sorts of default 'safety = off' features."  This
> being but the hypothetical first one.
>
> Furthermore, I see no reason why high quality external archiving
> software cannot exist.  Maybe some even exists already, and no doubt
> they can be improved and the contract with Postgres enriched to that
> purpose.
>
> Finally, it's not that hard to teach any archiver how to no-op at
> user-peril, or perhaps Postgres can learn a way to do this expressly
> to standardize the procedure a bit to ease publicly shared recipes, 
> perhaps.

Yes, I feel designing reliable archiving, even for the simplest case - copy 
WAL to disk, is very difficult.  I know there are following three problems 
if you just follow the PostgreSQL manual.  Average users won't notice them. 
I guess even professional DBAs migrating from other DBMSs won't, either.

1. If the machine or postgres crashes while archive_command is copying a WAL 
file, later archive recovery fails.
This is because cp leaves a file of less than 16MB in archive area, and 
postgres refuses to start when it finds such a small archive WAL file.
The solution, which IIRC Tomas san told me here, is to do like "cp %p 
/archive/dir/%f.tmp && mv /archive/dir/%f.tmp /archive/dir/%f".

2. archive_command dumps core when you run pg_ctl stop -mi.
This is because postmaster sends SIGQUIT to all its descendants.  The core 
files accumulate in the data directory, which will be backed up with the 
database.  Of course those core files are garbage.
archive_command script needs to catch SIGQUIT and exit.

3. You cannot know the reason of archive_command failure (e.g. archive area 
full) if you don't use PostgreSQL's server logging.
This is because archive_command failure is not logged in syslog/eventlog.


I hope PostgreSQL will provide a reliable archiving facility that is ready 
to use.

Regards
MauMau








pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Optimising Foreign Key checks
Next
From: Andrew Dunstan
Date:
Subject: Re: UTF-8 encoding problem w/ libpq