Re: Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt) - Mailing list pgsql-general

From Greg Smith
Subject Re: Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt)
Date
Msg-id Pine.GSO.4.64.0806181358230.11228@westnet.com
Whole thread Raw
In response to Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt)  (Sam Mason <sam@samason.me.uk>)
Responses Re: Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt)  (Sam Mason <sam@samason.me.uk>)
List pgsql-general
On Wed, 18 Jun 2008, Sam Mason wrote:

> Isn't fsync only a side-effect of having a write-back cache between
> programs and the disk?  This means it's only purpose is to ensure that
> the cache is consistent with what's on disk.  Because all programs
> running within a system are running on top of the cache they don't know
> or care whether the cache actually matches up to the disk.

Most programs don't.  PostgreSQL writes to the database in two stages:
the WAL, followed by an fsync, then later to the main database files.
You can't trust the WAL will be around for recovery until the first fsync
returns.  The checkpoint process makes sure everything that went into the
WAL then made it to the main database files, and again it doesn't trust
that it's really on disk until the fsync returns.

> Therefore, if I understand things correctly, the state of fsync
> shouldn't matter in this use case.  It's equally borken independent to
> the state of fsync.

Quote borken indeed, and fsync has nothing to do with it.  The theory
proposed is that since no writes were done, the backup should be
consistant.  This is quite wrong.  The most obvious case showing that is
one where a time-driven checkpoint occured (as happens every 5 minutes by
default) while you were in the middle of backing up.  Let's say the main
database files are backed up before the checkpoint, but the backup is
still going on some giant archival table.  The checkpoint happens; it
updates the earlier files already in the backup.  The checkpoint finishes,
and erases the WAL logs.  Now the backup makes it way to the WAL files.
You're screwed when you try and recover this database from the backup.
The database doesn't have the latest updates, and the WAL can't recover
them because it already cleared its copy of them out thinking they weren't
needed anymore.  You'll be lucky to get the database to start at all, it's
missing data you thought was commited before the backup started, and who
knows what subtle corruption you'll find.

Now, in reality, even time-driven checkpoints don't do anything if there
hasn't been activity, so it may very well be the case that any one
database backup is fine.  But you can't ignore the requirement to do a
pg_start_backup before making a filesystem level backup and expect you'll
get that lucky--sooner or later you will get a backup that won't restore
if you keep that up.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-general by date:

Previous
From: Rich Shepard
Date:
Subject: Re: Correct pg_dumpall Syntax
Next
From: Tom Lane
Date:
Subject: Re: migrating from mysql: need to convert empty string to null