Home > mailing lists

Re: Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt) - Mailing list pgsql-general

From	Greg Smith
Subject	Re: Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt)
Date	June 18, 2008 18:17:15
Msg-id	Pine.GSO.4.64.0806181358230.11228@westnet.com Whole thread Raw
In response to	Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt) (Sam Mason <sam@samason.me.uk>)
Responses	Re: Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt) (Sam Mason <sam@samason.me.uk>)
List	pgsql-general

Tree view

On Wed, 18 Jun 2008, Sam Mason wrote:

> Isn't fsync only a side-effect of having a write-back cache between
> programs and the disk?  This means it's only purpose is to ensure that
> the cache is consistent with what's on disk.  Because all programs
> running within a system are running on top of the cache they don't know
> or care whether the cache actually matches up to the disk.

Most programs don't.  PostgreSQL writes to the database in two stages:
the WAL, followed by an fsync, then later to the main database files.
You can't trust the WAL will be around for recovery until the first fsync
returns.  The checkpoint process makes sure everything that went into the
WAL then made it to the main database files, and again it doesn't trust
that it's really on disk until the fsync returns.

> Therefore, if I understand things correctly, the state of fsync
> shouldn't matter in this use case.  It's equally borken independent to
> the state of fsync.

Quote borken indeed, and fsync has nothing to do with it.  The theory
proposed is that since no writes were done, the backup should be
consistant.  This is quite wrong.  The most obvious case showing that is
one where a time-driven checkpoint occured (as happens every 5 minutes by
default) while you were in the middle of backing up.  Let's say the main
database files are backed up before the checkpoint, but the backup is
still going on some giant archival table.  The checkpoint happens; it
updates the earlier files already in the backup.  The checkpoint finishes,
and erases the WAL logs.  Now the backup makes it way to the WAL files.
You're screwed when you try and recover this database from the backup.
The database doesn't have the latest updates, and the WAL can't recover
them because it already cleared its copy of them out thinking they weren't
needed anymore.  You'll be lucky to get the database to start at all, it's
missing data you thought was commited before the backup started, and who
knows what subtle corruption you'll find.

Now, in reality, even time-driven checkpoints don't do anything if there
hasn't been activity, so it may very well be the case that any one
database backup is fine.  But you can't ignore the requirement to do a
pg_start_backup before making a filesystem level backup and expect you'll
get that lucky--sooner or later you will get a backup that won't restore
if you keep that up.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-general by date:

From: Rich Shepard
Date: 18 June 2008, 18:01:53
Subject: Re: Correct pg_dumpall Syntax

From: Tom Lane
Date: 18 June 2008, 18:39:45
Subject: Re: migrating from mysql: need to convert empty string to null

Re: Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt) - Mailing list pgsql-general

Previous

Next