Re: Maximum transaction rate - Mailing list pgsql-general

From Greg Smith
Subject Re: Maximum transaction rate
Date
Msg-id alpine.GSO.2.01.0903172118150.14497@westnet.com
Whole thread Raw
In response to Re: Maximum transaction rate  (Marco Colombo <pgsql@esiway.net>)
Responses Re: Maximum transaction rate  (Marco Colombo <pgsql@esiway.net>)
List pgsql-general
On Wed, 18 Mar 2009, Marco Colombo wrote:

> If you fsync() after each write you want ordered, there can't be any
> "subsequent I/O" (unless there are many different processes cuncurrently
> writing to the file w/o synchronization).

Inside PostgreSQL, each of the database backend processes ends up writing
blocks to the database disk, if they need to allocate a new buffer and the
one they are handed is dirty.  You can easily have several of those
writing to the same 1GB underlying file on disk.  So that prerequisite is
there.  The main potential for a problem here would be if a stray
unsynchronized write from one of those backends happened in a way that
wasn't accounted for by the WAL+checkpoint design.  What I was suggesting
is that the way that synchronization happens in the database provides some
defense from running into problems in this area.

The way backends handle writes themselves is also why your suggestion
about the database being able to utilize barriers isn't really helpful.
Those trickle out all the time, and normally you don't even have to care
about ordering them.  The only you do need to care, at checkpoint time,
only a hard line is really practical--all writes up to that point, period.
Trying to implement ordered writes for everything that happened before
then would complicate the code base, which isn't going to happen for such
a platform+filesystem specific feature, one that really doesn't offer much
acceleration from the database's perspective.

> only when the journal wraps around there's a (extremely) small window of
> vulnerability. You need to write a careful crafted torture program to
> get any chance to observe that... such program exists, and triggers the
> problem

Yeah, I've been following all that.  The PostgreSQL WAL design works on
ext2 filesystems with no journal at all.  Some people even put their
pg_xlog directory onto ext2 filesystems for best performance, relying on
the WAL to be the journal.  As long as fsync is honored correctly, the WAL
writes should be re-writing already allocated space, which makes this
category of journal mayhem not so much of a problem.  But when I read
about fsync doing unexpected things, that gets me more concerned.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-general by date:

Previous
From: Marco Colombo
Date:
Subject: Re: Maximum transaction rate
Next
From: "Zachary Mitchell, BCIS"
Date:
Subject: Question Concerning PostgreSQL license.