Re: Perfomance Tuning - Mailing list pgsql-performance

From Reece Hart
Subject Re: Perfomance Tuning
Date
Msg-id 1060642567.15483.220.camel@tallac
Whole thread Raw
In response to Re: Perfomance Tuning  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Perfomance Tuning
List pgsql-performance
On Mon, 2003-08-11 at 15:16, Bruce Momjian wrote:
That _would_ work if ext2 was a reliable file system --- it is not.

Bruce-

I'd like to know your evidence for this. I'm not refuting it, but I'm a >7 year linux user (including several clusters, all of which have run ext2 or ext3) and keep a fairly close ear to kernel newsgroups, announcements, and changelogs. I am aware that there have very occasionally been corruption problems, but my understanding is that these are fixed (and quickly). In any case, I'd say that your assertion is not widely known and I'd appreciate some data or references.

As for PostgreSQL on ext2 and ext3, I recently switched from ext3 to ext2 (Stephen Tweedy was insightful to facilitate this backward compatibility). I did this because I had a 45M row update on one table that was taking inordinate time (killed after 10 hours), even though creating the database from backup takes ~4 hours including indexing (see pgsql-perform post on 2003/07/22). CPU usage was ~2% on an otherwise unloaded, fast, SCSI160 machine. vmstat io suggested that PostgreSQL was writing something on the order of 100x as many blocks as being read. My untested interpretation was that the update bookkeeping as well as data update were all getting journalled, the journal space would fill, get sync'd, then repeat. In effect, all blocks were being written TWICE just for the journalling, never mind the overhead for PostgreSQL transactions. This emphasizes that journals probably work best with short burst writes and syncing during lulls rather than sustained writes.

I ended up solving the update issue without really updating, so ext2 timings aren't known. So, you may want to test this yourself if you're concerned.

-Reece

-- 
Reece Hart, Ph.D.                       rkh@gene.com, http://www.gene.com/
Genentech, Inc.                         650/225-6133 (voice), -5389 (fax)
Bioinformatics and Protein Engineering
1 DNA Way, MS-93                        http://www.in-machina.com/~reece/
South San Francisco, CA  94080-4990     reece@in-machina.com, GPG: 0x25EC91A0

pgsql-performance by date:

Previous
From: "Peter Darley"
Date:
Subject: Re: Odd problem with performance in duplicate database
Next
From: Bruce Momjian
Date:
Subject: Re: Perfomance Tuning