WAL recycling, ext3, Linux 2.4.18 - Mailing list pgsql-general

From Doug Fields
Subject WAL recycling, ext3, Linux 2.4.18
Date
Msg-id 5.1.0.14.2.20020708022105.01f36598@pop.pexicom.com
Whole thread Raw
In response to Re: I am being interviewed by OReilly  (Robert L Mathews <lists@tigertech.com>)
Responses Re: WAL recycling, ext3, Linux 2.4.18
Re: WAL recycling, ext3, Linux 2.4.18
List pgsql-general
Hello all,

I'm still trying to track down my very odd periodic pauses/hangs in
PostgreSQL 7.2.1.

I've localized it to what seems to be the "recycled transaction log file"
lines in the log file. Whenever this happens, a whole bunch of queries
which were "on hold" (just sitting there, as can be seen in
pg_stat_activity, when they usually execute in fractions of a second) come
back to life and finish very quickly.

Unfortunately, PostgreSQL doesn't seem to log when it starts doing this
recycling, only when it's done.

However, it seems to be taking about 1.5 minutes (yes, around 90 seconds)
to do this recycling on about sixteen of these WAL files at a time.
(Deduction from the logs from the application that uses the database.) I
currently have about 102 of these WAL files (I don't mind; I have 50 gigs
set aside for pg_xlog). My postgresql.conf settings are:

WAL_FILES = 48
WAL_BUFFERS = 16
CHECKPOINT_SEGMENTS = 30

With this, during my heavy load period, I get those 16 WAL recycling
messages every 6.5 minutes. During heavy vacuuming, the recycling happens
every 3 minutes, and that was my goal (no more than every three minutes,
per Bruce Momjian's PDF on tuning).

My server specs:
Dual P4 Xeon 2.4
8gb RAM
RAID-1 drive for pg_xlog - running ext3
RAID-5 drive dedicated to PostgreSQL for everything else - running ext3
Debian 3.0 (woody) kernel 2.4.18

Some questions:

1) Is there any known bad interactions with ext3fs and PostgreSQL? My
hardware vendor (Pogo Linux, recommended) seemed to suggest that ext3fs has
problems in multi-threading.
2) Any ideas on how to get it to log more info on WAL usage?
3) Which process in PostgreSQL should I attach to using gdb to check out
this WAL stuff?

Putting my application on hold for 1.5 minutes out of every 6.5 is of
course very bad... I'm stumped. Any ideas are welcome; I am willing to
provide any additional information and run any other tests.

Thanks,

Doug




pgsql-general by date:

Previous
From: Curt Sampson
Date:
Subject: Re: clean up time!
Next
From: Ricardo Junior
Date:
Subject: Re: I am being interviewed by OReilly