WAL sync behaviour - Mailing list pgsql-performance

From Steinar H. Gunderson
Subject WAL sync behaviour
Date
Msg-id 20051110133241.GA4031@uio.no
Whole thread Raw
Responses Re: WAL sync behaviour  (Richard Huxton <dev@archonet.com>)
List pgsql-performance
Hi,

We're having problems with our PostgreSQL server using forever for simple
queries, even when there's little load -- or rather, the transactions seem
to take forever to commit. We're using 8.1 (yay!) on a single Opteron, with
WAL on the system two-disk (software) RAID-1, separate from the database
four-disk RAID-10. All drives are 10000rpm SCSI disks, with write cache
turned off; we value our data :-) We're running Linux 2.6.13.4, with 64-bit
kernel but 32-bit userspace.

The main oddity is that simple transactions take forever to execute, even on
small tables with no triggers. A COMMIT on an otherwise idle system with one
row to commit can take anything from 60-200ms to execute, which seems quite
excessive -- sometimes (and I've verified that there's not a checkpoint or
vacuum going on at that time), transactions seem to pile up and you get
behaviour like:

LOG:  duration: 836.004 ms  statement: UPDATE mivu3ping SET pingtime = NOW(), ip = '129.241.93.100', hostname =
'mivu-03.samfundet.no'WHERE posid = 'mivu-03' 
LOG:  duration: 753.545 ms  statement: UPDATE mivu3ping SET pingtime = NOW(), ip =  '129.241.93.110', hostname =
'mivu-13.samfundet.no'WHERE posid = 'mivu-13' 
LOG:  duration: 567.914 ms  statement: UPDATE mivu3ping SET pingtime = NOW(), ip = '129.241.93.109', hostname =
'mivu-12.samfundet.no'WHERE posid = 'mivu-12' 
LOG:  duration: 515.013 ms  statement: UPDATE mivu3ping SET pingtime = NOW(), ip = '129.241.93.105', hostname =
'mivu-08.samfundet.no'WHERE posid = 'mivu-08' 
LOG:  duration: 427.541 ms  statement: UPDATE mivu3ping SET pingtime = NOW(), ip = '129.241.93.104', hostname =
'mivu-07.samfundet.no'WHERE posid = 'mivu-07' 
LOG:  duration: 383.314 ms  statement: UPDATE mivu3ping SET pingtime = NOW(), ip = '129.241.93.107', hostname =
'mivu-10.samfundet.no'WHERE posid = 'mivu-10' 
LOG:  duration: 348.965 ms  statement: UPDATE mivu3ping SET pingtime = NOW(), ip = '129.241.93.103', hostname =
'mivu-06.samfundet.no'WHERE posid = 'mivu-06' 
LOG:  duration: 314.465 ms  statement: UPDATE mivu3ping SET pingtime = NOW(), ip = '129.241.93.101', hostname =
'mivu-04.samfundet.no'WHERE posid = 'mivu-04' 
LOG:  duration: 824.893 ms  statement: UPDATE mivu3ping SET pingtime = NOW(), ip = '129.241.93.106', hostname =
'mivu-09.samfundet.no'WHERE posid = 'mivu-09' 

Sometimes, six or seven of these transactions even seem to wait for the same
thing, reporting finishing times of something like 6, 5, 4, 3 and 2 seconds
right after each other in the log! This is not a highly loaded system, so I
don't really see why this should happen. (We had the same problems with 7.4,
but if my imagination isn't playing games on me, they seem to have become
slightly worse with 8.1.)

strace shows that fdatasync() takes almost all that time, but when I run my own
fdatasync() test program on the same file system, I can consistently sync a
file (after an 8kB write) in about 30ms every time, so I don't really know why
this would be so much slower with PostgreSQL. We're using the cfq scheduler,
but deadline and noop give about the same results.

Setting wal_sync_method = open_sync seems to improve the situation
dramatically on simple commits; we get down into the 10-30ms range on an idle
system. OTOH, behaviour seems to get slightly worse when there's more stuff
going on, and we still get the 300ms transactions in batches every now and
then.

Any good ideas?

/* Steinar */
--
Homepage: http://www.sesse.net/

pgsql-performance by date:

Previous
From: Nörder-Tuitje, Marcus
Date:
Subject: Re: (View and SQL) VS plpgsql
Next
From: Richard Huxton
Date:
Subject: Re: WAL sync behaviour