Shared buffers, db transactions commited, and write IO on Solaris - Mailing list pgsql-performance

From Erik Jones
Subject Shared buffers, db transactions commited, and write IO on Solaris
Date
Msg-id 9B7F4B34-7E96-4501-BCE0-D0BED739530A@myemma.com
Whole thread Raw
Responses Re: Shared buffers, db transactions commited, and write IO on Solaris  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Greetings,

We've recently made a couple changes to our system that have resulted in a drastic increase in performance as well as some very confusing changes to the database statistics, specifically pg_stat_database.xact_commit.  Here's the details:

OS: Solaris10 x86
Server: Sunfire X4100, 8GB Memory, 2 Dual Core Opterons
Postgres 8.2.3
Disk array: 
Sun STK 6130 + CSM100 SATA tray, dual channel MPXIO, 15k drives, RAID5 across 14 disks
WAL logs on SATA RAID10
SAN architecture, 2 brocade FABRIC switches

The changes we made were:

Increase shared buffers from 150000 to 200000
Set the disk mount for the data directory to use forcedirectio (added that mount option that to the /etc/vfstab entry (ufs fs))

So, the reason we did this was that for months now we'd been experiencing extremely high IO load from both the perspective of the OS and the database, specifically where writes were concerned.  During peak hourse it wasn't unheard of for pg_stat_database to report anywhere from 500000 to 1000000 transactions committed in an hour.  iostat's %b (disk busy) sat at 100% for longer than we'd care to think about with the wait percentage going from a few percent on up to 50% at times and the cpu load almost never rising from around a 2 avg., i.e. we were extremely IO bound in all cases.  

As soon as we restarted postgres after making those changes the IO load was gone.  While we the number and amount of disk reads have stayed pretty much the same and the number of disk writes have stayed the same, the amount of data being written has dropped by about a factor of 10, which is huge.  The cpu load shot way up to around a 20 avg. and stayed that way up and stayed that way for about two days (we're thinking that was autovacuum "catching up").  In addition, and this is the truly confusing part, the xact_commit and xact_rollback stats from pg_stat_database both dropped by an order of magnitude (another factor of 10).  So, we are now doing 50000 to 100000 commits per hour during peak hours.

So, where were all of those extra transactions coming from?  Are transactions reported on in pg_stat_database anything but SQL statements?  What was causing all of the excess(?!) data being written to the disk (it seems that there's a 1:1 correspondence between the xacts and volume of data being written)?  Given that we have the bgwriter on, could it have been the culprit and one of the changes allowed it to now operate more efficiently and/or correctly?

erik jones <erik@myemma.com>
software developer
615-296-0838
emma(r)



pgsql-performance by date:

Previous
From: "Daniel Cristian Cruz"
Date:
Subject: Re: Improving performance on system catalog
Next
From: Dan Harris
Date:
Subject: Planner doing seqscan before indexed join