Re: improving concurrent transactin commit rate - Mailing list pgsql-hackers

From Greg Smith
Subject Re: improving concurrent transactin commit rate
Date
Msg-id alpine.GSO.2.01.0903242255570.16570@westnet.com
Whole thread Raw
In response to improving concurrent transactin commit rate  (Sam Mason <sam@samason.me.uk>)
Responses Re: improving concurrent transactin commit rate  (Sam Mason <sam@samason.me.uk>)
List pgsql-hackers
On Tue, 24 Mar 2009, Sam Mason wrote:

> The conceptual idea is to have at most one outstanding flush for the
> log going through the filesystem at any one time.

Quoting from src/backend/access/transam/xlog.c, inside XLogFlush:

"Since fsync is usually a horribly expensive operation, we try to 
piggyback as much data as we can on each fsync: if we see any more data 
entered into the xlog buffer, we'll write and fsync that too, so that the 
final value of LogwrtResult.Flush is as large as possible. This gives us 
some chance of avoiding another fsync immediately after."

The logic implementing that idea takes care of bunching up flushes for WAL 
data that also happens to be ready to go at that point.  You can see this 
most easily by doing inserts into a system that's limited by a slow fsync, 
like a single disk without write cache where you're bound by RPM speed. 
If you have, say, a 7200RPM disk, no one client can commit faster than 120 
times/second.  But if you have 10 clients all pushing small inserts in, 
it's fairly easy to see >500 transactions/second, because a bunch of 
commits will get batched up during the time the last fsync is waiting for 
the disk to finish.

The other idea you'll already find implemented in there is controlled by 
commit_delay.  If there are more than commit_siblings worth of open 
transactions at the point where a commit is supposed to happen, that will 
pause commit_delay microseconds in hopes that other transactions will jump 
onboard via the mechanism described above.  In practice, it's very hard to 
tune that usefully.  You can use it to help bunch together commits a bit 
better into bigger batches on a really busy system (where not having more 
than one commit ready is unexpected), it's not much help outside of that 
context.

Check out the rest of the comments in xlog.c, there's a lot in there 
that's not really covered in the README.  If you turn on WAL_DEBUG and 
XLOG_DEBUG you can actually watch some of this happen.  I found time spent 
reading the source to that file and src/backend/storage/buffer/bufmgr.c to 
be really well spent, some of the most interesting parts of the codebase 
to understand from a low-level performance tuning perspective are in those 
two.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: improving concurrent transactin commit rate
Next
From: Fujii Masao
Date:
Subject: New trigger option of pg_standby