Home > mailing lists

WALWriteLock contention - Mailing list pgsql-hackers

From	Robert Haas
Subject	WALWriteLock contention
Date	May 15, 2015 16:06:49
Msg-id	CA+TgmobWdBcbuipWPsbHSbf+-KDmatnYQYZ=AKAjU6ALB5m+hQ@mail.gmail.com Whole thread
Responses	Re: WALWriteLock contention Re: WALWriteLock contention Re: WALWriteLock contention
List	pgsql-hackers

Tree view

WALWriteLock contention is measurable on some workloads.  In studying
the problem briefly, a couple of questions emerged:

1. Doesn't it suck to rewrite an entire 8kB block every time, instead
of only the new bytes (and maybe a few bytes following that to spoil
any old data that might be there)?  I mean, the OS page size is 4kB on
Linux.  If we generate 2kB of WAL and then flush, we're likely to
dirty two OS blocks instead of one.  The OS isn't going to be smart
enough to notice that one of those pages didn't really change, so
we're potentially generating some extra disk I/O.  My colleague Jan
Wieck has some (inconclusive) benchmark results that suggest this
might actually be hurting us significantly.  More research is needed,
but I thought I'd ask if we've ever considered NOT doing that, or if
we should consider it.

2. I don't really understand why WALWriteLock is set up to prohibit
two backends from flushing WAL at the same time.  That seems
unnecessary.  Suppose we've got two backends that flush WAL one after
the other.  Assume (as is not unlikely) that the second one's flush
position is ahead of the first one's flush position.  So the first one
grabs WALWriteLock and does the flush, and then the second one grabs
WALWriteLock for its turn to flush and has to wait for an entire spin
of the platter to complete before its fsync() can be satisfied.  If
we'd just let the second guy issue his fsync() right away, odds are
good that the disk would have satisfied both in a single rotation.
Now it's possible that the second request would've arrived too late
for that to work out, but AFAICS in that case we're no worse off than
we are now.  And if it does work out we're better off.  The only
reasons I can see why we might NOT want to do this are (1) if we're
trying to compensate for some OS-level bugginess, which is a
horrifying thought, or (2) if we think the extra system calls will
cost more than we save by piggybacking the flushes more efficiently.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Andres Freund
Date: 15 May 2015, 16:01:03
Subject: Re: Triaging the remaining open commitfest items

From: "Joshua D. Drake"
Date: 15 May 2015, 16:53:25
Subject: Re: WALWriteLock contention

WALWriteLock contention - Mailing list pgsql-hackers

Previous

Next