Latch for the WAL writer - further reducing idle wake-ups. - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Latch for the WAL writer - further reducing idle wake-ups.
Date
Msg-id CAEYLb_U7S+Z8JOnEOY31w9Hcz-SavxrK4TTMRn4d1M+NxYEy+Q@mail.gmail.com
Whole thread Raw
Responses Re: Latch for the WAL writer - further reducing idle wake-ups.
List pgsql-hackers
Attached patch latches up the WAL Writer, reducing wake-ups and thus
saving electricity in a way that is more-or-less analogous to my work
on the BGWriter:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6d90eaaa89a007e0d365f49d6436f35d2392cfeb

I am hoping this gets into 9.2 . I am concious of the fact that this
is quite late, but it the patch addresses an open item, the concluding
part of a much wider feature. In any case, it is a useful patch, that
ought to be committed at some point. I should point out:

1. This functionality was covered by the group commit patch that I
worked on back in January, which was submitted in advance of the
commitfest deadline. However, an alternative implementation was
ultimately committed that did not consider WAL Writer wake-ups.

2. The WAL writer is the most important auxiliary process to latch-up.
Though it is tied with the BGWriter at 5 wake-ups per second by
default, I consider the WAL Writer to be more important than the
BGWriter because I find it much more plausible that the WAL Writer
really won't need to be around for much of the time, as with a
read-mostly work load. "Cloud" type deployments often have read-mostly
workloads, so we can still save some power even if the DB is actually
servicing lots of read queries. That being the case, it would be a
shame if we didn't get this last one in, as it adds a lot more value
than any of the other patches.

3. This is a fairly simple patch; as I've said, it works in a way that
is quite analogous to the BGWriter patch, applying lessons learned
there.

With this patch, my instrumentation shows that wake-ups when Postgres
reaches a fully idle state are just 2.7 per second for the entire
postgres process group, quite an improvement on the 7.6 per second in
HEAD. This is exactly what you'd expect from a reduction of 5 wake-ups
per second to 0.1 per second on average for the WAL Writer.

I have determined this with PowerTOP 1.13 on my Fedora 16 laptop. Here
is an example session, began after the cluster reached a fully idle
state, with this patch applied (if, alternatively, I want to see
things at per-process granularity, I can get that from PowerTOP 1.98
beta 1, which is available from my system's package manager):

[peter@peterlaptop powertop-1.13]$ sudo ./powertop -d --time=300
[sudo] password for peter:
PowerTOP 1.13   (C) 2007 - 2010 Intel Corporation

Collecting data for 300 seconds


Cn              Avg residency
C0 (cpu running)        ( 2.8%)
polling          0.0ms ( 0.0%)
C1 mwait      0.5ms ( 1.0%)
C2 mwait      0.9ms ( 0.6%)
C3 mwait      1.4ms ( 0.1%)
C4 mwait      6.7ms (95.4%)
P-states (frequencies)
  2.61 Ghz     5.7%
  1.80 Ghz     0.1%
  1200 Mhz     0.1%
  1000 Mhz     0.2%
   800 Mhz    93.5%
Wakeups-from-idle per second : 171.3    interval: 300.0s
no ACPI power usage estimate available
Top causes for wakeups:
  23.0% (134.5)   chrome
***SNIP***
   0.5% (  2.7)   postgres
***SNIP***

This is a rather low number, that will make us really competitive with
other RDBMSs in this area. Recall that we started from 11.5 wake-ups
for an idle Postgres cluster with a default configuration.

To put the 2.7 number in context, I measured MySQL's wake-ups at 2.2
last year (mysql-server version 5.1.56, Fedora 14), though I
subsequently saw much higher numbers (over 20 per second) for version
5.5.19 on Fedora 16, so you should probably take that with a grain of
salt - I don't know anything about MySQL, and so cannot really be sure
that I'm making an objective comparison in comparing that number with
the number of wake-ups Postgres has with a stock postgresql.conf.

I've employed the same trick used when a buffer is dirtied for the
BGWriter - most of the time, the SetLatch() calls will check a single
flag, and find it already set. We are careful to only "arm" the latch
with a call to ResetLatch() when it is really needed. Rather than
waiting for the clocksweep to be lapped, we wait for a set number of
iterations of consistent inactivity.

I've made the WAL Writer use its process latch, rather than the latch
that was previously within XLogCtl. This seems much more idiomatic, as
in doing so we reserve the right to register generic signal handlers.
With a non-process latch, we'd have to worry about signal invalidation
issues on an ongoing basis, since the handler wouldn't be calling
SetLatch() against the latch we waited on. I have also added a comment
in latch.h generally advising against ad-hoc shared latches where .

I took initial steps to quantify the performance hit from this patch.
A simple "insert.sql" pgbench-tools benchmark on my laptop, with a
generic configuration showed no problems, though I do not assume that
this conclusively proves the case. Results:

http://walwriterlatch.staticloud.com/

My choice of XLogInsert() as an additional site at which to call
SetLatch() was one that wasn't taken easily, and frankly I'm not
entirely confident that I couldn't have been just as effective while
placing the SetLatch() call in a less hot, perhaps higher-level
codepath. That said, MarkBufferDirty() is also a very hot code path,
and it's where one of the SetLatch() calls goes in the earlier
BGWriter patch, besides which I haven't been able to quantify any
performance hit as yet.

Thoughts?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Temporary tables under hot standby
Next
From: Michael Nolan
Date:
Subject: Re: Temporary tables under hot standby