Re: Cost of creating an emply WAL segment - Mailing list pgsql-performance

From Greg Smith
Subject Re: Cost of creating an emply WAL segment
Date
Msg-id 4E04B8E9.3030707@2ndQuadrant.com
Whole thread Raw
In response to Cost of creating an emply WAL segment  (Devrim GÜNDÜZ <devrim@gunduz.org>)
Responses Re: Cost of creating an emply WAL segment  (Shaun Thomas <sthomas@peak6.com>)
List pgsql-performance
On 06/24/2011 10:43 AM, Devrim GÜNDÜZ wrote:
> Has anyone measured the cost of creating empty WAL segments while the
> database is running?
>
> Actually, when is the new file created? Just after one segment is filled
> up, or some time before then? What happens during WAL segment creation?
> If there are pending transactions to be committed, do we see a delay?
>

Excellent set of questions.  Yes, and it can be disturbingly high on a
fresh system that's immediately hit with lots of transactions, or on one
where checkpoint_segments was just raised a lot and then nailed with
activity.  The problem also pops up in an even worse way when you have a
server that's overrun its normal checkpoint finish time, usually due to
a slow sync phase.  That then also results in creation of new
segments--necessary because all of the old ones that would normally be
recycled already have been recycled--and that work is now competing
against the already slow checkpoint writes, and backends running their
own fsync too just to make the mix extra exciting.

I know the server code does try to stay a bit ahead of this problem, by
creating one segment in advance under conditions I forget the details
of, to reduce the odds a client will actually hit a delay there.  It
helps the average case.  But since it doesn't do much for the worst-case
ones people that make my life miserable, working on that doesn't make my
life easier; therefore I don't.

The main argument in favor of pre-populating WAL segments early after
the database starts is that it would probably be a small cheat on some
benchmarks, moving a bit of work that happens during the test to happen
before then instead when it isn't counted.  But I've never been excited
about the idea of creating empty segments near initdb time for 3 reasons:

-Default postgresql.conf at initdb time has the pathetically small 3
checkpoint_segments, so it won't actually work unless we increase it first.
-Most servers go through a data loading stage before they hit production
that takes care of this anyway.
-The worst problems I've ever seen in this area, by far (as in:  at
least 100X worst than what you're asking about), are when new segments
are created due to heavy write activity exhausting the list of ones to
be recycled during a checkpoint.

To give you an example of *that*, here is the new record-setting slow
checkpoint I just found in my inbox this morning, from a system that's
been instrumented for its increasing checkpoint issues the last month:

checkpoint complete: wrote 63297 buffers (6.0%); 0 transaction log
file(s) added, 938 removed, 129 recycled; write=250.384 s,
sync=14525.296 s, total=14786.868 s

Here checkpoint_segments=64 and shared_buffers=8GB.  The fact that this
checkpoint hit the "create a new empty WAL segment" code 938 times
during its 4 hour checkpoint sync phase is much more troubling than the
pauses people run into when making a new segments on a fresh server.  I
would argue that if your database is new enough to not have populated a
full set of checkpoint_segments yet, it's also new enough that it can't
possibly have enough data in it yet for that to really be a problem.

(Note for hackers following along this far:  I strongly suspect today's
latest pathological example is the sort improved by the "compact fsync
request" feature added to 9.1.  I'm now seeing so many of these in the
field that I'm collecting up data to support the idea of backporting
that in 8.3-9.0, as a bug fix because this turns into a "system is too
slow to be considered operational" performance problem when it flares
up.  This was a rare thing I'd never seen before back in September when
I started working on this area again; now I see it once a month on a new
system somewhere.)

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Long Running Update
Next
From: Harry Mantheakis
Date:
Subject: Re: Long Running Update