Re: Load distributed checkpoint V3 - Mailing list pgsql-patches

From Greg Smith
Subject Re: Load distributed checkpoint V3
Date
Msg-id Pine.GSO.4.64.0704060043550.11244@westnet.com
Whole thread Raw
In response to Re: Load distributed checkpoint V3  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-patches
On Thu, 5 Apr 2007, Heikki Linnakangas wrote:

> The purpose of the bgwriter_all_* settings is to shorten the duration of
> the eventual checkpoint. The reason to shorten the checkpoint duration
> is to limit the damage to other I/O activity it causes. My thinking is
> that assuming the LDC patch is effective (agreed, needs more testing) at
> smoothening the checkpoint, the duration doesn't matter anymore. Do you
> want to argue there's other reasons to shorten the checkpoint duration?

My testing results suggest that LDC doesn't smooth the checkpoint usefully
when under a high (>30 client here) load, because (on Linux at least) the
way the OS caches writes clashes badly with how buffers end up being
evicted if the buffer pool fills back up before the checkpoint is done.
In that context, anything that slows down the checkpoint duration is going
to make the problem worse rather than better, because it makes it more
likely that the tail end of the checkpoint will have to fight with the
clients for write bandwidth, at which point they both suffer.  If you just
get the checkpoint done fast, the clients can't fill the pool as fast as
the BufferSync is writing it out, and things are as happy as they can be
without a major rewrite to all this code.  I can get a tiny improvement in
some respects by delaying 2-5 seconds between finishing the writes and
calling fsync, because that gives Linux a moment to usefully spool some of
the data to the disk controller's cache; beyond that any additional delay
is a problem.

Since it's only the high load cases I'm having trouble dealing with, this
basically makes it a non-starter for me.  The focus on checkpoint_timeout
and ignoring checkpoint_segments in the patch is also a big issue for me.
At the same time, I recognize that the approach taken in LDC probably is a
big improvement for many systems, it's just a step backwards for my
highest throughput one.  I'd really enjoy hearing some results from
someone else.

> The number of buffers evicted by normal backends in a bgwriter_delay period
> is simple to keep track of, just increase a counter in StrategyGetBuffer and
> reset it when bgwriter wakes up.

I see you've already found the other helpful Itagaki patch in this area.
I know I would like to see his code for tracking evictions commited, then
I'd like that to be added as another counter in pg_stat_bgwriter (I
mentioned that to Magnus in passing when he was setting the stats up but
didn't press it because of the patch dependency).  Ideally, and this idea
was also in Itagaki's patch with the writtenByBgWriter/ByBackEnds debug
hook, I think it's important that you know how every buffer written to
disk got there--was it a background writer, a checkpoint, or an eviction
that wrote it out?  Track all those and you can really learn something
about your write performance, data that's impossible to collect right now.

However, as Itagaki himself points out, doing something useful with
bgwriter_lru_maxpages is only one piece of automatically tuning the
background writer.  I hate to join in on chopping his patches up, but
without some additional work I don't think the exact auto-tuning logic he
then applies will work in all cases, which could make it more a problem
than the current crude yet predictable method.  The whole way
bgwriter_lru_maxpages and num_to_clean play off each other in his code
currently has a number of failure modes I'm concerned about.  I'm not sure
if a re-write using a moving-average approach (as I did in my auto-tuning
writer prototype and as Tom just suggested here) will be sufficient to fix
all of them.  Was already on my to-do list to investigate that further.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: Packed Varlena Update (v21)
Next
From: Tom Lane
Date:
Subject: Re: Optimized pgbench for 8.3