Re: Load Distributed Checkpoints, take 3 - Mailing list pgsql-patches

From Greg Smith
Subject Re: Load Distributed Checkpoints, take 3
Date
Msg-id Pine.GSO.4.64.0706251711430.2936@westnet.com
Whole thread Raw
In response to Re: Load Distributed Checkpoints, take 3  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: Load Distributed Checkpoints, take 3  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
On Mon, 25 Jun 2007, Heikki Linnakangas wrote:

> Greg, is this the kind of workload you're having, or is there some other
> scenario you're worried about?

The way transitions between completely idle and all-out bursts happen were
one problematic area I struggled with.  Since the LRU point doesn't move
during the idle parts, and the lingering buffers have a usage_count>0, the
LRU scan won't touch them; the only way to clear out a bunch of dirty
buffers leftover from the last burst is with the all-scan.  Ideally, you
want those to write during idle periods so you're completely clean when
the next burst comes.  My plan for the code I wanted to put into 8.4 one
day was to have something like the current all-scan that defers to the LRU
and checkpoint, such that if neither of them are doing anything it would
go searching for buffers it might blow out.  Because the all-scan mainly
gets in the way under heavy load right now I've only found mild settings
helpful, but if it had a bit more information about what else was going on
it could run much harder during slow spots.  That's sort of the next stage
to the auto-tuning LRU writer code in the grand design floating through my
head.

As a general comment on this subject, a lot of the work in LDC presumes
you have an accurate notion of how close the next checkpoint is.  On
systems that can dirty buffers and write WAL really fast, I've found hyper
bursty workloads are a challenge for it to cope with.  You can go from
thinking you have all sorts of time to stream the data out to discovering
the next checkpoint is coming up fast in only seconds.  In that situation,
you'd have been better off had you been writing faster during the period
preceeding the burst when the code thought it should be "smooth"[1].
That falls into the category of things I haven't found a good way for
other people to test (I happened to have an internal bursty app that
aggrevated this area to use).

[1] This is actually a reference to "Yacht Rock", one of my favorite web
sites:  http://www.channel101.com/shows/show.php?show_id=152

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-patches by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: remove SIBackendInit return value
Next
From: Greg Smith
Date:
Subject: Re: Load Distributed Checkpoints, take 3