Re: Final background writer cleanup for 8.3 - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Final background writer cleanup for 8.3
Date
Msg-id Pine.GSO.4.64.0708261637030.3811@westnet.com
Whole thread Raw
In response to Re: Final background writer cleanup for 8.3  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Final background writer cleanup for 8.3  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: Final background writer cleanup for 8.3  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
On Sun, 26 Aug 2007, Kevin Grittner wrote:

> usagecount | count | isdirty
> ------------+-------+---------
>          0 |  8711 | f
>          1 |  9394 | f
>          2 |  1188 | f
>          3 |   869 | f
>          4 |   160 | f
>          5 |   157 | f

Here's a typical sample from your set.  Notice how you've got very few 
buffers with a high usage count.  This is a situation the background 
writer is good at working with.  Either the old or new work-in-progress 
LRU writer can aggressively pound away at any of the buffers with a 0 
usage count shortly after they get dirty, and that won't be inefficient 
because there aren't large numbers of other clients using them.

Compare against this other sample:

> usagecount | count | isdirty
> ------------+-------+---------
>          0 |  9093 | f
>          1 |  6702 | f
>          2 |  2267 | f
>          3 |   602 | f
>          4 |   428 | f
>          5 |  1388 | f

Notice that you have a much larger number of buffers where the usage count 
is 4 or 5.  The all-scan part of the 8.2 background writer will waste a 
lot of writes when you have a profile that's more like this.  If there 
have been 4+ client backends touching the buffer recently, you'd be crazy 
to write it out right now if you could instead be focusing on banging out 
the ones where the usage count is 0.  The 8.2 background writer would 
write them out anyway, which meant that when you hit a checkpoint both the 
OS and the controller cache were filled with such buffers before you even 
started writing the checkpoint data.  The new setup in 8.3 only worries 
about the high usage count buffers when you hit a checkpoint, at which 
point it streams them out over a longer, adjustable period (as not to 
spike the I/O more than necessary and block your readers) than the 8.2 
design, which just dumped them all immediately.

> Just to be sure that I understand, are you saying it would be a bad scene if
> the physical writes happened, or that the overhead of pushing them out to
> the OS would be crippling?

If you have a lot of buffers where the usage_count data was high, it would 
be problematic to write them out every time they were touched; odds are 
good somebody else is going to dirty them again soon enough so why bother. 
On your workload, that doesn't seem to be the case.  But that is the 
situation on some other test workloads, and balancing for that situation 
has been central to the parts of the redesign I've been injecting 
suggestions into.  One of the systems I was tormented by had the 
usagecount of 5 for >20% of the buffers in the cache under heavy load, and 
had a physical write been executed every time one of those was touched 
that would have been crippling (even if the OS was smart to cache and 
therefore make redundant some of the writes, which is behavior I would 
prefer not to rely on).

> This contrib module seems pretty safe, patch and all.  Does anyone think
> there is significant risk to slipping it into the 8.2.4 database where we
> have massive public exposure on the web site handling 2 million hits per
> day?

I think it's fairly safe, and my patch was pretty small; just exposing 
some data that nobody had been looking at before.  Think how much easier 
your life would have been when doing your earlier tuning if you were 
looking at the data in these terms.  Just be aware that running the query 
is itself intensive and causes its own tiny hiccup in throughput every 
time it executes, so you may want to consider this more of a snapshot you 
run periodically to learn more about your data rather than something you 
do very regularly.

> I also think we need to somehow develop a set of tests which report 
> maximum response time on (what should be) fast queries while the 
> database is under different loads, so that those of us for whom reliable 
> response time is more important than maximum overall throughput are 
> protected from performance regressions.

My guess is that the DBT2 tests that Heikki has been running are a more 
complicated than you think they are; there are response time guarantee 
requirements in there as well as the throughput numbers.  The tests that I 
run (which I haven't been publishing yet but will be with the final patch 
soon) also report worst-case and 90-th percentile latency numbers as well 
as TPS.  A "regression" that improved TPS at the expense of those two 
would not be considered an improvement by anyone involved here.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Insufficient attention to security in contrib (mostly)
Next
From: "Kevin Grittner"
Date:
Subject: Re: Final background writer cleanup for 8.3