Re: Load Distributed Checkpoints test results - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Load Distributed Checkpoints test results
Date
Msg-id Pine.GSO.4.64.0706180843360.4392@westnet.com
Whole thread Raw
In response to Re: Load Distributed Checkpoints test results  ("Simon Riggs" <simon@2ndquadrant.com>)
List pgsql-hackers
On Mon, 18 Jun 2007, Simon Riggs wrote:

> Smoother checkpoints mean smaller resource queues when a burst coincides 
> with a checkpoint, so anybody with throughput-maximised or bursty apps 
> should want longer, smooth checkpoints.

True as long as two conditions hold:

1) Buffers needed to fill allocation requests are still being written fast 
enough.  The buffer allocation code starts burning a lot of CPU+lock 
resources when many clients are all searching the pool looking for a 
buffers and there aren't many clean ones to be found.  The way the current 
checkpoint code starts at the LRU point and writes everything dirty in the 
order new buffers will be allocating in as fast as possible means it's 
doing the optimal procedure to keep this from happening.  It's being 
presumed that making the LRU writer active will mitigate this issue, my 
experience suggests that may not be as effective as hoped--unless it gets 
changed so that it's allowed to decrement usage_count.

To pick one example of a direction I'm a little concerned about related to 
this, Itagaki's sorted writes results look very interesting.  But as his 
test system is such that the actual pgbench TPS numbers are 1/10 of the 
ones I was seeing when I started having ugly buffer allocation issues, I'm 
real sure the particular test he's running isn't sensitive to issues in 
this area at all; there's just not enough buffer cache churn if you're 
only doing a couple of hundred TPS for this to happen.

2) The checkpoint still finishes in time.

The thing you can't forget about when dealing with an overloaded system is 
that there's no such thing as lowering the load of the checkpoint such 
that it doesn't have a bad impact.  Assume new transactions are being 
generated by an upstream source such that the database itself is the 
bottleneck, and you're always filling 100% of I/O capacity.  All I'm 
trying to get everyone to consider is that if you have a large pool of 
dirty buffers to deal with in this situation, it's possible (albeit 
difficult) to get into a situation where if the checkpoint doesn't write 
out the dirty buffers fast enough, the client backends will evacuate them 
instead in a way that makes the whole process less efficient than the 
current behavior.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Bugtraq: Having Fun With PostgreSQL
Next
From: Tom Lane
Date:
Subject: GUC time unit spelling a bit inconsistent