Home > mailing lists

Re: Controlling Load Distributed Checkpoints - Mailing list pgsql-hackers

From	Gregory Stark
Subject	Re: Controlling Load Distributed Checkpoints
Date	June 7, 2007 16:28:37
Msg-id	87y7ivef8k.fsf@oxford.xeocode.com Whole thread Raw
In response to	Re: Controlling Load Distributed Checkpoints (Greg Smith <gsmith@gregsmith.com>)
Responses	Re: Controlling Load Distributed Checkpoints Re: .conf File Organization WAS: Controlling Load Distributed Checkpoints
List	pgsql-hackers

Tree view

"Greg Smith" <gsmith@gregsmith.com> writes:

> I'm completely biased because of the workloads I've been dealing with recently,
> but I consider (2) so much easier to tune for that it's barely worth worrying
> about.  If your system is so underloaded that you can let the checkpoints take
> their own sweet time, I'd ask if you have enough going on that you're suffering
> very much from checkpoint performance issues anyway.  I'm used to being in a
> situation where if you don't push out checkpoint data as fast as physically
> possible, you end up fighting with the client backends for write bandwidth once
> the LRU point moves past where the checkpoint has written out to already.  I'm
> not sure how much always running the LRU background writer will improve that
> situation.

I think you're working from a faulty premise.

There's no relationship between the volume of writes and how important the
speed of checkpoint is. In either scenario you should assume a system that is
close to the max i/o bandwidth. The only question is which task the admin
would prefer take the hit for maxing out the bandwidth, the transactions or
the checkpoint.

You seem to have imagined that letting the checkpoint take longer will slow
down transactions. In fact that's precisely the effect we're trying to avoid.
Right now we're seeing tests where Postgres stops handling *any* transactions
for up to a minute. In virtually any real world scenario that would simply be
unacceptable.

That one-minute outage is a direct consequence of trying to finish the
checkpoint as quick as possible. If we spread it out then it might increase
the average i/o load if you sum it up over time, but then you just need a
faster i/o controller. 

The only scenario where you would prefer the absolute lowest i/o rate summed
over time would be if you were close to maxing out your i/o bandwidth,
couldn't buy a faster controller, and response time was not a factor, only
sheer volume of transactions processed mattered. That's a much less common
scenario than caring about the response time.

The flip side of having to worry about response time buying a faster
controller doesn't even help. It would shorten the duration of the checkpoint
but not eliminate it. A 30-second outage every half hour is just as
unacceptable as a 1-minute outage every half hour.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

pgsql-hackers by date:

From: Tom Lane
Date: 07 June 2007, 16:27:44
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately

From: "Matthew T. O'Connor"
Date: 07 June 2007, 17:24:33
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately

Re: Controlling Load Distributed Checkpoints - Mailing list pgsql-hackers

Previous

Next