Re: Controlling Load Distributed Checkpoints - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Controlling Load Distributed Checkpoints
Date
Msg-id 87y7ivef8k.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: Controlling Load Distributed Checkpoints  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Controlling Load Distributed Checkpoints
Re: .conf File Organization WAS: Controlling Load Distributed Checkpoints
List pgsql-hackers
"Greg Smith" <gsmith@gregsmith.com> writes:

> I'm completely biased because of the workloads I've been dealing with recently,
> but I consider (2) so much easier to tune for that it's barely worth worrying
> about.  If your system is so underloaded that you can let the checkpoints take
> their own sweet time, I'd ask if you have enough going on that you're suffering
> very much from checkpoint performance issues anyway.  I'm used to being in a
> situation where if you don't push out checkpoint data as fast as physically
> possible, you end up fighting with the client backends for write bandwidth once
> the LRU point moves past where the checkpoint has written out to already.  I'm
> not sure how much always running the LRU background writer will improve that
> situation.

I think you're working from a faulty premise.

There's no relationship between the volume of writes and how important the
speed of checkpoint is. In either scenario you should assume a system that is
close to the max i/o bandwidth. The only question is which task the admin
would prefer take the hit for maxing out the bandwidth, the transactions or
the checkpoint.

You seem to have imagined that letting the checkpoint take longer will slow
down transactions. In fact that's precisely the effect we're trying to avoid.
Right now we're seeing tests where Postgres stops handling *any* transactions
for up to a minute. In virtually any real world scenario that would simply be
unacceptable.

That one-minute outage is a direct consequence of trying to finish the
checkpoint as quick as possible. If we spread it out then it might increase
the average i/o load if you sum it up over time, but then you just need a
faster i/o controller. 

The only scenario where you would prefer the absolute lowest i/o rate summed
over time would be if you were close to maxing out your i/o bandwidth,
couldn't buy a faster controller, and response time was not a factor, only
sheer volume of transactions processed mattered. That's a much less common
scenario than caring about the response time.

The flip side of having to worry about response time buying a faster
controller doesn't even help. It would shorten the duration of the checkpoint
but not eliminate it. A 30-second outage every half hour is just as
unacceptable as a 1-minute outage every half hour.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately
Next
From: "Matthew T. O'Connor"
Date:
Subject: Re: Autovacuum launcher doesn't notice death of postmaster immediately