Re: Load Distributed Checkpoints, take 3 - Mailing list pgsql-patches
From | Simon Riggs |
---|---|
Subject | Re: Load Distributed Checkpoints, take 3 |
Date | |
Msg-id | 1182677039.9276.382.camel@silverbirch.site Whole thread Raw |
In response to | Re: Load Distributed Checkpoints, take 3 (Heikki Linnakangas <heikki@enterprisedb.com>) |
Responses |
Re: Load Distributed Checkpoints, take 3
Re: Load Distributed Checkpoints, take 3 |
List | pgsql-patches |
On Fri, 2007-06-22 at 22:19 +0100, Heikki Linnakangas wrote: > However, I think shortening the checkpoint interval is a perfectly valid > solution to that. Agreed. That's what checkpoint_timeout is for. Greg can't choose to use checkpoint_segments as the limit and then complain about unbounded recovery time, because that was clearly a conscious choice. > In any case, while people > sometimes complain that we have a large WAL footprint, it's not usually > a problem. IMHO its a huge problem. Turning on full_page_writes means that the amount of WAL generated varies fairly linearly with number of blocks touched, which means large databases become a problem. Suzuki-san's team had results that showed this was a problem also. > This is off-topic, but at PGCon in May, Itagaki-san and his colleagues > whose names I can't remember, pointed out to me very clearly that our > recovery is *slow*. So slow, that in the benchmarks they were running, > their warm stand-by slave couldn't keep up with the master generating > the WAL, even though both are running on the same kind of hardware. > The reason is simple: There can be tens of backends doing I/O and > generating WAL, but in recovery we serialize them. If you have decent > I/O hardware that could handle for example 10 concurrent random I/Os, at > recovery we'll be issuing them one at a time. That's a scalability > issue, and doesn't show up on a laptop or a small server with a single disk. The results showed that the current recovery isn't scalable beyond a certain point, not that it was slow per se. The effect isn't noticeable on systems generating fewer writes (of any size) or ones where the cache hit ratio is high. It isn't accurate to say laptops and small servers only, but it would be accurate to say hi volume I/O bound OLTP is the place where the scalability of recovery does not match the performance scalability of the master server. Yes, we need to make recovery more scalable by de-serializing I/O. Slony also serializes changes onto the Slave nodes, so the general problem of scalability of recovery solutions needs to be tackled. > That's one of the first things I'm planning to tackle when the 8.4 dev > cycle opens. And I'm planning to look at recovery times in general; I've > never even measured it before so who knows what comes up. I'm planning on working on recovery also, as is Florian. Let's make sure we coordinate what we do to avoid patch conflicts. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
pgsql-patches by date: