Re: [GENERAL] Slow PITR restore - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: [GENERAL] Slow PITR restore
Date
Msg-id 200712131609.26419.josh@agliodbs.com
Whole thread Raw
In response to Re: [GENERAL] Slow PITR restore  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [GENERAL] Slow PITR restore  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom,

> [ shrug... ]  This is not consistent with my experience.  I can't help
> suspecting misconfiguration; perhaps shared_buffers much smaller on the
> backup, for example.

You're only going to see it on SMP systems which have a high degree of CPU 
utilization.  That is, when you have 16 cores processing flat-out, then 
the *single* core which will replay that log could certainly have trouble 
keeping up.  And this wouldn't be an issue which would show up testing on 
a dual-core system.

I don't have extensive testing data on that myself (I depended on Koichi's 
as well) but I do have another real-world case where our slow recovery 
time is a serious problem: clustered filesystem failover configurations, 
e.g. RHCFS, OpenHACluster, Veritas.  For those configuratons, when one 
node fails PostgreSQL is started on a 2nd node against the same data ... 
and goes through recovery.  On very high-volume systems, the recovery can 
be quite slow, up to 15 minutes, which is a long time for a web site to be 
down.

I completely agree that we don't want to risk the reliability of recovery 
in attempts to speed it up, though, so maybe this isn't something we can 
do right now.  But I don't agree that it's not an issue for users.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: [GENERAL] Slow PITR restore
Next
From: Tom Lane
Date:
Subject: Re: [GENERAL] Slow PITR restore