Tom,
> [ shrug... ] This is not consistent with my experience. I can't help
> suspecting misconfiguration; perhaps shared_buffers much smaller on the
> backup, for example.
You're only going to see it on SMP systems which have a high degree of CPU
utilization. That is, when you have 16 cores processing flat-out, then
the *single* core which will replay that log could certainly have trouble
keeping up. And this wouldn't be an issue which would show up testing on
a dual-core system.
I don't have extensive testing data on that myself (I depended on Koichi's
as well) but I do have another real-world case where our slow recovery
time is a serious problem: clustered filesystem failover configurations,
e.g. RHCFS, OpenHACluster, Veritas. For those configuratons, when one
node fails PostgreSQL is started on a 2nd node against the same data ...
and goes through recovery. On very high-volume systems, the recovery can
be quite slow, up to 15 minutes, which is a long time for a web site to be
down.
I completely agree that we don't want to risk the reliability of recovery
in attempts to speed it up, though, so maybe this isn't something we can
do right now. But I don't agree that it's not an issue for users.
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco