On Tue, 2009-06-02 at 14:54 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > Tom Lane wrote:
> >> That's a good point; don't we recover files under names like
> >> RECOVERYXLOG, not under names that could possibly conflict with regular
> >> WAL files?
>
> > Yes. But we rename RECOVERYXLOG to 000000010000000000000057 or similar
> > at the end of recovery, in exitArchiveRecovery().
>
> > Thinking about this some more, I think we should've changed
> > exitArchiveRecovery() rather than RemoveOldXlogFiles(): it would be more
> > robust if exitArchiveRecovery() always copied the last WAL file rather
> > than just renamed it. It doesn't seem safe to rely on the file the
> > symlink points to to be valid after recovery is finished, and we might
> > write to it before it's recycled, so the current fix isn't complete.
>
> Hmm. I think really the reason it's coded that way is that we assumed
> the recovery command would be physically copying the file from someplace
> else. pg_standby is violating the backend's expectations by using a
> symlink. And I really doubt that the technique is saving anything, since
> the data has to be read in from the archive location anyway.
>
> I'm leaning back to the position that pg_standby's -l option is simply a
> bad idea and should be removed.
ISTM we didn't clearly state what the recovery_command should do either
way. Even if you remove the pg_standby option that will not fix the
problem for people who have written their own script, or existing users
of pg_standby. The safe way is to do as Heikki suggests and copy the
final file into place and I would add that we must then fsync it also.
That should be back-patched possibly as far as 8.0. Documenting a change
is not nearly enough.
Removing -l is a separate discussion. If there was a consensus against
it, I would suggest that we deprecate the option, so that it does
nothing.
As an aside, I would be also much more comfortable if there was an
option to not recycle the WAL files at all, as a safe mode for error
checking at least. The question is: why do we need to zero fill the file
first anyway? We could save the 8 bytes for the prev pointer on every
record if we added enough zeroes onto every WAL write to zap any
pre-existing header data, causing recovery to fail.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support