Hi,
On Tue, Apr 14, 2009 at 6:35 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, 2009-04-13 at 14:52 +0900, Fujii Masao wrote:
>
>> A lookahead (the +1) may have pg_standby get stuck as follows.
>> Am I missing something?
>>
>> 1. the trigger file containing "smart" is created.
>> 2. pg_standby is executed.
>> 2-1. nextWALfile is restored.
>> 2-2. the trigger file is deleted because nextWALfile+1 doesn't exist.
>> 3. the restored nextWALfile is applied.
>> 4. pg_standby is executed again to restore nextWALfile+1.
>
> This can't happen. (4) will never occur when (2-2) has occurred. A
> non-zero error code means file not available which will cause recovery
> to end and hence no requests for further WAL files are made.
When pg_standby exits with non-zero code, (3) and (4) will never
occur, and the transactions in nextWALfile will be lost. So, in (2-2),
pg_standby has to call exit(0), I think.
On the other hand, if exit(0) is called in (2-2), the above scenario
happens.
> It does *seem* as if there is a race condition there in that another WAL
> file may arrive after we have taken the decision there are no more WAL
> files, but it's not a problem. That could happen if we issue the trigger
> while the master is still up, which is a mistake - why would we do that?
> If we only issue the trigger once we are happy the master is down then
> we don't get a problem.
Yeah, I agree that such race condition is not a problem. The
trigger file has to be created after all the WAL files arrive at
the standby server.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center