I think you may have a race condition in your code -- you don't find the
new file, sleep, while sleeping both the new file and the stop file come
in, you wake up, find the stop file and never copy the last segment over.
George Wilk wrote:
>
> I posted to this group before with the same topic but nobody replied.
> Please, provide some feedback if you can…
>
> I am running a warm standby server, which executes the following
> command in a recovery mode:
>
> *triggered=false*
>
> *while (test ! -f /var/ipsc/WAL/$1 && ! $triggered)*
>
> *do*
>
> * echo waiting for file: $1*
>
> * *
>
> * sleep 30*
>
> * *
>
> * if test -f /var/ipsc/pgsql/trigger*
>
> * then*
>
> * echo --- trigger found ---*
>
> * echo --- exiting recovery mode ---*
>
> * triggered=true*
>
> * fi*
>
> * *
>
> *done*
>
> * *
>
> *if ( ! $triggered)*
>
> *then*
>
> * cp /var/ipsc/WAL/$1 $2*
>
> *else*
>
> * exit 133*
>
> *fi*
>
> Recovery command works just fine restoring data from the WAL files
> scp’d from the primary server. While in the recovery mode, when I
> create the trigger file breaking the while loop in recovery command,
> postgres does not go gently into the active database mode. Here is output:
>
> *waiting for file: 00000001000000000000003A*
>
> *--- trigger found ---*
>
> *--- exiting recovery mode ---*
>
> *FATAL: could not restore file "00000001000000000000003A" from
> archive: return code 34048*
>
> *LOG: startup process (PID 13994) exited with exit code 1*
>
> *LOG: aborting startup due to startup process failure*
>
> * *
>
> After finding the trigger file my recovery_cmd returns non-zero code.
> Why am I still getting *FATAL: could not restore file *?
>
> Both my primary and standby servers run on Solaris 10 under SMF. When
> the standby server is attempting to change mode from recovery to
> regular database mode, there might be a race condition there between
> SMF trying to restart the server and the server trying to restart
> itself… or am I just hallucinating…
>
> Thanks in advance for your comments.
>
> Cheers,
>
> ~george
>