Re: Race condition in recovery? - Mailing list pgsql-hackers

From Tatsuro Yamada
Subject Re: Race condition in recovery?
Date
Msg-id 4698027d-5c0d-098f-9a8e-8cf09e36a555@nttcom.co.jp_1
Whole thread Raw
In response to Re: Race condition in recovery?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Duplicate history file?
Re: Race condition in recovery?
List pgsql-hackers
Hi Horiguchi-san,

> (Why me?)

Because the story was also related to PG-REX, which you are
also involved in developing. Perhaps off-list instead of
-hackers would have been better, but I emailed -hackers because
the same problem could be encountered by PostgreSQL users who
do not use PG-REX.

  
>> In a project I helped with, I encountered an issue where
>> the archive command kept failing. I thought this issue was
>> related to the problem in this thread, so I'm sharing it here.
>> If I should create a new thread, please let me know.
>>
>> * Problem
>>    - The archive_command is failed always.
> 
> Although I think the configuration is a kind of broken, it can be seen
> as it is mimicing the case of shared-archive, where primary and
> standby share the same archive directory.


To be precise, the environment of this reproduction script is
different from our actual environment. I tried to make it as
simple as possible to reproduce the problem.
(In order to make it look like the actual environment, you have
to build a PG-REX environment.)

A simple replication environment might be enough, so I'll try to
recreate a script that is closer to the actual environment later.

  
> Basically we need to use an archive command like the following for
> that case to avoid this kind of failure. The script returns "success"
> when the target file is found but identical with the source file. I
> don't find such a description in the documentation, and haven't
> bothered digging into the mailing-list archive.
> 
> ==
> #! /bin/bash
> 
> if [ -f $2 ]; then
>     cmp -s $1 $2
>     if [ $? != 0 ]; then
>         exit 1
>     fi
>     exit 0
> fi
> 
> cp $1 $2
> ==

Thanks for your reply.
Since the above behavior is different from the behavior of the
test command in the following example in postgresql.conf, I think
we should write a note about this example.

# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

Let me describe the problem we faced.
- When archive_mode=always, archive_command is (sometimes) executed
   in a situation where the history file already exists on the standby
   side.

- In this case, if "test ! -f" is written in the archive_command of
   postgresql.conf on the standby side, the command will keep failing.

   Note that this problem does not occur when archive_mode=on.

So, what should we do for the user? I think we should put some notes
in postgresql.conf or in the documentation. For example, something
like this:

====
Note: If you use archive_mode=always, the archive_command on the standby side should not be used "test ! -f".
====



Regards,
Tatsuro Yamada





pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Decoding speculative insert with toast leaks memory
Next
From: Ajin Cherian
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions