Thread: BUG #13077: when standby crash and restart, it's need next timeline history file, but upstream node normal.

The following bug has been logged on the website:

Bug reference:      13077
Logged by:          digoal
Email address:      digoal@126.com
PostgreSQL version: 9.4.1
Operating system:   CentOS 6.x x64
Description:

HI,
  I have one primary A,one standby B,another standby C receive wal from
upstream standby B.
all it's normal work and all standby catched up to A, But when the last
standby C crash, and then restart the standby C.
C need the a new timeline history, and the timeline history really not
exists any where, I don't promote any standby.
It's a BUG?
On 04/17/2015 06:58 AM, digoal@126.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      13077
> Logged by:          digoal
> Email address:      digoal@126.com
> PostgreSQL version: 9.4.1
> Operating system:   CentOS 6.x x64
> Description:
>
> HI,
>    I have one primary A,one standby B,another standby C receive wal from
> upstream standby B.
> all it's normal work and all standby catched up to A, But when the last
> standby C crash, and then restart the standby C.
> C need the a new timeline history, and the timeline history really not
> exists any where, I don't promote any standby.
> It's a BUG?

If you never promoted anything, everyone is still on timeline 1. No
timeline history files needed.

- Heikki
but in my environment, C need the new timeline file, i can see the
restore command display about copy 0000002.history from my setted dir (NFS from remote computer) use top or ps command.
And the copy message display forever.  
ps:
I restart the C compute, and NFS it's ok, recovery it's ok, and don't need the 0000002.history file.

 When postgresql crashed, it need copy new history first whatever it's really need the file?





--
公益是一辈子的事,I'm Digoal,Just Do It.


At 2015-04-17 21:49:35, "Heikki Linnakangas" <hlinnaka@iki.fi> wrote: >On 04/17/2015 06:58 AM, digoal@126.com wrote: >> The following bug has been logged on the website: >> >> Bug reference: 13077 >> Logged by: digoal >> Email address: digoal@126.com >> PostgreSQL version: 9.4.1 >> Operating system: CentOS 6.x x64 >> Description: >> >> HI, >> I have one primary A,one standby B,another standby C receive wal from >> upstream standby B. >> all it's normal work and all standby catched up to A, But when the last >> standby C crash, and then restart the standby C. >> C need the a new timeline history, and the timeline history really not >> exists any where, I don't promote any standby. >> It's a BUG? > >If you never promoted anything, everyone is still on timeline 1. No >timeline history files needed. > >- Heikki >


On Fri, Apr 17, 2015 at 5:39 PM, =E5=BE=B7=E5=93=A5 <digoal@126.com> wrote:

> but in my environment, C need the new timeline file, i can see the
> restore command display about copy 0000002.history from my setted dir (NF=
S
> from remote computer) use top or ps command.
> And the copy message display forever.
> ps:
> I restart the C compute, and NFS it's ok, recovery it's ok, and don't nee=
d
> the 0000002.history file.
>
>  When postgresql crashed, it need copy new history first whatever it's
> really need the file?
>

It needs to know whether the file exists or not.  Rather than asking if the
file exists, and then separately asking for a copy of it, instead it just
asks for a copy of the file and see if it gets returned an error or not.

If your restore_command hangs up forever when asked for a file that doesn't
exist, then there is something wrong with your restore command.

As explained in the documentation, "It is important for the command to
return a zero exit status only if it succeeds. The command will be asked
for file names that are not present in the archive; it must return nonzero
when so asked"

Cheers,

Jeff
HI, 
Thanks, jeff.
 I find the findNewestTimeLine() do that job. 
 I think this is the NFS problem, standby cann't access the NFS DIR(archive dir), so the copy command hang. 

--
公益是一辈子的事,I'm Digoal,Just Do It.

在 2015-04-19 01:30:31,"Jeff Janes" <jeff.janes@gmail.com> 写道:

On Fri, Apr 17, 2015 at 5:39 PM, 德哥 <digoal@126.com> wrote:
but in my environment, C need the new timeline file, i can see the
restore command display about copy 0000002.history from my setted dir (NFS from remote computer) use top or ps command.
And the copy message display forever.  
ps:
I restart the C compute, and NFS it's ok, recovery it's ok, and don't need the 0000002.history file.

 When postgresql crashed, it need copy new history first whatever it's really need the file?

It needs to know whether the file exists or not.  Rather than asking if the file exists, and then separately asking for a copy of it, instead it just asks for a copy of the file and see if it gets returned an error or not.

If your restore_command hangs up forever when asked for a file that doesn't exist, then there is something wrong with your restore command.

As explained in the documentation, "It is important for the command to return a zero exit status only if it succeeds. The command will be asked for file names that are not present in the archive; it must return nonzero when so asked"
 
Cheers,

Jeff