Home > mailing lists

Re: [Bug Fix]standby may crash when switching-over in certain special cases - Mailing list pgsql-hackers

From	px shi
Subject	Re: [Bug Fix]standby may crash when switching-over in certain special cases
Date	September 30 13:14:54
Msg-id	CAAccyYKXRVSmfC-YYdPbgsZfPiK_Tk4RLggxWs8UETxfKD7kRA@mail.gmail.com Whole thread Raw
In response to	Re: [Bug Fix]standby may crash when switching-over in certain special cases (Yugo NAGATA <nagata@sraoss.co.jp>)
Responses	Re: [Bug Fix]standby may crash when switching-over in certain special cases
List	pgsql-hackers

Tree view

Thanks for responding.

It is odd that the standby server crashes when

replication fails because the standby would keep retrying to get the

next record even in such case.

As I mentioned earlier, when replication fails, it retries to establish streaming replication. At this point, the value of walrcv->flushedUpto is not necessarily the data actually flushed to disk. However, the startup process mistakenly believes that the latest flushed LSN is walrcv->flushedUpto and attempts to open the corresponding WAL file, which doesn't exist, leading to a file open failure and causing the startup process to PANIC.

Regards,

Pixian Shi

Yugo NAGATA <nagata@sraoss.co.jp> 于2024年9月30日周一 13:47写道：

On Wed, 21 Aug 2024 09:11:03 +0800
px shi <spxlyy123@gmail.com> wrote:

> Yugo Nagata <nagata@sraoss.co.jp> 于2024年8月21日周三 00:49写道：
>
> >
> >
> > > Is s1 a cascading standby of s2? If otherwise s1 and s2 is the standbys
> > of
> > > the primary server respectively, it is not surprising that s2 has
> > progressed
> > > far than s1 when the primary fails. I believe that this is the case you
> > should
> > > use pg_rewind. Even if flushedUpto is reset as proposed in your patch,
> > s2 might
> > > already have applied a WAL record that s1 has not processed yet, and
> > there
> > > would be no gurantee that subsecuent applys suceed.
> >
> >
> Thank you for your response. In my scenario, s1 and s2 is the standbys of
> the primary server respectively, and s1 a synchronous standby and s2 is an
> asynchronous standby. You mentioned that if s2's replay progress is ahead
> of s1, pg_rewind should be used. However, what I'm trying to address is an
> issue where s2 crashes during replay after s1 has been promoted to primary,
> even though s2's progress hasn't surpassed s1.

I understood your point. It is odd that the standby server crashes when
replication fails because the standby would keep retrying to get the
next record even in such case.

Regards,
Yugo Nagata

>
> Regards,
> Pixian Shi

--
Yugo NAGATA <nagata@sraoss.co.jp>

pgsql-hackers by date:

From: shveta malik
Date: 30 September, 13:05:49
Subject: Re: Conflict Detection and Resolution

From: Peter Smith
Date: 30 September, 13:26:06
Subject: Re: Pgoutput not capturing the generated columns

Re: [Bug Fix]standby may crash when switching-over in certain special cases - Mailing list pgsql-hackers

Previous

Next