Thread: [GENERAL] streaming replication - crash on standby

[GENERAL] streaming replication - crash on standby

From
"Seong Son (US)"
Date:

The last line from pg_xlogdump of the last WAL file on the crashed standby server shows the following.

 

pg_xlogdump: FATAL:  error in WAL record at DF/4CB95FD0: unexpected pageaddr DB/62B96000 in log segment 00000000000000DF0000004C, offset 12148736

 

I believe this means the standby server received WAL file out of order?  But why did it crash?  Is crashing normal behavior in case like this?

 

Thanks,

Seong

Re: [GENERAL] streaming replication - crash on standby

From
Andres Freund
Date:
Hi,

On 2017-08-09 22:03:43 +0000, Seong Son (US) wrote:
> The last line from pg_xlogdump of the last WAL file on the crashed standby server shows the following.
>
> pg_xlogdump: FATAL:  error in WAL record at DF/4CB95FD0: unexpected pageaddr DB/62B96000 in log segment
00000000000000DF0000004C,offset 12148736 
>
> I believe this means the standby server received WAL file out of order?  But why did it crash?  Is crashing normal
behaviorin case like this? 

This likely just means that that's the end of the WAL.

- Andres


Re: [GENERAL] streaming replication - crash on standby

From
"Seong Son (US)"
Date:
I see.  Thank you.

But the Postgresql process had crashed at that time so the streaming replication was no longer working.  Why would it
crashand is that normal? 

Thanks,

Seong


This email and any files transmitted with it are intended solely for the use of the individual or entity to whom they
areaddressed. If you have received this email in error please notify the system manager. This message contains
informationthat is intended only for the individual named. If you are not the named addressee you should not
disseminate,distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this
e-mailby mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that
disclosing,copying, distributing or taking any action in reliance on the contents of this information is strictly
prohibited.

-----Original Message-----
From: Andres Freund [mailto:andres@anarazel.de]
Sent: Wednesday, August 09, 2017 6:27 PM
To: Seong Son (US) <Seong.Son@datapath.com>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] streaming replication - crash on standby

Hi,

On 2017-08-09 22:03:43 +0000, Seong Son (US) wrote:
> The last line from pg_xlogdump of the last WAL file on the crashed standby server shows the following.
>
> pg_xlogdump: FATAL:  error in WAL record at DF/4CB95FD0: unexpected pageaddr DB/62B96000 in log segment
00000000000000DF0000004C,offset 12148736 
>
> I believe this means the standby server received WAL file out of order?  But why did it crash?  Is crashing normal
behaviorin case like this? 

This likely just means that that's the end of the WAL.

- Andres


Re: [GENERAL] streaming replication - crash on standby

From
Andres Freund
Date:
Hi,

Please quote properly on postgres mailing lists.

On 2017-08-09 22:31:23 +0000, Seong Son (US) wrote:
> I see.  Thank you.
>
> But the Postgresql process had crashed at that time so the streaming replication was no longer working.  Why would it
crashand is that normal? 

You've given us absolutely zero information to be able to diagnose the
problem.  If you want somebody to help you you'll have to describe
exactly what happened, and what the problem you're facing is.

- Andres

> This email and any files transmitted with it are intended solely for the use of the individual or entity to whom they
areaddressed. If you have received this email in error please notify the system manager. This message contains
informationthat is intended only for the individual named. If you are not the named addressee you should not
disseminate,distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this
e-mailby mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that
disclosing,copying, distributing or taking any action in reliance on the contents of this information is strictly
prohibited.

This footer makes no sense on a public list.


Re: [GENERAL] streaming replication - crash on standby

From
"Seong Son (US)"
Date:
>-----Original Message-----
>From: Andres Freund [mailto:andres@anarazel.de]
>Sent: Wednesday, August 09, 2017 6:34 PM
>To: Seong Son (US) <Seong.Son@datapath.com>
>Cc: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] streaming replication - crash on standby
>
>Hi,
>
>Please quote properly on postgres mailing lists.
>
>On 2017-08-09 22:31:23 +0000, Seong Son (US) wrote:
>> I see.  Thank you.
>>
>> But the Postgresql process had crashed at that time so the streaming replication was no longer working.  Why would
itcrash and is that normal? 
>
>You've given us absolutely zero information to be able to diagnose the problem.  If you want somebody to help you
you'llhave to describe exactly what happened, and what the problem you're facing is. 
>
>- Andres

Sorry for lack of info.  I've gathered some more info.  Hopefully it would be enough to help isolate the cause of the
crashof the standby server. 

The servers are on Windows Server 2012 R2.  Postgresql 9.6.  Primary and standby servers are in two different cities
connectedover VPN. 

Here's the last few lines from pg_log at the time of the strandby server's crash:

2017-08-08 21:17:56 UTC FATAL:  invalid memory alloc request size 1656315904
2017-08-08 21:17:56 UTC LOG:  startup process (PID 2972) exited with exit code 1
2017-08-08 21:17:56 UTC LOG:  terminating any other active server processes
2017-08-08 21:17:56 UTC WARNING:  terminating connection because of crash of another server process
2017-08-08 21:17:56 UTC DETAIL:  The postmaster has commanded this server process to roll back the current transaction
andexit, because another server process exited abnormally and possibly corrupted shared memory. 
2017-08-08 21:17:56 UTC HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-08-08 21:17:56 UTC WARNING:  terminating connection because of crash of another server process
2017-08-08 21:17:56 UTC DETAIL:  The postmaster has commanded this server process to roll back the current transaction
andexit, because another server process exited abnormally and possibly corrupted shared memory. 
2017-08-08 21:17:56 UTC HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-08-08 21:17:56 UTC LOG:  database system is shut down

And this is the last entry from pg_xlogdump:

-08 21:17:36.864852 Coordinated Universal Time
pg_xlogdump: FATAL:  error in WAL record at DF/4CB95FD0: unexpected pageaddr DB/62B96000 in log segment
00000000000000DF0000004C,offset 12148736 

One thing I noticed is that the network is not the most stable.  When I ran wireshark capture on port 5432, I saw
numerouserrors and warning like 
    "New fragment overlaps old data (retransmission?)"
    "This frame is a (suspected) out-of-order segment"
    "This frame is a (suspected) retransmission"

So the questions are, why did the standby server crash?  Could the network instability be the cause for the crash?

Thank you in advance for any info.
Seong