Re: [GENERAL] streaming replication - crash on standby - Mailing list pgsql-general

From Seong Son (US)
Subject Re: [GENERAL] streaming replication - crash on standby
Date
Msg-id BY2PR17MB0328763A8900085A498A819684890@BY2PR17MB0328.namprd17.prod.outlook.com
Whole thread Raw
In response to Re: [GENERAL] streaming replication - crash on standby  (Andres Freund <andres@anarazel.de>)
List pgsql-general
>-----Original Message-----
>From: Andres Freund [mailto:andres@anarazel.de]
>Sent: Wednesday, August 09, 2017 6:34 PM
>To: Seong Son (US) <Seong.Son@datapath.com>
>Cc: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] streaming replication - crash on standby
>
>Hi,
>
>Please quote properly on postgres mailing lists.
>
>On 2017-08-09 22:31:23 +0000, Seong Son (US) wrote:
>> I see.  Thank you.
>>
>> But the Postgresql process had crashed at that time so the streaming replication was no longer working.  Why would
itcrash and is that normal? 
>
>You've given us absolutely zero information to be able to diagnose the problem.  If you want somebody to help you
you'llhave to describe exactly what happened, and what the problem you're facing is. 
>
>- Andres

Sorry for lack of info.  I've gathered some more info.  Hopefully it would be enough to help isolate the cause of the
crashof the standby server. 

The servers are on Windows Server 2012 R2.  Postgresql 9.6.  Primary and standby servers are in two different cities
connectedover VPN. 

Here's the last few lines from pg_log at the time of the strandby server's crash:

2017-08-08 21:17:56 UTC FATAL:  invalid memory alloc request size 1656315904
2017-08-08 21:17:56 UTC LOG:  startup process (PID 2972) exited with exit code 1
2017-08-08 21:17:56 UTC LOG:  terminating any other active server processes
2017-08-08 21:17:56 UTC WARNING:  terminating connection because of crash of another server process
2017-08-08 21:17:56 UTC DETAIL:  The postmaster has commanded this server process to roll back the current transaction
andexit, because another server process exited abnormally and possibly corrupted shared memory. 
2017-08-08 21:17:56 UTC HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-08-08 21:17:56 UTC WARNING:  terminating connection because of crash of another server process
2017-08-08 21:17:56 UTC DETAIL:  The postmaster has commanded this server process to roll back the current transaction
andexit, because another server process exited abnormally and possibly corrupted shared memory. 
2017-08-08 21:17:56 UTC HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-08-08 21:17:56 UTC LOG:  database system is shut down

And this is the last entry from pg_xlogdump:

-08 21:17:36.864852 Coordinated Universal Time
pg_xlogdump: FATAL:  error in WAL record at DF/4CB95FD0: unexpected pageaddr DB/62B96000 in log segment
00000000000000DF0000004C,offset 12148736 

One thing I noticed is that the network is not the most stable.  When I ran wireshark capture on port 5432, I saw
numerouserrors and warning like 
    "New fragment overlaps old data (retransmission?)"
    "This frame is a (suspected) out-of-order segment"
    "This frame is a (suspected) retransmission"

So the questions are, why did the standby server crash?  Could the network instability be the cause for the crash?

Thank you in advance for any info.
Seong





pgsql-general by date:

Previous
From: Jeff Janes
Date:
Subject: Re: [GENERAL] How to make server generate more output?
Next
From: Igor Korot
Date:
Subject: [GENERAL] Where is pg_hba.conf