Re: Postgres abort found in 9.3.11 - Mailing list pgsql-hackers

From K S, Sandhya (Nokia - IN/Bangalore)
Subject Re: Postgres abort found in 9.3.11
Date
Msg-id DB5PR07MB154156B5B062C8769E8A569ED6E20@DB5PR07MB1541.eurprd07.prod.outlook.com
Whole thread Raw
In response to Re: Postgres abort found in 9.3.11  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hello Tom,

Apologies for delayed reply.

Our setup is a hot-standby architecture. This crash is occurring only on stand-by node. Postgres continues to run
withoutany issues on active node. 
Postmaster is waiting for a start and is throwing this message.

Aug 22 11:44:21.462555 info node-0 postgres[8222]: [1-2] HINT:  Is another postmaster already running on port 5433? If
not,wait a few seconds and retry.   
Aug 22 11:44:52.065760 crit node-1 postgres[8629]: [18-1] err-3:  btree_xlog_delete_get_latestRemovedXid: cannot
operatewith inconsistent dataAug 22 11:44:52.065971 crit CFPU-1 postgres[8629]: [18-2] CONTEXT:  xlog redo delete:
index1663/16386/17378; iblk 1, heap 1663/16386/16518; 
Aug 22 11:44:52.085486 info node-1 coredumper: Generating core file

The standby postgres recovers automatically on next restart. This is because we always copy db freshly from active node
onrestart. 

We implemented one patch to force kill walsender on active side. This is done to avoid prolonged wait if standby node
isnot reachable (for eg. Force power off or LAN cable removal). This implementation exists from long time. However the
issueonly recently observed after upgrading to 9.3.11. Do you think this force kill of walsender might lead to such
issuesin latest postgres? 


Regards,
Sandhya

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, August 30, 2016 5:09 PM
To: K S, Sandhya (Nokia - IN/Bangalore) <sandhya.k_s@nokia.com>
Cc: pgsql-hackers@postgresql.org; Itnal, Prakash (Nokia - IN/Bangalore) <prakash.itnal@nokia.com>
Subject: Re: [HACKERS] Postgres abort found in 9.3.11

"K S, Sandhya (Nokia - IN/Bangalore)" <sandhya.k_s@nokia.com> writes:
> During the server restart, we are getting postgres crash with sigabrt. No other operation being performed.
> Attached the backtrace.

What shows up in the postmaster log?

> The occurrence is occasional. The issue is seen once in 30~50 times.

Does it successfully restart if you try again?  If not, what are you
doing to recover?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: amcheck (B-Tree integrity checking tool)
Next
From: Tom Lane
Date:
Subject: Re: [PATCH] COPY vs \copy HINT