Home > mailing lists

Re: Postgres abort found in 9.3.11 - Mailing list pgsql-hackers

From	K S, Sandhya (Nokia - IN/Bangalore)
Subject	Re: Postgres abort found in 9.3.11
Date	September 2, 2016 21:34:10
Msg-id	DB5PR07MB154156B5B062C8769E8A569ED6E20@DB5PR07MB1541.eurprd07.prod.outlook.com Whole thread Raw
In response to	Re: Postgres abort found in 9.3.11 (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Hello Tom,

Apologies for delayed reply.

Our setup is a hot-standby architecture. This crash is occurring only on stand-by node. Postgres continues to run
withoutany issues on active node. 
Postmaster is waiting for a start and is throwing this message.

Aug 22 11:44:21.462555 info node-0 postgres[8222]: [1-2] HINT:  Is another postmaster already running on port 5433? If
not,wait a few seconds and retry.   
Aug 22 11:44:52.065760 crit node-1 postgres[8629]: [18-1] err-3:  btree_xlog_delete_get_latestRemovedXid: cannot
operatewith inconsistent dataAug 22 11:44:52.065971 crit CFPU-1 postgres[8629]: [18-2] CONTEXT:  xlog redo delete:
index1663/16386/17378; iblk 1, heap 1663/16386/16518; 
Aug 22 11:44:52.085486 info node-1 coredumper: Generating core file

The standby postgres recovers automatically on next restart. This is because we always copy db freshly from active node
onrestart. 

We implemented one patch to force kill walsender on active side. This is done to avoid prolonged wait if standby node
isnot reachable (for eg. Force power off or LAN cable removal). This implementation exists from long time. However the
issueonly recently observed after upgrading to 9.3.11. Do you think this force kill of walsender might lead to such
issuesin latest postgres? 

Regards,
Sandhya

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, August 30, 2016 5:09 PM
To: K S, Sandhya (Nokia - IN/Bangalore) <sandhya.k_s@nokia.com>
Cc: pgsql-hackers@postgresql.org; Itnal, Prakash (Nokia - IN/Bangalore) <prakash.itnal@nokia.com>
Subject: Re: [HACKERS] Postgres abort found in 9.3.11

"K S, Sandhya (Nokia - IN/Bangalore)" <sandhya.k_s@nokia.com> writes:
> During the server restart, we are getting postgres crash with sigabrt. No other operation being performed.
> Attached the backtrace.

What shows up in the postmaster log?

> The occurrence is occasional. The issue is seen once in 30~50 times.

Does it successfully restart if you try again?  If not, what are you
doing to recover?
        regards, tom lane

pgsql-hackers by date:

From: Peter Geoghegan
Date: 02 September 2016, 21:19:56
Subject: Re: amcheck (B-Tree integrity checking tool)

From: Tom Lane
Date: 02 September 2016, 21:54:07
Subject: Re: [PATCH] COPY vs \copy HINT

Re: Postgres abort found in 9.3.11 - Mailing list pgsql-hackers

Previous

Next