Re: Critical failure of standby - Mailing list pgsql-general

From James Sewell
Subject Re: Critical failure of standby
Date
Msg-id CAANVwEso9qL8h0xyk5sPgHz5+SpcyMJZJc1O5kdbGkX+-=2kLA@mail.gmail.com
Whole thread Raw
In response to Re: Critical failure of standby  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-general
Hello,

I double posted this (posted once from an unregistered email and assumed it would be junked).

I'm continuing all discussion on the other thread now.

Cheers,

James Sewell,
PostgreSQL Team Lead / Solutions Architect 

 

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009

On Sat, Aug 13, 2016 at 5:20 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
James Sewell wrote:

> 2016-08-12 04:43:53 GMT [23614]: [5-1] user=,db=,client=  (0:00000)LOG:  consistent recovery state reached at 3/8811DFF0
> 2016-08-12 04:43:53 GMT [23614]: [6-1] user=,db=,client=  (0:XX000)FATAL:  invalid memory alloc request size 3445219328
> 2016-08-12 04:43:53 GMT [23612]: [3-1] user=,db=,client=  (0:00000)LOG:  database system is ready to accept read only connections
> 2016-08-12 04:43:53 GMT [23612]: [4-1] user=,db=,client=  (0:00000)LOG:  startup process (PID 23614) exited with exit code 1
> 2016-08-12 04:43:53 GMT [23612]: [5-1] user=,db=,client=  (0:00000)LOG:  terminating any other active server processes
> 2016-08-12 04:43:53 GMT [23612]: [6-1] user=,db=,client=  (0:00000)LOG:  archiver process (PID 23627) exited with exit code 1

What version is this?

Hm, so the startup process finds the consistent point (which signals
postmaster so that line 23612/3 says "ready to accept read-only conns")
and immediately dies because of the invalid memory alloc error.  I
suppose that error must be while trying to process some xlog record, but
without a xlog address it's difficult to say anything.  I suppose you
could try to pg_xlogdump WAL starting at the last known good address
3/8811DFF0 but I wouldn't know what to look for.

One strange thing is that xlog replay sets up an error context, so you
would have had a line like "xlog redo HEAP" etc, but there's nothing
here.  So maybe the allocation is not exactly in xlog replay, but
something different.  We'd need to see a backtrace in order to see what.
Since this occurs in the startup process, probably the easiest way is to
patch the source to turn that error into PANIC, then re-run and examine
the resulting core file.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

pgsql-general by date:

Previous
From: James Sewell
Date:
Subject: Re: Critical failure of standby
Next
From: Patrick B
Date:
Subject: Re: PK Index - Removal