Re: Critical failure of standby - Mailing list pgsql-general

From Alvaro Herrera
Subject Re: Critical failure of standby
Date
Msg-id 20160812192035.GA703905@alvherre.pgsql
Whole thread Raw
In response to Critical failure of standby  (James Sewell <james.sewell@jirotech.com>)
Responses Re: Critical failure of standby  (James Sewell <james.sewell@jirotech.com>)
Re: Critical failure of standby  (James Sewell <james.sewell@jirotech.com>)
Re: Critical failure of standby  (James Sewell <james.sewell@jirotech.com>)
List pgsql-general
James Sewell wrote:

> 2016-08-12 04:43:53 GMT [23614]: [5-1] user=,db=,client=  (0:00000)LOG:  consistent recovery state reached at
3/8811DFF0
> 2016-08-12 04:43:53 GMT [23614]: [6-1] user=,db=,client=  (0:XX000)FATAL:  invalid memory alloc request size
3445219328
> 2016-08-12 04:43:53 GMT [23612]: [3-1] user=,db=,client=  (0:00000)LOG:  database system is ready to accept read only
connections
> 2016-08-12 04:43:53 GMT [23612]: [4-1] user=,db=,client=  (0:00000)LOG:  startup process (PID 23614) exited with exit
code1 
> 2016-08-12 04:43:53 GMT [23612]: [5-1] user=,db=,client=  (0:00000)LOG:  terminating any other active server
processes
> 2016-08-12 04:43:53 GMT [23612]: [6-1] user=,db=,client=  (0:00000)LOG:  archiver process (PID 23627) exited with
exitcode 1 

What version is this?

Hm, so the startup process finds the consistent point (which signals
postmaster so that line 23612/3 says "ready to accept read-only conns")
and immediately dies because of the invalid memory alloc error.  I
suppose that error must be while trying to process some xlog record, but
without a xlog address it's difficult to say anything.  I suppose you
could try to pg_xlogdump WAL starting at the last known good address
3/8811DFF0 but I wouldn't know what to look for.

One strange thing is that xlog replay sets up an error context, so you
would have had a line like "xlog redo HEAP" etc, but there's nothing
here.  So maybe the allocation is not exactly in xlog replay, but
something different.  We'd need to see a backtrace in order to see what.
Since this occurs in the startup process, probably the easiest way is to
patch the source to turn that error into PANIC, then re-run and examine
the resulting core file.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Corrupted Data ?
Next
From: Hannes Erven
Date:
Subject: Re: How to parse xml containing optional elements