Re: BUG #6200: standby bad memory allocations on SELECT - Mailing list pgsql-bugs

From Robert Haas
Subject Re: BUG #6200: standby bad memory allocations on SELECT
Date
Msg-id CA+TgmoZ6S5e46xThsHKv6-vV58f==D4_TH_ECB2sQsZRngL+8Q@mail.gmail.com
Whole thread Raw
In response to Re: BUG #6200: standby bad memory allocations on SELECT  (Bridget Frey <bridget.frey@redfin.com>)
Responses Re: BUG #6200: standby bad memory allocations on SELECT  (Bridget Frey <bridget.frey@redfin.com>)
List pgsql-bugs
On Mon, Jan 23, 2012 at 3:22 PM, Bridget Frey <bridget.frey@redfin.com> wro=
te:
> Hello,
> We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing=
 an
> issue that seems very similar to the one reported as bug 6200.=A0 We see
> approximately 2 dozen alloc errors per day across 3 slaves, and we are
> getting one segfault approximately every 3 days.=A0 We did not experience=
 this
> issue before our upgrade (we were on version 8.4, and used skytools for
> replication).
>
> We are attempting to get a core dump on segfault (our last attempt did not
> work due to a config issue for the core dump).=A0 We're also attempting to
> repro the alloc errors on a test setup, but it seems like we may need qui=
te
> a bit of load to trigger the issue.=A0 We're not certain that the alloc i=
ssues
> and the sefaults are "the same issue" - but it seems that it may be since
> the OP for bug 6200 sees the same behavior.=A0 We have seen no issues on =
the
> master, all alloc errors and segfaults have been on the slaves.
>
> We've seen the alloc errors on a few different tables, but most frequently
> on logins.=A0 Rows are added to the logins table one-by-one, and updates
> generally happen one row at a time.=A0 The table is pretty basic, it looks
> like this...
>
> CREATE TABLE logins
> (
> =A0 login_id bigserial NOT NULL,
> =A0 <snip - a bunch of columns>
> =A0 CONSTRAINT logins_pkey PRIMARY KEY (login_id ),
> =A0 <snip - some other constraints...>
> )
> WITH (
> =A0 FILLFACTOR=3D80,
> =A0 OIDS=3DFALSE
> );
>
> The queries that trigger the alloc error on this table look like this (we
> use hibernate hence the funny underscoring...)
> select login0_.login_id as login1_468_0_, l...=A0 from logins login0_ whe=
re
> login0_.login_id=3D$1
>
> The alloc error in the logs looks like this:
> -01-12_080925.log:2012-01-12 17:33:46 PST [16034]: [7-1] [24/25934] ERROR:
> invalid memory alloc request size 18446744073709551613
>
> The alloc error is nearly always for size 18446744073709551613 - though we
> have seen one time where it was a different amount...

Hmm, that number in hex works out to 0xfffffffffffffffd, which makes
it sound an awful lot like the system (for some unknown reason)
attempted to allocate -3 bytes of memory.  I've seen something like
this once before on a customer system running a modified version of
PostgreSQL.  In that case, the problem turned out to be page
corruption.  Circumstances didn't permit determination of the root
cause of the page corruption, however, nor was I able to figure out
exactly how the corruption I saw resulted in an allocation request
like this.  It would be nice to figure out where in the code this is
happening and put in a higher-level guard so that we get a better
error message.

You want want to compile a modified PostgreSQL executable that puts an
extremely long sleep (like a year) just before this error is reported.
 Then, when the system hangs at that point, you can attach a debugger
and pull a stack backtrace.  Or you could insert an abort() at that
point in the code and get a backtrace from the core dump.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-bugs by date:

Previous
From: Dharmendra Goyal
Date:
Subject: Re: Windows x86-64 One-Click Install (9.1.2-1, 9.0.6-1) hangs on "initialising the database cluster" (with work-around)
Next
From: Marko Kreen
Date:
Subject: Re: pgcrypto decrypt_iv() issue