Thread: BUG #6200: standby bad memory allocations on SELECT

BUG #6200: standby bad memory allocations on SELECT

From
"Daniel Farina"
Date:
The following bug has been logged online:

Bug reference:      6200
Logged by:          Daniel Farina
Email address:      daniel@heroku.com
PostgreSQL version: 9.0.4
Operating system:   Ubuntu 10.04
Description:        standby bad memory allocations on SELECT
Details:

A huge thanks to Conrad Irwin of Rapportive for furnishing virtually all the
details of this bug report.

The following error occurs up to a couple of times a day on a busy
PostgreSQL database:

 ERROR: invalid memory alloc request size 18446744073709551613

The occurrence rate is somewhere in the one per tens-of-millions of
queries.

The number is always the same (2**64 - 3), and there's no obvious
pattern in the distribution of errors (they don't even appear to be
correlated with system load). The error has not been recorded on the
primary database, even though the same workload is submitted.

These errors do not reproduce, seeming to evaporate almost immediately on
the standby, so durable/long lasting index corruption is not likely.  This
problem has persisted among multiple generations of hot standbys on
different hardware and sourced from different base backups.

At least once, a hot standby was promoted to a primary and the errors seem
to discontinue, but then reappear on a newly-provisioned standby.

The VERSION() string is:
   PostgreSQL 9.0.4 on x86_64-pc-linux-gnu, compiled by GCC
       gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit

The problem is confined to a particular access patterns and schema objects,
enumerated below:

The points2 table looks like:

                                       Table "public.points2"
     Column      |            Type             |
Modifiers
------------------+-----------------------------+---------------------------
---------------------------
 id       | integer  | not null default nextval('points2_id_seq'::regclass)
 identifier       | text                        | not null
 scope_id         | integer                     | not null
 class_number     | smallint                    | not null
 authorization_id | integer                     | not null
 sum_json         | text                        | not null
 amended_at       | timestamp without time zone | not null
Indexes:
   "points2_pkey" PRIMARY KEY, btree (id)
   "points2_special" btree (identifier_hash(identifier), scope_id,
       class_number, authorization_id)

CREATE FUNCTION identifier_hash(text) RETURNS bigint IMMUTABLE
LANGUAGE SQL AS $$
 SELECT ('x' || md5($1))::bit(64)::bigint;
$$;

This has only been seen on queries of the form:

SELECT * FROM "points2" WHERE
    (identifier_hash(identifier) = identifier_hash('1104131405')
    AND identifier = '1104131405'
    AND scope_id = 0
    AND authorization_id = 0
    AND class_number = 25)

Though this table is accessed similarly frequently by queries of the form:

SELECT points2.* FROM points2
   JOIN (VALUES (8585261297509044776, 0, 47,
        'ae9064e6f24127c6a1f483cd71e14e64'))
   AS query(hashed_identifier, scope_id, class_number, identifier)
   ON identifier_hash(points2.identifier) = query.hashed_identifier
   AND points2.scope_id = query.scope_id
   AND points2.class_number = query.class_number
   AND points2.identifier = query.identifier;

these do not trigger the problem.

The table is always updated to or inserted into one row at a time
(using the "id" primary key for updates), though we sometimes update
multiple rows in a single transaction, synchronous_commit is turned off for
connections that touch the points2 table on the primary.

Re: BUG #6200: standby bad memory allocations on SELECT

From
Tom Lane
Date:
"Daniel Farina" <daniel@heroku.com> writes:
> A huge thanks to Conrad Irwin of Rapportive for furnishing virtually all the
> details of this bug report.

This isn't really enough information to reproduce the problem ...

> The occurrence rate is somewhere in the one per tens-of-millions of
> queries.

... and that statement is going to discourage anyone from even trying,
since with such a low occurrence rate it's going to be impossible to be
sure whether the setup to reproduce the problem is correct.  So if you'd
like this to be fixed, you're either going to need to show us exactly
how to reproduce it, or investigate it yourself.

The way that I'd personally proceed to investigate it would probably be
to change the "invalid memory alloc request size" size errors (in
src/backend/utils/mmgr/mcxt.c; there are about four occurrences) from
ERROR to PANIC so that they'll provoke a core dump, and then use gdb
to get a stack trace, which would provide at least a little more
information about what happened.  However, if you are only able to
reproduce it in a production server, you might not like that approach.
Perhaps you can set up an extra standby that's only there for testing,
so you don't mind if it crashes?

            regards, tom lane

Re: BUG #6200: standby bad memory allocations on SELECT

From
Heikki Linnakangas
Date:
On 09.09.2011 18:02, Tom Lane wrote:
> The way that I'd personally proceed to investigate it would probably be
> to change the "invalid memory alloc request size" size errors (in
> src/backend/utils/mmgr/mcxt.c; there are about four occurrences) from
> ERROR to PANIC so that they'll provoke a core dump, and then use gdb
> to get a stack trace, which would provide at least a little more
> information about what happened.  However, if you are only able to
> reproduce it in a production server, you might not like that approach.
> Perhaps you can set up an extra standby that's only there for testing,
> so you don't mind if it crashes?

If that's not possible or doesn't reproduce the issue, there's also
functions in glibc to produce a backtrace without aborting the program:
https://www.gnu.org/s/libc/manual/html_node/Backtraces.html.

I think you could also fork() + abort() to generate a core dump, not
just a backtrace.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: BUG #6200: standby bad memory allocations on SELECT

From
Simon Riggs
Date:
On Thu, Sep 8, 2011 at 11:33 PM, Daniel Farina <daniel@heroku.com> wrote:

> =A0ERROR: invalid memory alloc request size 18446744073709551613

> At least once, a hot standby was promoted to a primary and the errors seem
> to discontinue, but then reappear on a newly-provisioned standby.

So the query that fails is a btree index on a hot standby. I don't
fully accept it as an HS bug, but lets assume that it is and analyse
what could cause it.

The MO is certain user queries, only observed in HS. So certain
queries might be related to the way we use indexes or not.

There is a single and small difference between how a btree index
operates in HS and "normal" operation, which relates to whether we
kill tuples in the index. That's simple code and there's no obvious
bugs there, nor anything that specifically allocates memory even. So
the only bug that springs to mind is something related to how we
navigate hot chains with/without killed tuples. i.e. the bug is not
actually HS related, but is only observed under conditions typical in
HS.

HS touches almost nothing else in user space, apart from snapshots. So
there could be a bug there also, maybe in CopySnapshot().

--=20
=A0Simon Riggs=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 http:/=
/www.2ndQuadrant.com/
=A0PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #6200: standby bad memory allocations on SELECT

From
Bridget Frey
Date:
Hello,
We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing
an issue that seems very similar to the one reported as bug 6200.  We see
approximately 2 dozen alloc errors per day across 3 slaves, and we are
getting one segfault approximately every 3 days.  We did not experience
this issue before our upgrade (we were on version 8.4, and used skytools
for replication).

We are attempting to get a core dump on segfault (our last attempt did not
work due to a config issue for the core dump).  We're also attempting to
repro the alloc errors on a test setup, but it seems like we may need quite
a bit of load to trigger the issue.  We're not certain that the alloc
issues and the sefaults are "the same issue" - but it seems that it may be
since the OP for bug 6200 sees the same behavior.  We have seen no issues
on the master, all alloc errors and segfaults have been on the slaves.

We've seen the alloc errors on a few different tables, but most frequently
on logins.  Rows are added to the logins table one-by-one, and updates
generally happen one row at a time.  The table is pretty basic, it looks
like this...

CREATE TABLE logins
(
  login_id bigserial NOT NULL,
  <snip - a bunch of columns>
  CONSTRAINT logins_pkey PRIMARY KEY (login_id ),
  <snip - some other constraints...>
)
WITH (
  FILLFACTOR=80,
  OIDS=FALSE
);

The queries that trigger the alloc error on this table look like this (we
use hibernate hence the funny underscoring...)
select login0_.login_id as login1_468_0_, l...  from logins login0_ where
login0_.login_id=$1

The alloc error in the logs looks like this:
-01-12_080925.log:2012-01-12 17:33:46 PST [16034]: [7-1] [24/25934] ERROR:
invalid memory alloc request size 18446744073709551613

The alloc error is nearly always for size 18446744073709551613 - though we
have seen one time where it was a different amount...

We have been in touch with the OP for bug 6200, who said he may have time
to help us out a bit on debugging this.  It seems like what is being
suggested is getting a build of postgres that will capture a stack trace
for each alloc issue and/or simply dump core when that happens.  As this is
a production system we would prefer the former.  As I mentioned above we're
also trying to get a core dump for the segfault.

We are treating this as extremely high priority as it is currently causing
2 dozen failures for users of our site per day, as well as a few min of
downtime for the segfault every 3 days.  I realize there may be little that
the postgres experts can do until we provide more information - but since
our use case is really not very complicated here (basic use of HS), and
another site is also experiencing it, I figured it would be worth posting
about what we're seeing.

Thanks,
-Bridget Frey
Redfin

Re: BUG #6200: standby bad memory allocations on SELECT

From
Robert Haas
Date:
On Mon, Jan 23, 2012 at 3:22 PM, Bridget Frey <bridget.frey@redfin.com> wro=
te:
> Hello,
> We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing=
 an
> issue that seems very similar to the one reported as bug 6200.=A0 We see
> approximately 2 dozen alloc errors per day across 3 slaves, and we are
> getting one segfault approximately every 3 days.=A0 We did not experience=
 this
> issue before our upgrade (we were on version 8.4, and used skytools for
> replication).
>
> We are attempting to get a core dump on segfault (our last attempt did not
> work due to a config issue for the core dump).=A0 We're also attempting to
> repro the alloc errors on a test setup, but it seems like we may need qui=
te
> a bit of load to trigger the issue.=A0 We're not certain that the alloc i=
ssues
> and the sefaults are "the same issue" - but it seems that it may be since
> the OP for bug 6200 sees the same behavior.=A0 We have seen no issues on =
the
> master, all alloc errors and segfaults have been on the slaves.
>
> We've seen the alloc errors on a few different tables, but most frequently
> on logins.=A0 Rows are added to the logins table one-by-one, and updates
> generally happen one row at a time.=A0 The table is pretty basic, it looks
> like this...
>
> CREATE TABLE logins
> (
> =A0 login_id bigserial NOT NULL,
> =A0 <snip - a bunch of columns>
> =A0 CONSTRAINT logins_pkey PRIMARY KEY (login_id ),
> =A0 <snip - some other constraints...>
> )
> WITH (
> =A0 FILLFACTOR=3D80,
> =A0 OIDS=3DFALSE
> );
>
> The queries that trigger the alloc error on this table look like this (we
> use hibernate hence the funny underscoring...)
> select login0_.login_id as login1_468_0_, l...=A0 from logins login0_ whe=
re
> login0_.login_id=3D$1
>
> The alloc error in the logs looks like this:
> -01-12_080925.log:2012-01-12 17:33:46 PST [16034]: [7-1] [24/25934] ERROR:
> invalid memory alloc request size 18446744073709551613
>
> The alloc error is nearly always for size 18446744073709551613 - though we
> have seen one time where it was a different amount...

Hmm, that number in hex works out to 0xfffffffffffffffd, which makes
it sound an awful lot like the system (for some unknown reason)
attempted to allocate -3 bytes of memory.  I've seen something like
this once before on a customer system running a modified version of
PostgreSQL.  In that case, the problem turned out to be page
corruption.  Circumstances didn't permit determination of the root
cause of the page corruption, however, nor was I able to figure out
exactly how the corruption I saw resulted in an allocation request
like this.  It would be nice to figure out where in the code this is
happening and put in a higher-level guard so that we get a better
error message.

You want want to compile a modified PostgreSQL executable that puts an
extremely long sleep (like a year) just before this error is reported.
 Then, when the system hangs at that point, you can attach a debugger
and pull a stack backtrace.  Or you could insert an abort() at that
point in the code and get a backtrace from the core dump.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6200: standby bad memory allocations on SELECT

From
Robert Haas
Date:
On Fri, Jan 27, 2012 at 1:31 PM, Bridget Frey <bridget.frey@redfin.com> wro=
te:
> Thanks for the info - that's very helpful.=A0 We had also noted that the =
alloc
> seems to be -3 bytes.=A0 We have run pg_check and it found no instances of
> corruption. We've also replayed queries that have failed, and have never
> been able to get the same query to fail twice.=A0 In the case you
> investigated, was there permanent page corruption - e.g. you could run the
> same query over and over and get the same result?

Yes.  I observed that the infomask bits on several tuples had somehow
been overwritten by nonsense.  I am not sure whether there were other
kinds of corruption as well - I suspect probably so - but that's the
only one I saw with my own eyes, courtesy of pg_filedump.

> It really does seem like this is an issue either in Hot Standby or very
> closely related to that feature, where there is temporary corruption of a
> btree index that then disappears.=A0 Our master is not experiencing any m=
alloc
> issues, while the 3 slaves get about a dozen per day, despite similar
> workloads.=A0 We haven't have a slave segfault since we set it up to prod=
uce a
> core dump, but we're expecting to have that within the next few days
> (assuming we'll continue to get a segfault every 3-4 days).=A0 We're also
> planning to set up one slave that will panic when it gets a malloc issue,=
 as
> you (and other posters on 6400) had suggested.
>
> Thanks again for the help, and we'll keep you posted as we learn more...

The case I investigated involved corruption on the master, and I think
it predated Hot Standby.  However, the symptom is generic enough that
it seems quite possible that there's more than one way for it to
happen.  :-(

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6200: standby bad memory allocations on SELECT

From
Bridget Frey
Date:
Thanks for the info - that's very helpful.  We had also noted that the
alloc seems to be -3 bytes.  We have run pg_check and it found no instances
of corruption. We've also replayed queries that have failed, and have never
been able to get the same query to fail twice.  In the case you
investigated, was there permanent page corruption - e.g. you could run the
same query over and over and get the same result?

It really does seem like this is an issue either in Hot Standby or very
closely related to that feature, where there is temporary corruption of a
btree index that then disappears.  Our master is not experiencing any
malloc issues, while the 3 slaves get about a dozen per day, despite
similar workloads.  We haven't have a slave segfault since we set it up to
produce a core dump, but we're expecting to have that within the next few
days (assuming we'll continue to get a segfault every 3-4 days).  We're
also planning to set up one slave that will panic when it gets a malloc
issue, as you (and other posters on 6400) had suggested.

Thanks again for the help, and we'll keep you posted as we learn more...
-B

On Fri, Jan 27, 2012 at 6:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> On Mon, Jan 23, 2012 at 3:22 PM, Bridget Frey <bridget.frey@redfin.com>
> wrote:
> > Hello,
> > We upgraded to postgres 9.1.2 two weeks ago, and we are also
> experiencing an
> > issue that seems very similar to the one reported as bug 6200.  We see
> > approximately 2 dozen alloc errors per day across 3 slaves, and we are
> > getting one segfault approximately every 3 days.  We did not experience
> this
> > issue before our upgrade (we were on version 8.4, and used skytools for
> > replication).
> >
> > We are attempting to get a core dump on segfault (our last attempt did
> not
> > work due to a config issue for the core dump).  We're also attempting to
> > repro the alloc errors on a test setup, but it seems like we may need
> quite
> > a bit of load to trigger the issue.  We're not certain that the alloc
> issues
> > and the sefaults are "the same issue" - but it seems that it may be since
> > the OP for bug 6200 sees the same behavior.  We have seen no issues on
> the
> > master, all alloc errors and segfaults have been on the slaves.
> >
> > We've seen the alloc errors on a few different tables, but most
> frequently
> > on logins.  Rows are added to the logins table one-by-one, and updates
> > generally happen one row at a time.  The table is pretty basic, it looks
> > like this...
> >
> > CREATE TABLE logins
> > (
> >   login_id bigserial NOT NULL,
> >   <snip - a bunch of columns>
> >   CONSTRAINT logins_pkey PRIMARY KEY (login_id ),
> >   <snip - some other constraints...>
> > )
> > WITH (
> >   FILLFACTOR=80,
> >   OIDS=FALSE
> > );
> >
> > The queries that trigger the alloc error on this table look like this (we
> > use hibernate hence the funny underscoring...)
> > select login0_.login_id as login1_468_0_, l...  from logins login0_ where
> > login0_.login_id=$1
> >
> > The alloc error in the logs looks like this:
> > -01-12_080925.log:2012-01-12 17:33:46 PST [16034]: [7-1] [24/25934]
> ERROR:
> > invalid memory alloc request size 18446744073709551613
> >
> > The alloc error is nearly always for size 18446744073709551613 - though
> we
> > have seen one time where it was a different amount...
>
> Hmm, that number in hex works out to 0xfffffffffffffffd, which makes
> it sound an awful lot like the system (for some unknown reason)
> attempted to allocate -3 bytes of memory.  I've seen something like
> this once before on a customer system running a modified version of
> PostgreSQL.  In that case, the problem turned out to be page
> corruption.  Circumstances didn't permit determination of the root
> cause of the page corruption, however, nor was I able to figure out
> exactly how the corruption I saw resulted in an allocation request
> like this.  It would be nice to figure out where in the code this is
> happening and put in a higher-level guard so that we get a better
> error message.
>
> You want want to compile a modified PostgreSQL executable that puts an
> extremely long sleep (like a year) just before this error is reported.
>  Then, when the system hangs at that point, you can attach a debugger
> and pull a stack backtrace.  Or you could insert an abort() at that
> point in the code and get a backtrace from the core dump.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



--
Bridget Frey  Director, Data & Analytics Engineering | Redfin

bridget.frey@redfin.com | tel: 206.576.5894

Re: BUG #6200: standby bad memory allocations on SELECT

From
Michael Brauwerman
Date:
I work with Bridget at Redfin.

We have a core dump from a once-in-5-days (multi-million queries) hot
standby segfault in pg 9.1.2 . (It might or might be the same root issue as
the "alloc" errors. If I should file a new bug report, let me know.

The postgres executable that crashed did not have debugging symbols
installed, and we were unable to debug (gdb) the core file using a debug
build of postgres. (Symbols didn't match.) Running gdb against a non-debug
postgres executable gave us this stack trace:


[root@query-7 core]# gdb -q -c  /postgres/core/query-9.core.19678
/usr/pgsql-9.1/bin/postgres-non-debug
Reading symbols from /usr/pgsql-9.1/bin/postgres-non-debug...(no debugging
symbols found)...done.

warning: core file may not match specified executable file.
[New Thread 19678]

warning: no loadable sections found in added symbol-file system-supplied
DSO at 0x7fffdcd58000
Core was generated by `postgres: datamover stingray_prod 10.11.0.134(54140)
SELEC'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000045694c in nocachegetattr ()



(gdb) bt
#0  0x000000000045694c in nocachegetattr ()
#1  0x00000000006f93c9 in ?? ()
#2  0x00000000006fa231 in tuplesort_puttupleslot ()
#3  0x0000000000573ad1 in ExecSort ()
#4  0x000000000055cdda in ExecProcNode ()
#5  0x000000000055bcd1 in standard_ExecutorRun ()
#6  0x0000000000623594 in ?? ()
#7  0x0000000000624ae0 in PortalRun ()
#8  0x00000000006220f2 in PostgresMain ()
#9  0x00000000005e6ba4 in ?? ()
#10 0x00000000005e791c in PostmasterMain ()
#11 0x000000000058b9ae in main ()



We have the (5GB) core file, and are happy to do any more forensics anyone
can advise.

Please instruct.

I hope this helps point to a root cause and resolution....

Thank you,

Mike Brauwerman
Data Team, Redfin

On Fri, Jan 27, 2012 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> On Fri, Jan 27, 2012 at 1:31 PM, Bridget Frey <bridget.frey@redfin.com>
> wrote:
> > Thanks for the info - that's very helpful.  We had also noted that the
> alloc
> > seems to be -3 bytes.  We have run pg_check and it found no instances of
> > corruption. We've also replayed queries that have failed, and have never
> > been able to get the same query to fail twice.  In the case you
> > investigated, was there permanent page corruption - e.g. you could run
> the
> > same query over and over and get the same result?
>
> Yes.  I observed that the infomask bits on several tuples had somehow
> been overwritten by nonsense.  I am not sure whether there were other
> kinds of corruption as well - I suspect probably so - but that's the
> only one I saw with my own eyes, courtesy of pg_filedump.
>
> > It really does seem like this is an issue either in Hot Standby or very
> > closely related to that feature, where there is temporary corruption of a
> > btree index that then disappears.  Our master is not experiencing any
> malloc
> > issues, while the 3 slaves get about a dozen per day, despite similar
> > workloads.  We haven't have a slave segfault since we set it up to
> produce a
> > core dump, but we're expecting to have that within the next few days
> > (assuming we'll continue to get a segfault every 3-4 days).  We're also
> > planning to set up one slave that will panic when it gets a malloc
> issue, as
> > you (and other posters on 6400) had suggested.
> >
> > Thanks again for the help, and we'll keep you posted as we learn more...
>
> The case I investigated involved corruption on the master, and I think
> it predated Hot Standby.  However, the symptom is generic enough that
> it seems quite possible that there's more than one way for it to
> happen.  :-(
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>



--
Mike Brauwerman
Data Team, Redfin

Re: BUG #6200: standby bad memory allocations on SELECT

From
Peter Geoghegan
Date:
On 28 January 2012 21:34, Michael Brauwerman
<michael.brauwerman@redfin.com> wrote:
> We have the (5GB) core file, and are happy to do any more forensics anyone
> can advise.

Ideally, you'd be able to install debug information packages, which
should give a more detailed and useful stack trace, as described here:

http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreS=
QL_backend_on_Linux/BSD

--=20
Peter Geoghegan =A0 =A0 =A0 http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: BUG #6200: standby bad memory allocations on SELECT

From
Michael Brauwerman
Date:
We did try that with a postgres 9.1.2, compiled from source with debug
flags, but we got 0x10 bad address in gdb. (Obviously we did it wrong
somehow)

We will keep trying to get a good set of symbols set up.
On Jan 28, 2012 2:34 PM, "Peter Geoghegan" <peter@2ndquadrant.com> wrote:

> On 28 January 2012 21:34, Michael Brauwerman
> <michael.brauwerman@redfin.com> wrote:
> > We have the (5GB) core file, and are happy to do any more forensics
> anyone
> > can advise.
>
> Ideally, you'd be able to install debug information packages, which
> should give a more detailed and useful stack trace, as described here:
>
>
> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
>
> --
> Peter Geoghegan       http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training and Services
>

Re: BUG #6200: standby bad memory allocations on SELECT

From
Robert Haas
Date:
On Sat, Jan 28, 2012 at 8:45 PM, Michael Brauwerman
<michael.brauwerman@redfin.com> wrote:
> We did try that with a postgres 9.1.2, compiled from source with debug
> flags, but we got 0x10 bad address in gdb. (Obviously we did it wrong
> somehow)
>
> We will keep trying to get a good set of symbols set up.

Hmm.  Your backtrace is, on the one hand, unsurprising - because I
don't have complete confidence that nocachegetattr won't index off
into oblivion given corrupt data on disk - and on the other hand,
unconvincing, mostly because of this:

warning: core file may not match specified executable file.

If you're doing a custom build, it should be enough to use
--enable-debug and then gdb the core file with the path to the
executable you built.  If you're using Red Hat, you can just do
debuginfo-install package-name, where package-name is probably
something like "postgresql-server".  But either way, it's critical
that the executable you use to pull the backtrace be exactly the same
one that was running when the system went belly-up, and it's hard to
be confident that's the case here.

All that having been said, it *is* possible, even if you do everything
right, to run across a situation where gdb can't extract a backtrace.
This mostly commonly happens at higher optimization levels.  You might
want to try building -fno-omit-frame-pointer or even -O0 to minimize
the chances of the compiler doing something that's too clever for gdb.
 If you do manage to get it working correctly, the backtrace should
show not only ALL (rather than only some) functions in the call stack
but also the values of the arguments passed to each of those
functions, which, probably needless to say, would be awfully helpful
in figuring this one out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6200: standby bad memory allocations on SELECT

From
Bridget Frey
Date:
All right, so we were able to get a full bt of the alloc error on a test
system.  Also, since we have a lot of emails going around on this - I
wanted to make it clear that we're seeing *two* production errors, which
may or may not be related.  (The OP for bug #6200 also sees both issues.)
One is a segfault, our emails over the weekend were for that, and we don't
have a complete bt for that (we're going to install our own build of
postrges to get that.)

The second error is an invalid memory alloc error that we're getting ~2
dozen times per day in production.  The bt for this alloc error is below.

Please note also that this is from a version of postgres which we modified
to panic when we hit the invalid memory alloc line.  So, postgres won't
actually segfault here, it just outputs an error and continues.  We
modified this to panic so we could catch a bt, and it looks like we did get
one :)

Anyway, here goes...

#0  0x0000003a83e30265 in raise () from /lib64/libc.so.6
#1  0x0000003a83e31d10 in abort () from /lib64/libc.so.6
#2  0x00000000007cb84e in errfinish (dummy=0) at elog.c:523
#3  0x00000000007cd951 in elog_finish (elevel=22, fmt=0x95cdf0 "invalid
memory alloc request size %lu") at elog.c:1202
#4  0x00000000007f115c in MemoryContextAlloc (context=0x17b581d0,
size=18446744073709551613) at mcxt.c:516
#5  0x0000000000771a46 in text_to_cstring (t=0x17b23608) at varlena.c:139
#6  0x0000000000770747 in varcharout (fcinfo=0x7fffd44854e0) at
varchar.c:515
#7  0x00000000007d3ea8 in FunctionCall1Coll (flinfo=0x17b7b5c0,
collation=0, arg1=397555208) at fmgr.c:1293
#8  0x00000000007d5562 in OutputFunctionCall (flinfo=0x17b7b5c0,
val=397555208) at fmgr.c:1946
#9  0x000000000045a4a1 in printtup (slot=0x17d27b80, self=0x17a9cf00) at
printtup.c:338
#10 0x00000000005bab14 in ExecutePlan (estate=0x17d27a70,
planstate=0x17d27c10, operation=CMD_SELECT, sendTuples=1 '\001',
    numberTuples=0, direction=ForwardScanDirection, dest=0x17a9cf00) at
execMain.c:1464
#11 0x00000000005b9205 in standard_ExecutorRun (queryDesc=0x17b87918,
direction=ForwardScanDirection, count=0) at execMain.c:313
#12 0x00000000005b9100 in ExecutorRun (queryDesc=0x17b87918,
direction=ForwardScanDirection, count=0) at execMain.c:261
#13 0x00000000006d3b88 in PortalRunSelect (portal=0x17ae2c30, forward=1
'\001', count=0, dest=0x17a9cf00) at pquery.c:943
#14 0x00000000006d3849 in PortalRun (portal=0x17ae2c30,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x17a9cf00,
    altdest=0x17a9cf00, completionTag=0x7fffd4485cd0 "") at pquery.c:787
#15 0x00000000006cf62d in exec_execute_message (portal_name=0x17a9caf0 "",
max_rows=9223372036854775807) at postgres.c:1963
#16 0x00000000006d1e81 in PostgresMain (argc=2, argv=0x179bdaa8,
username=0x179bd8c0 "datamover") at postgres.c:3983
#17 0x0000000000686256 in BackendRun (port=0x179e2350) at postmaster.c:3601
#18 0x000000000068593f in BackendStartup (port=0x179e2350) at
postmaster.c:3286
#19 0x0000000000682af2 in ServerLoop () at postmaster.c:1455
#20 0x0000000000682298 in PostmasterMain (argc=3, argv=0x179bbc80) at
postmaster.c:1116
#21 0x00000000005fe007 in main (argc=3, argv=0x179bbc80) at main.c:199

Re: BUG #6200: standby bad memory allocations on SELECT

From
Tom Lane
Date:
Bridget Frey <bridget.frey@redfin.com> writes:
> The second error is an invalid memory alloc error that we're getting ~2
> dozen times per day in production.  The bt for this alloc error is below.

This trace is consistent with the idea that we're getting a corrupt
tuple out of a table, although it doesn't entirely preclude the
possibility that the corrupt value is manufactured inside the backend.
To get much further you're going to need to look at the specific query
being executed each time this happens, and see if you can detect any
pattern.  Now that you've got debug symbols straightened out, the
gdb command "p debug_query_string" should accomplish this.  (If that
does not produce anything that looks like one of your application's
SQL commands, we'll need to try harder to extract the info.)  You could
probably hack the elog(PANIC) to log that before dying, too, if you'd
rather not manually gdb each core dump.

            regards, tom lane

Re: BUG #6200: standby bad memory allocations on SELECT

From
Bridget Frey
Date:
Hi Tom,
Thanks for the reply, we appreciate you time on this.  The alloc error
queries all seem to be selects from a btree primary index.   I gave an
example in my initial post from the logins table.  Usually for us it
is logins but sometimes we have seen it on a few other tables, and
it's always a btree primary key index, very simple type of select.
The queries have been showing up in the logs which is how we know, but
we could also confirm in the core dump.  If the problem is data
corruption, it is transient.  We replay the same queries and get no
errors.  We also have jobs that run that basically do the same series
of selects every day or hour etc. but it is totally random which ones
cause an error.  E.g. If it is corruption it somehow magically fixes
itself.  Also we still have not seen any issues on the master, this
seems to be a problem only on hot standby slaves (we have three
slaves).  The OP, incidentally, reported the same thing - issue is
only on hot standby slaves, it is transient, and it happens on a
select from a btree primary index.

This also does not seem to be load related.  It often happens under
periods of light load for us.

Please let us know if you have any other thoughts on what we should look at...

Sent from my iPhone

On Jan 30, 2012, at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Bridget Frey <bridget.frey@redfin.com> writes:
>> The second error is an invalid memory alloc error that we're getting ~2
>> dozen times per day in production.  The bt for this alloc error is below.
>
> This trace is consistent with the idea that we're getting a corrupt
> tuple out of a table, although it doesn't entirely preclude the
> possibility that the corrupt value is manufactured inside the backend.
> To get much further you're going to need to look at the specific query
> being executed each time this happens, and see if you can detect any
> pattern.  Now that you've got debug symbols straightened out, the
> gdb command "p debug_query_string" should accomplish this.  (If that
> does not produce anything that looks like one of your application's
> SQL commands, we'll need to try harder to extract the info.)  You could
> probably hack the elog(PANIC) to log that before dying, too, if you'd
> rather not manually gdb each core dump.
>
>            regards, tom lane

Re: BUG #6200: standby bad memory allocations on SELECT

From
Tom Lane
Date:
Bridget Frey <bridget.frey@redfin.com> writes:
> Thanks for the reply, we appreciate you time on this.  The alloc error
> queries all seem to be selects from a btree primary index.   I gave an
> example in my initial post from the logins table.  Usually for us it
> is logins but sometimes we have seen it on a few other tables, and
> it's always a btree primary key index, very simple type of select.

Hm.  The stack trace is definitive that it's finding the bad data in a
tuple that it's trying to print to the client, not in an index.
That tuple might've been straight from disk, or it could have been
constructed inside the backend ... but if it's a simple SELECT FROM
single-table WHERE index-condition then the tuple should be raw data
found in a shared buffer.

> The queries have been showing up in the logs which is how we know, but
> we could also confirm in the core dump.  If the problem is data
> corruption, it is transient.  We replay the same queries and get no
> errors.

The idea that comes to mind is that somehow btree index updates are
reaching the standby in advance of the heap updates they refer to.
But how could that be?  And even more to the point, if we did follow
a bogus TID pointer from an index, how come it's failing there?  You'd
expect it to usually notice such a problem much earlier, while examining
the heap tuple header.  (Invalid xmin complaints are the typical symptom
from that, since the xmin is one of the first fields we look at that
can be sanity-checked to any extent.)

Still baffled here.

            regards, tom lane

Re: BUG #6200: standby bad memory allocations on SELECT

From
Tom Lane
Date:
I wrote:
> Hm.  The stack trace is definitive that it's finding the bad data in a
> tuple that it's trying to print to the client, not in an index.

BTW, after a bit more reflection it occurs to me that it's not so much
that the data is necessarily *bad*, as that it seemingly doesn't match
the tuple descriptor that the backend's trying to interpret it with.
(In particular, the reported symptom would be consistent with finding
a small integer constant at a place where the descriptor expects to find
a variable-length field.)  So that opens up a different line of thought
about how those could get out of sync on a standby.  Are you in the
habit of issuing ALTER TABLE commands to add/delete/change columns on
these tables?  In fact, is there any DDL whatsoever going on around the
time these failures happen?

            regards, tom lane

Re: BUG #6200: standby bad memory allocations on SELECT

From
Bridget Frey
Date:
We have no DDL whatsoever in the code.  We do update rows in the
logins table frequently, but we basically have a policy of only doing
DDL changes during scheduled upgrades when we bring the site down.  We
have been discussing this issue a lot and we really haven't come up
with anything that would be considered unusual here.  The tables
experiencing issues have maybe 1M to 200M rows, we do updates and
selects frequently, they have standard btree primary key indexes, and
the failing query always seems to be a select for a single row based
on a primary key lookup.  All of these code paths worked flawlessly
prior to our 9.1 upgrade (we had been using skytools replication).
And we see no problems on the master despite similar workloads there.
It is definitely puzzling, and we are not too sure what to look into
next...

Sent from my iPhone

On Jan 30, 2012, at 9:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I wrote:
>> Hm.  The stack trace is definitive that it's finding the bad data in a
>> tuple that it's trying to print to the client, not in an index.
>
> BTW, after a bit more reflection it occurs to me that it's not so much
> that the data is necessarily *bad*, as that it seemingly doesn't match
> the tuple descriptor that the backend's trying to interpret it with.
> (In particular, the reported symptom would be consistent with finding
> a small integer constant at a place where the descriptor expects to find
> a variable-length field.)  So that opens up a different line of thought
> about how those could get out of sync on a standby.  Are you in the
> habit of issuing ALTER TABLE commands to add/delete/change columns on
> these tables?  In fact, is there any DDL whatsoever going on around the
> time these failures happen?
>
>            regards, tom lane

Re: BUG #6200: standby bad memory allocations on SELECT

From
Alvaro Herrera
Date:
Excerpts from Bridget Frey's message of lun ene 30 18:59:08 -0300 2012:

> Anyway, here goes...

Maybe a "bt full" could give more insight into what's going on ...

> #0  0x0000003a83e30265 in raise () from /lib64/libc.so.6
> #1  0x0000003a83e31d10 in abort () from /lib64/libc.so.6
> #2  0x00000000007cb84e in errfinish (dummy=3D0) at elog.c:523
> #3  0x00000000007cd951 in elog_finish (elevel=3D22, fmt=3D0x95cdf0 "inval=
id
> memory alloc request size %lu") at elog.c:1202
> #4  0x00000000007f115c in MemoryContextAlloc (context=3D0x17b581d0,
> size=3D18446744073709551613) at mcxt.c:516
> #5  0x0000000000771a46 in text_to_cstring (t=3D0x17b23608) at varlena.c:1=
39
> #6  0x0000000000770747 in varcharout (fcinfo=3D0x7fffd44854e0) at
> varchar.c:515

--=20
=C3=81lvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #6200: standby bad memory allocations on SELECT

From
Robert Haas
Date:
On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Hm. =A0The stack trace is definitive that it's finding the bad data in a
>> tuple that it's trying to print to the client, not in an index.
>
> BTW, after a bit more reflection it occurs to me that it's not so much
> that the data is necessarily *bad*, as that it seemingly doesn't match
> the tuple descriptor that the backend's trying to interpret it with.

Hmm.  Could this be caused by the recovery process failing to obtain a
sufficiently strong lock on a buffer before replaying some WAL record?
 For example, getting only an exclusive content lock where a cleanup
lock is needed could presumably cause something like this to happen -
it would explain the transient nature of the errors as well as the
fact that they only seem to occur during Hot Standby operation.  On
the other hand, it's a little hard to believe we would have missed
something that obvious; there aren't that many things that need a
cleanup lock on a heap page.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6200: standby bad memory allocations on SELECT

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, after a bit more reflection it occurs to me that it's not so much
>> that the data is necessarily *bad*, as that it seemingly doesn't match
>> the tuple descriptor that the backend's trying to interpret it with.

> Hmm.  Could this be caused by the recovery process failing to obtain a
> sufficiently strong lock on a buffer before replaying some WAL record?

Well, I was kinda speculating that inadequate locking could result in
use of a stale (or too-new?) tuple descriptor, and that would be as good
a candidate as any if the basic theory were right.  But Bridget says
they are not doing any DDL, so it's hard to see how there'd be any tuple
descriptor mismatch at all.  Still baffled ...

            regards, tom lane

Re: BUG #6200: standby bad memory allocations on SELECT

From
Bridget Frey
Date:
So here's a better stack trace for the segfault issue (again, just to
summarize, since this is a long thread, we're seeing two issues: 1) alloc
errors that do not crash the DB (although we modified postgres to panic
when this happens in our test environment, and posted a stack earlier) 2) a
postgres segfault that happens once every couple of days on our slaves.
We're still not sure if these are the same issue or not.  This stack is not
perfect because it still has some things optimized out (this came from our
production database), but it's much more detailed than the last one we
posted for a segfault... hope this helps get closer to an answer on
this...  I'd also be interested in knowing if the postgres experts thing
these two symptoms are likely related... or totally separate issues...

Thanks!
-B

#0  0x0000000000455dc1 in slot_deform_tuple (slot=0x53cfc20, natts=70) at
heaptuple.c:1090
1090     off = att_align_pointer(off, thisatt->attalign, -1,
(gdb) bt
#0  0x0000000000455dc1 in slot_deform_tuple (slot=0x53cfc20, natts=70) at
heaptuple.c:1090
#1  0x0000000000455fbd in slot_getallattrs (slot=0x53cfc20) at
heaptuple.c:1253
#2  0x0000000000458ac7 in printtup (slot=0x53cfc20, self=0x534f1e0) at
printtup.c:300
#3  0x000000000055bd69 in ExecutePlan (queryDesc=0x5515978,
direction=<value optimized out>, count=0)
    at execMain.c:1464
#4  standard_ExecutorRun (queryDesc=0x5515978, direction=<value optimized
out>, count=0) at execMain.c:313
#5  0x0000000000623594 in PortalRunSelect (portal=0x5394f10, forward=<value
optimized out>, count=0,
    dest=0x534f1e0) at pquery.c:943
#6  0x0000000000624ae0 in PortalRun (portal=0x5394f10,
count=9223372036854775807, isTopLevel=1 '\001',
    dest=0x534f1e0, altdest=0x534f1e0, completionTag=0x7fff014e0640 "") at
pquery.c:787
#7  0x00000000006220f2 in exec_execute_message (argc=<value optimized out>,
argv=<value optimized out>,
    username=<value optimized out>) at postgres.c:1963
#8  PostgresMain (argc=<value optimized out>, argv=<value optimized out>,
username=<value optimized out>)
    at postgres.c:3983
#9  0x00000000005e6ba4 in ServerLoop () at postmaster.c:3601
#10 0x00000000005e791c in PostmasterMain (argc=5, argv=0x524cab0) at
postmaster.c:1116
#11 0x000000000058b9ae in in (argc=5, argv=<value optimized out>) at
main.c:199

Re: BUG #6200: standby bad memory allocations on SELECT

From
Robert Haas
Date:
On Tue, Jan 31, 2012 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jan 31, 2012 at 12:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> BTW, after a bit more reflection it occurs to me that it's not so much
>>> that the data is necessarily *bad*, as that it seemingly doesn't match
>>> the tuple descriptor that the backend's trying to interpret it with.
>
>> Hmm. =A0Could this be caused by the recovery process failing to obtain a
>> sufficiently strong lock on a buffer before replaying some WAL record?
>
> Well, I was kinda speculating that inadequate locking could result in
> use of a stale (or too-new?) tuple descriptor, and that would be as good
> a candidate as any if the basic theory were right. =A0But Bridget says
> they are not doing any DDL, so it's hard to see how there'd be any tuple
> descriptor mismatch at all. =A0Still baffled ...

No, I wasn't thinking about a tuple descriptor mismatch.  I was
imagining that the page contents themselves might be in flux while
we're trying to read from it.  Off the top of my head I don't see how
that can happen, but it would be awfully interesting to be able to see
which WAL record last touched the relevant heap page, and how long
before the error that happened.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6200: standby bad memory allocations on SELECT

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> No, I wasn't thinking about a tuple descriptor mismatch.  I was
> imagining that the page contents themselves might be in flux while
> we're trying to read from it.

Oh, gotcha.  Yes, that's a horribly plausible idea.  All it'd take is
one WAL replay routine that hasn't been upgraded to acquire sufficient
buffer locks.  Pre-hot-standby, there was no reason for them to be
careful about locking.

On the other hand, if that were the cause, you'd expect the symptoms
to be a bit more variable...

            regards, tom lane

Re: BUG #6200: standby bad memory allocations on SELECT

From
Robert Haas
Date:
On Wed, Feb 1, 2012 at 11:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> No, I wasn't thinking about a tuple descriptor mismatch. =A0I was
>> imagining that the page contents themselves might be in flux while
>> we're trying to read from it.
>
> Oh, gotcha. =A0Yes, that's a horribly plausible idea. =A0All it'd take is
> one WAL replay routine that hasn't been upgraded to acquire sufficient
> buffer locks. =A0Pre-hot-standby, there was no reason for them to be
> careful about locking.
>
> On the other hand, if that were the cause, you'd expect the symptoms
> to be a bit more variable...

Well, OP has two: crash, and invalid memory allocation.  Both share
the common thread that they happen while trying to decode a tuple.

It would be nice to get a dump of what PostgreSQL thought the entire
block looked like at the time the crash happened.  That information is
presumably already in the core dump, but I'm not sure if there's a
nice way to extract it using gdb.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6200: standby bad memory allocations on SELECT

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
>>> No, I wasn't thinking about a tuple descriptor mismatch.  I was
>>> imagining that the page contents themselves might be in flux while
>>> we're trying to read from it.

> It would be nice to get a dump of what PostgreSQL thought the entire
> block looked like at the time the crash happened.  That information is
> presumably already in the core dump, but I'm not sure if there's a
> nice way to extract it using gdb.

It probably would be possible to get the page out of the dump, but
I'd be really surprised if that proved much.  By the time the
crash-dump-making code gets around to examining the shared memory, the
other process that's hypothetically changing the page will have done its
work and moved on.  A crash in process X doesn't freeze execution in
process Y, at least not in any Unixoid system I've ever heard of.

            regards, tom lane

Re: BUG #6200: standby bad memory allocations on SELECT

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of mi=C3=A9 feb 01 18:06:27 -0300 2012:
> Robert Haas <robertmhaas@gmail.com> writes:
> >>> No, I wasn't thinking about a tuple descriptor mismatch. I was
> >>> imagining that the page contents themselves might be in flux while
> >>> we're trying to read from it.
>=20
> > It would be nice to get a dump of what PostgreSQL thought the entire
> > block looked like at the time the crash happened.  That information is
> > presumably already in the core dump, but I'm not sure if there's a
> > nice way to extract it using gdb.
>=20
> It probably would be possible to get the page out of the dump, but
> I'd be really surprised if that proved much.  By the time the
> crash-dump-making code gets around to examining the shared memory, the
> other process that's hypothetically changing the page will have done its
> work and moved on.  A crash in process X doesn't freeze execution in
> process Y, at least not in any Unixoid system I've ever heard of.

Maybe you can do something like send SIGSTOP to every other backend,
then attach to them and find which one was touching the same buffer,
then peek at what it was doing.

--=20
=C3=81lvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #6200: standby bad memory allocations on SELECT

From
Duncan Rance
Date:
I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the l=
ast reproduction of the problem I saw this:

Client 2 aborted in state 0: ERROR:  invalid memory alloc request size 1844=
6744073709551613

So like Tom said, these two issues could well be related. I just wanted to =
mention it here in this thread, FYI.

Re: BUG #6200: standby bad memory allocations on SELECT

From
Duncan Rance
Date:
I recently raised "BUG #6425: Bus error in slot_deform_tuple". During the l=
ast reproduction of the problem I saw this:

Client 2 aborted in state 0: ERROR:  invalid memory alloc request size 1844=
6744073709551613

So like Tom said, these two issues could well be related. I just wanted to =
mention it here in this thread, FYI.