Thread: 8.2.4 signal 11 with large transaction

8.2.4 signal 11 with large transaction

From
Bill Moran
Date:
The attached PHP script is a derived test case based on an actual problem
we've been seeing in our application.

The result of this script is a crashed (sig 11) backend on pg 8.2.4.

I've now reproduced this on two different systems, a large server with
1G of shared_buffers and many other performance tunings, and my
workstation, which has a pretty much default install of PG.  (The
problem also occurred on 8.2.0, which led to an upgrade but no
resolution)

On 8.1, the query fails but instead of the back end crashing, I get
a log message "[warning] postgres[49484]: [3-1] ERROR:  out of memory"

Oddly, the query succeeds if it's fed into psql.

I'm now full of mystery and wonder.  It would appear as if the
underlying problem has something to do with PHP, but why should this
cause a backend process to crash?

<?php

ini_set("memory_limit", "64M");

$table_name = "some_table1";
$create_table = "
CREATE TABLE $table_name (
  " . $table_name . "_id bigserial PRIMARY KEY,
  field1 varchar(50),
  field2 varchar(50),
  field3 varchar(50),
  field4 varchar(10),
  field5 date,
  field6 varchar(32),
  field7 varchar(100),
  field8 varchar(50),
  field9 date,
  field10 date,
  field11 varchar(50),
  field12 date,
  field13 varchar(100),
  field14 varchar(100),
  field15 varchar(100),
  field16 varchar(100),
  field17 varchar(100),
  field18 varchar(100),
  field19 varchar(50),
  field20 varchar(50),
  field21 varchar(12),
  field22 varchar(100),
  field23 varchar(50),
  field24 varchar(10),
  field25 varchar(50),
  field26 varchar(50),
  field27 varchar(50),
  field28 varchar(10),
  field29 varchar(200)
);
";

$data = array(
  'thomas',
  'norman',
  'anderson',
  '',
  '03/05/1965',
  '1234 road rd',
  '',
  'ST',
  '05/01/2004',
  '05/31/2007',
  '12345',
  '07/01/2009',
  'In 1972, a crack commando unit was sent to prison by a military court for a ',
  'crime they didnt commit. They promptly escaped from a maximum security ',
  'stockade to the Los Angeles underground. Today, still wanted by the ',
  'government, they survive as soldiers of fortune. If you have a problem, if ',
  'no-one else can help, and if you can find them, maybe you can hire the ',
  'A-Team.',
  'In 1972, a crack commando unit was sent',
  'crime they didnt commit. They promptly escaped',
  'laz0r',
  'stockade to the Los Angeles underground. Today',
  'government, they survive as soldiers of fortune',
  'Y',
  'no-one else can help, and if you can find them',
  'A-Team.',
  'maybe you can hire',
  'A-Team',
  'Today, still wanted by the
  government, they survive as soldiers of fortune. If you have a problem, if
  no-one else can help, and if you can find them, maybe you can hire the
  A-Team.'
);

$conn_string = "host=/tmp dbname=test user=pgsql";
$conn = pg_connect($conn_string);

pg_query($conn, "DROP TABLE $table_name");
pg_query($conn, $create_table);

$insert_prefix = "INSERT INTO $table_name ( ";
for($i=1; $i<30; $i++) {
  $insert_prefix .= "field" . $i;
  if ( $i < 29 ) {
    $insert_prefix .= ",";
  }
}
$insert_prefix .= " ) VALUES ( ";
$insert_suffix = " );\n";

$sql = "BEGIN WORK;\n";
for($i=0; $i<=30000; $i++) {
  if ( $i % 1000 == 0 ) {
    echo $i;
  }
  if ( $i % 200 == 0 ) {
    echo ".";
  }
  $sql .= $insert_prefix;
  for($j=0; $j<count($data); $j++) {
    if ( strlen($data[$j]) > 0 ) {
      $sql .= "'" . $data[$j] . "'";
    }
    else {
      $sql .= "NULL";
    }
    if ( $j < count($data) - 1 ) {
      $sql .= ",";
    }
  }
  $sql .= $insert_suffix;
}
$sql .= "COMMIT WORK;\n";

echo "\nQuerying..\n";

pg_query($sql);

echo "\n";

?>


--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: 8.2.4 signal 11 with large transaction

From
Tom Lane
Date:
Bill Moran <wmoran@collaborativefusion.com> writes:
> I'm now full of mystery and wonder.  It would appear as if the
> underlying problem has something to do with PHP, but why should this
> cause a backend process to crash?

I'd bet on PHP submitting the query via extended query protocol
(PQexecParams or equivalent) instead of plain ol PQexec which is what
psql uses.  I don't speak PHP or have it installed here, so this example
is hard for me to investigate.  Can someone make a reproducer that uses
PQexecParams?

            regards, tom lane

Re: 8.2.4 signal 11 with large transaction

From
Jan de Visser
Date:
On Friday 20 July 2007 12:51:47 Tom Lane wrote:
> Can someone make a reproducer that uses
> PQexecParams?

Does JDBC apply?

jan

--
--------------------------------------------------------------
Jan de Visser                     jdevisser@digitalfairway.com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------

Re: 8.2.4 signal 11 with large transaction

From
Bill Moran
Date:
In response to Tom Lane <tgl@sss.pgh.pa.us>:

> Bill Moran <wmoran@collaborativefusion.com> writes:
> > I'm now full of mystery and wonder.  It would appear as if the
> > underlying problem has something to do with PHP, but why should this
> > cause a backend process to crash?
>
> I'd bet on PHP submitting the query via extended query protocol
> (PQexecParams or equivalent) instead of plain ol PQexec which is what
> psql uses.

Doesn't appear that way.  The PHP source is somewhat cryptic, but I
don't seem much ambiguity here:

pgsql_result = PQexec(pgsql, Z_STRVAL_PP(query));

There're no conditional blocks around that, so it's the only possible
choice when pg_query() gets called in a PHP script.  PHP exposes a
seperate pg_query_params() that wraps PQexecParams().

> I don't speak PHP or have it installed here, so this example
> is hard for me to investigate.  Can someone make a reproducer that uses
> PQexecParams?

Is there any way that this (or something similar) could still apply?

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: 8.2.4 signal 11 with large transaction

From
Tom Lane
Date:
Bill Moran <wmoran@collaborativefusion.com> writes:
> In response to Tom Lane <tgl@sss.pgh.pa.us>:
>> I'd bet on PHP submitting the query via extended query protocol
>> (PQexecParams or equivalent) instead of plain ol PQexec which is what
>> psql uses.

> Doesn't appear that way.

OK, it seemed like a good first guess.  Can you get a stack trace from
the crashed backend?

            regards, tom lane

Re: 8.2.4 signal 11 with large transaction

From
"Scott Marlowe"
Date:
On 7/20/07, Bill Moran <wmoran@collaborativefusion.com> wrote:
> In response to Tom Lane <tgl@sss.pgh.pa.us>:
>
> > Bill Moran <wmoran@collaborativefusion.com> writes:
> > > I'm now full of mystery and wonder.  It would appear as if the
> > > underlying problem has something to do with PHP, but why should this
> > > cause a backend process to crash?
> >
> > I'd bet on PHP submitting the query via extended query protocol
> > (PQexecParams or equivalent) instead of plain ol PQexec which is what
> > psql uses.
>
> Doesn't appear that way.  The PHP source is somewhat cryptic, but I
> don't seem much ambiguity here:
>
> pgsql_result = PQexec(pgsql, Z_STRVAL_PP(query));
>
> There're no conditional blocks around that, so it's the only possible
> choice when pg_query() gets called in a PHP script.  PHP exposes a
> seperate pg_query_params() that wraps PQexecParams().
>
> > I don't speak PHP or have it installed here, so this example
> > is hard for me to investigate.  Can someone make a reproducer that uses
> > PQexecParams?
>
> Is there any way that this (or something similar) could still apply?

I just ran your script, and only changed the connect string to reflect
my environment.

It ran smoothly against my workstations 8.1.8 pgsql install and
against my reporting server's 8.2.4 installation, inserting 30001 rows
in each.

I'm not familiar with the host=/tmp bit in the connect string, is that
an explicit declaration of using unix local sockets and the directory
to find it?  Does it work if you go to tcp/ip sockets and use a
hostname etc... in your pg_connect?

Solved? Re: 8.2.4 signal 11 with large transaction

From
Bill Moran
Date:
In response to "Scott Marlowe" <scott.marlowe@gmail.com>:

> On 7/20/07, Bill Moran <wmoran@collaborativefusion.com> wrote:
> > In response to Tom Lane <tgl@sss.pgh.pa.us>:
> >
> > > Bill Moran <wmoran@collaborativefusion.com> writes:
> > > > I'm now full of mystery and wonder.  It would appear as if the
> > > > underlying problem has something to do with PHP, but why should this
> > > > cause a backend process to crash?
> > >
> > > I'd bet on PHP submitting the query via extended query protocol
> > > (PQexecParams or equivalent) instead of plain ol PQexec which is what
> > > psql uses.
> >
> > Doesn't appear that way.  The PHP source is somewhat cryptic, but I
> > don't seem much ambiguity here:
> >
> > pgsql_result = PQexec(pgsql, Z_STRVAL_PP(query));
> >
> > There're no conditional blocks around that, so it's the only possible
> > choice when pg_query() gets called in a PHP script.  PHP exposes a
> > seperate pg_query_params() that wraps PQexecParams().
> >
> > > I don't speak PHP or have it installed here, so this example
> > > is hard for me to investigate.  Can someone make a reproducer that uses
> > > PQexecParams?
> >
> > Is there any way that this (or something similar) could still apply?
>
> I just ran your script, and only changed the connect string to reflect
> my environment.
>
> It ran smoothly against my workstations 8.1.8 pgsql install and
> against my reporting server's 8.2.4 installation, inserting 30001 rows
> in each.

Just great.  I love these "hard to pin down" issues ...

> I'm not familiar with the host=/tmp bit in the connect string, is that
> an explicit declaration of using unix local sockets and the directory
> to find it?

Yes.

> Does it work if you go to tcp/ip sockets and use a
> hostname etc... in your pg_connect?

Behaviour is the same.  We first experienced the problem in a development
environment where the PHP server is separate from the PG server, thus
we're using TCP sockets for that connection.

I'm starting to wonder if the OS could be sending the sig 11?

... time warp ...

Yup, that was it.  The OS was limiting the amount of memory a single
process could use via kern.maxdsiz (FreeBSD).  This was evident with
ulimit -d.

So, the fact remains that PG 8.1 returns an out of memory error when
it hits this, and PG 8.2 coredumps.  Is the 8.2 behaviour expected/
planned?  The out of memory error I would expect, the coredump I
would not.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Solved? Re: 8.2.4 signal 11 with large transaction

From
Bill Moran
Date:
In response to Bill Moran <wmoran@collaborativefusion.com>:

> In response to "Scott Marlowe" <scott.marlowe@gmail.com>:
[snip]
> I'm starting to wonder if the OS could be sending the sig 11?
>
> ... time warp ...
>
> Yup, that was it.  The OS was limiting the amount of memory a single
> process could use via kern.maxdsiz (FreeBSD).  This was evident with
> ulimit -d.
>
> So, the fact remains that PG 8.1 returns an out of memory error when
> it hits this, and PG 8.2 coredumps.  Is the 8.2 behaviour expected/
> planned?  The out of memory error I would expect, the coredump I
> would not.

It just occurred to me that there's another wildcard in this one.
The 8.1 system I tested was on FreeBSD 5.5, while both 8.2 systems
were running on FreeBSD 6.2.  I wonder if FreeBSD has changed
which signal gets sent on memory exhaustion?

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: 8.2.4 signal 11 with large transaction

From
Tom Lane
Date:
Bill Moran <wmoran@collaborativefusion.com> writes:
> Oddly, the query succeeds if it's fed into psql.

> I'm now full of mystery and wonder.  It would appear as if the
> underlying problem has something to do with PHP, but why should this
> cause a backend process to crash?

Ah, I see it.  Your PHP script is sending all 30000 INSERT commands
to the backend *in a single PQexec*, ie, one 37MB command string.
psql won't do that, it splits the input at semicolons.

Unsurprisingly, this runs the backend out of memory.  (It's not the
command string that's the problem, so much as the 30000 parse and plan
trees...)

Unfortunately, in trying to prepare the error message, it tries to
attach the command text as the STATEMENT field of the log message.
All 37MB worth.  And of course *that* gets an out-of-memory error.
Presto, infinite recursion, broken only by stack overflow (= SIGSEGV).

It looks like 8.1 and older are also vulnerable to this, it's just that
they don't try to log error statement strings at the default logging
level, whereas 8.2 does.  If you cranked up log_min_error_statement
I think they'd fail too.

I guess what we need to do is hack the emergency-recovery path for
error-during-error-processing such that it will prevent trying to print
a very long debug_query_string.  Maybe we should just not try to print
the command at all in this case, or maybe there's some intermediate
possibility like only printing the first 1K or so.  Thoughts?

            regards, tom lane

Re: Solved? Re: 8.2.4 signal 11 with large transaction

From
Andrew - Supernews
Date:
On 2007-07-20, Bill Moran <wmoran@collaborativefusion.com> wrote:
> It just occurred to me that there's another wildcard in this one.
> The 8.1 system I tested was on FreeBSD 5.5, while both 8.2 systems
> were running on FreeBSD 6.2.  I wonder if FreeBSD has changed
> which signal gets sent on memory exhaustion?

Neither 5.x nor 6.x send any sort of signal simply because maxdsize was
reached; the only effect of reaching maxdsize is that further memory
allocations fail.

I'd guess that the most likely cause for a sig11 under those circumstances
is a stack overflow resulting from recursive errors...

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Re: 8.2.4 signal 11 with large transaction

From
"Chris Hoover"
Date:
On 7/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I guess what we need to do is hack the emergency-recovery path for
error-during-error-processing such that it will prevent trying to print
a very long debug_query_string.  Maybe we should just not try to print
the command at all in this case, or maybe there's some intermediate
possibility like only printing the first 1K or so.  Thoughts?

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


I know my 2 cents are not worth that much, but as a DBA, I would really like for you to print at least some of the string causing the abend.  This would greatly assist in the tracing of the offending query.

Chris


Re: 8.2.4 signal 11 with large transaction

From
"Sibte Abbas"
Date:
On 7/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I guess what we need to do is hack the emergency-recovery path for
> error-during-error-processing such that it will prevent trying to print
> a very long debug_query_string.  Maybe we should just not try to print
> the command at all in this case, or maybe there's some intermediate
> possibility like only printing the first 1K or so.  Thoughts?
>
>                         regards, tom lane
>

I think printing the first 1K would make more sense.

If I understand you correctly, the code path which you are referring
to is the send_message_to_server_log() function in elog.c?

thanks,
--
Sibte Abbas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [HACKERS] 8.2.4 signal 11 with large transaction

From
Tom Lane
Date:
"Sibte Abbas" <sibtay@gmail.com> writes:
> I think printing the first 1K would make more sense.

> If I understand you correctly, the code path which you are referring
> to is the send_message_to_server_log() function in elog.c?

No, the place that has to change is where errstart() detects that we're
recursing.  We could possibly have it first try to make a shorter string
and only give up entirely if recursion happens again, but given that
this is such a corner case I don't think it's worth the complexity and
risk of further bugs.  I've made it just drop the statement at the same
time that it decides to give up on printing other context (which can
also be a source of out-of-memory problems btw).
http://archives.postgresql.org/pgsql-committers/2007-07/msg00215.php

            regards, tom lane

Re: [HACKERS] 8.2.4 signal 11 with large transaction

From
"Sibte Abbas"
Date:
On 7/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> No, the place that has to change is where errstart() detects that we're
> recursing.  We could possibly have it first try to make a shorter string
> and only give up entirely if recursion happens again, but given that
> this is such a corner case I don't think it's worth the complexity and
> risk of further bugs.  I've made it just drop the statement at the same
> time that it decides to give up on printing other context (which can
> also be a source of out-of-memory problems btw).
> http://archives.postgresql.org/pgsql-committers/2007-07/msg00215.php
>

Makes sense.

regards,
--
Sibte Abbas
EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] 8.2.4 signal 11 with large transaction

From
"Sibte Abbas"
Date:
On 7/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I guess what we need to do is hack the emergency-recovery path for
> error-during-error-processing such that it will prevent trying to print
> a very long debug_query_string.  Maybe we should just not try to print
> the command at all in this case, or maybe there's some intermediate
> possibility like only printing the first 1K or so.  Thoughts?
>
>                         regards, tom lane
>

I think printing the first 1K would make more sense.

If I understand you correctly, the code path which you are referring
to is the send_message_to_server_log() function in elog.c?

thanks,
--
Sibte Abbas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend