Thread: 8.2.4 signal 11 with large transaction
The attached PHP script is a derived test case based on an actual problem we've been seeing in our application. The result of this script is a crashed (sig 11) backend on pg 8.2.4. I've now reproduced this on two different systems, a large server with 1G of shared_buffers and many other performance tunings, and my workstation, which has a pretty much default install of PG. (The problem also occurred on 8.2.0, which led to an upgrade but no resolution) On 8.1, the query fails but instead of the back end crashing, I get a log message "[warning] postgres[49484]: [3-1] ERROR: out of memory" Oddly, the query succeeds if it's fed into psql. I'm now full of mystery and wonder. It would appear as if the underlying problem has something to do with PHP, but why should this cause a backend process to crash? <?php ini_set("memory_limit", "64M"); $table_name = "some_table1"; $create_table = " CREATE TABLE $table_name ( " . $table_name . "_id bigserial PRIMARY KEY, field1 varchar(50), field2 varchar(50), field3 varchar(50), field4 varchar(10), field5 date, field6 varchar(32), field7 varchar(100), field8 varchar(50), field9 date, field10 date, field11 varchar(50), field12 date, field13 varchar(100), field14 varchar(100), field15 varchar(100), field16 varchar(100), field17 varchar(100), field18 varchar(100), field19 varchar(50), field20 varchar(50), field21 varchar(12), field22 varchar(100), field23 varchar(50), field24 varchar(10), field25 varchar(50), field26 varchar(50), field27 varchar(50), field28 varchar(10), field29 varchar(200) ); "; $data = array( 'thomas', 'norman', 'anderson', '', '03/05/1965', '1234 road rd', '', 'ST', '05/01/2004', '05/31/2007', '12345', '07/01/2009', 'In 1972, a crack commando unit was sent to prison by a military court for a ', 'crime they didnt commit. They promptly escaped from a maximum security ', 'stockade to the Los Angeles underground. Today, still wanted by the ', 'government, they survive as soldiers of fortune. If you have a problem, if ', 'no-one else can help, and if you can find them, maybe you can hire the ', 'A-Team.', 'In 1972, a crack commando unit was sent', 'crime they didnt commit. They promptly escaped', 'laz0r', 'stockade to the Los Angeles underground. Today', 'government, they survive as soldiers of fortune', 'Y', 'no-one else can help, and if you can find them', 'A-Team.', 'maybe you can hire', 'A-Team', 'Today, still wanted by the government, they survive as soldiers of fortune. If you have a problem, if no-one else can help, and if you can find them, maybe you can hire the A-Team.' ); $conn_string = "host=/tmp dbname=test user=pgsql"; $conn = pg_connect($conn_string); pg_query($conn, "DROP TABLE $table_name"); pg_query($conn, $create_table); $insert_prefix = "INSERT INTO $table_name ( "; for($i=1; $i<30; $i++) { $insert_prefix .= "field" . $i; if ( $i < 29 ) { $insert_prefix .= ","; } } $insert_prefix .= " ) VALUES ( "; $insert_suffix = " );\n"; $sql = "BEGIN WORK;\n"; for($i=0; $i<=30000; $i++) { if ( $i % 1000 == 0 ) { echo $i; } if ( $i % 200 == 0 ) { echo "."; } $sql .= $insert_prefix; for($j=0; $j<count($data); $j++) { if ( strlen($data[$j]) > 0 ) { $sql .= "'" . $data[$j] . "'"; } else { $sql .= "NULL"; } if ( $j < count($data) - 1 ) { $sql .= ","; } } $sql .= $insert_suffix; } $sql .= "COMMIT WORK;\n"; echo "\nQuerying..\n"; pg_query($sql); echo "\n"; ?> -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Bill Moran <wmoran@collaborativefusion.com> writes: > I'm now full of mystery and wonder. It would appear as if the > underlying problem has something to do with PHP, but why should this > cause a backend process to crash? I'd bet on PHP submitting the query via extended query protocol (PQexecParams or equivalent) instead of plain ol PQexec which is what psql uses. I don't speak PHP or have it installed here, so this example is hard for me to investigate. Can someone make a reproducer that uses PQexecParams? regards, tom lane
On Friday 20 July 2007 12:51:47 Tom Lane wrote: > Can someone make a reproducer that uses > PQexecParams? Does JDBC apply? jan -- -------------------------------------------------------------- Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------
In response to Tom Lane <tgl@sss.pgh.pa.us>: > Bill Moran <wmoran@collaborativefusion.com> writes: > > I'm now full of mystery and wonder. It would appear as if the > > underlying problem has something to do with PHP, but why should this > > cause a backend process to crash? > > I'd bet on PHP submitting the query via extended query protocol > (PQexecParams or equivalent) instead of plain ol PQexec which is what > psql uses. Doesn't appear that way. The PHP source is somewhat cryptic, but I don't seem much ambiguity here: pgsql_result = PQexec(pgsql, Z_STRVAL_PP(query)); There're no conditional blocks around that, so it's the only possible choice when pg_query() gets called in a PHP script. PHP exposes a seperate pg_query_params() that wraps PQexecParams(). > I don't speak PHP or have it installed here, so this example > is hard for me to investigate. Can someone make a reproducer that uses > PQexecParams? Is there any way that this (or something similar) could still apply? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Bill Moran <wmoran@collaborativefusion.com> writes: > In response to Tom Lane <tgl@sss.pgh.pa.us>: >> I'd bet on PHP submitting the query via extended query protocol >> (PQexecParams or equivalent) instead of plain ol PQexec which is what >> psql uses. > Doesn't appear that way. OK, it seemed like a good first guess. Can you get a stack trace from the crashed backend? regards, tom lane
On 7/20/07, Bill Moran <wmoran@collaborativefusion.com> wrote: > In response to Tom Lane <tgl@sss.pgh.pa.us>: > > > Bill Moran <wmoran@collaborativefusion.com> writes: > > > I'm now full of mystery and wonder. It would appear as if the > > > underlying problem has something to do with PHP, but why should this > > > cause a backend process to crash? > > > > I'd bet on PHP submitting the query via extended query protocol > > (PQexecParams or equivalent) instead of plain ol PQexec which is what > > psql uses. > > Doesn't appear that way. The PHP source is somewhat cryptic, but I > don't seem much ambiguity here: > > pgsql_result = PQexec(pgsql, Z_STRVAL_PP(query)); > > There're no conditional blocks around that, so it's the only possible > choice when pg_query() gets called in a PHP script. PHP exposes a > seperate pg_query_params() that wraps PQexecParams(). > > > I don't speak PHP or have it installed here, so this example > > is hard for me to investigate. Can someone make a reproducer that uses > > PQexecParams? > > Is there any way that this (or something similar) could still apply? I just ran your script, and only changed the connect string to reflect my environment. It ran smoothly against my workstations 8.1.8 pgsql install and against my reporting server's 8.2.4 installation, inserting 30001 rows in each. I'm not familiar with the host=/tmp bit in the connect string, is that an explicit declaration of using unix local sockets and the directory to find it? Does it work if you go to tcp/ip sockets and use a hostname etc... in your pg_connect?
In response to "Scott Marlowe" <scott.marlowe@gmail.com>: > On 7/20/07, Bill Moran <wmoran@collaborativefusion.com> wrote: > > In response to Tom Lane <tgl@sss.pgh.pa.us>: > > > > > Bill Moran <wmoran@collaborativefusion.com> writes: > > > > I'm now full of mystery and wonder. It would appear as if the > > > > underlying problem has something to do with PHP, but why should this > > > > cause a backend process to crash? > > > > > > I'd bet on PHP submitting the query via extended query protocol > > > (PQexecParams or equivalent) instead of plain ol PQexec which is what > > > psql uses. > > > > Doesn't appear that way. The PHP source is somewhat cryptic, but I > > don't seem much ambiguity here: > > > > pgsql_result = PQexec(pgsql, Z_STRVAL_PP(query)); > > > > There're no conditional blocks around that, so it's the only possible > > choice when pg_query() gets called in a PHP script. PHP exposes a > > seperate pg_query_params() that wraps PQexecParams(). > > > > > I don't speak PHP or have it installed here, so this example > > > is hard for me to investigate. Can someone make a reproducer that uses > > > PQexecParams? > > > > Is there any way that this (or something similar) could still apply? > > I just ran your script, and only changed the connect string to reflect > my environment. > > It ran smoothly against my workstations 8.1.8 pgsql install and > against my reporting server's 8.2.4 installation, inserting 30001 rows > in each. Just great. I love these "hard to pin down" issues ... > I'm not familiar with the host=/tmp bit in the connect string, is that > an explicit declaration of using unix local sockets and the directory > to find it? Yes. > Does it work if you go to tcp/ip sockets and use a > hostname etc... in your pg_connect? Behaviour is the same. We first experienced the problem in a development environment where the PHP server is separate from the PG server, thus we're using TCP sockets for that connection. I'm starting to wonder if the OS could be sending the sig 11? ... time warp ... Yup, that was it. The OS was limiting the amount of memory a single process could use via kern.maxdsiz (FreeBSD). This was evident with ulimit -d. So, the fact remains that PG 8.1 returns an out of memory error when it hits this, and PG 8.2 coredumps. Is the 8.2 behaviour expected/ planned? The out of memory error I would expect, the coredump I would not. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
In response to Bill Moran <wmoran@collaborativefusion.com>: > In response to "Scott Marlowe" <scott.marlowe@gmail.com>: [snip] > I'm starting to wonder if the OS could be sending the sig 11? > > ... time warp ... > > Yup, that was it. The OS was limiting the amount of memory a single > process could use via kern.maxdsiz (FreeBSD). This was evident with > ulimit -d. > > So, the fact remains that PG 8.1 returns an out of memory error when > it hits this, and PG 8.2 coredumps. Is the 8.2 behaviour expected/ > planned? The out of memory error I would expect, the coredump I > would not. It just occurred to me that there's another wildcard in this one. The 8.1 system I tested was on FreeBSD 5.5, while both 8.2 systems were running on FreeBSD 6.2. I wonder if FreeBSD has changed which signal gets sent on memory exhaustion? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Bill Moran <wmoran@collaborativefusion.com> writes: > Oddly, the query succeeds if it's fed into psql. > I'm now full of mystery and wonder. It would appear as if the > underlying problem has something to do with PHP, but why should this > cause a backend process to crash? Ah, I see it. Your PHP script is sending all 30000 INSERT commands to the backend *in a single PQexec*, ie, one 37MB command string. psql won't do that, it splits the input at semicolons. Unsurprisingly, this runs the backend out of memory. (It's not the command string that's the problem, so much as the 30000 parse and plan trees...) Unfortunately, in trying to prepare the error message, it tries to attach the command text as the STATEMENT field of the log message. All 37MB worth. And of course *that* gets an out-of-memory error. Presto, infinite recursion, broken only by stack overflow (= SIGSEGV). It looks like 8.1 and older are also vulnerable to this, it's just that they don't try to log error statement strings at the default logging level, whereas 8.2 does. If you cranked up log_min_error_statement I think they'd fail too. I guess what we need to do is hack the emergency-recovery path for error-during-error-processing such that it will prevent trying to print a very long debug_query_string. Maybe we should just not try to print the command at all in this case, or maybe there's some intermediate possibility like only printing the first 1K or so. Thoughts? regards, tom lane
On 2007-07-20, Bill Moran <wmoran@collaborativefusion.com> wrote: > It just occurred to me that there's another wildcard in this one. > The 8.1 system I tested was on FreeBSD 5.5, while both 8.2 systems > were running on FreeBSD 6.2. I wonder if FreeBSD has changed > which signal gets sent on memory exhaustion? Neither 5.x nor 6.x send any sort of signal simply because maxdsize was reached; the only effect of reaching maxdsize is that further memory allocations fail. I'd guess that the most likely cause for a sig11 under those circumstances is a stack overflow resulting from recursive errors... -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
On 7/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I know my 2 cents are not worth that much, but as a DBA, I would really like for you to print at least some of the string causing the abend. This would greatly assist in the tracing of the offending query.
Chris
I guess what we need to do is hack the emergency-recovery path for
error-during-error-processing such that it will prevent trying to print
a very long debug_query_string. Maybe we should just not try to print
the command at all in this case, or maybe there's some intermediate
possibility like only printing the first 1K or so. Thoughts?
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
I know my 2 cents are not worth that much, but as a DBA, I would really like for you to print at least some of the string causing the abend. This would greatly assist in the tracing of the offending query.
Chris
On 7/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I guess what we need to do is hack the emergency-recovery path for > error-during-error-processing such that it will prevent trying to print > a very long debug_query_string. Maybe we should just not try to print > the command at all in this case, or maybe there's some intermediate > possibility like only printing the first 1K or so. Thoughts? > > regards, tom lane > I think printing the first 1K would make more sense. If I understand you correctly, the code path which you are referring to is the send_message_to_server_log() function in elog.c? thanks, -- Sibte Abbas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
"Sibte Abbas" <sibtay@gmail.com> writes: > I think printing the first 1K would make more sense. > If I understand you correctly, the code path which you are referring > to is the send_message_to_server_log() function in elog.c? No, the place that has to change is where errstart() detects that we're recursing. We could possibly have it first try to make a shorter string and only give up entirely if recursion happens again, but given that this is such a corner case I don't think it's worth the complexity and risk of further bugs. I've made it just drop the statement at the same time that it decides to give up on printing other context (which can also be a source of out-of-memory problems btw). http://archives.postgresql.org/pgsql-committers/2007-07/msg00215.php regards, tom lane
On 7/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > No, the place that has to change is where errstart() detects that we're > recursing. We could possibly have it first try to make a shorter string > and only give up entirely if recursion happens again, but given that > this is such a corner case I don't think it's worth the complexity and > risk of further bugs. I've made it just drop the statement at the same > time that it decides to give up on printing other context (which can > also be a source of out-of-memory problems btw). > http://archives.postgresql.org/pgsql-committers/2007-07/msg00215.php > Makes sense. regards, -- Sibte Abbas EnterpriseDB http://www.enterprisedb.com
On 7/20/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I guess what we need to do is hack the emergency-recovery path for > error-during-error-processing such that it will prevent trying to print > a very long debug_query_string. Maybe we should just not try to print > the command at all in this case, or maybe there's some intermediate > possibility like only printing the first 1K or so. Thoughts? > > regards, tom lane > I think printing the first 1K would make more sense. If I understand you correctly, the code path which you are referring to is the send_message_to_server_log() function in elog.c? thanks, -- Sibte Abbas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend