Thread: BUG #5622: Query failed: server closed the connection unexpectedly

BUG #5622: Query failed: server closed the connection unexpectedly

From
"Thue Janus Kristensen"
Date:
The following bug has been logged online:

Bug reference:      5622
Logged by:          Thue Janus Kristensen
Email address:      thuejk@gmail.com
PostgreSQL version: 8.4.4
Operating system:   Ubuntu 10.04 LTS i386
Description:        Query failed: server closed the connection unexpectedly
Details:

I have a 100% reproducible server crash with postgresql. This crash does not
happen on my 8.3 installation.

When the crash occurs I get
        pg_query(): Query failed: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

However, I have been unable to simplify the error condition :(. I tried
dumping 8000 lines of sql, but when I run them separately, I don't get a
crash.

The last lines before the crash are ($db->query is a prepared query,
$db->simple_query is a non-prepared query):
<?php
    $db->query("SAVEPOINT lala");
    $res = questions::move_question($t->e, $g1q1_id, $g2_id); //some queries
here
    $db->query("ROLLBACK TO SAVEPOINT lala");
    $db->query("RELEASE SAVEPOINT lala");
    //some queries here
    $res = aas::insert_in_group($t->e, $aa_id, $g2_id);
    $res = questions::move_question($t->e, $g1q1_id, $g2_id);
    //crash when line below is run!
    $db->simple_query("SET CONSTRAINTS ALL IMMEDIATE");
?>

/var/log/syslog:
Aug 17 23:08:26 thue-laptop kernel: [46188.626357] postgres[11461]: segfault
at 21723858 ip 00772286 sp bf8a4a80 error 4 in postgres[621000+446000]

/var/log/postgresql/postgresql-8.4-main.log
2010-08-17 23:08:26 CEST LOG:  server process (PID 11461) was terminated by
signal 11: Segmentation fault
2010-08-17 23:08:26 CEST LOG:  terminating any other active server
processes
2010-08-17 23:08:26 CEST WARNING:  terminating connection because of crash
of another server process
2010-08-17 23:08:26 CEST DETAIL:  The postmaster has commanded this server
process to roll back the current transaction and exit,\
 because another server process exited abnormally and possibly corrupted
shared memory.
2010-08-17 23:08:26 CEST HINT:  In a moment you should be able to reconnect
to the database and repeat your command.
2010-08-17 23:08:26 CEST LOG:  all server processes terminated;
reinitializing
2010-08-17 23:08:26 CEST LOG:  database system was interrupted; last known
up at 2010-08-17 23:04:02 CEST
2010-08-17 23:08:26 CEST LOG:  database system was not properly shut down;
automatic recovery in progress
2010-08-17 23:08:26 CEST LOG:  redo starts at 0/37500054

I realize that this is probably not enough to understand the bug, but I will
be happy to try to debug it further, if I am given a pointer as to how.

Re: BUG #5622: Query failed: server closed the connection unexpectedly

From
Robert Haas
Date:
On Tue, Aug 17, 2010 at 5:26 PM, Thue Janus Kristensen <thuejk@gmail.com> w=
rote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A05622
> Logged by: =A0 =A0 =A0 =A0 =A0Thue Janus Kristensen
> Email address: =A0 =A0 =A0thuejk@gmail.com
> PostgreSQL version: 8.4.4
> Operating system: =A0 Ubuntu 10.04 LTS i386
> Description: =A0 =A0 =A0 =A0Query failed: server closed the connection un=
expectedly
> Details:
>
> I have a 100% reproducible server crash with postgresql. This crash does =
not
> happen on my 8.3 installation.
>
> When the crash occurs I get
> =A0 =A0 =A0 =A0pg_query(): Query failed: server closed the connection une=
xpectedly
> =A0 =A0 =A0 =A0This probably means the server terminated abnormally
> =A0 =A0 =A0 =A0before or while processing the request.
>
> However, I have been unable to simplify the error condition :(. I tried
> dumping 8000 lines of sql, but when I run them separately, I don't get a
> crash.
>
> The last lines before the crash are ($db->query is a prepared query,
> $db->simple_query is a non-prepared query):
> <?php
> =A0 =A0$db->query("SAVEPOINT lala");
> =A0 =A0$res =3D questions::move_question($t->e, $g1q1_id, $g2_id); //some=
 queries
> here
> =A0 =A0$db->query("ROLLBACK TO SAVEPOINT lala");
> =A0 =A0$db->query("RELEASE SAVEPOINT lala");
> =A0 =A0//some queries here
> =A0 =A0$res =3D aas::insert_in_group($t->e, $aa_id, $g2_id);
> =A0 =A0$res =3D questions::move_question($t->e, $g1q1_id, $g2_id);
> =A0 =A0//crash when line below is run!
> =A0 =A0$db->simple_query("SET CONSTRAINTS ALL IMMEDIATE");
> ?>
>
> /var/log/syslog:
> Aug 17 23:08:26 thue-laptop kernel: [46188.626357] postgres[11461]: segfa=
ult
> at 21723858 ip 00772286 sp bf8a4a80 error 4 in postgres[621000+446000]
>
> /var/log/postgresql/postgresql-8.4-main.log
> 2010-08-17 23:08:26 CEST LOG: =A0server process (PID 11461) was terminate=
d by
> signal 11: Segmentation fault
> 2010-08-17 23:08:26 CEST LOG: =A0terminating any other active server
> processes
> 2010-08-17 23:08:26 CEST WARNING: =A0terminating connection because of cr=
ash
> of another server process
> 2010-08-17 23:08:26 CEST DETAIL: =A0The postmaster has commanded this ser=
ver
> process to roll back the current transaction and exit,\
> =A0because another server process exited abnormally and possibly corrupted
> shared memory.
> 2010-08-17 23:08:26 CEST HINT: =A0In a moment you should be able to recon=
nect
> to the database and repeat your command.
> 2010-08-17 23:08:26 CEST LOG: =A0all server processes terminated;
> reinitializing
> 2010-08-17 23:08:26 CEST LOG: =A0database system was interrupted; last kn=
own
> up at 2010-08-17 23:04:02 CEST
> 2010-08-17 23:08:26 CEST LOG: =A0database system was not properly shut do=
wn;
> automatic recovery in progress
> 2010-08-17 23:08:26 CEST LOG: =A0redo starts at 0/37500054
>
> I realize that this is probably not enough to understand the bug, but I w=
ill
> be happy to try to debug it further, if I am given a pointer as to how.

Well, obviously the best thing would be to isolate a reproducible test
case.  But maybe a good start would be to try to get a list of the
exact series of SQL statements that are being executed.  Perhaps you
could set log_min_duration_statement=3D0 and then find 'em in the logs.
If you do this with some trivial query in the places where you have
"some queries here", it doesn't crash.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: BUG #5622: Query failed: server closed the connection unexpectedly

From
Craig Ringer
Date:
On 18/08/10 05:26, Thue Janus Kristensen wrote:
> /var/log/syslog:
> Aug 17 23:08:26 thue-laptop kernel: [46188.626357] postgres[11461]: segfault
> at 21723858 ip 00772286 sp bf8a4a80 error 4 in postgres[621000+446000]

If you're game, it might also be useful to see what the backend was up
to when it crashed. There are some instructions on how to get a stack
trace on the wiki:

http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: BUG #5622: Query failed: server closed the connection unexpectedly

From
Robert Haas
Date:
On Tue, Aug 17, 2010 at 10:55 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 18/08/10 05:26, Thue Janus Kristensen wrote:
>> /var/log/syslog:
>> Aug 17 23:08:26 thue-laptop kernel: [46188.626357] postgres[11461]: segfault
>> at 21723858 ip 00772286 sp bf8a4a80 error 4 in postgres[621000+446000]
>
> If you're game, it might also be useful to see what the backend was up
> to when it crashed. There are some instructions on how to get a stack
> trace on the wiki:
>
> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

Yeah, that would be very helpful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: BUG #5622: Query failed: server closed the connection unexpectedly

From
Thue Janus Kristensen
Date:
I finally succeeded in creating a test case, after much experimentation. Running the attached sql crashes my postgresql server 100% of the time.

Regards, Thue

On Tue, Aug 17, 2010 at 11:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 17, 2010 at 5:26 PM, Thue Janus Kristensen <thuejk@gmail.com> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      5622
> Logged by:          Thue Janus Kristensen
> Email address:      thuejk@gmail.com
> PostgreSQL version: 8.4.4
> Operating system:   Ubuntu 10.04 LTS i386
> Description:        Query failed: server closed the connection unexpectedly
> Details:
>
> I have a 100% reproducible server crash with postgresql. This crash does not
> happen on my 8.3 installation.
>
> When the crash occurs I get
>        pg_query(): Query failed: server closed the connection unexpectedly
>        This probably means the server terminated abnormally
>        before or while processing the request.
>
> However, I have been unable to simplify the error condition :(. I tried
> dumping 8000 lines of sql, but when I run them separately, I don't get a
> crash.
>
> The last lines before the crash are ($db->query is a prepared query,
> $db->simple_query is a non-prepared query):
> <?php
>    $db->query("SAVEPOINT lala");
>    $res = questions::move_question($t->e, $g1q1_id, $g2_id); //some queries
> here
>    $db->query("ROLLBACK TO SAVEPOINT lala");
>    $db->query("RELEASE SAVEPOINT lala");
>    //some queries here
>    $res = aas::insert_in_group($t->e, $aa_id, $g2_id);
>    $res = questions::move_question($t->e, $g1q1_id, $g2_id);
>    //crash when line below is run!
>    $db->simple_query("SET CONSTRAINTS ALL IMMEDIATE");
> ?>
>
> /var/log/syslog:
> Aug 17 23:08:26 thue-laptop kernel: [46188.626357] postgres[11461]: segfault
> at 21723858 ip 00772286 sp bf8a4a80 error 4 in postgres[621000+446000]
>
> /var/log/postgresql/postgresql-8.4-main.log
> 2010-08-17 23:08:26 CEST LOG:  server process (PID 11461) was terminated by
> signal 11: Segmentation fault
> 2010-08-17 23:08:26 CEST LOG:  terminating any other active server
> processes
> 2010-08-17 23:08:26 CEST WARNING:  terminating connection because of crash
> of another server process
> 2010-08-17 23:08:26 CEST DETAIL:  The postmaster has commanded this server
> process to roll back the current transaction and exit,\
>  because another server process exited abnormally and possibly corrupted
> shared memory.
> 2010-08-17 23:08:26 CEST HINT:  In a moment you should be able to reconnect
> to the database and repeat your command.
> 2010-08-17 23:08:26 CEST LOG:  all server processes terminated;
> reinitializing
> 2010-08-17 23:08:26 CEST LOG:  database system was interrupted; last known
> up at 2010-08-17 23:04:02 CEST
> 2010-08-17 23:08:26 CEST LOG:  database system was not properly shut down;
> automatic recovery in progress
> 2010-08-17 23:08:26 CEST LOG:  redo starts at 0/37500054
>
> I realize that this is probably not enough to understand the bug, but I will
> be happy to try to debug it further, if I am given a pointer as to how.

Well, obviously the best thing would be to isolate a reproducible test
case.  But maybe a good start would be to try to get a list of the
exact series of SQL statements that are being executed.  Perhaps you
could set log_min_duration_statement=0 and then find 'em in the logs.
If you do this with some trivial query in the places where you have
"some queries here", it doesn't crash.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment

Re: BUG #5622: Query failed: server closed the connection unexpectedly

From
Tom Lane
Date:
Thue Janus Kristensen <thuejk@gmail.com> writes:
> I finally succeeded in creating a test case, after much experimentation.
> Running the attached sql crashes my postgresql server 100% of the time.

Cool, replicated here on CVS HEAD:

Program terminated with signal 10, Bus error.
#0  afterTriggerCheckState (evtshared=0x40163089) at trigger.c:2850
2850            if ((evtshared->ats_event & AFTER_TRIGGER_DEFERRABLE) == 0)
(gdb) bt
#0  afterTriggerCheckState (evtshared=0x40163089) at trigger.c:2850
#1  0x2f61f4 in afterTriggerMarkEvents (events=0x40163089, move_list=0x0,
    immediate_only=1 '\001') at trigger.c:3219
#2  0x2f76dc in AfterTriggerSetState (stmt=0x40064af0) at trigger.c:4149
#3  0x418250 in standard_ProcessUtility (parsetree=0x40064af0,
    queryString=0x400641a8 "SET CONSTRAINTS ALL IMMEDIATE;", params=0x0,
    isTopLevel=1 '\001', dest=0x40064cd8, completionTag=0x7b03b308 "")
    at utility.c:1226
#4  0x416a70 in ProcessUtility (parsetree=0x40064af0,
    queryString=0x400641a8 "SET CONSTRAINTS ALL IMMEDIATE;", params=0x0,
    isTopLevel=1 '\001', dest=0x40064cd8, completionTag=0x7b03b308 "")
    at utility.c:323
#5  0x415740 in PortalRunUtility (portal=0x400e1fa0, utilityStmt=0x40064af0,
    isTopLevel=1 '\001', dest=0x40064cd8, completionTag=0x7b03b308 "")
    at pquery.c:1191
#6  0x4158f0 in PortalRunMulti (portal=0x400e1fa0, isTopLevel=1 '\001',
    dest=0x40064cd8, altdest=0x40064cd8, completionTag=0x7b03b308 "")
    at pquery.c:1298
#7  0x414f64 in PortalRun (portal=0x400e1fa0, count=2147483647,
    isTopLevel=1 '\001', dest=0x40064cd8, altdest=0x40064cd8,
    completionTag=0x7b03b308 "") at pquery.c:822
#8  0x40e7c0 in exec_simple_query (
    query_string=0x400641a8 "SET CONSTRAINTS ALL IMMEDIATE;")

Looking at it ...

            regards, tom lane

Re: BUG #5622: Query failed: server closed the connection unexpectedly

From
Tom Lane
Date:
Thue Janus Kristensen <thuejk@gmail.com> writes:
> I finally succeeded in creating a test case, after much experimentation.
> Running the attached sql crashes my postgresql server 100% of the time.

Cute problem.  The attached fix cures it for me; would you see if it
solves your original case too?

            regards, tom lane

Index: src/backend/commands/trigger.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v
retrieving revision 1.248.2.2
diff -c -r1.248.2.2 trigger.c
*** src/backend/commands/trigger.c    24 Jan 2010 21:49:31 -0000    1.248.2.2
--- src/backend/commands/trigger.c    19 Aug 2010 15:14:06 -0000
***************
*** 2534,2539 ****
--- 2534,2540 ----
          else
              events->tail->next = chunk;
          events->tail = chunk;
+         /* events->tailfree is now out of sync, but we'll fix it below */
      }

      /*
***************
*** 2935,2940 ****
--- 2936,2950 ----
          {
              chunk->freeptr = CHUNK_DATA_START(chunk);
              chunk->endfree = chunk->endptr;
+
+             /*
+              * If it's last chunk, must sync event list's tailfree too.  Note
+              * that delete_ok must NOT be passed as true if there could be
+              * stacked AfterTriggerEventList values pointing at this event
+              * list, since we'd fail to fix their copies of tailfree.
+              */
+             if (chunk == events->tail)
+                 events->tailfree = chunk->freeptr;
          }
      }


Re: BUG #5622: Query failed: server closed the connection unexpectedly

From
Thue Janus Kristensen
Date:
On Thu, Aug 19, 2010 at 5:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Thue Janus Kristensen <thuejk@gmail.com> writes:
> > I finally succeeded in creating a test case, after much experimentation.
> > Running the attached sql crashes my postgresql server 100% of the time.
>
> Cute problem.


Generating the test case was very "interesting" black box debugging. Now why
does it need at least 7 (and not 1-6) updates to that unrelated row in that
unrelated table to crash...? At least it was deterministic.


>  The attached fix cures it for me; would you see if it
> solves your original case too?
>

Yes, it fixes my original problem too, thanks :).

Regards, Thue