Thread: BUG #5622: Query failed: server closed the connection unexpectedly
BUG #5622: Query failed: server closed the connection unexpectedly
From
"Thue Janus Kristensen"
Date:
The following bug has been logged online: Bug reference: 5622 Logged by: Thue Janus Kristensen Email address: thuejk@gmail.com PostgreSQL version: 8.4.4 Operating system: Ubuntu 10.04 LTS i386 Description: Query failed: server closed the connection unexpectedly Details: I have a 100% reproducible server crash with postgresql. This crash does not happen on my 8.3 installation. When the crash occurs I get pg_query(): Query failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. However, I have been unable to simplify the error condition :(. I tried dumping 8000 lines of sql, but when I run them separately, I don't get a crash. The last lines before the crash are ($db->query is a prepared query, $db->simple_query is a non-prepared query): <?php $db->query("SAVEPOINT lala"); $res = questions::move_question($t->e, $g1q1_id, $g2_id); //some queries here $db->query("ROLLBACK TO SAVEPOINT lala"); $db->query("RELEASE SAVEPOINT lala"); //some queries here $res = aas::insert_in_group($t->e, $aa_id, $g2_id); $res = questions::move_question($t->e, $g1q1_id, $g2_id); //crash when line below is run! $db->simple_query("SET CONSTRAINTS ALL IMMEDIATE"); ?> /var/log/syslog: Aug 17 23:08:26 thue-laptop kernel: [46188.626357] postgres[11461]: segfault at 21723858 ip 00772286 sp bf8a4a80 error 4 in postgres[621000+446000] /var/log/postgresql/postgresql-8.4-main.log 2010-08-17 23:08:26 CEST LOG: server process (PID 11461) was terminated by signal 11: Segmentation fault 2010-08-17 23:08:26 CEST LOG: terminating any other active server processes 2010-08-17 23:08:26 CEST WARNING: terminating connection because of crash of another server process 2010-08-17 23:08:26 CEST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit,\ because another server process exited abnormally and possibly corrupted shared memory. 2010-08-17 23:08:26 CEST HINT: In a moment you should be able to reconnect to the database and repeat your command. 2010-08-17 23:08:26 CEST LOG: all server processes terminated; reinitializing 2010-08-17 23:08:26 CEST LOG: database system was interrupted; last known up at 2010-08-17 23:04:02 CEST 2010-08-17 23:08:26 CEST LOG: database system was not properly shut down; automatic recovery in progress 2010-08-17 23:08:26 CEST LOG: redo starts at 0/37500054 I realize that this is probably not enough to understand the bug, but I will be happy to try to debug it further, if I am given a pointer as to how.
On Tue, Aug 17, 2010 at 5:26 PM, Thue Janus Kristensen <thuejk@gmail.com> w= rote: > > The following bug has been logged online: > > Bug reference: =A0 =A0 =A05622 > Logged by: =A0 =A0 =A0 =A0 =A0Thue Janus Kristensen > Email address: =A0 =A0 =A0thuejk@gmail.com > PostgreSQL version: 8.4.4 > Operating system: =A0 Ubuntu 10.04 LTS i386 > Description: =A0 =A0 =A0 =A0Query failed: server closed the connection un= expectedly > Details: > > I have a 100% reproducible server crash with postgresql. This crash does = not > happen on my 8.3 installation. > > When the crash occurs I get > =A0 =A0 =A0 =A0pg_query(): Query failed: server closed the connection une= xpectedly > =A0 =A0 =A0 =A0This probably means the server terminated abnormally > =A0 =A0 =A0 =A0before or while processing the request. > > However, I have been unable to simplify the error condition :(. I tried > dumping 8000 lines of sql, but when I run them separately, I don't get a > crash. > > The last lines before the crash are ($db->query is a prepared query, > $db->simple_query is a non-prepared query): > <?php > =A0 =A0$db->query("SAVEPOINT lala"); > =A0 =A0$res =3D questions::move_question($t->e, $g1q1_id, $g2_id); //some= queries > here > =A0 =A0$db->query("ROLLBACK TO SAVEPOINT lala"); > =A0 =A0$db->query("RELEASE SAVEPOINT lala"); > =A0 =A0//some queries here > =A0 =A0$res =3D aas::insert_in_group($t->e, $aa_id, $g2_id); > =A0 =A0$res =3D questions::move_question($t->e, $g1q1_id, $g2_id); > =A0 =A0//crash when line below is run! > =A0 =A0$db->simple_query("SET CONSTRAINTS ALL IMMEDIATE"); > ?> > > /var/log/syslog: > Aug 17 23:08:26 thue-laptop kernel: [46188.626357] postgres[11461]: segfa= ult > at 21723858 ip 00772286 sp bf8a4a80 error 4 in postgres[621000+446000] > > /var/log/postgresql/postgresql-8.4-main.log > 2010-08-17 23:08:26 CEST LOG: =A0server process (PID 11461) was terminate= d by > signal 11: Segmentation fault > 2010-08-17 23:08:26 CEST LOG: =A0terminating any other active server > processes > 2010-08-17 23:08:26 CEST WARNING: =A0terminating connection because of cr= ash > of another server process > 2010-08-17 23:08:26 CEST DETAIL: =A0The postmaster has commanded this ser= ver > process to roll back the current transaction and exit,\ > =A0because another server process exited abnormally and possibly corrupted > shared memory. > 2010-08-17 23:08:26 CEST HINT: =A0In a moment you should be able to recon= nect > to the database and repeat your command. > 2010-08-17 23:08:26 CEST LOG: =A0all server processes terminated; > reinitializing > 2010-08-17 23:08:26 CEST LOG: =A0database system was interrupted; last kn= own > up at 2010-08-17 23:04:02 CEST > 2010-08-17 23:08:26 CEST LOG: =A0database system was not properly shut do= wn; > automatic recovery in progress > 2010-08-17 23:08:26 CEST LOG: =A0redo starts at 0/37500054 > > I realize that this is probably not enough to understand the bug, but I w= ill > be happy to try to debug it further, if I am given a pointer as to how. Well, obviously the best thing would be to isolate a reproducible test case. But maybe a good start would be to try to get a list of the exact series of SQL statements that are being executed. Perhaps you could set log_min_duration_statement=3D0 and then find 'em in the logs. If you do this with some trivial query in the places where you have "some queries here", it doesn't crash. --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 18/08/10 05:26, Thue Janus Kristensen wrote: > /var/log/syslog: > Aug 17 23:08:26 thue-laptop kernel: [46188.626357] postgres[11461]: segfault > at 21723858 ip 00772286 sp bf8a4a80 error 4 in postgres[621000+446000] If you're game, it might also be useful to see what the backend was up to when it crashed. There are some instructions on how to get a stack trace on the wiki: http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/
On Tue, Aug 17, 2010 at 10:55 PM, Craig Ringer <craig@postnewspapers.com.au> wrote: > On 18/08/10 05:26, Thue Janus Kristensen wrote: >> /var/log/syslog: >> Aug 17 23:08:26 thue-laptop kernel: [46188.626357] postgres[11461]: segfault >> at 21723858 ip 00772286 sp bf8a4a80 error 4 in postgres[621000+446000] > > If you're game, it might also be useful to see what the backend was up > to when it crashed. There are some instructions on how to get a stack > trace on the wiki: > > http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD Yeah, that would be very helpful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Re: BUG #5622: Query failed: server closed the connection unexpectedly
From
Thue Janus Kristensen
Date:
I finally succeeded in creating a test case, after much experimentation. Running the attached sql crashes my postgresql server 100% of the time.
Regards, Thue
Regards, Thue
On Tue, Aug 17, 2010 at 11:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 17, 2010 at 5:26 PM, Thue Janus Kristensen <thuejk@gmail.com> wrote:
>
> The following bug has been logged online:
>
> Bug reference: 5622
> Logged by: Thue Janus Kristensen
> Email address: thuejk@gmail.com
> PostgreSQL version: 8.4.4
> Operating system: Ubuntu 10.04 LTS i386
> Description: Query failed: server closed the connection unexpectedly
> Details:
>
> I have a 100% reproducible server crash with postgresql. This crash does not
> happen on my 8.3 installation.
>
> When the crash occurs I get
> pg_query(): Query failed: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>
> However, I have been unable to simplify the error condition :(. I tried
> dumping 8000 lines of sql, but when I run them separately, I don't get a
> crash.
>
> The last lines before the crash are ($db->query is a prepared query,
> $db->simple_query is a non-prepared query):
> <?php
> $db->query("SAVEPOINT lala");
> $res = questions::move_question($t->e, $g1q1_id, $g2_id); //some queries
> here
> $db->query("ROLLBACK TO SAVEPOINT lala");
> $db->query("RELEASE SAVEPOINT lala");
> //some queries here
> $res = aas::insert_in_group($t->e, $aa_id, $g2_id);
> $res = questions::move_question($t->e, $g1q1_id, $g2_id);
> //crash when line below is run!
> $db->simple_query("SET CONSTRAINTS ALL IMMEDIATE");
> ?>
>
> /var/log/syslog:
> Aug 17 23:08:26 thue-laptop kernel: [46188.626357] postgres[11461]: segfault
> at 21723858 ip 00772286 sp bf8a4a80 error 4 in postgres[621000+446000]
>
> /var/log/postgresql/postgresql-8.4-main.log
> 2010-08-17 23:08:26 CEST LOG: server process (PID 11461) was terminated by
> signal 11: Segmentation fault
> 2010-08-17 23:08:26 CEST LOG: terminating any other active server
> processes
> 2010-08-17 23:08:26 CEST WARNING: terminating connection because of crash
> of another server process
> 2010-08-17 23:08:26 CEST DETAIL: The postmaster has commanded this server
> process to roll back the current transaction and exit,\
> because another server process exited abnormally and possibly corrupted
> shared memory.
> 2010-08-17 23:08:26 CEST HINT: In a moment you should be able to reconnect
> to the database and repeat your command.
> 2010-08-17 23:08:26 CEST LOG: all server processes terminated;
> reinitializing
> 2010-08-17 23:08:26 CEST LOG: database system was interrupted; last known
> up at 2010-08-17 23:04:02 CEST
> 2010-08-17 23:08:26 CEST LOG: database system was not properly shut down;
> automatic recovery in progress
> 2010-08-17 23:08:26 CEST LOG: redo starts at 0/37500054
>
> I realize that this is probably not enough to understand the bug, but I will
> be happy to try to debug it further, if I am given a pointer as to how.
Well, obviously the best thing would be to isolate a reproducible test
case. But maybe a good start would be to try to get a list of the
exact series of SQL statements that are being executed. Perhaps you
could set log_min_duration_statement=0 and then find 'em in the logs.
If you do this with some trivial query in the places where you have
"some queries here", it doesn't crash.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
Attachment
Thue Janus Kristensen <thuejk@gmail.com> writes: > I finally succeeded in creating a test case, after much experimentation. > Running the attached sql crashes my postgresql server 100% of the time. Cool, replicated here on CVS HEAD: Program terminated with signal 10, Bus error. #0 afterTriggerCheckState (evtshared=0x40163089) at trigger.c:2850 2850 if ((evtshared->ats_event & AFTER_TRIGGER_DEFERRABLE) == 0) (gdb) bt #0 afterTriggerCheckState (evtshared=0x40163089) at trigger.c:2850 #1 0x2f61f4 in afterTriggerMarkEvents (events=0x40163089, move_list=0x0, immediate_only=1 '\001') at trigger.c:3219 #2 0x2f76dc in AfterTriggerSetState (stmt=0x40064af0) at trigger.c:4149 #3 0x418250 in standard_ProcessUtility (parsetree=0x40064af0, queryString=0x400641a8 "SET CONSTRAINTS ALL IMMEDIATE;", params=0x0, isTopLevel=1 '\001', dest=0x40064cd8, completionTag=0x7b03b308 "") at utility.c:1226 #4 0x416a70 in ProcessUtility (parsetree=0x40064af0, queryString=0x400641a8 "SET CONSTRAINTS ALL IMMEDIATE;", params=0x0, isTopLevel=1 '\001', dest=0x40064cd8, completionTag=0x7b03b308 "") at utility.c:323 #5 0x415740 in PortalRunUtility (portal=0x400e1fa0, utilityStmt=0x40064af0, isTopLevel=1 '\001', dest=0x40064cd8, completionTag=0x7b03b308 "") at pquery.c:1191 #6 0x4158f0 in PortalRunMulti (portal=0x400e1fa0, isTopLevel=1 '\001', dest=0x40064cd8, altdest=0x40064cd8, completionTag=0x7b03b308 "") at pquery.c:1298 #7 0x414f64 in PortalRun (portal=0x400e1fa0, count=2147483647, isTopLevel=1 '\001', dest=0x40064cd8, altdest=0x40064cd8, completionTag=0x7b03b308 "") at pquery.c:822 #8 0x40e7c0 in exec_simple_query ( query_string=0x400641a8 "SET CONSTRAINTS ALL IMMEDIATE;") Looking at it ... regards, tom lane
Thue Janus Kristensen <thuejk@gmail.com> writes: > I finally succeeded in creating a test case, after much experimentation. > Running the attached sql crashes my postgresql server 100% of the time. Cute problem. The attached fix cures it for me; would you see if it solves your original case too? regards, tom lane Index: src/backend/commands/trigger.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v retrieving revision 1.248.2.2 diff -c -r1.248.2.2 trigger.c *** src/backend/commands/trigger.c 24 Jan 2010 21:49:31 -0000 1.248.2.2 --- src/backend/commands/trigger.c 19 Aug 2010 15:14:06 -0000 *************** *** 2534,2539 **** --- 2534,2540 ---- else events->tail->next = chunk; events->tail = chunk; + /* events->tailfree is now out of sync, but we'll fix it below */ } /* *************** *** 2935,2940 **** --- 2936,2950 ---- { chunk->freeptr = CHUNK_DATA_START(chunk); chunk->endfree = chunk->endptr; + + /* + * If it's last chunk, must sync event list's tailfree too. Note + * that delete_ok must NOT be passed as true if there could be + * stacked AfterTriggerEventList values pointing at this event + * list, since we'd fail to fix their copies of tailfree. + */ + if (chunk == events->tail) + events->tailfree = chunk->freeptr; } }
Re: BUG #5622: Query failed: server closed the connection unexpectedly
From
Thue Janus Kristensen
Date:
On Thu, Aug 19, 2010 at 5:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thue Janus Kristensen <thuejk@gmail.com> writes: > > I finally succeeded in creating a test case, after much experimentation. > > Running the attached sql crashes my postgresql server 100% of the time. > > Cute problem. Generating the test case was very "interesting" black box debugging. Now why does it need at least 7 (and not 1-6) updates to that unrelated row in that unrelated table to crash...? At least it was deterministic. > The attached fix cures it for me; would you see if it > solves your original case too? > Yes, it fixes my original problem too, thanks :). Regards, Thue