Thread: Server error
Hello, I have a plpgsql function which dies strangely very often, with the message "server closed the connection unexpectedly". The log file says [...] postgres[3315]: [391-8] :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 } postgres[30748]: [182] DEBUG: reaping dead processes postgres[30748]: [183] DEBUG: child process (pid 3315) was terminated by signal 11 postgres[30748]: [184] DEBUG: server process (pid 3315) was terminated by signal 11 postgres[30748]: [185] DEBUG: terminating any other active server processes postgres[30748]: [186] DEBUG: all server processes terminated; reinitializing shared memory and semaphores postgres[30748]: [187] DEBUG: shmem_exit(0) [...] What is signal 11 (and where is it documented anyway?), and what could be the cause? I tracked the error to a line in the function which consists of a simple EXECUTE call. Erik __________________________________________________ Yahoo! Plus For a better Internet experience http://www.yahoo.co.uk/btoffer
On Tue, 6 May 2003, Erik Ronström wrote: > Hello, > > I have a plpgsql function which dies strangely very often, with the > message "server closed the connection unexpectedly". The log file says > > [...] > postgres[3315]: [391-8] :extprm () :locprm () :initplan <> :nprm 0 > :scanrelid 1 } > postgres[30748]: [182] DEBUG: reaping dead processes > postgres[30748]: [183] DEBUG: child process (pid 3315) was terminated > by signal 11 > postgres[30748]: [184] DEBUG: server process (pid 3315) was terminated > by signal 11 > postgres[30748]: [185] DEBUG: terminating any other active server > processes > postgres[30748]: [186] DEBUG: all server processes terminated; > reinitializing shared memory and semaphores > postgres[30748]: [187] DEBUG: shmem_exit(0) > [...] > > What is signal 11 (and where is it documented anyway?), and what could > be the cause? I tracked the error to a line in the function which > consists of a simple EXECUTE call. Sig 11 means you have bad memory or CPU, about 99.9% of the time. www.memtest86.com
Signal 11 is a segfault. As can be seen in 'man kill' So it is either broken hardware or broken software, in the latter case broken postgres? Perhaps it is usefull if you'd be able to post that plpgsql function along? arjen > -----Oorspronkelijk bericht----- > Van: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] Namens Erik Ronström > Verzonden: dinsdag 6 mei 2003 23:53 > Aan: pgsql-general@postgresql.org > Onderwerp: [GENERAL] Server error > > > Hello, > > I have a plpgsql function which dies strangely very often, > with the message "server closed the connection unexpectedly". > The log file says > > [...] > postgres[3315]: [391-8] :extprm () :locprm () :initplan <> :nprm 0 > :scanrelid 1 } > postgres[30748]: [182] DEBUG: reaping dead processes > postgres[30748]: [183] DEBUG: child process (pid 3315) was > terminated by signal 11 > postgres[30748]: [184] DEBUG: server process (pid 3315) was > terminated by signal 11 > postgres[30748]: [185] DEBUG: terminating any other active > server processes > postgres[30748]: [186] DEBUG: all server processes > terminated; reinitializing shared memory and semaphores > postgres[30748]: [187] DEBUG: shmem_exit(0) > [...] > > What is signal 11 (and where is it documented anyway?), and > what could be the cause? I tracked the error to a line in the > function which consists of a simple EXECUTE call. > > Erik > > __________________________________________________ > Yahoo! Plus > For a better Internet experience > http://www.yahoo.co.uk/btoffer > > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an > appropriate subscribe-nomail command to > majordomo@postgresql.org so that your message can get through > to the mailing list cleanly >
Oh, here's a nice link on sig11: http://www.bitwizard.nl/sig11/
"scott.marlowe" <scott.marlowe@ihs.com> writes: > On Tue, 6 May 2003, Erik Ronstr�m wrote: >> I have a plpgsql function which dies strangely very often, with the >> message "server closed the connection unexpectedly". The log file says > Sig 11 means you have bad memory or CPU, about 99.9% of the time. In my part of the universe, about 99% of the time it means you've found a software bug ;-) ... especially if you can create an example case that is reproducible on another machine. Erik, can you wrap up a test case? And which PG version are you running, anyway? regards, tom lane
Hi again, thanks for the answers. --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > Sig 11 means you have bad memory or CPU, about 99.9% of the time. > > In my part of the universe, about 99% of the time it means you've > found a software bug ;-) ... especially if you can create an example > case that is reproducible on another machine. Erik, can you wrap up > a test case? 99% + 99.9%, that makes 198.9 percent :-) Unfortunatly, the function depends heavily on the database structure. I tried to extract the essential parts to reproduce the problem within a small test DB, but then everything worked just fine! But I will post an example when I get one... > And which PG version are you running, anyway? 7.2.1. I've heard it has some bugs, but the guy running the server refuses to upgrade to _anything_ that isn't cleared by Debian. 7.3 should have been cleared some days ago, but it hasn't, don't know why. Best regards Erik __________________________________________________ Yahoo! Plus For a better Internet experience http://www.yahoo.co.uk/btoffer
On Wed, 2003-05-07 at 11:48, Erik Ronström wrote: > > And which PG version are you running, anyway? > > 7.2.1. I've heard it has some bugs, but the guy running the server > refuses to upgrade to _anything_ that isn't cleared by Debian. 7.3 > should have been cleared some days ago, but it hasn't, don't know why. It's blocked by perl and python2.2. The package dependencies are set by the environment when I build the package and my versions of perl and python are ahead of those in testing. His position is a bit irrational, since progression to testing means only that no release critical bugs have been found within the 10 days since the package was uploaded. It is in no sense a guarantee of perfection; it only means that it will probably not break your system through an egregious packaging error. You may tell him that, as Debian maintainer, I consider the current unstable package to be better than the one in testing! If he wants to keep the rest of his system pure he could download the source package and build from that. If in fact you mean that he is running stable, there is a woody build of 7.3.2 in an aptable repository at http://people.debian.org/~elphick/debian -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight, UK http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C ======================================== "Dearly beloved, avenge not yourselves, but rather give place unto wrath. For it is written, Vengeance is mine; I will repay, saith the Lord. Therefore if thine enemy hunger, feed him; if he thirst, give him drink; for in so doing thou shalt heap coals of fire on his head. Be not overcome of evil, but overcome evil with good." Romans 12:19-21
On Tue, 6 May 2003, Tom Lane wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > On Tue, 6 May 2003, Erik Ronström wrote: > >> I have a plpgsql function which dies strangely very often, with the > >> message "server closed the connection unexpectedly". The log file says > > > Sig 11 means you have bad memory or CPU, about 99.9% of the time. > > In my part of the universe, about 99% of the time it means you've found > a software bug ;-) ... especially if you can create an example case that > is reproducible on another machine. Erik, can you wrap up a test case? > And which PG version are you running, anyway? Touche' I think the real issue is whether or not the error remains the same each time, occuring in the same exact place, then it is usually code. But if the sig 11 shows up in different places each time, then it is likely bad hardware. Further, just because one gets a sig11 every time they run a certain stored proc is not necessarily the same as getting one in the same exact place of the stored proc or postgresql code while it's running. So, it's a good idea to get several traces of the sig 11, and compare them. If they aren't happening in the same place each time, then the hardware should be checked. My point on this is that YOU shouldn't be chasing down these problems until such time as the user has proven that their hardware is sound. Since bad hardware is pretty common, and your time is a limited resource, I really feel that if someone is getting sig 11s, they should be directed to test their hardware first with something like memtest86 and only after it passes should they come back to you. Especially right now when you and the other developers are working hard to get the 7.4 code ready to go. The old test for bad hardware, by the way, was to compile the linux kernel a 100 times with a -j <bignum> switch with bignum set high enough to use all your memory. Of course, that was back when 64 megs was a fair bit, so it wasn't hard to get the machine to use it all. With bigger and bigger memory subsystems, bad memory is much more likely to stay hidden until load increases, then boom, you hit that bad bit and get a sig11. Hence the need for better hardware testing before chasing the software bug possibility.
On Wed, 7 May 2003, Erik Ronström wrote: > Hi again, > > thanks for the answers. > > --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > > Sig 11 means you have bad memory or CPU, about 99.9% of the time. > > > > In my part of the universe, about 99% of the time it means you've > > found a software bug ;-) ... especially if you can create an example > > case that is reproducible on another machine. Erik, can you wrap up > > a test case? > > 99% + 99.9%, that makes 198.9 percent :-) That's because Postgresql goes to 11. :-) > Unfortunatly, the function depends heavily on the database structure. I > tried to extract the essential parts to reproduce the problem within a > small test DB, but then everything worked just fine! But I will post an > example when I get one... It could easily be that some data structure has to get to a certain size before it clobbers some pointers somewhere. Or that the single bad bit of memory in both machines isn't used until load gets high enough for it to get allocated for a postgresql backend process. > > And which PG version are you running, anyway? > > 7.2.1. I've heard it has some bugs, but the guy running the server > refuses to upgrade to _anything_ that isn't cleared by Debian. 7.3 > should have been cleared some days ago, but it hasn't, don't know why. Hasn't debian approved 7.2.4 yet? With the known bugs in 7.2.1 your friend is being a bit pedantic if he won't at least upgrade to the latest version of 7.2. I'd surely trust the opinion of the postgresql developers over that of the debian developers on which versions of postgresql have bugs you should be worried about.
Hello, Still stuck with the same error. Finally managed to upgrade from 7.2.1 to 7.2.4, and realized that the problem is still there. Shit! I've not yet been able to reproduce the problem on another location, but at least I've isolated it a bit: I have a function which creates a "cache" table with a subset of rows from another table (CREATE TABLE new AS SELECT ... FROM old WHERE ...). Then it adds a foreign key (ALTER TABLE new ADD CONSTRAINT ref FOREIGN KEY ... REFERENCES old). Everything is fine so far. Now, if I drop the new table and run the function again, postgres crashes. Things to note: 1) If the old table doesn't contain any rows *when running the query the first time*, there is no crash the second time. 2) If I execute the queries from the function "manually", typing them into psql, everything works fine. Looks to me like there is some sort of cleanup problem, since it is almost always the second run (in each session) that crashes. One question is: is it always safe to create a foreign key constraint, even when the table contains data? Erik __________________________________________________ Yahoo! Plus For a better Internet experience http://www.yahoo.co.uk/btoffer
On Thu, 8 May 2003, [iso-8859-1] Erik Ronstr�m wrote: > Still stuck with the same error. Finally managed to upgrade from 7.2.1 > to 7.2.4, and realized that the problem is still there. Shit! I've not > yet been able to reproduce the problem on another location, but at > least I've isolated it a bit: > > I have a function which creates a "cache" table with a subset of rows > from another table (CREATE TABLE new AS SELECT ... FROM old WHERE ...). > Then it adds a foreign key (ALTER TABLE new ADD CONSTRAINT ref FOREIGN > KEY ... REFERENCES old). Everything is fine so far. Now, if I drop the > new table and run the function again, postgres crashes. I can reproduce on 7.2 but not 7.3 or 7.4. It looks like something is getting clobbered. When I recompiled with debug and asserting, I get a crash the first time the function is called. You may need to go through with a debugger. > One question is: is it always safe to create a foreign key constraint, > even when the table contains data? It'll error if the constraint is violated or invalid, but otherwise it should be.
name your tables with some sort of sequence and a concatenation like: create table 'temp' || next_val(blah blah) AS blah blah, making sure to delete of course :-) It will avoid any race conditions on cleanup. Do this for awhile, and check the catalogs and make sure that everything DOESget cleaned up. Erik Ronström wrote: > Hello, > > Still stuck with the same error. Finally managed to upgrade from 7.2.1 > to 7.2.4, and realized that the problem is still there. Shit! I've not > yet been able to reproduce the problem on another location, but at > least I've isolated it a bit: > > I have a function which creates a "cache" table with a subset of rows > from another table (CREATE TABLE new AS SELECT ... FROM old WHERE ...). > Then it adds a foreign key (ALTER TABLE new ADD CONSTRAINT ref FOREIGN > KEY ... REFERENCES old). Everything is fine so far. Now, if I drop the > new table and run the function again, postgres crashes. > > Things to note: > 1) If the old table doesn't contain any rows *when running the query > the first time*, there is no crash the second time. > 2) If I execute the queries from the function "manually", typing them > into psql, everything works fine. > > Looks to me like there is some sort of cleanup problem, since it is > almost always the second run (in each session) that crashes. > > One question is: is it always safe to create a foreign key constraint, > even when the table contains data? > > Erik > > __________________________________________________ > Yahoo! Plus > For a better Internet experience > http://www.yahoo.co.uk/btoffer > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > >
it does a full scan of the table's child and parent columns upon creation of a foreign key? Stephan Szabo wrote: > On Thu, 8 May 2003, [iso-8859-1] Erik Ronstr?m wrote: > > >>Still stuck with the same error. Finally managed to upgrade from 7.2.1 >>to 7.2.4, and realized that the problem is still there. Shit! I've not >>yet been able to reproduce the problem on another location, but at >>least I've isolated it a bit: >> >>I have a function which creates a "cache" table with a subset of rows >>from another table (CREATE TABLE new AS SELECT ... FROM old WHERE ...). >>Then it adds a foreign key (ALTER TABLE new ADD CONSTRAINT ref FOREIGN >>KEY ... REFERENCES old). Everything is fine so far. Now, if I drop the >>new table and run the function again, postgres crashes. > > > I can reproduce on 7.2 but not 7.3 or 7.4. It looks like something is > getting clobbered. When I recompiled with debug and asserting, I get a > crash the first time the function is called. You may need to go through > with a debugger. > > >>One question is: is it always safe to create a foreign key constraint, >>even when the table contains data? > > > It'll error if the constraint is violated or invalid, but otherwise it > should be. > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
On Thu, 8 May 2003, Dennis Gearon wrote: > it does a full scan of the table's child and parent columns upon > creation of a foreign key? It currently runs the trigger once per row in the referencing table. Doing a single select with not exists will almost certainly be faster, but that's waiting for someone else to decide to do it or me to get time to do it. :) Some form of check is required, AFAIK however, because if the constraint isn't satisified at the end of the alter table an error should be thrown.