Thread: Server error

Server error

From

Erik Ronström

Date:

06 May 2003, 17:53:28

Hello,

I have a plpgsql function which dies strangely very often, with the
message "server closed the connection unexpectedly". The log file says

[...]
postgres[3315]: [391-8]  :extprm () :locprm () :initplan <> :nprm 0
:scanrelid 1 }
postgres[30748]: [182] DEBUG:  reaping dead processes
postgres[30748]: [183] DEBUG:  child process (pid 3315) was terminated
by signal 11
postgres[30748]: [184] DEBUG:  server process (pid 3315) was terminated
by signal 11
postgres[30748]: [185] DEBUG:  terminating any other active server
processes
postgres[30748]: [186] DEBUG:  all server processes terminated;
reinitializing shared memory and semaphores
postgres[30748]: [187] DEBUG:  shmem_exit(0)
[...]

What is signal 11 (and where is it documented anyway?), and what could
be the cause? I tracked the error to a line in the function which
consists of a simple EXECUTE call.

Erik

__________________________________________________
Yahoo! Plus
For a better Internet experience
http://www.yahoo.co.uk/btoffer

Re: Server error

From

"scott.marlowe"

Date:

06 May 2003, 18:04:24

On Tue, 6 May 2003, Erik Ronström wrote:

> Hello,
>
> I have a plpgsql function which dies strangely very often, with the
> message "server closed the connection unexpectedly". The log file says
>
> [...]
> postgres[3315]: [391-8]  :extprm () :locprm () :initplan <> :nprm 0
> :scanrelid 1 }
> postgres[30748]: [182] DEBUG:  reaping dead processes
> postgres[30748]: [183] DEBUG:  child process (pid 3315) was terminated
> by signal 11
> postgres[30748]: [184] DEBUG:  server process (pid 3315) was terminated
> by signal 11
> postgres[30748]: [185] DEBUG:  terminating any other active server
> processes
> postgres[30748]: [186] DEBUG:  all server processes terminated;
> reinitializing shared memory and semaphores
> postgres[30748]: [187] DEBUG:  shmem_exit(0)
> [...]
>
> What is signal 11 (and where is it documented anyway?), and what could
> be the cause? I tracked the error to a line in the function which
> consists of a simple EXECUTE call.

Sig 11 means you have bad memory or CPU, about 99.9% of the time.

www.memtest86.com

Re: Server error

From

Arjen van der Meijden

Date:

06 May 2003, 18:35:51

Signal 11 is a segfault.
As can be seen in 'man kill'

So it is either broken hardware or broken software, in the latter case
broken postgres?
Perhaps it is usefull if you'd be able to post that plpgsql function
along?

arjen

> -----Oorspronkelijk bericht-----
> Van: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] Namens Erik Ronström
> Verzonden: dinsdag 6 mei 2003 23:53
> Aan: pgsql-general@postgresql.org
> Onderwerp: [GENERAL] Server error
>
>
> Hello,
>
> I have a plpgsql function which dies strangely very often,
> with the message "server closed the connection unexpectedly".
> The log file says
>
> [...]
> postgres[3315]: [391-8]  :extprm () :locprm () :initplan <> :nprm 0
> :scanrelid 1 }
> postgres[30748]: [182] DEBUG:  reaping dead processes
> postgres[30748]: [183] DEBUG:  child process (pid 3315) was
> terminated by signal 11
> postgres[30748]: [184] DEBUG:  server process (pid 3315) was
> terminated by signal 11
> postgres[30748]: [185] DEBUG:  terminating any other active
> server processes
> postgres[30748]: [186] DEBUG:  all server processes
> terminated; reinitializing shared memory and semaphores
> postgres[30748]: [187] DEBUG:  shmem_exit(0)
> [...]
>
> What is signal 11 (and where is it documented anyway?), and
> what could be the cause? I tracked the error to a line in the
> function which consists of a simple EXECUTE call.
>
> Erik
>
> __________________________________________________
> Yahoo! Plus
> For a better Internet experience
> http://www.yahoo.co.uk/btoffer
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an
> appropriate subscribe-nomail command to
> majordomo@postgresql.org so that your message can get through
> to the mailing list cleanly
>

Re: Server error

From

"scott.marlowe"

Date:

06 May 2003, 19:29:22

Oh, here's a nice link on sig11:

http://www.bitwizard.nl/sig11/

Re: Server error

From

Tom Lane

Date:

06 May 2003, 23:12:01

"scott.marlowe" <scott.marlowe@ihs.com> writes:
> On Tue, 6 May 2003, Erik Ronstr�m wrote:
>> I have a plpgsql function which dies strangely very often, with the
>> message "server closed the connection unexpectedly". The log file says

> Sig 11 means you have bad memory or CPU, about 99.9% of the time.

In my part of the universe, about 99% of the time it means you've found
a software bug ;-) ... especially if you can create an example case that
is reproducible on another machine.  Erik, can you wrap up a test case?
And which PG version are you running, anyway?

            regards, tom lane

Re: Server error

From

Erik Ronström

Date:

07 May 2003, 06:48:37

Hi again,

thanks for the answers.

 --- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > Sig 11 means you have bad memory or CPU, about 99.9% of the time.
>
> In my part of the universe, about 99% of the time it means you've
> found a software bug ;-) ... especially if you can create an example
> case that is reproducible on another machine.  Erik, can you wrap up
> a test case?

99% + 99.9%, that makes 198.9 percent :-)

Unfortunatly, the function depends heavily on the database structure. I
tried to extract the essential parts to reproduce the problem within a
small test DB, but then everything worked just fine! But I will post an
example when I get one...

> And which PG version are you running, anyway?

7.2.1. I've heard it has some bugs, but the guy running the server
refuses to upgrade to _anything_ that isn't cleared by Debian. 7.3
should have been cleared some days ago, but it hasn't, don't know why.

Best regards
Erik

__________________________________________________
Yahoo! Plus
For a better Internet experience
http://www.yahoo.co.uk/btoffer

Re: Server error

From

Oliver Elphick

Date:

07 May 2003, 09:11:58

On Wed, 2003-05-07 at 11:48, Erik Ronström wrote:

> > And which PG version are you running, anyway?
>
> 7.2.1. I've heard it has some bugs, but the guy running the server
> refuses to upgrade to _anything_ that isn't cleared by Debian. 7.3
> should have been cleared some days ago, but it hasn't, don't know why.

It's blocked by perl and python2.2.  The package dependencies are set by
the environment when I build the package and my versions of perl and
python are ahead of those in testing.

His position is a bit irrational, since progression to testing means
only that no release critical bugs have been found within the 10 days
since the package was uploaded.  It is in no sense a guarantee of
perfection; it only means that it will probably not break your system
through an egregious packaging error.

You may tell him that, as Debian maintainer, I consider the current
unstable package to be better than the one in testing!  If he wants to
keep the rest of his system pure he could download the source package
and build from that.

If in fact you mean that he is running stable, there is a woody build of
7.3.2  in an aptable repository at
http://people.debian.org/~elphick/debian

--
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight, UK                             http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
                 ========================================
     "Dearly beloved, avenge not yourselves, but rather give
      place unto wrath. For it is written, Vengeance is
      mine; I will repay, saith the Lord. Therefore if thine
      enemy hunger, feed him; if he thirst, give him drink;
      for in so doing thou shalt heap coals of fire on his
      head. Be not overcome of evil, but overcome evil with
      good."      Romans 12:19-21

Re: Server error

From

"scott.marlowe"

Date:

07 May 2003, 11:37:08

On Tue, 6 May 2003, Tom Lane wrote:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > On Tue, 6 May 2003, Erik Ronström wrote:
> >> I have a plpgsql function which dies strangely very often, with the
> >> message "server closed the connection unexpectedly". The log file says
>
> > Sig 11 means you have bad memory or CPU, about 99.9% of the time.
>
> In my part of the universe, about 99% of the time it means you've found
> a software bug ;-) ... especially if you can create an example case that
> is reproducible on another machine.  Erik, can you wrap up a test case?
> And which PG version are you running, anyway?

Touche'  I think the real issue is whether or not the error remains the
same each time, occuring in the same exact place, then it is usually code.
But if the sig 11 shows up in different places each time, then it is
likely bad hardware.

Further, just because one gets a sig11 every time they run a certain
stored proc is not necessarily the same as getting one in the same exact
place of the stored proc or postgresql code while it's running.

So, it's a good idea to get several traces of the sig 11, and compare
them.  If they aren't happening in the same place each time, then the
hardware should be checked.

My point on this is that YOU shouldn't be chasing down these problems
until such time as the user has proven that their hardware is sound.
Since bad hardware is pretty common, and your time is a limited resource,
I really feel that if someone is getting sig 11s, they should be directed
to test their hardware first with something like memtest86 and only after
it passes should they come back to you.  Especially right now when you and
the other developers are working hard to get the 7.4 code ready to go.

The old test for bad hardware, by the way, was to compile the linux kernel
a 100 times with a -j <bignum> switch with bignum set high enough to use
all your memory.  Of course, that was back when 64 megs was a fair bit,
so it wasn't hard to get the machine to use it all.  With bigger and
bigger memory subsystems, bad memory is much more likely to stay hidden
until load increases, then boom, you hit that bad bit and get a sig11.
Hence the need for better hardware testing before chasing the software bug
possibility.

Re: Server error

From

"scott.marlowe"

Date:

07 May 2003, 11:48:57

On Wed, 7 May 2003, Erik Ronström wrote:

> Hi again,
>
> thanks for the answers.
>
>  --- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > > Sig 11 means you have bad memory or CPU, about 99.9% of the time.
> >
> > In my part of the universe, about 99% of the time it means you've
> > found a software bug ;-) ... especially if you can create an example
> > case that is reproducible on another machine.  Erik, can you wrap up
> > a test case?
>
> 99% + 99.9%, that makes 198.9 percent :-)

That's because Postgresql goes to 11. :-)

> Unfortunatly, the function depends heavily on the database structure. I
> tried to extract the essential parts to reproduce the problem within a
> small test DB, but then everything worked just fine! But I will post an
> example when I get one...

It could easily be that some data structure has to get to a certain size
before it clobbers some pointers somewhere.  Or that the single bad bit of
memory in both machines isn't used until load gets high enough for it to
get allocated for a postgresql backend process.

> > And which PG version are you running, anyway?
>
> 7.2.1. I've heard it has some bugs, but the guy running the server
> refuses to upgrade to _anything_ that isn't cleared by Debian. 7.3
> should have been cleared some days ago, but it hasn't, don't know why.

Hasn't debian approved 7.2.4 yet?  With the known bugs in 7.2.1 your
friend is being a bit pedantic if he won't at least upgrade to the latest
version of 7.2.  I'd surely trust the opinion of the postgresql developers
over that of the debian developers on which versions of postgresql have
bugs you should be worried about.

Re: Server error

From

Erik Ronström

Date:

08 May 2003, 16:35:50

Hello,

Still stuck with the same error. Finally managed to upgrade from 7.2.1
to 7.2.4, and realized that the problem is still there. Shit! I've not
yet been able to reproduce the problem on another location, but at
least I've isolated it a bit:

I have a function which creates a "cache" table with a subset of rows
from another table (CREATE TABLE new AS SELECT ... FROM old WHERE ...).
Then it adds a foreign key (ALTER TABLE new ADD CONSTRAINT ref FOREIGN
KEY ... REFERENCES old). Everything is fine so far. Now, if I drop the
new table and run the function again, postgres crashes.

Things to note:
1) If the old table doesn't contain any rows *when running the query
the first time*, there is no crash the second time.
2) If I execute the queries from the function "manually", typing them
into psql, everything works fine.

Looks to me like there is some sort of cleanup problem, since it is
almost always the second run (in each session) that crashes.

One question is: is it always safe to create a foreign key constraint,
even when the table contains data?

Erik

__________________________________________________
Yahoo! Plus
For a better Internet experience
http://www.yahoo.co.uk/btoffer

Re: Server error

From

Stephan Szabo

Date:

08 May 2003, 17:54:01

On Thu, 8 May 2003, [iso-8859-1] Erik Ronstr�m wrote:

> Still stuck with the same error. Finally managed to upgrade from 7.2.1
> to 7.2.4, and realized that the problem is still there. Shit! I've not
> yet been able to reproduce the problem on another location, but at
> least I've isolated it a bit:
>
> I have a function which creates a "cache" table with a subset of rows
> from another table (CREATE TABLE new AS SELECT ... FROM old WHERE ...).
> Then it adds a foreign key (ALTER TABLE new ADD CONSTRAINT ref FOREIGN
> KEY ... REFERENCES old). Everything is fine so far. Now, if I drop the
> new table and run the function again, postgres crashes.

I can reproduce on 7.2 but not 7.3 or 7.4.  It looks like something is
getting clobbered.  When I recompiled with debug and asserting, I get a
crash the first time the function is called.  You may need to go through
with a debugger.

> One question is: is it always safe to create a foreign key constraint,
> even when the table contains data?

It'll error if the constraint is violated or invalid, but otherwise it
should be.

Re: Server error

From

Dennis Gearon

Date:

08 May 2003, 18:03:14

name your tables with some sort of sequence and a concatenation like:

create table 'temp' || next_val(blah blah) AS blah blah, making sure to delete of course :-)
It will avoid any race conditions on cleanup. Do this for awhile, and check the catalogs and make sure that everything
DOESget cleaned up. 

Erik Ronström wrote:
> Hello,
>
> Still stuck with the same error. Finally managed to upgrade from 7.2.1
> to 7.2.4, and realized that the problem is still there. Shit! I've not
> yet been able to reproduce the problem on another location, but at
> least I've isolated it a bit:
>
> I have a function which creates a "cache" table with a subset of rows
> from another table (CREATE TABLE new AS SELECT ... FROM old WHERE ...).
> Then it adds a foreign key (ALTER TABLE new ADD CONSTRAINT ref FOREIGN
> KEY ... REFERENCES old). Everything is fine so far. Now, if I drop the
> new table and run the function again, postgres crashes.
>
> Things to note:
> 1) If the old table doesn't contain any rows *when running the query
> the first time*, there is no crash the second time.
> 2) If I execute the queries from the function "manually", typing them
> into psql, everything works fine.
>
> Looks to me like there is some sort of cleanup problem, since it is
> almost always the second run (in each session) that crashes.
>
> One question is: is it always safe to create a foreign key constraint,
> even when the table contains data?
>
> Erik
>
> __________________________________________________
> Yahoo! Plus
> For a better Internet experience
> http://www.yahoo.co.uk/btoffer
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>

Re: Server error

From

Dennis Gearon

Date:

08 May 2003, 18:05:23

it does a full scan of the table's child and parent columns upon creation of a foreign key?

Stephan Szabo wrote:
> On Thu, 8 May 2003, [iso-8859-1] Erik Ronstr?m wrote:
>
>
>>Still stuck with the same error. Finally managed to upgrade from 7.2.1
>>to 7.2.4, and realized that the problem is still there. Shit! I've not
>>yet been able to reproduce the problem on another location, but at
>>least I've isolated it a bit:
>>
>>I have a function which creates a "cache" table with a subset of rows
>>from another table (CREATE TABLE new AS SELECT ... FROM old WHERE ...).
>>Then it adds a foreign key (ALTER TABLE new ADD CONSTRAINT ref FOREIGN
>>KEY ... REFERENCES old). Everything is fine so far. Now, if I drop the
>>new table and run the function again, postgres crashes.
>
>
> I can reproduce on 7.2 but not 7.3 or 7.4.  It looks like something is
> getting clobbered.  When I recompiled with debug and asserting, I get a
> crash the first time the function is called.  You may need to go through
> with a debugger.
>
>
>>One question is: is it always safe to create a foreign key constraint,
>>even when the table contains data?
>
>
> It'll error if the constraint is violated or invalid, but otherwise it
> should be.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

Re: Server error

From

Stephan Szabo

Date:

08 May 2003, 18:14:58

On Thu, 8 May 2003, Dennis Gearon wrote:

> it does a full scan of the table's child and parent columns upon
> creation of a foreign key?

It currently runs the trigger once per row in the referencing table.
Doing a single select with not exists will almost certainly be faster, but
that's waiting for someone else to decide to do it or me to get time to do
it. :)

Some form of check is required, AFAIK however, because if the constraint
isn't satisified at the end of the alter table an error should be thrown.