Re: [HACKERS] Problem after removal of exec(), help - Mailing list pgsql-hackers

From dg@illustra.com (David Gould)
Subject Re: [HACKERS] Problem after removal of exec(), help
Date
Msg-id 9806230151.AA07582@hawk.illustra.com
Whole thread Raw
In response to Problem after removal of exec(), help  (Bruce Momjian <maillist@candle.pha.pa.us>)
Responses Re: [HACKERS] Problem after removal of exec(), help  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
>
> Since the removal of exec(), Thomas has seen, and I have confirmed that
> if a backend crashes, and the postmaster must reset the shared memory,
> no backends can connect anymore.  One way to reproduce it is to run the
> regression tests, which on their last test will crash for an un-related
> reason.  However, it will not allow you to restart any more backends.
>
> The error it gets is:
>
> Failed Assertion("!((((unsigned long)nextElem) > ShmemBase)):", File: "shmqueue.
> c", Line: 83)
> !((((unsigned long)nextElem) > ShmemBase)) (0) [No such file or directory]
>
> In this case nextElem = ShmemBase, so it is not greater.  Removing the
> Assert() still does not make things work, so there must be something
> else.
>
> Now, the problem is probably not at that exact spot, but somewhere
> deeper.  There are two differences between the old non-exec() behavior
> and new behavior.  In the old setup, the backend had all its global
> variables initialized, while in the new no-exec case, they take the
> global variable values from the postmaster.  Second, the old setup had
> each backend attaching to the shared memory, while the new setup has
> them inheriting the shared memory from the fork().
>
> My guess is that there is something buggy about the reset code in
> postmaster.c that was not resetting completely, but the initialization
> of the global variables in the backend was masking the bug, or the
> attach() operation did some extra work that we now need to do when
> resetting the shared memory:
>
>     static void
>     reset_shared(short port)
>     {
>         ipc_key = port * 1000 + shmem_seq * 100;
>         CreateSharedMemoryAndSemaphores(ipc_key);
>         ActiveBackends = FALSE;
>         shmem_seq += 1;
>         if (shmem_seq >= 10)
>             shmem_seq -= 10;
>     }
>
>
> I am stumped on this.

No help here, but a request:

Could we have an option to do the fork()/exec() the old way as well as the
new sleek fork() only. I want to do some performance testing under gprof and
want to be able to replace my postgres binary with a shell script to save
the gmon.out file eg:

#!/bin/sh
postgres.bin $*
mv gmon.out gmon.$$

This won't work unless and exec() is done.

-dg

David Gould            dg@illustra.com           510.628.3783 or 510.305.9468
Informix Software  (No, really)         300 Lakeside Drive  Oakland, CA 94612
"Don't worry about people stealing your ideas.  If your ideas are any
 good, you'll have to ram them down people's throats." -- Howard Aiken

pgsql-hackers by date:

Previous
From: ocie@paracel.com
Date:
Subject: Re: [HACKERS] SQL queries accessing tables in more than one db
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Problem after removal of exec(), help