Problem after removal of exec(), help - Mailing list pgsql-hackers

From Bruce Momjian
Subject Problem after removal of exec(), help
Date
Msg-id 199806221445.KAA13553@candle.pha.pa.us
Whole thread Raw
Responses Re: [HACKERS] Problem after removal of exec(), help  (dg@illustra.com (David Gould))
Re: [HACKERS] Problem after removal of exec(), help  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
Since the removal of exec(), Thomas has seen, and I have confirmed that
if a backend crashes, and the postmaster must reset the shared memory,
no backends can connect anymore.  One way to reproduce it is to run the
regression tests, which on their last test will crash for an un-related
reason.  However, it will not allow you to restart any more backends.

The error it gets is:

Failed Assertion("!((((unsigned long)nextElem) > ShmemBase)):", File: "shmqueue.
c", Line: 83)
!((((unsigned long)nextElem) > ShmemBase)) (0) [No such file or directory]

In this case nextElem = ShmemBase, so it is not greater.  Removing the
Assert() still does not make things work, so there must be something
else.

Now, the problem is probably not at that exact spot, but somewhere
deeper.  There are two differences between the old non-exec() behavior
and new behavior.  In the old setup, the backend had all its global
variables initialized, while in the new no-exec case, they take the
global variable values from the postmaster.  Second, the old setup had
each backend attaching to the shared memory, while the new setup has
them inheriting the shared memory from the fork().

My guess is that there is something buggy about the reset code in
postmaster.c that was not resetting completely, but the initialization
of the global variables in the backend was masking the bug, or the
attach() operation did some extra work that we now need to do when
resetting the shared memory:

    static void
    reset_shared(short port)
    {
        ipc_key = port * 1000 + shmem_seq * 100;
        CreateSharedMemoryAndSemaphores(ipc_key);
        ActiveBackends = FALSE;
        shmem_seq += 1;
        if (shmem_seq >= 10)
            shmem_seq -= 10;
    }


I am stumped on this.

--
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)

pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: btree: BTP_CHAIN flag was expected (revisited)
Next
From: Keith Parks
Date:
Subject: Divide by zero error on SPARC/Linux.