Re: 7.0RC1: possible query and backend problem - Mailing list pgsql-general
From | Tom Lane |
---|---|
Subject | Re: 7.0RC1: possible query and backend problem |
Date | |
Msg-id | 17744.957067447@sss.pgh.pa.us Whole thread Raw |
In response to | Re: 7.0RC1: possible query and backend problem (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
I wrote: >>> IpcMemoryCreate: shmget failed (Invalid argument) key=5432110, >>> size=144, permission=700 > Hmm, that is odd. The thing that looks peculiar to me is that > it seems to be calculating a different size for the segment than > it did the first time through: >> # ipcs -a >> IPC status from <running system> as of Wed Apr 19 16:45:42 2000 >> T ID KEY MODE OWNER GROUP CREATOR >> CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME >> Shared Memory: >> m 800 0x0052e32e --rw------- postgres postgres postgres >> postgres 0 120 12737 12737 13:01:36 13:01:36 13:01:36 > See the difference? 120 vs 144? What's causing that I wonder... > and would it explain the failure to reattach? After further investigation, "Invalid argument" is the typical kernel error code from shmget() if one tries to attach to an existing shared memory segment that is smaller than one asked for. So that's consistent. The size requested for the spinlock segment (which is the only one of Postgres' three shmem segments that could be as small as 144 bytes) is computed by "sizeof(struct foo)"; there is no way that that is going to change from one invocation to the text. But the numbers 120 and 144 are consistent with the theory that the shmem segment was originally created by Postgres 6.5 and you are now trying to attach to it with Postgres 7.0 --- 7.0 has more spinlocks than 6.5 did. Your trace appeared to show a working 7.0 postmaster getting this error while trying to reinitialize. That doesn't make any sense to me; if the 7.0 postmaster had managed to start up originally, then it must have found or created a suitably-sized shmem segment. So I'm confused about the details, but I've got to think that we are looking at some sort of interference between 6.5 and 7.0 installations. One possibility is that after you started the 7.0 postmaster, you accidentally tried to start a 6.5 postmaster on the same port number, and the 6.5 code managed to resize the shared mem segment before failing because of the port number conflict. Not sure if that could happen --- my shmget() man page doesn't say anything about changing the size of an already-existing shmem segment, but maybe your Unix works differently. Comments anyone? If this actually is what happened, we should reorder the startup sequence to check for port-number conflicts before any shared memory segments are touched. But I'm not sure about it. regards, tom lane PS: if you don't see the connection between port number and shmem, it's this: the key numbers used for shmem segments are computed from the port number. So different postmasters can coexist on one machine if they have different port numbers; they'll get different shmem segments. But starting two postmasters on the same port is bad news. I thought we had adequate interlocks against that, but now I'm wondering.
pgsql-general by date: