Thread: server auto-restarts and ipcs
A power failure led to failed postmaster restart using 7.4.6 (see output below). The short-term fix is usually to delete the pid file and restart. I often wonder why ipcs never seems to show the shared memory block in question? Am I using the wrong command? Does the key mentioned by pgsql map to the key in the ipcs output? And if the shared segment is simply not there, would it be possible for pgsql to figure that out ala Apache, search the process table, and go ahead and restart if it didn't see a postmaster already running? I'm sure this has been asked and answered, I just couldn't find it via google... TIA. Ed Database and process is pg746dba... $ cat logs-pg746-7.4.6/server_log.Mon pg_ctl: Another postmaster may be running. Trying to start postmaster anyway. 2004-11-08 17:17:22.398 [18038] FATAL: pre-existing shared memory block (key 9746001, ID 658210829) is still in use HINT: If you're sure there are no old server processes still running, remove the shared memory block with the command "ipcrm",or just delete the file "/users/pg746dba/dbclusters/pg746/postgresql-7.4.6/data/postmaster.pid". pg_ctl: cannot start postmaster Examine the log output. $ ipcs ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 32768 ed 777 393216 2 dest 0x00000000 131073 root 644 110592 4 dest 0x00000000 3538946 ed 777 393216 2 dest 0x00000000 3670019 ed 777 393216 2 dest 0x00000000 4685828 ed 777 393216 2 dest 0x00000000 4816901 ed 777 393216 2 dest 0x00000000 4915206 ed 777 393216 2 dest 0x00000000 4980743 ed 777 393216 2 dest 0x00000000 5046280 ed 777 393216 2 dest 0x00000000 5111817 ed 777 393216 2 dest 0x00000000 5537802 root 644 110592 3 dest 0x00000000 6651915 ed 777 393216 2 dest 0x00000000 19595276 ed 666 14400 1 dest 0x00000000 11272205 root 644 110592 2 dest ------ Semaphore Arrays -------- key semid owner perms nsems ------ Message Queues -------- key msqid owner perms used-bytes messages
"Ed L." <pgsql@bluepolka.net> writes: > A power failure led to failed postmaster restart using 7.4.6 (see output > below). The short-term fix is usually to delete the pid file and restart. > I often wonder why ipcs never seems to show the shared memory > block in question? The shared memory block would certainly not still exist after a system reboot, so what we have here is a misleading error message. Looking at the code, the most plausible explanation appears to be that shmctl(IPC_STAT) is failing (which it ought to) and returning some errno code different from EINVAL (which is the case we are expecting to see). What platform are you on, and what does its shmctl(2) man page document as error conditions? regards, tom lane
On Monday November 8 2004 6:16, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > A power failure led to failed postmaster restart using 7.4.6 (see > > output below). The short-term fix is usually to delete the pid file > > and restart. > > > > I often wonder why ipcs never seems to show the shared memory > > block in question? > > The shared memory block would certainly not still exist after a system > reboot, so what we have here is a misleading error message. Looking at > the code, the most plausible explanation appears to be that > shmctl(IPC_STAT) is failing (which it ought to) and returning some errno > code different from EINVAL (which is the case we are expecting to see). > What platform are you on, and what does its shmctl(2) man page document > as error conditions? Platform is Linux 2.4.20-30.9 on i686 (Pentium 4, I think). From man 2 schctl: ERRORS On error, errno will be set to one of the following: EACCES is returned if IPC_STAT is requested and shm_perm.modes does not allow read access for shmid. EFAULT The argument cmd has value IPC_SET or IPC_STAT but the address pointed to by buf isn’t accessible. EINVAL is returned if shmid is not a valid identifier, or cmd is not a valid command. EIDRM is returned if shmid points to a removed identifier. EPERM is returned if IPC_SET or IPC_RMID is attempted, and the effective user ID of the calling process is not the creator (as found in shm_perm.cuid), the owner (as found in shm_perm.uid), or the super-user. EOVERFLOW is returned if IPC_STAT is attempted, and the gid or uid value is too large to be stored in the structure pointed to by buf. CONFORMING TO SVr4, SVID. SVr4 documents additional error conditions EINVAL, ENOENT, ENOSPC, ENOMEM, EEXIST. Neither SVr4 nor SVID documents an EIDRM error condition.
On Monday November 8 2004 7:24, Ed L. wrote: > On Monday November 8 2004 6:16, Tom Lane wrote: > > "Ed L." <pgsql@bluepolka.net> writes: > > > A power failure led to failed postmaster restart using 7.4.6 (see > > > output below). The short-term fix is usually to delete the pid file > > > and restart. > > > > > > I often wonder why ipcs never seems to show the shared memory > > > block in question? > > > > The shared memory block would certainly not still exist after a system > > reboot, so what we have here is a misleading error message. Looking at > > the code, the most plausible explanation appears to be that > > shmctl(IPC_STAT) is failing (which it ought to) and returning some > > errno code different from EINVAL (which is the case we are expecting to > > see). What platform are you on, and what does its shmctl(2) man page > > document as error conditions? > > Platform is Linux 2.4.20-30.9 on i686 (Pentium 4, I think). I recently saw this same thing happen from a power failure on several HPUX boxes as well (I think running B.11.00/11.23 with 7.3.4/7.3.7, but not sure). Ed
"Ed L." <pgsql@bluepolka.net> writes: > A power failure led to failed postmaster restart using 7.4.6 (see > output below). The short-term fix is usually to delete the pid file > and restart. Thinking some more about this ... does anyone know the algorithm used in Linux to assign shared memory segment IDs? Your report shows about a dozen shmem segments in use; which would put the probability of an accidental collision at pretty-tiny. But if the kernel's assignment algorithm is nonrandom then it'd be plausible for the Postgres shmem ID from the previous system boot cycle to match one of the shmem IDs already handed out in the current boot cycle. In that case we'd get EACCES from shmctl() which we take to be a trouble indication. (This is probably over-conservatism, but I don't want to relax it without knowing for sure that we need to.) BTW, do you know what all those shmem segments are for? My Linux box shows only one segment in use besides the ones Postgres is using. regards, tom lane
On Monday November 8 2004 8:41, Tom Lane wrote: > > BTW, do you know what all those shmem segments are for? My Linux box > shows only one segment in use besides the ones Postgres is using. Looks like Ximian Evolution apps, X, Mozilla, Wombat, etc ... Ed
On Mon, 2004-11-08 at 17:47 -0700, Ed L. wrote: > I often wonder why ipcs never seems to show the shared memory > block in question? The permissions of the shared memory block and the semaphore arrays are 600. ipcs seems not to report objects which you cannot access. Run ipcs as root and you should see the PostgreQSL shared memory segment and semaphores. -- Oliver Elphick olly@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA ======================================== "O death, where is thy sting? O grave, where is thy victory?" 1 Corinthians 15:55
On Tuesday November 9 2004 2:16, Oliver Elphick wrote: > On Mon, 2004-11-08 at 17:47 -0700, Ed L. wrote: > > I often wonder why ipcs never seems to show the shared memory > > block in question? > > The permissions of the shared memory block and the semaphore arrays are > 600. ipcs seems not to report objects which you cannot access. Run > ipcs as root and you should see the PostgreQSL shared memory segment and > semaphores. I don't see them when running ipcs as root, either. Not sure that would make sense given the shared memory is created as the same user running ipcs... Ed
On Tue, 2004-11-09 at 07:00 -0700, Ed L. wrote: > On Tuesday November 9 2004 2:16, Oliver Elphick wrote: > > On Mon, 2004-11-08 at 17:47 -0700, Ed L. wrote: > > > I often wonder why ipcs never seems to show the shared memory > > > block in question? > > > > The permissions of the shared memory block and the semaphore arrays are > > 600. ipcs seems not to report objects which you cannot access. Run > > ipcs as root and you should see the PostgreQSL shared memory segment and > > semaphores. > > I don't see them when running ipcs as root, either. Not sure that would > make sense given the shared memory is created as the same user running > ipcs... If neither root nor their creator can see them, I assume they don't exist. Certainly, with Linux 2.6 and util-linux 2.12, ipcs sees the postgres objects whether it is run by root or by the postgres user. -- Oliver Elphick olly@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA ======================================== "O death, where is thy sting? O grave, where is thy victory?" 1 Corinthians 15:55
Tom Lane <tgl@sss.pgh.pa.us> writes: > "Ed L." <pgsql@bluepolka.net> writes: > > A power failure led to failed postmaster restart using 7.4.6 (see > > output below). The short-term fix is usually to delete the pid file > > and restart. > > Thinking some more about this ... does anyone know the algorithm used > in Linux to assign shared memory segment IDs? At least in 2.6 it seems to avoid reuse of ids by keeping a global counter that is incremented every time a segment is created which ranges from 0..128k that it multiplies by 32k and adds to the array index (which is reused quickly). So it doesn't seem plausible that there was an id collision unless this was different in 2.4.20. However looking at his list of ids they're all separated by multiples of 32769 which is what you would expect from this algorithm at least until they start being reused. -- greg
Greg Stark <gsstark@MIT.EDU> writes: > At least in 2.6 it seems to avoid reuse of ids by keeping a global counter > that is incremented every time a segment is created which ranges from 0..128k > that it multiplies by 32k and adds to the array index (which is reused > quickly). > > So it doesn't seem plausible that there was an id collision unless this was > different in 2.4.20. However looking at his list of ids they're all separated > by multiples of 32769 which is what you would expect from this algorithm at > least until they start being reused. Oh I missed the fact that you were talking about after a reboot. So the algorithm I described would produce exactly the same sequence of ids after any reboot given the same sequence of creation and deletions. Even if there's a different sequence as long as the n'th creation is for the m'th array slot it would get the same id. So collisions would be very common. (though it seems the sequence is shared across all the ipc objects.) -- greg
Greg Stark <gsstark@MIT.EDU> writes: > Oh I missed the fact that you were talking about after a reboot. So the > algorithm I described would produce exactly the same sequence of ids after any > reboot given the same sequence of creation and deletions. Even if there's a > different sequence as long as the n'th creation is for the m'th array slot it > would get the same id. So collisions would be very common. This seems to square with Ed's complaint that he frequently sees a collision after a reboot. I've just committed some code that makes a more extensive check as to whether a pre-existing segment actually has any relevance to our data directory; should fix the problem. regards, tom lane
On Tuesday November 9 2004 1:37, Tom Lane wrote: > >> The shared memory block would certainly not still exist after a system > >> reboot, so what we have here is a misleading error message. Looking > >> at the code, the most plausible explanation appears to be that > >> shmctl(IPC_STAT) is failing (which it ought to) and returning some > >> errno code different from EINVAL (which is the case we are expecting > >> to see). > > I believe the attached patch will fix this problem for you, at least on > the assumption that you are starting only one postmaster at system boot. Just realizing we do start multiple postmasters under same user id when upgrading a cluster (one on old port, one on new). I noticed that ipcs on my linux box has a command-line option to list the pid that created the segment. Not sure if such a library exists in usable form, but looking for segments owned by the downed postmaster's pid would seem to be what is needed. Just a thought... Ed
"Ed L." <pgsql@bluepolka.net> writes: > I noticed that ipcs on my linux box has a command-line option to list the > pid that created the segment. Not sure if such a library exists in usable > form, but looking for segments owned by the downed postmaster's pid would > seem to be what is needed. Just a thought... [ thinks about it... ] Nah, it's still not bulletproof, because in a system reboot situation you can't trust the old PID either. It could easy be that the other guy gets both the PID and the shmem ID that belonged to you last time. I've committed changes for 8.0 that mark a shmem segment with the inode of the associated data directory; that should be a stable enough ID to handle all routine-reboot cases. (If you had to restore your whole filesystem from backup tapes, it might be wrong, but you're going to be doing such recovery manually anyway ...) regards, tom lane
On Tuesday November 9 2004 4:35, Tom Lane wrote: > "Ed L." <pgsql@bluepolka.net> writes: > > I noticed that ipcs on my linux box has a command-line option to list > > the pid that created the segment. Not sure if such a library exists in > > usable form, but looking for segments owned by the downed postmaster's > > pid would seem to be what is needed. Just a thought... > > [ thinks about it... ] Nah, it's still not bulletproof, because in a > system reboot situation you can't trust the old PID either. It could > easy be that the other guy gets both the PID and the shmem ID that > belonged to you last time. I see. Ipcs on my box also lists the date/time of shared memory segment attach/detach/change (ipcs -t), but ... > I've committed changes for 8.0 that mark a shmem segment with the inode > of the associated data directory; that should be a stable enough ID to > handle all routine-reboot cases. (If you had to restore your whole > filesystem from backup tapes, it might be wrong, but you're going to be > doing such recovery manually anyway ...) ...that will remove a major hassle for us and lots of other. Thanks. Ed