Thread: Re: shmem_seq may be a bad idea
Tom Lane writes: > When a postmaster initially starts up, it uses a key value of > PortNumber * 1000. However, if it is forced to do a system-wide > restart because of a backend crash, it generates a new key value > different from the old one, namely PortNumber * 1000 + shmem_seq * > 100, so that the old shared-memory segments are discarded and a new > set is created. Why not use IPC_EXCL to ensure you're getting a freshly baked shmem segment rather than a recycled one? > The intent of this logic is evidently to ensure that the old, failed > backends can't somehow corrupt the new ones. But what if someone runs another postmaster at port 5433, will it eventually interfere? Or some totally different program? Trying to generate distinct number to use for keys is only one part of the equation, but you still have to check whether the number was distinct enough. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: > Why not use IPC_EXCL to ensure you're getting a freshly baked shmem > segment rather than a recycled one? Hmm, that might work --- we'd need to add some logic to try to delete a conflicting segment and/or move on to another key if we can't. But that seems like it'd resolve both the wrong-size issue and the conflict-with-another-program issue. I like it. >> The intent of this logic is evidently to ensure that the old, failed >> backends can't somehow corrupt the new ones. > But what if someone runs another postmaster at port 5433, will it > eventually interfere? No; I neglected to mention that shmem_seq wraps around at 10. So there's no possible conflict between postmasters at different port#s in the current scheme. > Or some totally different program? That, on the other hand, is a very real risk. Trying a new key if we fail to get a new shmem seg (or semaphore set) seems like a good recovery method. We'd need to rejigger the meaning of shmem_seq a little bit --- it'd no longer be a global variable, but rather a local count of the number of tries to create a particular seg or set. So, logic would be something like for (seq = 0; seq < 10; seq++) { key = port*1000 + seq*100 + (id for particular item); shmctl(key, IPC_RMID, 0); // ignore error shmget(key, size, IPC_CREAT | IPC_EXCL); if (success) return key;}// if get here, report shmgetfailure ... It looks like we can apply the same method for semaphore sets, too. Sound good? regards, tom lane
On Mon, 1 May 2000, Tom Lane wrote: > > Why not use IPC_EXCL to ensure you're getting a freshly baked shmem > > segment rather than a recycled one? > > Hmm, that might work --- we'd need to add some logic to try to delete > a conflicting segment and/or move on to another key if we can't. How about you just pick a key (preferably using a standard method such as ftok, but the current is fine as well if you like the traceability of keys to servers) and if it's already used then increase it by one, try again. For efficiency you could keep the last key that worked in a global and start retrying from there. No need to try any fancy sequence number that wrap after 10 times anyway and thus don't help in general. A while ago while thinking about a way to make ipcclean better I thunk that perhaps the postmaster should write the keys of the segments it gets to a flat-text file. If it somehow crashes and loses track of what it allocated before it can use that information to clean up. Not sure how often that would take effect but it's very socially friendly. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
> A while ago while thinking about a way to make ipcclean better I thunk > that perhaps the postmaster should write the keys of the segments it gets > to a flat-text file. If it somehow crashes and loses track of what it > allocated before it can use that information to clean up. Not sure how > often that would take effect but it's very socially friendly. Hmm. Could we write this to a separate shared memory segment? Much more likely to be of fixed length and compatible between versions, and more likely to exist or not exist with the same behavior as the large shared memory segment under discussion?? - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Peter Eisentraut <e99re41@DoCS.UU.SE> writes: >>>> Why not use IPC_EXCL to ensure you're getting a freshly baked shmem >>>> segment rather than a recycled one? >> >> Hmm, that might work --- we'd need to add some logic to try to delete >> a conflicting segment and/or move on to another key if we can't. > How about you just pick a key (preferably using a standard method such as > ftok, but the current is fine as well if you like the traceability of keys > to servers) and if it's already used then increase it by one, try again. The thing I don't like about using ftok() is that it opens you up to cross-postmaster conflicts, if there's more than one postmaster running on the same machine. Since ftok is so weakly specified, we have no way to be sure that two different postmasters wouldn't generate the same key. As for the issue of whether to try to delete or not, we don't have automatic cleanup of no-longer-used segments unless we try to delete. That's why it's critical that we not have cross-postmaster conflicts. The skeleton code I exhibited yesterday depends on the assumption that any segment/semset that the postmaster has permission to delete is OK for it to delete. If there is a possibility of cross-postmaster collision then that doesn't work anymore. (Of course, if the postgres account also runs non-postgres applications that use shmem or semas, you could have trouble anyway ... but that seems unlikely, and perhaps more to the point it's pretty easy to avoid *as long as postgres generates predictable keys*. With ftok the keys aren't predictable, and you're trusting to luck that you don't have a collision.) > A while ago while thinking about a way to make ipcclean better I thunk > that perhaps the postmaster should write the keys of the segments it gets > to a flat-text file. If it somehow crashes and loses track of what it > allocated before it can use that information to clean up. We could do that, but I'm not sure it's necessary. With the code I proposed, the segments would essentially always have the same keys, and so restarting the postmaster would clean up automatically. (The only time the sequence number would get above 0 would be if you had a collision with an ftok-using app, and presumably on your next try you'd get the same collision and end up with the same final choice of sequence number ...) regards, tom lane
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> A while ago while thinking about a way to make ipcclean better I thunk >> that perhaps the postmaster should write the keys of the segments it gets >> to a flat-text file. > Hmm. Could we write this to a separate shared memory segment? Much > more likely to be of fixed length and compatible between versions, and > more likely to exist or not exist with the same behavior as the large > shared memory segment under discussion?? What happens if you get a key collision with some other application for that segment? Seems to me that using shmem to remember where you put your shmem segments is dangerously circular ;-) The flat text file is not a bad idea, but I think the logic I suggested yesterday makes it unnecessary... regards, tom lane
Thomas Lockhart writes: > > A while ago while thinking about a way to make ipcclean better I thunk > > that perhaps the postmaster should write the keys of the segments it gets > > to a flat-text file. If it somehow crashes and loses track of what it > > allocated before it can use that information to clean up. Not sure how > > often that would take effect but it's very socially friendly. > > Hmm. Could we write this to a separate shared memory segment? But how would ipcclean get to the key of *that* segment? I was thinking file because that'd always be in a known location and could also be accessible to humans to sort things out by hand or debug stuff. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Tom Lane writes: > The thing I don't like about using ftok() is that it opens you up to > cross-postmaster conflicts, if there's more than one postmaster running > on the same machine. Since ftok is so weakly specified, we have no way > to be sure that two different postmasters wouldn't generate the same > key. The name ftok=file-to-key implies that the file name is being used to create some fairly unique keys. All systems I peeked at used the inode and device number of the file name you give it. Surely OS vendors could alias ftok() to random(), but I'd like to give them a little credit at least. But this is not the issue, I don't care what the key making scheme is. It could in fact be random(). But you *must* use IPC_EXCL to check if the segment already exists and change your key if so. If you do that then you *never* have a conflict, no matter what you do. That should be a no-brainer. Temporary files are handled this way as well (in theory anyway). > (Of course, if the postgres account also runs non-postgres applications > that use shmem or semas, you could have trouble anyway ... but that > seems unlikely, and perhaps more to the point it's pretty easy to avoid > *as long as postgres generates predictable keys*. But you still have to trust other applications to avoid the PostgreSQL key space, which they probably couldn't care less about. The whole point of this exercise is to increase fault-tolerance, so you shouldn't assume too much about a sane environment. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: > But this is not the issue, I don't care what the key making scheme is. It > could in fact be random(). But you *must* use IPC_EXCL to check if the > segment already exists and change your key if so. Oh, I agree that IPC_EXCL would be an improvement. I'm just dubious that using ftok() is an improvement. >> Since ftok is so weakly specified, we have no way >> to be sure that two different postmasters wouldn't generate the same >> key. > The name ftok=file-to-key implies that the file name is being used to > create some fairly unique keys. All systems I peeked at used the inode and > device number of the file name you give it. Yup, that's what I would have expected from the way it's described in the man page. My point is you can't put more than 32 bits of stuff in a 32-bit sack; therefore, ftok cannot guarantee that it delivers a distinct token for every combination of inode + device + application key. I doubt it even tries very hard, more likely just xor's them together and trusts to luck. So I'd rather use a predictable mapping that we *know* will not generate cross-postmaster conflicts between postmasters running concurrently on different ports. However, IPC_EXCL would definitely make us more robust against other sorts of conflicts, so I agree that's a good change to make. TODO list entry, please, Bruce? * Use IPC_EXCL when creating shared memory and semaphores regards, tom lane
> However, IPC_EXCL would definitely make us more robust against other > sorts of conflicts, so I agree that's a good change to make. TODO > list entry, please, Bruce? > * Use IPC_EXCL when creating shared memory and semaphores Done. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026