Thread: Re: shmem_seq may be a bad idea

Re: shmem_seq may be a bad idea

From
Peter Eisentraut
Date:
Tom Lane writes:

> When a postmaster initially starts up, it uses a key value of
> PortNumber * 1000. However, if it is forced to do a system-wide
> restart because of a backend crash, it generates a new key value
> different from the old one, namely PortNumber * 1000 + shmem_seq *
> 100, so that the old shared-memory segments are discarded and a new
> set is created.

Why not use IPC_EXCL to ensure you're getting a freshly baked shmem
segment rather than a recycled one?

> The intent of this logic is evidently to ensure that the old, failed
> backends can't somehow corrupt the new ones.

But what if someone runs another postmaster at port 5433, will it
eventually interfere? Or some totally different program? Trying to
generate distinct number to use for keys is only one part of the equation,
but you still have to check whether the number was distinct enough.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: shmem_seq may be a bad idea

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Why not use IPC_EXCL to ensure you're getting a freshly baked shmem
> segment rather than a recycled one?

Hmm, that might work --- we'd need to add some logic to try to delete
a conflicting segment and/or move on to another key if we can't.
But that seems like it'd resolve both the wrong-size issue and the
conflict-with-another-program issue.  I like it.

>> The intent of this logic is evidently to ensure that the old, failed
>> backends can't somehow corrupt the new ones.

> But what if someone runs another postmaster at port 5433, will it
> eventually interfere?

No; I neglected to mention that shmem_seq wraps around at 10.  So
there's no possible conflict between postmasters at different port#s
in the current scheme.

> Or some totally different program?

That, on the other hand, is a very real risk.  Trying a new key if we
fail to get a new shmem seg (or semaphore set) seems like a good
recovery method.

We'd need to rejigger the meaning of shmem_seq a little bit --- it'd
no longer be a global variable, but rather a local count of the number
of tries to create a particular seg or set.  So, logic would be
something like
for (seq = 0; seq < 10; seq++) {    key = port*1000 + seq*100 + (id for particular item);    shmctl(key, IPC_RMID, 0);
// ignore error    shmget(key, size, IPC_CREAT | IPC_EXCL);    if (success)        return key;}// if get here, report
shmgetfailure ...
 

It looks like we can apply the same method for semaphore sets, too.

Sound good?
        regards, tom lane


Re: shmem_seq may be a bad idea

From
Peter Eisentraut
Date:
On Mon, 1 May 2000, Tom Lane wrote:

> > Why not use IPC_EXCL to ensure you're getting a freshly baked shmem
> > segment rather than a recycled one?
> 
> Hmm, that might work --- we'd need to add some logic to try to delete
> a conflicting segment and/or move on to another key if we can't.

How about you just pick a key (preferably using a standard method such as
ftok, but the current is fine as well if you like the traceability of keys
to servers) and if it's already used then increase it by one, try again.
For efficiency you could keep the last key that worked in a global and
start retrying from there. No need to try any fancy sequence number that
wrap after 10 times anyway and thus don't help in general.

A while ago while thinking about a way to make ipcclean better I thunk
that perhaps the postmaster should write the keys of the segments it gets
to a flat-text file. If it somehow crashes and loses track of what it
allocated before it can use that information to clean up. Not sure how
often that would take effect but it's very socially friendly.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: shmem_seq may be a bad idea

From
Thomas Lockhart
Date:
> A while ago while thinking about a way to make ipcclean better I thunk
> that perhaps the postmaster should write the keys of the segments it gets
> to a flat-text file. If it somehow crashes and loses track of what it
> allocated before it can use that information to clean up. Not sure how
> often that would take effect but it's very socially friendly.

Hmm. Could we write this to a separate shared memory segment? Much
more likely to be of fixed length and compatible between versions, and
more likely to exist or not exist with the same behavior as the large
shared memory segment under discussion??
                       - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: shmem_seq may be a bad idea

From
Tom Lane
Date:
Peter Eisentraut <e99re41@DoCS.UU.SE> writes:
>>>> Why not use IPC_EXCL to ensure you're getting a freshly baked shmem
>>>> segment rather than a recycled one?
>> 
>> Hmm, that might work --- we'd need to add some logic to try to delete
>> a conflicting segment and/or move on to another key if we can't.

> How about you just pick a key (preferably using a standard method such as
> ftok, but the current is fine as well if you like the traceability of keys
> to servers) and if it's already used then increase it by one, try again.

The thing I don't like about using ftok() is that it opens you up to
cross-postmaster conflicts, if there's more than one postmaster running
on the same machine.  Since ftok is so weakly specified, we have no way
to be sure that two different postmasters wouldn't generate the same
key.

As for the issue of whether to try to delete or not, we don't have
automatic cleanup of no-longer-used segments unless we try to delete.
That's why it's critical that we not have cross-postmaster conflicts.
The skeleton code I exhibited yesterday depends on the assumption that
any segment/semset that the postmaster has permission to delete is OK
for it to delete.  If there is a possibility of cross-postmaster
collision then that doesn't work anymore.

(Of course, if the postgres account also runs non-postgres applications
that use shmem or semas, you could have trouble anyway ... but that
seems unlikely, and perhaps more to the point it's pretty easy to avoid
*as long as postgres generates predictable keys*.  With ftok the keys
aren't predictable, and you're trusting to luck that you don't have a
collision.)

> A while ago while thinking about a way to make ipcclean better I thunk
> that perhaps the postmaster should write the keys of the segments it gets
> to a flat-text file. If it somehow crashes and loses track of what it
> allocated before it can use that information to clean up.

We could do that, but I'm not sure it's necessary.  With the code I
proposed, the segments would essentially always have the same keys,
and so restarting the postmaster would clean up automatically.  (The
only time the sequence number would get above 0 would be if you had
a collision with an ftok-using app, and presumably on your next try
you'd get the same collision and end up with the same final choice
of sequence number ...)
        regards, tom lane


Re: shmem_seq may be a bad idea

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> A while ago while thinking about a way to make ipcclean better I thunk
>> that perhaps the postmaster should write the keys of the segments it gets
>> to a flat-text file.

> Hmm. Could we write this to a separate shared memory segment? Much
> more likely to be of fixed length and compatible between versions, and
> more likely to exist or not exist with the same behavior as the large
> shared memory segment under discussion??

What happens if you get a key collision with some other application
for that segment?  Seems to me that using shmem to remember where you
put your shmem segments is dangerously circular ;-)

The flat text file is not a bad idea, but I think the logic I suggested
yesterday makes it unnecessary...
        regards, tom lane


Re: shmem_seq may be a bad idea

From
Peter Eisentraut
Date:
Thomas Lockhart writes:

> > A while ago while thinking about a way to make ipcclean better I thunk
> > that perhaps the postmaster should write the keys of the segments it gets
> > to a flat-text file. If it somehow crashes and loses track of what it
> > allocated before it can use that information to clean up. Not sure how
> > often that would take effect but it's very socially friendly.
> 
> Hmm. Could we write this to a separate shared memory segment?

But how would ipcclean get to the key of *that* segment? I was thinking
file because that'd always be in a known location and could also be
accessible to humans to sort things out by hand or debug stuff.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: shmem_seq may be a bad idea

From
Peter Eisentraut
Date:
Tom Lane writes:

> The thing I don't like about using ftok() is that it opens you up to
> cross-postmaster conflicts, if there's more than one postmaster running
> on the same machine.  Since ftok is so weakly specified, we have no way
> to be sure that two different postmasters wouldn't generate the same
> key.

The name ftok=file-to-key implies that the file name is being used to
create some fairly unique keys. All systems I peeked at used the inode and
device number of the file name you give it. Surely OS vendors could alias
ftok() to random(), but I'd like to give them a little credit at least.

But this is not the issue, I don't care what the key making scheme is. It
could in fact be random(). But you *must* use IPC_EXCL to check if the
segment already exists and change your key if so. If you do that then you
*never* have a conflict, no matter what you do. That should be a
no-brainer. Temporary files are handled this way as well (in theory
anyway).

> (Of course, if the postgres account also runs non-postgres applications
> that use shmem or semas, you could have trouble anyway ... but that
> seems unlikely, and perhaps more to the point it's pretty easy to avoid
> *as long as postgres generates predictable keys*.

But you still have to trust other applications to avoid the PostgreSQL key
space, which they probably couldn't care less about. The whole point of
this exercise is to increase fault-tolerance, so you shouldn't assume too
much about a sane environment.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: shmem_seq may be a bad idea

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> But this is not the issue, I don't care what the key making scheme is. It
> could in fact be random(). But you *must* use IPC_EXCL to check if the
> segment already exists and change your key if so.

Oh, I agree that IPC_EXCL would be an improvement.  I'm just dubious
that using ftok() is an improvement.

>> Since ftok is so weakly specified, we have no way
>> to be sure that two different postmasters wouldn't generate the same
>> key.

> The name ftok=file-to-key implies that the file name is being used to
> create some fairly unique keys. All systems I peeked at used the inode and
> device number of the file name you give it.

Yup, that's what I would have expected from the way it's described in
the man page.  My point is you can't put more than 32 bits of stuff in
a 32-bit sack; therefore, ftok cannot guarantee that it delivers a
distinct token for every combination of inode + device + application
key.  I doubt it even tries very hard, more likely just xor's them
together and trusts to luck.  So I'd rather use a predictable mapping
that we *know* will not generate cross-postmaster conflicts between
postmasters running concurrently on different ports.

However, IPC_EXCL would definitely make us more robust against other
sorts of conflicts, so I agree that's a good change to make.  TODO
list entry, please, Bruce?
* Use IPC_EXCL when creating shared memory and semaphores
        regards, tom lane


Re: shmem_seq may be a bad idea

From
Bruce Momjian
Date:
> However, IPC_EXCL would definitely make us more robust against other
> sorts of conflicts, so I agree that's a good change to make.  TODO
> list entry, please, Bruce?
> * Use IPC_EXCL when creating shared memory and semaphores

Done.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026