Thread: Do we need a ShmList implementation?
On the Serializable Snapshot Isolation thread, Heikki pointed out a collection of objects in an HTAB which didn't really need its key on VirtualTransactionId, but there isn't really any other useful key, either. One of these objects may live and die, seeing use from multiple processes, without ever getting a TransactionId assigned; and it needs to be in a collection in shared memory the whole time. This suggests to me that some sort of list would be better. SHM_QUEUE objects provide the infrastructure for maintaining a shared memory linked list, but they don't do anything about the allocation and release of the space for the objects. So it occurs to me that I'm using an HTAB for this collection because it provides the infrastructure for managing the memory for the collection, rather than because I need hash lookup. :-( It works, but that hardly seems optimal. Have I missed something we already have which could meet that need? If not, how would people feel about a ShmList implementation? A quick first draft for the API (which can almost certainly be improved, so don't be shy), is: ShmList ShmInitList(const char *name, Size entrySize, int initalEntryAlloc, int maxExtensions); Size ShmListEstimateSize(ShmList list); void *CreateShmListEntry(ShmList list); void ReleaseShmListEntry(ShmList list, void *entry); int ShmListSize(ShmList list); void *ShmListFirst(ShmList list); void *ShmListNext(ShmList list, void *entry); I see this as grabbing the initial allocation, filling it with zeros, and then creating a linked list of available entries. Internally the entries would be a SHM_QUEUE structure followed by space for the entrySize passed on init. A "create entry" call would remove an entry from the available list, link it into the collection, and return a pointer to the structure. Releasing an entry would remove it from the collection list, zero it, and link it to the available list. Hopefully the rest is fairly self-evident -- if not, let me know. Thoughts? -Kevin
On 20/09/10 18:12, Kevin Grittner wrote: > On the Serializable Snapshot Isolation thread, Heikki pointed out a > collection of objects in an HTAB which didn't really need its key on > VirtualTransactionId, but there isn't really any other useful key, > either. One of these objects may live and die, seeing use from > multiple processes, without ever getting a TransactionId assigned; > and it needs to be in a collection in shared memory the whole time. > This suggests to me that some sort of list would be better. In the SSI patch, you'd also need a way to insert an existing struct into a hash table. You currently work around that by using a hash element that contains only the hash key, and a pointer to the SERIALIZABLEXACT struct. It isn't too bad I guess, but I find it a bit confusing. > SHM_QUEUE objects provide the infrastructure for maintaining a > shared memory linked list, but they don't do anything about the > allocation and release of the space for the objects. So it occurs > to me that I'm using an HTAB for this collection because it provides > the infrastructure for managing the memory for the collection, > rather than because I need hash lookup. :-( It works, but that > hardly seems optimal. > Have I missed something we already have which could meet that need? Well, we generally try to avoid dynamic structures in shared memory, because shared memory can't be resized. So, you'd typically use an array with a fixed number of elements. One could even argue that we specifically *don't* want to have the kind of infrastructure you propose, to discourage people from writing patches that need dynamic shmem structures. Any chance of collapsing together entries of already-committed transactions in the SSI patch, to put an upper limit on the number of shmem list entries needed? If you can do that, then a simple array allocated at postmaster startup will do fine. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Kevin, On 09/20/2010 05:12 PM, Kevin Grittner wrote: > SHM_QUEUE objects provide the infrastructure for maintaining a > shared memory linked list, but they don't do anything about the > allocation and release of the space for the objects. Did you have a look at my dynshmem stuff? It tries to solve the problem of dynamic allocation from shared memory. Not just for lists, but very generally. Regards Markus Wanner
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > In the SSI patch, you'd also need a way to insert an existing > struct into a hash table. You currently work around that by using > a hash element that contains only the hash key, and a pointer to > the SERIALIZABLEXACT struct. It isn't too bad I guess, but I find > it a bit confusing. Hmmm... Mucking with the hash table implementation to accommodate that seems like it's a lot of work and risk for pretty minimal benefit. Are you sure it's worth it? Perhaps better commenting around the SERIALIZABLEXID structure to indicate it's effectively a used for a non-primary key index into the other collection? > Well, we generally try to avoid dynamic structures in shared > memory, because shared memory can't be resized. But don't HTAB structures go beyond their estimated sizes as needed? I was trying to accommodate the situation where one collection might not be anywhere near its limit, but some other collection has edged past. Unless I'm misunderstanding things (which is always possible), the current HTAB implementation takes advantage of the "slush fund" of unused space to some degree. I was just trying to maintain the same flexibility with the list. I was thinking of returning a size based on the *maximum* allowed allocations from the estimated size function, and actually limiting it to that size. So it wasn't so much a matter of grabbing more than expected, but leaving something for the hash table slush if possible. Of course I was also thinking that this would allow one to be a little bit more generous with he maximum, as it might have benefit elsewhere... > So, you'd typically use an array with a fixed number of elements. That's certainly a little easier, if you think it's better. > Any chance of collapsing together entries of already-committed > transactions in the SSI patch, to put an upper limit on the number > of shmem list entries needed? If you can do that, then a simple > array allocated at postmaster startup will do fine. I suspect it can be done, but I'm quite sure that any such scheme would increase the rate of serialization failures. Right now I'm trying to see how much I can do to *decrease* the rate of serialization failures, so I'm not eager to go there. :-/ If it is necessary, the most obvious way to manage this is just to force cancellation of the oldest running serializable transaction and running ClearOldPredicateLocks(), perhaps iterating, until we free an entry to service the new request. -Kevin
Markus Wanner <markus@bluegap.ch> wrote: > On 09/20/2010 05:12 PM, Kevin Grittner wrote: >> SHM_QUEUE objects provide the infrastructure for maintaining a >> shared memory linked list, but they don't do anything about the >> allocation and release of the space for the objects. > > Did you have a look at my dynshmem stuff? It tries to solve the > problem of dynamic allocation from shared memory. Not just for > lists, but very generally. Yeah, I mostly followed that thread. If such a feature was present, it might well make sense to use it for this; however, I've got enough trouble selling the SSI technology without making it dependent on something else which was clearly quite controversial, and which seemed to have some technical hurdles of its own left to clear. :-/ At the point where there is an implementation which is accepted by the community, I'll certainly take another look. -Kevin
On Mon, 2010-09-20 at 18:37 +0300, Heikki Linnakangas wrote: > > SHM_QUEUE objects provide the infrastructure for maintaining a > > shared memory linked list, but they don't do anything about the > > allocation and release of the space for the objects. So it occurs > > to me that I'm using an HTAB for this collection because it provides > > the infrastructure for managing the memory for the collection, > > rather than because I need hash lookup. :-( It works, but that > > hardly seems optimal. > > > Have I missed something we already have which could meet that need? > > Well, we generally try to avoid dynamic structures in shared memory, > because shared memory can't be resized. So, you'd typically use an array > with a fixed number of elements. One could even argue that we > specifically *don't* want to have the kind of infrastructure you > propose, to discourage people from writing patches that need dynamic > shmem structures. My understanding is that we used to have that and it was removed for the reasons Heikki states. There are still vestigial bits still in code. Not exactly impressed with the SHM_QUEUE stuff though, so I appreciate the sentiment that Kevin expresses. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> wrote: > My understanding is that we used to have that and it was removed > for the reasons Heikki states. There are still vestigial bits > still in code. > > Not exactly impressed with the SHM_QUEUE stuff though, so I > appreciate the sentiment that Kevin expresses. So, if I just allocated a fixed memory space to provide an API similar to my previous post, does that sound reasonable to you? For the record, my intention would be to hide the SHM_QUEUE structures in this API -- an entry would be just the structure you're interested in working with. If practical, I would prefer for ShmList to be a pointer to an opaque structure; users of this shouldn't really be exposed to or depend upon the implementation. -Kevin
On 20/09/10 19:04, Kevin Grittner wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: > >> In the SSI patch, you'd also need a way to insert an existing >> struct into a hash table. You currently work around that by using >> a hash element that contains only the hash key, and a pointer to >> the SERIALIZABLEXACT struct. It isn't too bad I guess, but I find >> it a bit confusing. > > Hmmm... Mucking with the hash table implementation to accommodate > that seems like it's a lot of work and risk for pretty minimal > benefit. Are you sure it's worth it? No, I'm not sure at all. >> Well, we generally try to avoid dynamic structures in shared >> memory, because shared memory can't be resized. > > But don't HTAB structures go beyond their estimated sizes as needed? Yes, but not in a very smart way. The memory allocated for hash table elements are never free'd. So if you use up all the "slush fund" shared memory for SIREAD locks, it can't be used for anything else anymore, even if the SIREAD locks are later released. >> Any chance of collapsing together entries of already-committed >> transactions in the SSI patch, to put an upper limit on the number >> of shmem list entries needed? If you can do that, then a simple >> array allocated at postmaster startup will do fine. > > I suspect it can be done, but I'm quite sure that any such scheme > would increase the rate of serialization failures. Right now I'm > trying to see how much I can do to *decrease* the rate of > serialization failures, so I'm not eager to go there. :-/ I see. It's worth spending some mental power on, an upper limit would make life a lot easier. It doesn't matter much if it's 2*max_connections or 100*max_connections, as long as it's finite. > If it is > necessary, the most obvious way to manage this is just to force > cancellation of the oldest running serializable transaction and > running ClearOldPredicateLocks(), perhaps iterating, until we free > an entry to service the new request. Hmm, that's not very appealing either. But perhaps it's still better than not letting any new transactions to begin. We could say "snapshot too old" in the error message :-). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Simon Riggs <simon@2ndQuadrant.com> wrote: >> My understanding is that we used to have that and it was removed >> for the reasons Heikki states. There are still vestigial bits >> still in code. There's nothing vestigial about SHM_QUEUE --- it's used by the lock manager. But it's intended to link together structs whose existence is managed by somebody else. >> Not exactly impressed with the SHM_QUEUE stuff though, so I >> appreciate the sentiment that Kevin expresses. > So, if I just allocated a fixed memory space to provide an API > similar to my previous post, does that sound reasonable to you? I'm not excited about inventing an API with just one use-case; it's unlikely that you actually end up with anything generally useful. (SHM_QUEUE seems like a case in point...) Especially when there are so many other constraints on what shared memory is usable for. You might as well just do this internally to the SERIALIZABLEXACT management code. regards, tom lane
On Mon, 2010-09-20 at 12:35 -0400, Tom Lane wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > > Simon Riggs <simon@2ndQuadrant.com> wrote: > >> My understanding is that we used to have that and it was removed > >> for the reasons Heikki states. There are still vestigial bits > >> still in code. > > There's nothing vestigial about SHM_QUEUE --- it's used by the lock > manager. Yes, I was talking about an implementation that allocated memory as well. There are sections of IFDEF'd out code there... -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
Tom Lane <tgl@sss.pgh.pa.us> wrote: > There's nothing vestigial about SHM_QUEUE --- it's used by the > lock manager. But it's intended to link together structs whose > existence is managed by somebody else. Yep, that's exactly my problem. > I'm not excited about inventing an API with just one use-case; > it's unlikely that you actually end up with anything generally > useful. (SHM_QUEUE seems like a case in point...) Especially > when there are so many other constraints on what shared memory is > usable for. You might as well just do this internally to the > SERIALIZABLEXACT management code. Fair enough. I'll probably abstract it within the SSI patch anyway, just because it will keep the other code cleaner where the logic is necessarily kinda messy anyway, and I think it'll reduce the chance of weird memory bugs. I just won't get quite so formal about the interface. -Kevin
On 09/20/2010 06:09 PM, Kevin Grittner wrote: > Yeah, I mostly followed that thread. If such a feature was present, > it might well make sense to use it for this; however, I've got > enough trouble selling the SSI technology without making it > dependent on something else which was clearly quite controversial, > and which seemed to have some technical hurdles of its own left to > clear. :-/ Okay, well understandable. I'm wondering how you want to implement the memory allocation part, though. > At the point where there is an implementation which is accepted by > the community, I'll certainly take another look. Fair enough, thanks. Regards Markus Wanner
Markus Wanner <markus@bluegap.ch> wrote: > I'm wondering how you want to implement the memory allocation part Based on the feedback I've received, it appears that the only sane way to do that in the current shared memory environment is to allocate a fixed size of memory to hold these entries on postmaster startup. To minimize the chance that we'll be forced to cancel running transactions to deal with the limit, it will need to be sized to some multiple of max_connections. Obviously, if there were a dynamic way to add to the entries as needed, there would be one less setting (hard-coded or GUC) to worry about getting right. Too low means transactions need to be canceled. Too high means you're wasting space which could otherwise go to caching. And of course, the optimal number could change from day to day or hour to hour. -Kevin
On 09/20/2010 08:06 PM, Kevin Grittner wrote: > Obviously, if there were a dynamic way to add to the entries as > needed, there would be one less setting (hard-coded or GUC) to worry > about getting right. Too low means transactions need to be > canceled. Too high means you're wasting space which could otherwise > go to caching. And of course, the optimal number could change from > day to day or hour to hour. Yeah, same problem as with lots of the other users shared memory. It certainly makes sense to decouple the two projects, so you'll have to pick some number that sounds good to you now. Regards Markus
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: >> I'm not excited about inventing an API with just one use-case; >> it's unlikely that you actually end up with anything generally >> useful. (SHM_QUEUE seems like a case in point...) Especially >> when there are so many other constraints on what shared memory is >> usable for. You might as well just do this internally to the >> SERIALIZABLEXACT management code. > > Fair enough. I'll probably abstract it within the SSI patch > anyway, just because it will keep the other code cleaner where the > logic is necessarily kinda messy anyway, and I think it'll reduce > the chance of weird memory bugs. I just won't get quite so formal > about the interface. OK, I'd say it's a little rough yet, but it works. Is this reasonable?: http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=b8eca245ab63725d0fbfc3b5969f4a17fc765f2c In particular, I'm a little squeamish about how I allocated the shared memory for the list, but I couldn't think of anything that seemed better. -Kevin
Hi, On 09/21/2010 05:48 PM, Kevin Grittner wrote: > OK, I'd say it's a little rough yet, but it works. Is this > reasonable?: > > http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=b8eca245ab63725d0fbfc3b5969f4a17fc765f2c I only get a: "404 - Unknown commit object" on that link. Did you push your work? Regards Markus Wanner
Markus Wanner <markus@bluegap.ch> wrote: > I only get a: "404 - Unknown commit object" on that link. Did you > push your work? Yeah, but it has since been blown away (at my request) as part of my attempt to get it based on the new git conversion. Sorry about that. Attached is an mbox representation of what was there. Hopefully I can get it out there again today some time. -Kevin