Re: cheaper snapshots redux - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: cheaper snapshots redux |
Date | |
Msg-id | CA+TgmoZHt4iGyVp3vxOTOi8ev4J_faWZNQ-3OXgcabed7uFaDA@mail.gmail.com Whole thread Raw |
In response to | Re: cheaper snapshots redux (Jim Nasby <jim@nasby.net>) |
Responses |
Re: cheaper snapshots redux
Re: cheaper snapshots redux |
List | pgsql-hackers |
On Mon, Aug 22, 2011 at 6:45 PM, Jim Nasby <jim@nasby.net> wrote: > Something that would be really nice to fix is our reliance on a fixed size of shared memory, and I'm wondering if thiscould be an opportunity to start in a new direction. My thought is that we could maintain two distinct shared memorysnapshots and alternate between them. That would allow us to actually resize them as needed. We would still need somethinglike what you suggest to allow for adding to the list without locking, but with this scheme we wouldn't need toworry about extra locking when taking a snapshot since we'd be doing that in a new segment that no one is using yet. > > The downside is such a scheme does add non-trivial complexity on top of what you proposed. I suspect it would be much betterif we had a separate mechanism for dealing with shared memory requirements (shalloc?). But if it's just not practicalto make a generic shared memory manager it would be good to start thinking about ways we can work around fixed sharedmemory size issues. Well, the system I'm proposing is actually BETTER than having two distinct shared memory snapshots. For example, right now we cache up to 64 subxids per backend. I'm imagining that going away and using that memory for the ring buffer. Out of the box, that would imply a ring buffer of 64 * 103 = 6592 slots. If the average snapshot lists 100 XIDs, you could rewrite the snapshot dozens of times times before the buffer wraps around, which is obviously a lot more than two. Even if subtransactions are being heavily used and each snapshot lists 1000 XIDs, you still have enough space to rewrite the snapshot several times over before wraparound occurs. Of course, at some point the snapshot gets too big and you have to switch to retaining only the toplevel XIDs, which is more or less the equivalent of what happens under the current implementation when any single transaction's subxid cache overflows. With respect to a general-purpose shared memory allocator, I think that there are cases where that would be useful to have, but I don't think there are as many of them as many people seem to think. I wouldn't choose to implement this using a general-purpose allocator even if we had it, both because it's undesirable to allow this or any subsystem to consume an arbitrary amount of memory (nor can it fail... especially in the abort path) and because a ring buffer is almost certainly faster than a general-purpose allocator. We have enough trouble with palloc overhead already. That having been said, I do think there are cases where it would be nice to have... and it wouldn't surprise me if I end up working on something along those lines in the next year or so. It turns out that memory management is a major issue in lock-free programming; you can't assume that it's safe to recycle an object once the last pointer to it has been removed from shared memory - because someone may have fetched the pointer just before you removed it and still be using it to examine the object. An allocator with some built-in capabilities for handling such problems seems like it might be very useful.... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: