Re: Notes on lock table spilling - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: Notes on lock table spilling |
Date | |
Msg-id | 20050404202759.GB30022@dcc.uchile.cl Whole thread Raw |
In response to | Re: Notes on lock table spilling (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Notes on lock table spilling
Re: Notes on lock table spilling |
List | pgsql-hackers |
On Mon, Apr 04, 2005 at 07:08:11PM +0100, Simon Riggs wrote: > IIRC there is not another major system that spills locks to disk and > there's a big reason: performance is very poor. Other systems accept > some limitations in order to avoid that. Oracle takes locks and holds > them within the block itself, which then need to have block cleanup > performed on them next time they're read. DB2 has a lock table which > doesn't grow in size or spill to disk, ever. We can have the in-memory hash table grow to be "enough for most applications"; say, have it store 10000 locks without resorting to spilling. As a secondary storage there would be the in-memory buffers for the spill area; comfortably another 5000 locks. Because there is no need to have the lock table persist across system crashes (except in case we implement 2PC), there's no need to fsync() or XLog this information. So you would not touch disk until you have acquired 15000 locks or so. This doesn't really address your concern, because for spilling logic to work we may need to evict any transaction's lock, not just our own. Thus we can't guarantee that only the heavy locker's locks are spilled (spilt?) Now, I talked to Bruce via IM the other day and he doesn't like the idea of using the regular lock manager to lock tuples. It's too heavy, he says. So he proposed using "phantom Xids" instead. (Some of you may remember the phantom CommandIds back in the subtransactions day; these are quite different, so they have none of the problems the others had.) This idea is explained below. In that chat, we also commented on a third possible solution: using a lock manager process, instead of using shared memory for locks. I'm not sure if this is really workable; there needs to be some means of communication beyond simple signals for this to be effective, I think. Bruce didn't seems very fond of this idea; at least he seemed way more attracted to the phantoms. So, we now have four ideas for solving the shared-row locking problem: A. Using the lock manager to lock tuples. This requires some sort of spill-to-disk logic. B. Not using the lock manager to lock tuples. This can be done by using Bruce's Phantom Xids idea. C. Using a hybrid approach. This can be done using the unused space in heap pages, as proposed by Paul Tillotson, and fallingback to the lock manager. Thus we still need some spilling logic. I'm still not clear how would this work giventhe current page layout and XLog requirements. D. Using a lock manager process. With a separate process, we don't need any of this, because it can use the amount of memoryit sees fit. We may have some communicational overhead which may make the idea unfeasible. But of course, we wouldretain a shared memory area for "caching" (so we wouldn't need to touch the process until we are holding lots of locks.)Does anyone thinks this is an idea worth pursuing? Or rather, does anyone see huge showstoppers here? Using Phantom Xids ================== The idea here is to use an approach similar to what we use now: mark the tuples with an Xid when it is locked. A phantom Xid is a sort-of Xid, with multiple real Xids associated to it. So we mark the tuple with the regular Xid the first time the share lock is acquired; if a second transaction wants to lock the tuple, it creates a new phantom Xid which "contains" the original Xid in the tuple and its own Xid, insert it into the phantom Xid table, and mark the tuple with that as Xmax. Thus, the operations we need to implement this are: 1. Create new phantom Xid with two real Xids 2. Find the phantom Xid which has some combination of real Xids 3. Add a real Xid to a phantom Xid 4. Cleanup the whole thing Bruce proposes using two hash tables for this: one with the phantom Xid as the key, and other with XOR(real xids) as key. Thus we could implement 1, 2 and 3 easily. Cleanup would happen when somebody else visits the phantom Xid and finds it populated with non-running transactions (it can clean the Xmax marking as well, if the phantom Xid ends up empty), and by VACUUM so that we don't have problems with Xid wraparound. Comments? -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "La vida es para el que se aventura"
pgsql-hackers by date: