Thread: augmenting MultiXacts to improve foreign keys
Hi, Previously, we had some discussion to introduce a new type of tuple lock to let foreign keys be lighter weight, allowing for more concurrency. During the course of that discussion, it became evident that the solution being proposed didn't go all the way to solve the problem. A proposal by Noah Misch seems like it would fix the whole problem, but it requires some more rejiggering than I had considered. This email summarizes what I just posted in a blog article here: http://www.commandprompt.com/blogs/alvaro_herrera/2011/08/fixing_foreign_key_deadlocks_part_three/ and adds some implementation details. The proposal there is to create two new tuple lock types, not one. They have been called "KEY UPDATE" and "KEY SHARE". Proposed conflict table: KEY UPDATE FOR UPDATE FOR SHARE KEY SHARE KEY UPDATE X X X X FOR UPDATE X X X FOR SHARE X X KEY SHARE X DELETE always grabs KEY UPDATE lock on a tuple. UPDATE grabs KEY UPDATE if the key is being modified, otherwise FOR UPDATE. SELECT FOR UPDATE grabs FOR UPDATE. SELECT FOR SHARE grabs FOR SHARE. SELECT FOR KEY SHARE grabs FOR KEY SHARE. This is the mode used by RI triggers. Note that this means that UPDATE would always have to check whether key columns are being modified. (I loosely talk about "key columns" here referring to columns involving unique indexes. On tables with no unique indexes, I think this would have to mean all columns, but I haven't thought much about this bit.) Note that the KEY UPDATE lock would be an internal option, not exposed to SQL. I think we already have enough extensions in this area. We are forced to expose KEY SHARE because RI triggers get to it via SPI, but I would be happy to avoid doing it if I knew how. I would also love to get rid of FOR SHARE because I don't think they serve any purpose anymore, but since it has already been exposed to SQL for several releases, I'm not sure that it's a viable thing to do. To implement this, we need to augment MultiXact to store the lock type that each comprising Xid holds on the tuple. Two bits per Xid are needed. My thinking is that we could simply extend the "members" to add a byte for every four members. This should be relatively simple, though this forecloses the option of using MultiXact for some other purpose than locking tuples. This being purely theoretical, I don't have a problem with that. Note that the original keylocks patch I posted a couple of weeks ago has a caveat: if transaction A grabs share lock and transaction B grabs key lock, there's no way to know who owns which. I dismissed that problem as unimportant (and probably infrequent); the good thing about this new idea is that we wouldn't have that problem. This would also help us find a solution to the problem that an aborted subtransaction that updates or excl-locks a tuple causes an earlier shared lock to be forgotten. We would deal with this by marking the Xmax with a new MultiXact that includes both the lock and the update. This means that this MultiXact would have to be WAL-logged. I would leave that for a later patch, but I think it's good that there's a way to fix it. Thoughts? -- Álvaro Herrera <alvherre@alvh.no-ip.org>
On Tue, 2011-08-09 at 13:01 -0400, Alvaro Herrera wrote: > Note that the KEY UPDATE lock would be an internal option, not exposed > to SQL. I think we already have enough extensions in this area. We are > forced to expose KEY SHARE because RI triggers get to it via SPI, but I > would be happy to avoid doing it if I knew how. Right now, FKs aren't really very special, they are mostly just syntactic sugar (right?). This proposal would make FKs special internal mechanisms, and I don't see the benefit in doing so. [ I didn't read through the previous threads yet, so perhaps this was already discussed. ] Regards,Jeff Davis
Excerpts from Jeff Davis's message of mar ago 09 14:41:14 -0400 2011: > On Tue, 2011-08-09 at 13:01 -0400, Alvaro Herrera wrote: > > Note that the KEY UPDATE lock would be an internal option, not exposed > > to SQL. I think we already have enough extensions in this area. We are > > forced to expose KEY SHARE because RI triggers get to it via SPI, but I > > would be happy to avoid doing it if I knew how. > > Right now, FKs aren't really very special, they are mostly just > syntactic sugar (right?). This proposal would make FKs special internal > mechanisms, and I don't see the benefit in doing so. Well, you can get the same behavior by adding the constraint triggers manually. But those triggers are written in C, so you could equally claim that they are "special internal" already. The SPI interface has some special entry points to allow them to work correctly (for example passing a snapshot for the checks to run with). In any case, this is certainly not something I'm really interested in doing. I don't have a problem with simply adding the new syntax to SQL and documenting it appropriately ("this is only for internal RI use"). > [ I didn't read through the previous threads yet, so perhaps this was > already discussed. ] Nope. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Aug9, 2011, at 19:01 , Alvaro Herrera wrote: > This would also help us find a solution to the problem that an aborted > subtransaction that updates or excl-locks a tuple causes an earlier > shared lock to be forgotten. We would deal with this by marking the > Xmax with a new MultiXact that includes both the lock and the update. > This means that this MultiXact would have to be WAL-logged. I would > leave that for a later patch, but I think it's good that there's a way > to fix it. Yeah, I pondered something similar in the past, but never got around to figuring out the details. This would also allow us to modify the behaviour of tuple locks under isolation level REPEATABLE READ to throw a serialization error if the row was locked by an invisible transaction. Doing so would allow us to get rid of the strange re-checking logic that RI triggers must currently use in REPEATABLE READ mode, and would thus make it possible to implement custom RI triggers in PL/pgSQL. I'll try to find some time to check how that fits in with the new per-tuple lock levels you proposed. best regards, Florian Pflug
On Aug9, 2011, at 19:01 , Alvaro Herrera wrote: > Note that this means that UPDATE would always have to check whether key > columns are being modified. (I loosely talk about "key columns" here > referring to columns involving unique indexes. On tables with no unique > indexes, I think this would have to mean all columns, but I haven't > thought much about this bit.) > > To implement this, we need to augment MultiXact to store the lock type > that each comprising Xid holds on the tuple. Two bits per Xid are > needed. My thinking is that we could simply extend the "members" to add > a byte for every four members. This should be relatively simple, though > this forecloses the option of using MultiXact for some other purpose > than locking tuples. This being purely theoretical, I don't have a > problem with that. Hm, I'm still not a huge fan of having the only the choice between locking all columns, or all columns that are part of a unique index. For multiple reasons. First, I'd like to see us support FKs which reference non-unqiue columns. As long as we restrict such FK constraints to ON UPDATE/DELETE RESTRICT (or NO ACTION), there's no conceptual difficulty in doing that. Yeah, I know that there'll probably be a lot of push-back on the grounds of standards compliance, but I still believe it'd be a nice feature. Second, there are lots of reasons for adding UNIQUE constraints to columns beside being on the referencing side of a FK constraint. If we make the lock strength taken by UPDATE depend on whether or not the UPDATE modified a column included in a UNIQUE constraint, we force people to drop UNIQUE constraints to avoid deadlocks, and thus indirectly support sloppy schema design. Couldn't we simply give the user a way to specify, per column, whether or not that column is a "KEY" column? Creating a foreign key constraint could still implicitly mark all referenced columns as KEY columns, but columns would no longer be "KEY" columns simply because they're part of a UNIQUE constraint. Users would be free to add arbitrary columns to the set of "KEY" columns by doing ALTER TABLE ALTER COLUMN SET KEY. best regards, Florian Pflug
Excerpts from Florian Pflug's message of mar ago 09 15:41:00 -0400 2011: > First, I'd like to see us support FKs which reference non-unqiue columns. > As long as we restrict such FK constraints to ON UPDATE/DELETE RESTRICT > (or NO ACTION), there's no conceptual difficulty in doing that. Yeah, I > know that there'll probably be a lot of push-back on the grounds of standards > compliance, but I still believe it'd be a nice feature. > > Second, there are lots of reasons for adding UNIQUE constraints to columns > beside being on the referencing side of a FK constraint. If we make the > lock strength taken by UPDATE depend on whether or not the UPDATE modified a > column included in a UNIQUE constraint, we force people to drop UNIQUE > constraints to avoid deadlocks, and thus indirectly support sloppy schema > design. > > Couldn't we simply give the user a way to specify, per column, whether or > not that column is a "KEY" column? Creating a foreign key constraint could > still implicitly mark all referenced columns as KEY columns, but columns > would no longer be "KEY" columns simply because they're part of a UNIQUE > constraint. Users would be free to add arbitrary columns to the set of > "KEY" columns by doing ALTER TABLE ALTER COLUMN SET KEY. What you propose seems reasonable to me (allegedly not an expert in keys). However, I don't see that it conflicts with my proposal in any way -- I mean, on top of my patch you could build something that changes what columns are considered "keys". This is why the "KEY SHARE" rowmark option is going to be documented as "only for internal use", because it might change depending on how we define foreign-key-referenceable fields. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Jeff Davis's message of mar ago 09 14:41:14 -0400 2011: >> Right now, FKs aren't really very special, they are mostly just >> syntactic sugar (right?). This proposal would make FKs special internal >> mechanisms, and I don't see the benefit in doing so. > Well, you can get the same behavior by adding the constraint triggers > manually. But those triggers are written in C, so you could equally > claim that they are "special internal" already. The SPI interface has > some special entry points to allow them to work correctly (for example > passing a snapshot for the checks to run with). Yeah, the crosscheck-snapshot logic already puts the lie to any idea that the RI triggers are equivalent to anything available at the SQL level. ISTM you could pass down the "please do this with FOR KEY UPDATE not just FOR SHARE" flag similarly to the way the crosscheck snapshot is passed down, or maybe even just do it if you see a crosscheck snapshot is present in the executor state (though that's a bit ugly). Like Florian, I'm considerably more concerned about the aspect of deciding which columns are "key columns" and whether they changed. regards, tom lane
Excerpts from Tom Lane's message of mar ago 09 16:40:15 -0400 2011: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Excerpts from Jeff Davis's message of mar ago 09 14:41:14 -0400 2011: > >> Right now, FKs aren't really very special, they are mostly just > >> syntactic sugar (right?). This proposal would make FKs special internal > >> mechanisms, and I don't see the benefit in doing so. > > > Well, you can get the same behavior by adding the constraint triggers > > manually. But those triggers are written in C, so you could equally > > claim that they are "special internal" already. The SPI interface has > > some special entry points to allow them to work correctly (for example > > passing a snapshot for the checks to run with). > > Yeah, the crosscheck-snapshot logic already puts the lie to any idea > that the RI triggers are equivalent to anything available at the SQL > level. > > ISTM you could pass down the "please do this with FOR KEY UPDATE not > just FOR SHARE" flag similarly to the way the crosscheck snapshot is > passed down, or maybe even just do it if you see a crosscheck snapshot > is present in the executor state (though that's a bit ugly). You mean FOR KEY SHARE ... Yeah, I could pass it down that way. But if we go that route, do we need to specify any specific rowmark clause at all? Maybe we could go back to FOR UPDATE and use the flag to tell the executor that it's actually FOR KEY SHARE, and deprecate the FOR SHARE clause altogether. > Like Florian, I'm considerably more concerned about the aspect of > deciding which columns are "key columns" and whether they changed. I will keep this in mind. I think this problem is considerably easier to solve than the different rowmark per xact in MultiXact, anyway. If we require the user to specify which unique indexes or constraints are appropriate for FKs, it becomes trivial. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Aug 9, 2011, at 4:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Like Florian, I'm considerably more concerned about the aspect of > deciding which columns are "key columns" and whether they changed. Should we consider trying implement a system that can lock individual columns? Either way, my main concern is that we're going to end up pessimizing the common case of UPDATE, by making it do extra workto reduce the chances of a lock conflict from an incoming foreign key. Most of the time there won't be an incoming foreignkey, or the referring row won't get updated, and that work will be wasted. It would be nice to just lock the row "forsome kind of update" and sort out what exactly we locked only if a possible conflict comes along. ...Robert
On Tue, Aug 09, 2011 at 09:41:00PM +0200, Florian Pflug wrote: > Couldn't we simply give the user a way to specify, per column, whether or > not that column is a "KEY" column? Creating a foreign key constraint could > still implicitly mark all referenced columns as KEY columns, but columns > would no longer be "KEY" columns simply because they're part of a UNIQUE > constraint. Users would be free to add arbitrary columns to the set of > "KEY" columns by doing ALTER TABLE ALTER COLUMN SET KEY. What would be the use case for manually expanding the set of key columns? If you have a heavily-updated table referenced by slowly-changing tables, it might pay off to disable the optimization entirely. That would be an esoteric escape hatch to provide for an uncommon use case, though. -- Noah Misch http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Aug 09, 2011 at 01:01:04PM -0400, Alvaro Herrera wrote: > KEY UPDATE FOR UPDATE FOR SHARE KEY SHARE > KEY UPDATE X X X X > FOR UPDATE X X X > FOR SHARE X X > KEY SHARE X > > DELETE always grabs KEY UPDATE lock on a tuple. > UPDATE grabs KEY UPDATE if the key is being modified, otherwise FOR UPDATE. > SELECT FOR UPDATE grabs FOR UPDATE. > SELECT FOR SHARE grabs FOR SHARE. > SELECT FOR KEY SHARE grabs FOR KEY SHARE. This is the mode used by RI triggers. > > Note that this means that UPDATE would always have to check whether key > columns are being modified. (I loosely talk about "key columns" here > referring to columns involving unique indexes. On tables with no unique > indexes, I think this would have to mean all columns, but I haven't > thought much about this bit.) On tables with no key columns, we can skip the datum comparisons and use KEY UPDATE, because it does not matter: nobody would try to take KEY SHARE anyway. (If KEY SHARE is SQL-exposed, a manual acquisition remains possible. It does not seem important to cater to that, though.) Key columns should be those columns actually referenced by incoming foreign keys; the relcache can maintain that list. This was less important for the previous version, which didn't compare datums prior to encountering a live KEY LOCK. It will now be more important to avoid that overhead for tables lacking incoming FKs. > To implement this, we need to augment MultiXact to store the lock type > that each comprising Xid holds on the tuple. Two bits per Xid are > needed. My thinking is that we could simply extend the "members" to add > a byte for every four members. This should be relatively simple, though > this forecloses the option of using MultiXact for some other purpose > than locking tuples. This being purely theoretical, I don't have a > problem with that. > > Note that the original keylocks patch I posted a couple of weeks ago has > a caveat: if transaction A grabs share lock and transaction B grabs key > lock, there's no way to know who owns which. I dismissed that problem > as unimportant (and probably infrequent); the good thing about this new > idea is that we wouldn't have that problem. Is there a case not involving manual SELECT ... FOR <locktype> that will depend on this for correctness? Even if not, there's a lot to like about this proposal. However, I think I may be missing the condition you had in mind when designing it. > This would also help us find a solution to the problem that an aborted > subtransaction that updates or excl-locks a tuple causes an earlier > shared lock to be forgotten. We would deal with this by marking the > Xmax with a new MultiXact that includes both the lock and the update. > This means that this MultiXact would have to be WAL-logged. I would > leave that for a later patch, but I think it's good that there's a way > to fix it. Interesting. Growing pg_multixact/members by 6% seems reasonable for those two benefits. It also makes possible a manual FOR UPDATE over a KEY LOCK; that previously looked problematic due to the lack of a later tuple version to continue bearing the KEY LOCK. Consider the simple case of a tuple with a single KEY LOCK which we then proceed to UPDATE, not touching any key column. Will that old tuple version get a multixact bearing both the FOR UPDATE locker and the KEY LOCK locker, or will it carry just the FOR UPDATE locker with the new tuple witnessing the KEY LOCK? The latter seems preferable to avoid an increase in pg_multixact usage, but I haven't worked out whether it covers enough of the problem space. Thanks, nm -- Noah Misch http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Aug9, 2011, at 22:40 , Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Excerpts from Jeff Davis's message of mar ago 09 14:41:14 -0400 2011: >>> Right now, FKs aren't really very special, they are mostly just >>> syntactic sugar (right?). This proposal would make FKs special internal >>> mechanisms, and I don't see the benefit in doing so. > >> Well, you can get the same behavior by adding the constraint triggers >> manually. But those triggers are written in C, so you could equally >> claim that they are "special internal" already. The SPI interface has >> some special entry points to allow them to work correctly (for example >> passing a snapshot for the checks to run with). > > Yeah, the crosscheck-snapshot logic already puts the lie to any idea > that the RI triggers are equivalent to anything available at the SQL > level. True, but I still considered that to be a quite unfortunate limitation, and one that I hope to one day remove. So I'd hate to see us adding more stumbling blocks along that road. Even today, you can do user-space FK-like constraint if you restrict yourself to either running all transaction in isolation level READ COMMITTED, or all transactions in isolation level SERIALIZABLE (Though I in the later case, you don't need locks anyway..) best regards, Florian Pflug
On Aug10, 2011, at 08:45 , Noah Misch wrote: > On Tue, Aug 09, 2011 at 09:41:00PM +0200, Florian Pflug wrote: >> Couldn't we simply give the user a way to specify, per column, whether or >> not that column is a "KEY" column? Creating a foreign key constraint could >> still implicitly mark all referenced columns as KEY columns, but columns >> would no longer be "KEY" columns simply because they're part of a UNIQUE >> constraint. Users would be free to add arbitrary columns to the set of >> "KEY" columns by doing ALTER TABLE ALTER COLUMN SET KEY. > > What would be the use case for manually expanding the set of key columns? You'd use that if you implemented a FK-like constraint (e.g. a FK on an array-values field). Without a way to manually expand the set of KEY columns, such a non-core FK constraint couldn't use the lighter locks. best regards, Florian Pflug
Excerpts from Alvaro Herrera's message of mar ago 09 13:01:04 -0400 2011: > To implement this, we need to augment MultiXact to store the lock type > that each comprising Xid holds on the tuple. Two bits per Xid are > needed. My thinking is that we could simply extend the "members" to add > a byte for every four members. So I've been working on this, and I've noticed that somehow it seems to have turned into a giant snowball. I'd like opinions on the items below before I continue work here; if any of these ideas turns out to be a showstopper, I'd like to know sooner rather than later. 1. since MultiXacts are going to contain updates and not just locks, it means they will need to persist beyond OldestXmin -- in fact, pg_multixact is going to have the same truncation rules as pg_clog, namely the vacuum freeze horizon. Currently they are truncated very quickly; this is not going to be true anymore. 2. This also means that we may have to resolve MultiXacts to their comprising TransactionIds when tqual.c is doing visibility checks on the tuples. Right now, the code simply does things like this: if (tuple->t_infomask & HEAP_XMAX_IS_MULTI){ /* MultiXacts are currently only allowed to lock tuples */ Assert(tuple->t_infomask& HEAP_IS_LOCKED); return true;}/* further code checks HeapTupleHeaderGetXmax(tuple) */ It's now going to need to do something more like if (tuple->t_infomask & HEAP_XMAX_IS_MULTI){ if (tuple->t_infomask & HEAP_IS_LOCKED) return true; xmax = HeapTupleGetUpdateXid(tuple);}else xmax = HeapTupleHeaderGetXmax(tuple);/* further code checks our derived xmax */ where the HeapTupleGetUpdateXid function looks more or less like this TransactionId HeapTupleGetUpdateXid(HeapTupleHeader tuple) {TransactionId update_xact; Assert(!(tuple->t_infomask & HEAP_XMAX_IS_NOT_UPDATE));Assert(tuple->t_infomask & HEAP_XMAX_IS_MULTI); MultiXactMember *members; nmembers = GetMultiXactIdMembers(xwait, &members); if (nmembers > 0){ int i; for (i = 0; i < nmembers; i++) { /* KEY SHARE lockers are okay -- ignore it */ if (members[i].status== MultiXactStatusKeyShare) continue; /* there should be at most one updater */ Assert(update_xact == InvalidTransactionId); /* other status values not acceptable because they * conflictwith update */ Assert(members[i].status == MultiXactStatusUpdate); update_xact = members[i].xid; }} return update_xact; } Which leads us to: 3. I've come up with HEAP_XMAX_IS_NOT_UPDATE in t_infomask, which means that the Xmax, being a MultiXact, does not contain an update Xid. This reuses the old value of HEAP_XMAX_SHARED_LOCK. I've used this rather weird semantics for these reasons: a) it's pg_upgrade compatible. Any tuple that has the SHARED_LOCK bit from the older version set cannot contain an update, and so the bit is automatically right. b) it quickly avoids having to run the GetMultiXactIdMembers thingy (which is expensive) in the common case that there's no update. c) it lets me keep the HEAP_IS_LOCKED semantics; which means "this tuple is only locked by Xmax, not updated", which is used extensively in various places. /** if any of these bits is set, xmax hasn't deleted the tuple, only locked it.*/ #define HEAP_IS_LOCKED (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_IS_NOT_UPDATE | \ HEAP_XMAX_KEYSHR_LOCK) 4. FOR SHARE is not a very widely used clause, I think. FOR UPDATE is part of the standard and thus it warrants quick innards, i.e. its own hint bit. FOR KEY SHARE is going to be used by RI triggers and so it should also be quick; I've also given it its own hint bit. However, FOR SHARE is probably not used much and I've relegated it to being mandatorily stored in MultiXact. That is, if someone requests a FOR SHARE lock on a tuple, it will get a singleton MultiXact. The reason for this is that I didn't want to use one more hint bit. 5. I've now used three hint bits -- reused HEAP_XMAX_SHARED_LOCK as HEAP_XMAX_IS_NOT_UPDATE (already explained); used the free hint bit from t_infomask as HEAP_XMAX_KEYSHR_LOCK (should be obvious); and I've used 0x2000 in t_infomask2 as HEAP_UPDATE_KEY_INTACT, to mean that this tuple has been updated but the key columns have not been modified. This lets heapam.c know that this tuple can be further key-share locked. 6. When locking a tuple that is being concurrently updated, the locking transaction must obviously walk up the update chain (t_self) and lock the updated version too. I have not tried to implement this yet but I think it's going to be a bit weird -- I think I might need to modify tuples that are HeapTupleInvisible and/or HeapTupleSelfUpdated according to HeapTupleSatisfiesUpdate. (I even wonder if I'm going to need to create a new HeapTupleSatisfies routine for this purpose). Thoughts? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Anyone on all of this? On 09/09/2011 02:31 PM, Alvaro Herrera wrote: > > > Excerpts from Alvaro Herrera's message of mar ago 09 13:01:04 -0400 2011: > >> To implement this, we need to augment MultiXact to store the lock type >> that each comprising Xid holds on the tuple. Two bits per Xid are >> needed. My thinking is that we could simply extend the "members" to add >> a byte for every four members. > > So I've been working on this, and I've noticed that somehow it seems to > have turned into a giant snowball. I'd like opinions on the items below > before I continue work here; if any of these ideas turns out to be a > showstopper, I'd like to know sooner rather than later. > > 1. since MultiXacts are going to contain updates and not just locks, it > means they will need to persist beyond OldestXmin -- in fact, > pg_multixact is going to have the same truncation rules as pg_clog, > namely the vacuum freeze horizon. Currently they are truncated very > quickly; this is not going to be true anymore. > > 2. This also means that we may have to resolve MultiXacts to their > comprising TransactionIds when tqual.c is doing visibility checks on the > tuples. Right now, the code simply does things like this: > > if (tuple->t_infomask& HEAP_XMAX_IS_MULTI) > { > /* MultiXacts are currently only allowed to lock tuples */ > Assert(tuple->t_infomask& HEAP_IS_LOCKED); > return true; > } > /* further code checks HeapTupleHeaderGetXmax(tuple) */ > > It's now going to need to do something more like > > if (tuple->t_infomask& HEAP_XMAX_IS_MULTI) > { > if (tuple->t_infomask& HEAP_IS_LOCKED) > return true; > xmax = HeapTupleGetUpdateXid(tuple); > } > else > xmax = HeapTupleHeaderGetXmax(tuple); > /* further code checks our derived xmax */ > > where the HeapTupleGetUpdateXid function looks more or less like this > > TransactionId > HeapTupleGetUpdateXid(HeapTupleHeader tuple) > { > TransactionId update_xact; > > Assert(!(tuple->t_infomask& HEAP_XMAX_IS_NOT_UPDATE)); > Assert(tuple->t_infomask& HEAP_XMAX_IS_MULTI); > > MultiXactMember *members; > > nmembers = GetMultiXactIdMembers(xwait,&members); > > if (nmembers> 0) > { > int i; > > for (i = 0; i< nmembers; i++) > { > /* KEY SHARE lockers are okay -- ignore it */ > if (members[i].status == MultiXactStatusKeyShare) > continue; > /* there should be at most one updater */ > Assert(update_xact == InvalidTransactionId); > /* other status values not acceptable because they > * conflict with update */ > Assert(members[i].status == MultiXactStatusUpdate); > update_xact = members[i].xid; > } > } > > return update_xact; > } > > > Which leads us to: > > 3. I've come up with HEAP_XMAX_IS_NOT_UPDATE in t_infomask, which means > that the Xmax, being a MultiXact, does not contain an update Xid. This > reuses the old value of HEAP_XMAX_SHARED_LOCK. I've used this rather > weird semantics for these reasons: > > a) it's pg_upgrade compatible. Any tuple that has the SHARED_LOCK bit > from the older version set cannot contain an update, and so the bit is > automatically right. > b) it quickly avoids having to run the GetMultiXactIdMembers thingy > (which is expensive) in the common case that there's no update. > c) it lets me keep the HEAP_IS_LOCKED semantics; which means "this tuple > is only locked by Xmax, not updated", which is used extensively in > various places. > /* > * if any of these bits is set, xmax hasn't deleted the tuple, only locked it. > */ > #define HEAP_IS_LOCKED (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_IS_NOT_UPDATE | \ > HEAP_XMAX_KEYSHR_LOCK) > > > 4. FOR SHARE is not a very widely used clause, I think. FOR UPDATE is > part of the standard and thus it warrants quick innards, i.e. its own > hint bit. FOR KEY SHARE is going to be used by RI triggers and so it > should also be quick; I've also given it its own hint bit. However, > FOR SHARE is probably not used much and I've relegated it to being > mandatorily stored in MultiXact. That is, if someone requests a FOR > SHARE lock on a tuple, it will get a singleton MultiXact. The reason > for this is that I didn't want to use one more hint bit. > > 5. I've now used three hint bits -- reused HEAP_XMAX_SHARED_LOCK as > HEAP_XMAX_IS_NOT_UPDATE (already explained); used the free hint bit from > t_infomask as HEAP_XMAX_KEYSHR_LOCK (should be obvious); and I've used > 0x2000 in t_infomask2 as HEAP_UPDATE_KEY_INTACT, to mean that this tuple > has been updated but the key columns have not been modified. This lets > heapam.c know that this tuple can be further key-share locked. > > 6. When locking a tuple that is being concurrently updated, the locking > transaction must obviously walk up the update chain (t_self) and lock > the updated version too. I have not tried to implement this yet but I > think it's going to be a bit weird -- I think I might need to modify > tuples that are HeapTupleInvisible and/or HeapTupleSelfUpdated according > to HeapTupleSatisfiesUpdate. (I even wonder if I'm going to need to > create a new HeapTupleSatisfies routine for this purpose). > > Thoughts? > -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development The PostgreSQL Conference - http://www.postgresqlconference.org/ @cmdpromptinc - @postgresconf - 509-416-6579
On Fri, Sep 9, 2011 at 5:31 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > [ multixact complexity ] I wonder if it's a mistake to be thinking about solving this problem by extending the MultiXact mechanism. Pushing xmax out-of-line so that we have room to store tuple information seems expensive, especially because there's no convenient way to undo it once the locks are old enough not to be interesting any more. The current system works because we never need both pieces of information at the same time, but that's not going to be true any more. I'm wondering if it would be possible to modify the main lock manager, or create a special-purpose tuple lock manager, to record all tuple locks, both awaited and granted. You'd need to make sure that if there were more than a few locks the information could spill to disk somehow, and you'd also need to make sure that you didn't pull that information in from disk more often than necessary - i.e. you should try to keep enough summary info in memory to determine whether there *might* be a conflicting lock that spilled out, so that you only need to go examine the spilled data if there's a possibility that you might find something interesting there. A system like this would make it possible to clean up all the lock entries at transaction end, which would avoid a lot of the complexity you mention. On the other hand, it's clearly not simple, either, and I haven't thought through all the details... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of mar sep 13 11:02:51 -0300 2011: > > On Fri, Sep 9, 2011 at 5:31 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > [ multixact complexity ] > > I wonder if it's a mistake to be thinking about solving this problem > by extending the MultiXact mechanism. Pushing xmax out-of-line so > that we have room to store tuple information seems expensive, > especially because there's no convenient way to undo it once the locks > are old enough not to be interesting any more. The current system > works because we never need both pieces of information at the same > time, but that's not going to be true any more. Hmm, it doesn't look that way to me: whenever you lock a row, all previous lockers that are gone can now be forgotten. Locks that are old enough not to be interesting, are constantly and automatically gone. The only reason that multixact now needs to persist beyond currently running transaction is the chance that there might be an "update xmax" hiding somewhere; and tuples marked with those are going to be removed by vacuum anyway. (I have been thinking that long before vacuum, we could remove the multixact and replace it with a plain Xid, if the lockers are all gone -- which is another part of your "undo it once the locks are old enough".) The expensive bit is the reason why I used a hint bit to mark this possibility; we distinguish the cheap case of locked-but-not-updated from the expensive one of locked-and-updated with hint bits, so the cheap case stays cheap; and the expensive one requires a bit more work, yes, but this brings more concurrency overall. > I'm wondering if it would be possible to modify the main lock manager, > or create a special-purpose tuple lock manager, to record all tuple > locks, both awaited and granted. You'd need to make sure that if > there were more than a few locks the information could spill to disk > somehow, and you'd also need to make sure that you didn't pull that > information in from disk more often than necessary - i.e. you should > try to keep enough summary info in memory to determine whether there > *might* be a conflicting lock that spilled out, so that you only need > to go examine the spilled data if there's a possibility that you might > find something interesting there. This is where we started, back when we were creating SELECT FOR SHARE: trying to spill the lock table. That idea went down in flames; consider that someone might request to block an entire huge table, and you're in trouble. There might have been other problems I don't recall with that idea. I don't want to go back to that drawing board -- obviously I'm not very keen of going great lengths down the same path, only to fail, twice. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Sep 13, 2011 at 9:27 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: >> I wonder if it's a mistake to be thinking about solving this problem >> by extending the MultiXact mechanism. Pushing xmax out-of-line so >> that we have room to store tuple information seems expensive, >> especially because there's no convenient way to undo it once the locks >> are old enough not to be interesting any more. The current system >> works because we never need both pieces of information at the same >> time, but that's not going to be true any more. > > Hmm, it doesn't look that way to me: whenever you lock a row, all > previous lockers that are gone can now be forgotten. Locks that are old > enough not to be interesting, are constantly and automatically gone. Yes... but as you pointed out, you'll still have to keep the MultiXacts around for a long time, because you have no way of knowing for certain whether there might be an xmax referring to them. > The only reason that multixact now needs to persist beyond currently > running transaction is the chance that there might be an "update xmax" > hiding somewhere; and tuples marked with those are going to be removed > by vacuum anyway. (I have been thinking that long before vacuum, we > could remove the multixact and replace it with a plain Xid, if the > lockers are all gone -- which is another part of your "undo it once the > locks are old enough".) You could do that, but odds are it won't buy you much. Generally by the time you could do this, you'll either be able to mark xmax invalid or truncate the tuple to a dead line pointer. The exception would be the case where xmax outlives all the lockers, but I'm not sure whether it's worth having code just to cater to that case. > The expensive bit is the reason why I used a hint bit to mark this > possibility; we distinguish the cheap case of locked-but-not-updated > from the expensive one of locked-and-updated with hint bits, so the > cheap case stays cheap; and the expensive one requires a bit more work, > yes, but this brings more concurrency overall. Right. I don't think what you're trying to do is necessarily bad; I was just casting around for other solutions. From an efficiency standpoint, I can see two things to worry about: first, keeping multixacts around for longer consumes more disk space than not doing so, and most of the time, that disk space will be wasted; and second, we'll need to deference multixact XIDs more frequently, and that's not free. Now, it may well be that both of those problems are sufficiently small that we needn't worry about them. >> I'm wondering if it would be possible to modify the main lock manager, >> or create a special-purpose tuple lock manager, to record all tuple >> locks, both awaited and granted. You'd need to make sure that if >> there were more than a few locks the information could spill to disk >> somehow, and you'd also need to make sure that you didn't pull that >> information in from disk more often than necessary - i.e. you should >> try to keep enough summary info in memory to determine whether there >> *might* be a conflicting lock that spilled out, so that you only need >> to go examine the spilled data if there's a possibility that you might >> find something interesting there. > > This is where we started, back when we were creating SELECT FOR SHARE: > trying to spill the lock table. That idea went down in flames; consider > that someone might request to block an entire huge table, and you're in > trouble. Yeah, that's not much fun. On the other hand, our current representation involves dirtying every page in the table, which isn't much fun either. I'm not sure that this couldn't be made to work - with some highly-compressed lock representation - but I can see why you're not interested in trying it. I wish I had some better ideas here, but I think this is just a tough problem and it may be that any solution will be somewhat messy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company