Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Date | |
Msg-id | CAM3SWZRpycmx=EwcEpH5dpt4f7B8Db-TrfYq63QbMB2DsLsi=g@mail.gmail.com Whole thread Raw |
In response to | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Peter Geoghegan <pg@heroku.com>) |
List | pgsql-hackers |
On Mon, Nov 18, 2013 at 4:37 PM, Peter Geoghegan <pg@heroku.com> wrote: > You're right that the value locking is scary. I think we need to very > carefully consider it, once I have buy-in on the basic approach. I > really do think it's the least-worst approach described to date. It > isn't like we can't discuss making it inherently less scary, but I > hesitate to do that now, given that I don't know if that discussing > will go anywhere. One possible compromise would be "promise tuples" where we know we'll be able to keep our promise. In other words: 1. We lock values in the first phase, in more or less the manner of the extant patch. 2. When a consensus exists that heap tuple insertion proceeds, we proceed with insertion of these promise index tuples (and probably keep just a pin on the relevant pages). 3. Proceed with insertion of the heap tuple (with no "value locks" of any kind held). 3. Go back to the unique indexes, update the heap tid and unset the index tuple flag (that indicates that the tuples are in this promise state). Probably we can even be bright about re-finding the existing promise tuples with their proper heap tid (e.g. maybe we can avoid doing a regular index scan at least some of the time - chances are pretty good that the index tuple is on the same page as before, so it's generally well worth a shot looking there first). As with the earlier promise tuple proposals, we store our xid in the ItemPointer. 4. Finally, insertion of non-unique index tuples occurs in the regular manner. Obviously the big advantage here is that we don't have to worry about value locking across heap tuple insertion at all, and yet we don't have to worry about bloating, because we really do know that insertion proper will proceed when inserting *this* type of promise index tuple. Maybe that even makes it okay to just use buffer locks, if we think some more about the other edge cases. Regular index scans take the aforementioned flag as a kind of visibility hint, perhaps, so we don't have to worry about them. And VACUUM would kill any dead promise tuples - this would be much less of a concern than with the earlier promise tuple proposals, because it is extremely non routine. Maybe it's fine to not make autovacuum concerned about a whole new class of (index-only) bloat, which seemed like a big problem with those earlier proposals, simply because crashes within this tiny window are hopefully so rare that it couldn't possibly amount to much bloat in the grand scheme of things (at least before a routine VACUUM - UPDATEs tend to necessitate those). If you have 50 upserting backends in this tiny window during a crash, that would be only 50 dead index tuples. Given the window is so tiny, I doubt it would be much of a problem at all - even 50 seems like a very high number. The track_counts counts that drive autovacuum here are already not crash safe, so I see no regression. Now, you still have to value lock across multiple btree unique indexes, and I understand there are reservations about this. But the surface area is made significantly smaller at reasonably low cost. Furthermore, doing TOASTing out-of-line and so on ceases to be necessary. The LOCK FOR UPDATE case is the same as before. Nothing else changes. FWIW, without presuming anything about value locking implementation, I'm not too worried about making the implementation scale to very large numbers of unique indexes, with very low shared_buffer settings. We already have a fairly similar situation with max_locks_per_transaction and so on, no? -- Peter Geoghegan
pgsql-hackers by date: