Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date
Msg-id CAM3SWZRpycmx=EwcEpH5dpt4f7B8Db-TrfYq63QbMB2DsLsi=g@mail.gmail.com
Whole thread Raw
In response to Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Mon, Nov 18, 2013 at 4:37 PM, Peter Geoghegan <pg@heroku.com> wrote:
> You're right that the value locking is scary. I think we need to very
> carefully consider it, once I have buy-in on the basic approach. I
> really do think it's the least-worst approach described to date. It
> isn't like we can't discuss making it inherently less scary, but I
> hesitate to do that now, given that I don't know if that discussing
> will go anywhere.

One possible compromise would be "promise tuples" where we know we'll
be able to keep our promise. In other words:

1. We lock values in the first phase, in more or less the manner of
the extant patch.

2. When a consensus exists that heap tuple insertion proceeds, we
proceed with insertion of these promise index tuples (and probably
keep just a pin on the relevant pages).

3. Proceed with insertion of the heap tuple (with no "value locks" of
any kind held).

3. Go back to the unique indexes, update the heap tid and unset the
index tuple flag (that indicates that the tuples are in this promise
state). Probably we can even be bright about re-finding the existing
promise tuples with their proper heap tid (e.g. maybe we can avoid
doing a regular index scan at least some of the time - chances are
pretty good that the index tuple is on the same page as before, so
it's generally well worth a shot looking there first). As with the
earlier promise tuple proposals, we store our xid in the ItemPointer.

4. Finally, insertion of non-unique index tuples occurs in the regular manner.

Obviously the big advantage here is that we don't have to worry about
value locking across heap tuple insertion at all, and yet we don't
have to worry about bloating, because we really do know that insertion
proper will proceed when inserting *this* type of promise index tuple.
Maybe that even makes it okay to just use buffer locks, if we think
some more about the other edge cases. Regular index scans take the
aforementioned flag as a kind of visibility hint, perhaps, so we don't
have to worry about them. And VACUUM would kill any dead promise
tuples - this would be much less of a concern than with the earlier
promise tuple proposals, because it is extremely non routine. Maybe
it's fine to not make autovacuum concerned about a whole new class of
(index-only) bloat, which seemed like a big problem with those earlier
proposals, simply because crashes within this tiny window are
hopefully so rare that it couldn't possibly amount to much bloat in
the grand scheme of things (at least before a routine VACUUM - UPDATEs
tend to necessitate those). If you have 50 upserting backends in this
tiny window during a crash, that would be only 50 dead index tuples.
Given the window is so tiny, I doubt it would be much of a problem at
all - even 50 seems like a very high number. The track_counts counts
that drive autovacuum here are already not crash safe, so I see no
regression.

Now, you still have to value lock across multiple btree unique
indexes, and I understand there are reservations about this. But the
surface area is made significantly smaller at reasonably low cost.
Furthermore, doing TOASTing out-of-line and so on ceases to be
necessary.

The LOCK FOR UPDATE case is the same as before. Nothing else changes.

FWIW, without presuming anything about value locking implementation,
I'm not too worried about making the implementation scale to very
large numbers of unique indexes, with very low shared_buffer settings.
We already have a fairly similar situation with
max_locks_per_transaction and so on, no?

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] configure: allow adding a custom string to PG_VERSION
Next
From: KONDO Mitsumasa
Date:
Subject: Re: Improvement of pg_stat_statement usage about buffer hit ratio