Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Date | |
Msg-id | CAM3SWZRpnkuVrENDV3zM=BNTCv8-X3PYXt76pohGyAuP1iq-ug@mail.gmail.com Whole thread Raw |
In response to | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
|
List | pgsql-hackers |
Attached revision only uses heavyweight page locks across complex operations. I haven't benchmarked it, but it appears to perform reasonably well. I haven't attempted to measure a regression for regular insertion, but offhand it seems likely that any regression would be well within the noise - more or less immeasurably small. I won't repeat too much of what is already well commented in the patch. For those that would like a relatively quick summary of what I've done, I include inline a new section that I've added to the nbtree README: Notes about speculative insertion --------------------------------- As an amcanunique AM, the btree implementation is required to support "speculative insertion". This means that the value locking method through which unique index enforcement conventionally occurs is extended and generalized, such that insertion is staggered: the core code attempts to get full consensus on whether values proposed for insertion will not cause duplicate key violations. Speculative insertion is only possible for unique index insertion without deferred uniqueness checking (since speculative insertion into a deferred unique constraint's index is a contradiction in terms). For conventional unique index insertion, the Btree implementation exclusive locks a buffer holding the first page that the value to be inserted could possibly be on, though only for an instant, during and shortly after uniqueness verification. It would not be acceptable to hold this lock across complex operations for the duration of the remainder of the first phase of speculative insertion. Therefore, we convert this exclusive buffer lock to an exclusive page lock managed by the lock manager, thereby greatly ameliorating the consequences of undiscovered deadlocking implementation bugs (though deadlocks are not expected), and minimizing the impact on system interruptibility, while not affecting index scans. It may be useful to informally think of the page lock type acquired by speculative insertion as similar to an intention exclusive lock, a type of lock found in some legacy 2PL database systems that use multi-granularity locking. A session establishes the exclusive right to subsequently establish a full write lock, without actually blocking reads of the page unless and until a lock conversion actually occurs, at which point both reads and writes are blocked. Under this mental model, buffer shared locks can be thought of as intention shared locks. As implemented, these heavyweight locks are only relevant to the insertion case; at no other point are they actually considered, since insertion is the only way through which new values are introduced. The first page a value proposed for insertion into an index could be on represents a natural choke point for our extended, though still rather limited system of value locking. Naturally, when we perform a "lock escalation" and acquire an exclusive buffer lock, all other buffer locks on the same buffer are blocked, which is how the implementation localizes knowledge about the heavyweight lock to insertion-related routines. Apart from deletion, which is concomitantly prevented by holding a pin on the buffer throughout, all exclusive locking of Btree buffers happen as a direct or indirect result of insertion, so this approach is sufficient. (Actually, an exclusive lock may still be acquired without insertion to initialize a root page, but that hardly matters.) Note that all value locks (including buffer pins) are dropped immediately as speculative insertion is aborted, as the implementation waits on the outcome of another xact, or as "insertion proper" occurs. These page-level locks are not intended to last more than an instant. In general, the protocol for heavyweight locking Btree pages is that heavyweight locks are acquired before any buffer locks are held, while the locks are only released after all buffer locks are released. While not a hard and fast rule, presently we avoid heavyweight page locking more than one page per unique index concurrently. Happy new year -- Peter Geoghegan
Attachment
pgsql-hackers by date: