Re: Brain dump: btree collapsing - Mailing list pgsql-hackers
From | Curtis Faith |
---|---|
Subject | Re: Brain dump: btree collapsing |
Date | |
Msg-id | 001401c2d6df$af61f9c0$29043ac8@curtislaptop Whole thread Raw |
In response to | Brain dump: btree collapsing (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
> tom lane wrote: > > Hm. A single lock that must be grabbed for operations anywhere in > > the index is a concurrency bottleneck. I replied a bit later: > I don't think it would be that bad. It's not a lock but a > mutex that would only be held for relatively brief duration. > It looks like tens or a few hundred instructions at most > covered by the mutex and only during scans that cross page boundaries. > > Unless you were talking about a 16 to 64 way SMP box, I don't > think there would be much actual contention since there would > have to be contention for the same index at the same time and > depending on the number of keys per page, very little of the > run time is spent on the lock transition for traversal across > pages as compared to all the other stuff that would be going on. tom also presciently wrote: > But maybe we could avoid that After sleeping on it a bit, it occurs to me that you are again correct; the mutex is not required to do what I suggested. It is simply a matter of assuring that during the page traversal the lock is either obtained or the process added to the waiters queue before the current page lock is released. This will ensure than any process trying to obtain an exclusive or superexclusive lock will wait or fail an immediate lock acquisition. traversal pseudo code: if the next page lock can be obtained immediately grab it release the current page lock else if we would have to wait add current process to waiters queue release current page lock sleep on the lock end if end traversal pseudo code Like the current release before next page acquisition method, this won't cause deadlocks with other scanners going in the opposite direction or with concurrently inserting processes since the current page lock is released before sleeping. If this change is made then my original suggestion, that the merge process attempt a non-deadlocking acquisition of all the locks with a drop of all locks and a retry if any of the attempts would have caused a wait, should work fine. If the traversal code is changed as per above, a merging VACUUM process is guaranteed to either run into the lock held by the scanning process on the current page or the next page. This has the potential side benefit of permitting the simplification of the scan code in the case where the lock is obtained immediately. Since the scanning process would know that no inserting process has added pages to the tree because of a split, it wouldn't have to check for newly inserted pages to the right of the next page. I don't know how much work is involved in this check but if it is significant it could be eliminated in this case, one which is the most likely case in many scenarios. What do you think? - Curtis
pgsql-hackers by date: