Re: Brain dump: btree collapsing - Mailing list pgsql-hackers

From Curtis Faith
Subject Re: Brain dump: btree collapsing
Date
Msg-id 001401c2d6df$af61f9c0$29043ac8@curtislaptop
Whole thread Raw
In response to Brain dump: btree collapsing  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> tom lane wrote:
> > Hm.  A single lock that must be grabbed for operations anywhere in 
> > the index is a concurrency bottleneck.

I replied a bit later: 
> I don't think it would be that bad.  It's not a lock but a
> mutex that would only be held for relatively brief duration. 
> It looks like tens or a few hundred instructions at most 
> covered by the mutex and only during scans that cross page boundaries.
> 
> Unless you were talking about a 16 to 64 way SMP box, I don't
> think there would be much actual contention since there would 
> have to be contention for the same index at the same time and 
> depending on the number of keys per page, very little of the 
> run time is spent on the lock transition for traversal across 
> pages as compared to all the other stuff that would be going on.


tom also presciently wrote:
> But maybe we could avoid that

After sleeping on it a bit, it occurs to me that you are again correct;
the mutex is not required to do what I suggested. It is simply a matter
of assuring that during the page traversal the lock is either obtained
or the process added to the waiters queue before the current page lock
is released. This will ensure than any process trying to obtain an
exclusive or superexclusive lock will wait or fail an immediate lock
acquisition.


traversal pseudo code:
   if the next page lock can be obtained immediately
       grab it
       release the current page lock
   else if we would have to wait
       add current process to waiters queue
       release current page lock
       sleep on the lock
   end if

end traversal pseudo code


Like the current release before next page acquisition method, this won't
cause deadlocks with other scanners going in the opposite direction or
with concurrently inserting processes since the current page lock is
released before sleeping.

If this change is made then my original suggestion, that the merge
process attempt a non-deadlocking acquisition of all the locks with a
drop of all locks and a retry if any of the attempts would have caused a
wait, should work fine.

If the traversal code is changed as per above, a merging VACUUM process
is guaranteed to either run into the lock held by the scanning process
on the current page or the next page.

This has the potential side benefit of permitting the simplification of
the scan code in the case where the lock is obtained immediately. Since
the scanning process would know that no inserting process has added
pages to the tree because of a split, it wouldn't have to check for
newly inserted pages to the right of the next page. I don't know how
much work is involved in this check but if it is significant it could be
eliminated in this case, one which is the most likely case in many
scenarios.

What do you think?

- Curtis





pgsql-hackers by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: pg environment? metadata?
Next
From: Christopher Kings-Lynne
Date:
Subject: Re: pg environment? metadata?