Re: Stating the significance of Lehman & Yao in the nbtree README - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Stating the significance of Lehman & Yao in the nbtree README
Date
Msg-id CAM3SWZRUdMBoCr5=MT0k6n=k724=g5S3PmMsSA1MVXwmKnBGKw@mail.gmail.com
Whole thread Raw
In response to Re: Stating the significance of Lehman & Yao in the nbtree README  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Stating the significance of Lehman & Yao in the nbtree README  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Stating the significance of Lehman & Yao in the nbtree README  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Tue, Jul 22, 2014 at 8:59 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Okay, but how does this justify to add below new text in README.
> + Even with these read locks, Lehman and Yao's approach obviates the
> + need of earlier schemes to hold multiple read locks concurrently when
> + descending the tree as part of servicing index scans (pessimistic lock
> + coupling).
>
> Actually I think putting it can lead to inconsistency in the README.
> Currently it indicates that our algorithm is different from L&Y w.r.t taking
> Read Locks and has given explanation for same.

Not really. Firstly, that sentence acknowledges that there are read
locks where L&Y assume there will not be. "Even with these read locks"
references the first paragraph, where it is stated the Postgres
B-Trees still acquire read locks while descending the tree. Secondly,
I'm pretty sure that even Lehman and Yao realized that their apparent
assumption that real implementations would not require read locks
isn't realistic. Their handling of deletion seems perfunctory to me.
They say "In situations where excessive deletions cause the storage
utilization of tree nodes to be unacceptably low, a batch
reorganization or an underflow operation which locks the entire tree
can be performed". I'm pretty sure that that sounded almost as bad in
1980 as it does now. We don't have a "not quite L&Y" implementation
just because there are single read locks acquired while descending the
tree. Prior schemes needed multiple *concurrent* exclusive locks.
B-Trees were around for about 10 years before L&Y.

There is reason to think that pretty much every practical
implementation uses read locks for many years, because there is a well
received 2001 paper [1] that describes a scheme where L&Y style B-link
trees can *actually* be made to not require read locks, which
discusses things like caching effects on contemporary hardware - it
involves playing tricks with detecting and recovering from page level
inconsistencies, IIRC. Furthermore, it references a scheme from the
late 90s involving local copies of B-Link pages. I thought about
pursuing something like that myself, but the cost of "latching"
(buffer locking) B-Trees in PostgreSQL is likely to be reduced before
too long anyway, which makes the general idea seem unenticing right
now.

> Basically I think it will be better if you can explain in bit more detail
> that
> how does "right-links at all levels and high-key" helps to detect and
> recover from concurrent page splits.

You might be right about that - perhaps I should go into more detail.

[1] http://www.vldb.org/conf/2001/P181.pdf
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal (9.5) : psql unicode border line styles
Next
From: Viswanatham kirankumar
Date:
Subject: Re: [TODO] Process pg_hba.conf keywords as case-insensitive