Thread: Stating the significance of Lehman & Yao in the nbtree README

Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
While talking to various people during pgCon, I was reminded that the
nbtree README does a poor job of explaining the actual practical
advantages of L&Y from a high level. The "move right" trick is
initially mentioned only as an adjunct to a discussion of the
special-case handling of root page splits, even though it's of huge
significance. We also say this within nbtinsert.c:

/*
 * If the page was split between the time that we surrendered our read
 * lock and acquired our write lock, then this page may no longer be the
 * right place for the key we want to insert.  In this case, we need to
 * move right in the tree.  See Lehman and Yao for an excruciatingly
 * precise description.
 */

(Why we need to say something about _bt_moveright() within
_bt_doinsert() that is equally true of both _bt_moveright() callers
isn't clear).

I think that this isn't enough. Attached patch proposes to add a small
paragraph at the top of the nbtree README, to clarify the advantages
of L&Y from a high level. I don't think it's appropriate to state the
advantages of an algorithm in a README file generally, but in this
instance it makes it easier to understand the algorithm.

--
Peter Geoghegan

Attachment

Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Mon, May 26, 2014 at 1:22 PM, Peter Geoghegan <pg@heroku.com> wrote:
> I think that this isn't enough. Attached patch proposes to add a small
> paragraph at the top of the nbtree README, to clarify the advantages
> of L&Y from a high level.


I've added this to the ongoing commitfest.

-- 
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Abhijit Menon-Sen
Date:
At 2014-07-02 17:24:06 -0700, pg@heroku.com wrote:
>
> I've added this to the ongoing commitfest.

I don't actually understand why you were able to do that (seeing as this
CF is no longer open for new patches). Trivial or not, I think at this
point it should go into the next one. Or it should be handled outside
the CF altogether.

-- Abhijit



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Michael Paquier
Date:
On Thu, Jul 3, 2014 at 11:40 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> At 2014-07-02 17:24:06 -0700, pg@heroku.com wrote:
>>
>> I've added this to the ongoing commitfest.
>
> I don't actually understand why you were able to do that (seeing as this
> CF is no longer open for new patches). Trivial or not, I think at this
> point it should go into the next one. Or it should be handled outside
> the CF altogether.
+1. Even if it is just a doc patch, it has been submitted after 6/15,
which was the CF1 deadline.
-- 
Michael



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Abhijit Menon-Sen
Date:
At 2014-07-03 11:59:21 +0900, michael.paquier@gmail.com wrote:
>
> +1. Even if it is just a doc patch, it has been submitted after 6/15,
> which was the CF1 deadline.

Well, to be fair, the original patch was posted to the list more than a
month ago, and it should have been in this CF. But… it wasn't. And now
after more than two weeks into this CF, I don't think it should be.

I moved it to 2014-08. (Sorry, Peter.)

-- Abhijit



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Wed, Jul 2, 2014 at 8:14 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote:
> Well, to be fair, the original patch was posted to the list more than a
> month ago, and it should have been in this CF. But… it wasn't. And now
> after more than two weeks into this CF, I don't think it should be.

Is that how the rule is interpreted? Okay, I defer to you. I guess
I've just never seen a situation where that distinction needed to be
drawn come up before.

> I moved it to 2014-08. (Sorry, Peter.)

I don't mind if no one looks at this until then. I agree that this is
exactly the kind of thing that generally doesn't need to be handled
through a commitfest submission. The only reason I added this to any
commitfest was to avoid having it be forgotten about entirely, which
there is a real danger of for something like this when it isn't
handled quickly. I almost forgot about it myself.

--
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Amit Kapila
Date:
On Tue, May 27, 2014 at 1:52 AM, Peter Geoghegan <pg@heroku.com> wrote:
>
> While talking to various people during pgCon, I was reminded that the
> nbtree README does a poor job of explaining the actual practical
> advantages of L&Y from a high level. The "move right" trick is
> initially mentioned only as an adjunct to a discussion of the
> special-case handling of root page splits, even though it's of huge
> significance. We also say this within nbtinsert.c:
>
> /*
>  * If the page was split between the time that we surrendered our read
>  * lock and acquired our write lock, then this page may no longer be the
>  * right place for the key we want to insert.  In this case, we need to
>  * move right in the tree.  See Lehman and Yao for an excruciatingly
>  * precise description.
>  */
>
> (Why we need to say something about _bt_moveright() within
> _bt_doinsert() that is equally true of both _bt_moveright() callers
> isn't clear).

There is a mention about the race condition where it needs to move right
in another caller (_bt_search) of _bt_moveright() as well.

/*
* Race -- the page we just grabbed may have split since we read its
* pointer in the parent (or metapage).  If it has, we may need to
* move right to its new sibling.  Do that.
..

Do you think there is more to what is already mentioned on top of second
caller which we should add or you think if it is true for both, then it should
be on top of _bt_moveright()?

> I think that this isn't enough. Attached patch proposes to add a small
> paragraph at the top of the nbtree README, to clarify the advantages
> of L&Y from a high level.  I don't think it's appropriate to state the
> advantages of an algorithm in a README file generally, but in this
> instance it makes it easier to understand the algorithm.

In general, I agree with you that we should mention about any advantage
of the algorithm we are using and especially if it is significant.  I think it
will be better if can also mention how that advantage or use is realized
in our implementation as we are already doing in README of nbtree.

+
+ Even with these read locks, Lehman and Yao's approach obviates the
+ need of earlier schemes to hold multiple read locks concurrently when
+ descending the tree as part of servicing index scans (pessimistic lock
+ coupling).  The addition of right-links at all levels, as well as the
+ addition of a page "high key" allows detection of, and dynamic
+ recovery from concurrent page splits (that is, splits between
+ unlocking an internal page, and subsequently locking its child page
+ during a descent).  L&Y Trees are sometimes referred to as "B-Link
+ trees" in the literature.
+

The above indicates 2 things:
a. L & Y doesn't need to hold read locks concurrently.
b. Advantage of right-links at all levels and "high-key".

As per my understanding, we are not following point (a) in our code,
so what is the benefit we get by having a reference of same in README?

Isn't it better if we mention how the point (b) is used in our code and
it's advantage together rather than keeping it at top of README?

Already README mentions in brief about right-link and how it is used
as below:
".. The scan must remember the page's right-link at the time it was scanned,
since that is the page to move right to; if we move right to the current
right-link then we'd re-scan any items moved by a page split. ..."


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Mon, Jul 21, 2014 at 9:55 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> There is a mention about the race condition where it needs to move right
> in another caller (_bt_search) of _bt_moveright() as well.
>
> /*
> * Race -- the page we just grabbed may have split since we read its
> * pointer in the parent (or metapage).  If it has, we may need to
> * move right to its new sibling.  Do that.
> ..
>
> Do you think there is more to what is already mentioned on top of second
> caller which we should add or you think if it is true for both, then it
> should
> be on top of _bt_moveright()?

Well, maybe it is justified to mention it multiple times, so as to not
break the reader's train of thought. I'm not sure.

> In general, I agree with you that we should mention about any advantage
> of the algorithm we are using and especially if it is significant.  I think
> it
> will be better if can also mention how that advantage or use is realized
> in our implementation as we are already doing in README of nbtree.

Right. It seems like the nbtree README is very shy about telling us
what the point of all this extra work is. IMV that should be stated
very prominently to aid understanding. A lot of people believe that we
have to do lock coupling/"crabbing" when descending the tree. We do
not. The locks acquired when descending a B-Tree in Postgres are
pretty granular. One read buffer lock is held at a time in the process
of servicing index scans. There are times during the descent of the
tree when no buffer locks are held whatsoever. Moreover, (with some
caveats) it doesn't really matter if a stale view of the page is seen
during a descent (as it happens I've been trying to think of ways to
further take advantage of this). That's pretty cool. If you only know
one thing about how the nbtree code works, that should probably be it.

> The above indicates 2 things:
> a. L & Y doesn't need to hold read locks concurrently.
> b. Advantage of right-links at all levels and "high-key".
>
> As per my understanding, we are not following point (a) in our code,
> so what is the benefit we get by having a reference of same in README?

The major reason that we don't completely avoid read locks, is, I
suppose, the need for self-consistent pages (but also because it would
break page deletion - I'm pretty sure that L&Y don't consider page
deletion, and the page deletion logic is entirely based on the Lanin
and Shasha paper and original research, but I didn't check). I think
that the sentence "Lehman and Yao don't require read locks, but assume
that in-memory copies of tree pages are unshared" is intended to
convey the point on the self-consistency of pages. Of course, Lehman
and Yao must assume that the B-Tree is in some sense in shared memory.
Otherwise, there wouldn't be much point to their elaborate locking
protocol.  :-)

> Isn't it better if we mention how the point (b) is used in our code and
> it's advantage together rather than keeping it at top of README?

Maybe it deserves more prominent mention in the code too.

> Already README mentions in brief about right-link and how it is used
> as below:
> ".. The scan must remember the page's right-link at the time it was scanned,
> since that is the page to move right to; if we move right to the current
> right-link then we'd re-scan any items moved by a page split. ..."

This is talking about how index scans interlock against VACUUM while
going to the heap, by keeping a leaf page pinned (this prevents "super
exclusive lock" acquisition). This is after the tree has been
descended. This is really just a detail (albeit one that follows
similar principles, since pages split right and it similarly exploits
that fact), whereas the use of right links and high keys while
descending the tree, and in particular the fact that the "move right"
L&Y technique obviates the prior need for "lock coupling" is pretty
much the whole point of L&Y.

In more concrete terms, _bt_search() releases and only then acquires
read locks during a descent of the tree (by calling
_bt_relandgetbuf()), and, perhaps counterintuitively, that's just
fine.
-- 
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> Right. It seems like the nbtree README is very shy about telling us
> what the point of all this extra work is.

IIRC, the README was written on the assumption that you'd already read
L&Y.  If this patch is mostly about not assuming that, why not?
        regards, tom lane



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Amit Kapila
Date:
On Wed, Jul 23, 2014 at 5:58 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Jul 21, 2014 at 9:55 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > The above indicates 2 things:
> > a. L & Y doesn't need to hold read locks concurrently.
> > b. Advantage of right-links at all levels and "high-key".
> >
> > As per my understanding, we are not following point (a) in our code,
> > so what is the benefit we get by having a reference of same in README?
>
> The major reason that we don't completely avoid read locks, is, I
> suppose, the need for self-consistent pages (but also because it would
> break page deletion - I'm pretty sure that L&Y don't consider page
> deletion, and the page deletion logic is entirely based on the Lanin
> and Shasha paper and original research, but I didn't check). I think
> that the sentence "Lehman and Yao don't require read locks, but assume
> that in-memory copies of tree pages are unshared" is intended to
> convey the point on the self-consistency of pages. Of course, Lehman
> and Yao must assume that the B-Tree is in some sense in shared memory.
> Otherwise, there wouldn't be much point to their elaborate locking
> protocol.  :-)

Okay, but how does this justify to add below new text in README.
+ Even with these read locks, Lehman and Yao's approach obviates the
+ need of earlier schemes to hold multiple read locks concurrently when
+ descending the tree as part of servicing index scans (pessimistic lock
+ coupling).

Actually I think putting it can lead to inconsistency in the README.
Currently it indicates that our algorithm is different from L&Y w.r.t taking
Read Locks and has given explanation for same.

> > Isn't it better if we mention how the point (b) is used in our code and
> > it's advantage together rather than keeping it at top of README?
>
> Maybe it deserves more prominent mention in the code too.
>
> > Already README mentions in brief about right-link and how it is used
> > as below:
> > ".. The scan must remember the page's right-link at the time it was scanned,
> > since that is the page to move right to; if we move right to the current
> > right-link then we'd re-scan any items moved by a page split. ..."
>
> This is talking about how index scans interlock against VACUUM while
> going to the heap, by keeping a leaf page pinned (this prevents "super
> exclusive lock" acquisition). This is after the tree has been
> descended. This is really just a detail (albeit one that follows
> similar principles, since pages split right and it similarly exploits
> that fact), whereas the use of right links and high keys while
> descending the tree, and in particular the fact that the "move right"
> L&Y technique obviates the prior need for "lock coupling" is pretty
> much the whole point of L&Y.
>
> In more concrete terms, _bt_search() releases and only then acquires
> read locks during a descent of the tree (by calling
> _bt_relandgetbuf()), and, perhaps counterintuitively, that's just
> fine.

So don't you think that it needs bit more explanation than you have
quoted in below text.
+ The addition of right-links at all levels, as well as the
+ addition of a page "high key" allows detection of, and dynamic
+ recovery from concurrent page splits (that is, splits between
+ unlocking an internal page, and subsequently locking its child page
+ during a descent).

Basically I think it will be better if you can explain in bit more detail that
how does "right-links at all levels and high-key" helps to detect and
recover from concurrent page splits.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Tue, Jul 22, 2014 at 5:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> IIRC, the README was written on the assumption that you'd already read
> L&Y.  If this patch is mostly about not assuming that, why not?

L&Y made the same mistake that the authors of most influential papers
make - they never get around to telling the reader why they should
bother to read it. The paper is over 30 years old, and we now know
that it's very influential, and the reasons why. I think that both the
nbtree README and L&Y would be a lot more approachable with a high
level introduction (arguably L&Y attempt this, but the way they go
about it seems impenetrable, mostly consisting of esoteric references
to other papers). Surely making that code more approachable is a
worthy goal.

-- 
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Tue, Jul 22, 2014 at 8:59 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Okay, but how does this justify to add below new text in README.
> + Even with these read locks, Lehman and Yao's approach obviates the
> + need of earlier schemes to hold multiple read locks concurrently when
> + descending the tree as part of servicing index scans (pessimistic lock
> + coupling).
>
> Actually I think putting it can lead to inconsistency in the README.
> Currently it indicates that our algorithm is different from L&Y w.r.t taking
> Read Locks and has given explanation for same.

Not really. Firstly, that sentence acknowledges that there are read
locks where L&Y assume there will not be. "Even with these read locks"
references the first paragraph, where it is stated the Postgres
B-Trees still acquire read locks while descending the tree. Secondly,
I'm pretty sure that even Lehman and Yao realized that their apparent
assumption that real implementations would not require read locks
isn't realistic. Their handling of deletion seems perfunctory to me.
They say "In situations where excessive deletions cause the storage
utilization of tree nodes to be unacceptably low, a batch
reorganization or an underflow operation which locks the entire tree
can be performed". I'm pretty sure that that sounded almost as bad in
1980 as it does now. We don't have a "not quite L&Y" implementation
just because there are single read locks acquired while descending the
tree. Prior schemes needed multiple *concurrent* exclusive locks.
B-Trees were around for about 10 years before L&Y.

There is reason to think that pretty much every practical
implementation uses read locks for many years, because there is a well
received 2001 paper [1] that describes a scheme where L&Y style B-link
trees can *actually* be made to not require read locks, which
discusses things like caching effects on contemporary hardware - it
involves playing tricks with detecting and recovering from page level
inconsistencies, IIRC. Furthermore, it references a scheme from the
late 90s involving local copies of B-Link pages. I thought about
pursuing something like that myself, but the cost of "latching"
(buffer locking) B-Trees in PostgreSQL is likely to be reduced before
too long anyway, which makes the general idea seem unenticing right
now.

> Basically I think it will be better if you can explain in bit more detail
> that
> how does "right-links at all levels and high-key" helps to detect and
> recover from concurrent page splits.

You might be right about that - perhaps I should go into more detail.

[1] http://www.vldb.org/conf/2001/P181.pdf
-- 
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Amit Kapila
Date:
On Wed, Jul 23, 2014 at 11:19 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Jul 22, 2014 at 8:59 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Okay, but how does this justify to add below new text in README.
> > + Even with these read locks, Lehman and Yao's approach obviates the
> > + need of earlier schemes to hold multiple read locks concurrently when
> > + descending the tree as part of servicing index scans (pessimistic lock
> > + coupling).
> >
> > Actually I think putting it can lead to inconsistency in the README.
> > Currently it indicates that our algorithm is different from L&Y w.r.t taking
> > Read Locks and has given explanation for same.
>
> Not really. Firstly, that sentence acknowledges that there are read
> locks where L&Y assume there will not be. "Even with these read locks"
> references the first paragraph, where it is stated the Postgres
> B-Trees still acquire read locks while descending the tree. 

I think here you want to state that the difference in Postgres is "as we are
using L & Y approach, it don't need to hold *multiple* read locks concurrently",
and L & Y approach which obviates this need is explained in second line
(which indicates the importance of maintaining right-links and high-keys to
detect and recover from page splits).  

As such there is no problem in saying the way you have mentioned, but
I feel it would be better if we can mention the mechanism of _bt_search()
as quoted by you upthread in the first line.
"> In more concrete terms, _bt_search() releases and only then acquires
> read locks during a descent of the tree (by calling
> _bt_relandgetbuf()), and, perhaps counterintuitively, that's just
> fine."

One more point, why you think it is important to add this new text
on top?  I think adding new text after "Lehman and Yao don't require read
locks, .." paragraph is okay. 


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Wed, Jul 23, 2014 at 8:52 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> As such there is no problem in saying the way you have mentioned, but
> I feel it would be better if we can mention the mechanism of _bt_search()
> as quoted by you upthread in the first line.
> "> In more concrete terms, _bt_search() releases and only then acquires
>> read locks during a descent of the tree (by calling
>> _bt_relandgetbuf()), and, perhaps counterintuitively, that's just
>> fine."

I guess I could say that too.

> One more point, why you think it is important to add this new text
> on top?  I think adding new text after "Lehman and Yao don't require read
> locks, .." paragraph is okay.

I've added it to the top because it's really the most important point
on Lehman and Yao. It's the _whole_ point. Consider how it's
introduced here, for example:
http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html

Why should I "bury the lead"?

-- 
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Amit Kapila
Date:
On Thu, Jul 24, 2014 at 9:28 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Wed, Jul 23, 2014 at 8:52 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > As such there is no problem in saying the way you have mentioned, but
> > I feel it would be better if we can mention the mechanism of _bt_search()
> > as quoted by you upthread in the first line.
> > "> In more concrete terms, _bt_search() releases and only then acquires
> >> read locks during a descent of the tree (by calling
> >> _bt_relandgetbuf()), and, perhaps counterintuitively, that's just
> >> fine."
>
> I guess I could say that too.

Okay.

> > One more point, why you think it is important to add this new text
> > on top?  I think adding new text after "Lehman and Yao don't require read
> > locks, .." paragraph is okay.
>
> I've added it to the top because it's really the most important point
> on Lehman and Yao. It's the _whole_ point. Consider how it's
> introduced here, for example:
> http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html
>
> Why should I "bury the lead"?

I think even if you want to keep it at top, may be we could have another
heading like : Concurrency Considerations with Lehman & Yao Approach

However, I think we can leave this point for Committer to decide. 


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Tue, Jul 22, 2014 at 10:49 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Basically I think it will be better if you can explain in bit more detail
>> that
>> how does "right-links at all levels and high-key" helps to detect and
>> recover from concurrent page splits.
>
> You might be right about that - perhaps I should go into more detail.

I've gone into more detail on what a high key is, and how we arguably
do not follow Lehman & Yao to the letter because we still have read
locks. L&Y's assumption of atomic page reads/writes, and cursory
handling of deletion kind of make it inevitable that read locks are
used, which I now imply. So nbtree isn't a substandard L&Y
implementation - it's a realistic one, which only needs to hold a
single read lock at a time when servicing index scans (or when finding
a place for insertion). I guess Lehman & Yao preferred to put forward
the claim "no read locks" rather than "only one read lock at a time on
internal pages, even for insertion" because there might be some uses
of their algorithm where that is actually realistic. It is really a
sympathetic way of spinning things to say that *no* read locks are
used, though.

--
Peter Geoghegan

Attachment

Re: Stating the significance of Lehman & Yao in the nbtree README

From
Heikki Linnakangas
Date:
On 09/07/2014 05:58 AM, Peter Geoghegan wrote:
> + Lehman and Yao don't require read locks, but assume that in-memory
> + copies of tree pages are unshared.  Postgres shares in-memory buffers
> + among backends.  As a result, we do page-level read locking on btree
> + pages in order to guarantee that no record is modified while we are
> + examining it.  This reduces concurrency but guarantees correct
> + behavior.  An advantage is that when trading in a read lock for a
> + write lock, we need not re-read the page after getting the write lock.
> + Since we're also holding a pin on the shared buffer containing the
> + page, we know that buffer still contains the page and is up-to-date.

This is the existing paragraph, just moved to different place in the README.

> + Although it could be argued that Lehman and Yao isn't followed to the
> + letter because single pages are read locked as the tree is descended,
> + this is at least necessary to support deletion, a common requirement
> + which L&Y hardly acknowledge.  Read locks also ensure that B-tree
> + pages are self-consistent (L&Y appear to assume atomic page reads and
> + writes).

This is just duplicating the existing paragraph. I don't see the point 
of this.

>  Even with these read locks, following L&Y obviates the need
> + of earlier schemes to hold multiple locks concurrently when descending
> + the tree as part of servicing index scans (pessimistic lock coupling).
> + The addition of right-links at all levels, as well as the addition of
> + a page "high key" allows detection and dynamic recovery from
> + concurrent page splits (that is, splits between unlocking an internal
> + page, and subsequently locking its child page during a descent).  When
> + a page is first locked (at every level of a descent servicing an index
> + scan), we consider the need to "move right":  if the scankey value is
> + less than (or sometimes less than or equal to) the page's existing
> + highkey value, a value which serves as an upper bound for values on
> + the page generally, then it must be necessary to move the scan to the
> + right-hand page on the same level.  It's even possible that the scan
> + needs to move right more than once.  Once the other session's
> + concurrent page split finishes, a downlink will be inserted into the
> + parent, and so assuming there are no further page splits, future index
> + scans using the same scankey value will not need to move right.  L&Y
> + Trees are sometimes referred to as "B-Link trees" in the literature.

This explains in a few sentences what a L&Y B-tree looks like. The 
current README assumes that you're already familiar with what a L&Y tree 
looks like, or that you go read the paper mentioned at the top of the 
README. It might be a good idea to expand on that, and add an 
introduction like this so that an unfamiliar reader doesn't need to read 
the L&Y paper first. Is that the purpose of this patch? Please make it 
more explicit. And please make the sentences simpler - the above reads 
like a Shakespeare play. Something like:

The basic Lehman & Yao Algorithm
================================

Compared to a classic B-tree, L&Y adds a right-link pointer to each 
page, to the page's right sibling. It also adds a "high key" to each 
page, which is an upper bound on the keys that are allowed on that page. 
These two additions make it possible detect a concurrent page split, 
which allows the tree to be searched without holding any read locks 
(except to keep a single page from being modified while reading it).

When a search follows a downlink to a child page, it compares the page's 
high key with the search key. If the search key is greater than the high 
key, the page must've been split concurrently, and you must follow the 
right-link to find the new page containing the key range you're looking 
for. This might need to be repeated, if the page has been split more 
than once.

Differences to the Lehman & Yao algorithm
=========================================

(current "Lehamn and Yao Algorithm and Insertions section)



I think that's pretty much the same information you added, but it's in a 
separate section, with the clear purpose that it explains what a L&Y 
tree looks like. You can skip over it, if you have read the paper or are 
otherwise already familiar with it. It still assumes that you're 
familiar with B-trees in general.

Anyway, I see that you had resurrected this in the commitfest app after 
three weeks of inactivity. I'm going to mark this back to "Returned with 
Feedback". Please don't resurrect it again, this patch has received more 
than its fair share of attention. Instead, please help by signing up to 
review a patch. The commitfest progress is glacial at the moment, and we 
really need experienced reviewers like you to get closure to people's 
patches.

- Heikki




Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Tue, Sep 9, 2014 at 12:01 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
>> + Lehman and Yao don't require read locks, but assume that in-memory
>> + copies of tree pages are unshared.

> This is the existing paragraph, just moved to different place in the README.

That's right - it seemed to make just as much sense to talk about that
here, while doing so establishes the context of talking about how what
we do differs from the canonical algorithm (which I think is true of
real-world L&Y implementations generally). Which makes the next
paragraph easier to understand:

>> + Although it could be argued that Lehman and Yao isn't followed to the
>> + letter because single pages are read locked as the tree is descended,
>> + this is at least necessary to support deletion, a common requirement
>> + which L&Y hardly acknowledge.  Read locks also ensure that B-tree
>> + pages are self-consistent (L&Y appear to assume atomic page reads and
>> + writes).
>
> This is just duplicating the existing paragraph. I don't see the point of
> this.

The point is to make clear the reason for the differences - evidently,
Amit felt it was unclear why we don't follow L&Y. I am suggesting that
L&Y's talk of having no read locks is unrealistic (it might be
realistic in the 21st century, but that's a whole other story).

>>  Even with these read locks, following L&Y obviates the need
>> + of earlier schemes to hold multiple locks concurrently when descending
>> + the tree as part of servicing index scans (pessimistic lock coupling).
>> + The addition of right-links at all levels, as well as the addition of
>> + a page "high key" allows detection and dynamic recovery from
>> + concurrent page splits (that is, splits between unlocking an internal
>> + page, and subsequently locking its child page during a descent).

> This explains in a few sentences what a L&Y B-tree looks like. The current
> README assumes that you're already familiar with what a L&Y tree looks like,
> or that you go read the paper mentioned at the top of the README. It might
> be a good idea to expand on that, and add an introduction like this so that
> an unfamiliar reader doesn't need to read the L&Y paper first. Is that the
> purpose of this patch? Please make it more explicit.

Yes, although even L&Y don't get around to telling the reader why they
should care, the mistake that many good papers make. We now know its
significance, both in general and to Postgres. Framing the discussion
like this aids understanding more than you'd think.

> And please make the
> sentences simpler - the above reads like a Shakespeare play.

Out, damned lock!

> Something like:
> The basic Lehman & Yao Algorithm
> ================================
>
> Compared to a classic B-tree, L&Y adds a right-link pointer to each page, to
> the page's right sibling. It also adds a "high key" to each page, which is
> an upper bound on the keys that are allowed on that page. These two
> additions make it possible detect a concurrent page split, which allows the
> tree to be searched without holding any read locks (except to keep a single
> page from being modified while reading it).
>
> When a search follows a downlink to a child page, it compares the page's
> high key with the search key. If the search key is greater than the high
> key, the page must've been split concurrently, and you must follow the
> right-link to find the new page containing the key range you're looking for.
> This might need to be repeated, if the page has been split more than once.
>
> Differences to the Lehman & Yao algorithm
> =========================================
>
> (current "Lehamn and Yao Algorithm and Insertions section)
>
>
>
> I think that's pretty much the same information you added, but it's in a
> separate section, with the clear purpose that it explains what a L&Y tree
> looks like. You can skip over it, if you have read the paper or are
> otherwise already familiar with it. It still assumes that you're familiar
> with B-trees in general.

That seems fair enough - I'd just expand on why we don't completely
avoid read locks, which L&Y suppose we can get away with. That is
clearly a point of confusion.

> Anyway, I see that you had resurrected this in the commitfest app after
> three weeks of inactivity. I'm going to mark this back to "Returned with
> Feedback". Please don't resurrect it again, this patch has received more
> than its fair share of attention.

I didn't mean to suggest that it deserves much attention. I didn't
know how to interpret the fact that you changed the status, since in
fact not much additional work was required. I was busy throughout, for
reasons that are perhaps obvious. I am not fussed about when this
happens, but I really think we should get around to it.

> Instead, please help by signing up to
> review a patch. The commitfest progress is glacial at the moment, and we
> really need experienced reviewers like you to get closure to people's
> patches.

I'm back from holidays now. I plan to look at the Tomas Vondra's
patch. If you think I should look at some particular patch, please let
me know.
-- 
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Heikki Linnakangas
Date:
On 09/11/2014 11:47 PM, Peter Geoghegan wrote:
> On Tue, Sep 9, 2014 at 12:01 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>>> + Although it could be argued that Lehman and Yao isn't followed to the
>>> + letter because single pages are read locked as the tree is descended,
>>> + this is at least necessary to support deletion, a common requirement
>>> + which L&Y hardly acknowledge.  Read locks also ensure that B-tree
>>> + pages are self-consistent (L&Y appear to assume atomic page reads and
>>> + writes).
>>
>> This is just duplicating the existing paragraph. I don't see the point of
>> this.
>
> The point is to make clear the reason for the differences - evidently,
> Amit felt it was unclear why we don't follow L&Y. I am suggesting that
> L&Y's talk of having no read locks is unrealistic (it might be
> realistic in the 21st century, but that's a whole other story).

IMHO the existing paragraph does a much better job explaining that:

> Lehman and Yao don't require read locks, but assume that in-memory
> copies of tree pages are unshared.  Postgres shares in-memory buffers
> among backends.  As a result, we do page-level read locking on btree
> pages in order to guarantee that no record is modified while we are
> examining it.

Amit: did you notice that paragraph in the README? If not, and now that 
you have read it, does that paragraph make things clear? If that's not 
enough, what do you think is missing?

- Heikki




Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Fri, Sep 12, 2014 at 2:15 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Amit: did you notice that paragraph in the README? If not, and now that you
> have read it, does that paragraph make things clear? If that's not enough,
> what do you think is missing?

Amit raised the fact that L&Y say that no read locks are required, so
clearly he read that paragraph. We don't try and justify that right
now, and trying to explain the whole point of L&Y without first
explaining this satisfactorily seems odd. The current text reads:

"""
Lehman and Yao don't require read locks, but assume that in-memory
copies of tree pages are unshared.
"""

If they assume that they're unshared, uh, then why bother with all
that locking stuff? Or does this mean that page reads and writes are
(dubiously) assumed atomic without locks? If you take a step back, you
can see how confusing that is. L&Y don't get around to explaining
this, but it's pretty clear that this is what's going on. If L&Y did a
better job of explaining their algorithm, they'd just say that the
"latch coupling" (coupling, or "crabbing", of what we'd call buffer
locks) previously required is no longer required, but single shared
locks are required. As I pointed out, it looks like someone figured
out a way to make that true much, much later [1]. I think page-level
MVCC designs might have also been used, but basically our
interpretation of L&Y is the standard one. We cannot really be
considered to have deviated from L&Y, since AFAICT everyone else went
with the same interpretation.

FWIW, in recent years Stonebraker has used "latch coupling" as an
example of the (implicitly no longer acceptable) overhead of designs
influenced by System R and ARIES. This is unfounded, but plenty of
bright people still at least believe that "latch coupling" is common.

[1] http://www.vldb.org/conf/2001/P181.pdf
-- 
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Robert Haas
Date:
On Fri, Sep 12, 2014 at 12:57 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Sep 12, 2014 at 2:15 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> Amit: did you notice that paragraph in the README? If not, and now that you
>> have read it, does that paragraph make things clear? If that's not enough,
>> what do you think is missing?
>
> Amit raised the fact that L&Y say that no read locks are required, so
> clearly he read that paragraph. We don't try and justify that right
> now, and trying to explain the whole point of L&Y without first
> explaining this satisfactorily seems odd. The current text reads:
>
> """
> Lehman and Yao don't require read locks, but assume that in-memory
> copies of tree pages are unshared.
> """
>
> If they assume that they're unshared, uh, then why bother with all
> that locking stuff? Or does this mean that page reads and writes are
> (dubiously) assumed atomic without locks? If you take a step back, you
> can see how confusing that is. L&Y don't get around to explaining
> this, but it's pretty clear that this is what's going on. If L&Y did a
> better job of explaining their algorithm, they'd just say that the
> "latch coupling" (coupling, or "crabbing", of what we'd call buffer
> locks) previously required is no longer required, but single shared
> locks are required. As I pointed out, it looks like someone figured
> out a way to make that true much, much later [1]. I think page-level
> MVCC designs might have also been used, but basically our
> interpretation of L&Y is the standard one. We cannot really be
> considered to have deviated from L&Y, since AFAICT everyone else went
> with the same interpretation.

Gosh, I think you're making this way more complicated than it needs to
be.  My interpretation of the above statement was that they knew
individual page reads and writes would need to be made atomic -
probably using some form of simple locking - but omitted that from
their pseudocode for clarity.  Such elisions are common in computer
science literature and don't typically detract from understanding.  It
is assumed that the implementor knows enough to avoid the trivial
pitfalls of whatever is being discussed and therefore focuses on the
higher-level algorithmic issues.

If this is what we're arguing about, it's completely not worth the
time we've spent on it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Fri, Sep 12, 2014 at 10:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Gosh, I think you're making this way more complicated than it needs to
> be.  My interpretation of the above statement was that they knew
> individual page reads and writes would need to be made atomic -
> probably using some form of simple locking - but omitted that from
> their pseudocode for clarity.

That clearly isn't the case. The introductory paragraph of L&Y says
the following:

"Our solution compares favorably with earlier solutions in that the
locking scheme is simpler (no read-locks are used) and only a (small)
constant number of nodes are locked by any update process at any given
time."

They clearly and prominently state that not needing read locks is a
major advantage of their algorithm, which doesn't quite ring true.

> If this is what we're arguing about, it's completely not worth the
> time we've spent on it.

It isn't. It's a minor point, originally raised by Amit.

-- 
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Kevin Grittner
Date:
Peter Geoghegan <pg@heroku.com> wrote:

> The introductory paragraph of L&Y says the following:
>
> "Our solution compares favorably with earlier solutions in that the
> locking scheme is simpler (no read-locks are used) and only a (small)
> constant number of nodes are locked by any update process at any given
> time."
>
> They clearly and prominently state that not needing read locks is a
> major advantage of their algorithm, which doesn't quite ring true.

It's been a while since I read that paper, but my recollection is
that they assumed that each process or thread looking at a buffer
would have its own private copy of that buffer, which it could be
sure nobody was changing (even if the "master" copy somewhere else
was changing).  Locking was only needed to prevent conflicting
writes.  Now, whether it is safe to assume that creating a
process-local buffer and copying to it is cheaper than getting a
lock seems dicey, but that seemed to be the implicit assumption.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Fri, Sep 12, 2014 at 12:39 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
> It's been a while since I read that paper, but my recollection is
> that they assumed that each process or thread looking at a buffer
> would have its own private copy of that buffer, which it could be
> sure nobody was changing (even if the "master" copy somewhere else
> was changing).  Locking was only needed to prevent conflicting
> writes.  Now, whether it is safe to assume that creating a
> process-local buffer and copying to it is cheaper than getting a
> lock seems dicey, but that seemed to be the implicit assumption.

That is one way to make reads atomic, but I don't recall any explicit
mention of it. In 1981, I think page sizes were about the same as
today, but 4K was a lot of memory. We could actually do this, with
some work. I think that this has actually been implemented elsewhere,
though. Note that L&Y have practically nothing to say about deletion -
they simply suggest that it be done offline.

It is really useful that we can recover from page splits as and when
problems arise. That's really what I'd like to prominently convey - it
is the whole point of L&Y.

-- 
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Amit Kapila
Date:
On Fri, Sep 12, 2014 at 2:45 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>
> On 09/11/2014 11:47 PM, Peter Geoghegan wrote:
>>
>> On Tue, Sep 9, 2014 at 12:01 AM, Heikki Linnakangas
>> <hlinnakangas@vmware.com> wrote:
>>>>
>>>> + Although it could be argued that Lehman and Yao isn't followed to the
>>>> + letter because single pages are read locked as the tree is descended,
>>>> + this is at least necessary to support deletion, a common requirement
>>>> + which L&Y hardly acknowledge.  Read locks also ensure that B-tree
>>>> + pages are self-consistent (L&Y appear to assume atomic page reads and
>>>> + writes).
>>>
>>>
>>> This is just duplicating the existing paragraph. I don't see the point of
>>> this.
>>
>>
>> The point is to make clear the reason for the differences - evidently,
>> Amit felt it was unclear why we don't follow L&Y. I am suggesting that
>> L&Y's talk of having no read locks is unrealistic (it might be
>> realistic in the 21st century, but that's a whole other story).
>
>
> IMHO the existing paragraph does a much better job explaining that:
>
>> Lehman and Yao don't require read locks, but assume that in-memory
>> copies of tree pages are unshared.  Postgres shares in-memory buffers
>> among backends.  As a result, we do page-level read locking on btree
>> pages in order to guarantee that no record is modified while we are
>> examining it.
>
>
> Amit: did you notice that paragraph in the README?

Yeah, I have noticed and even discussed in this thread that
it might be better to add new text after that paragraph.

> If not, and now that you have read it, does that paragraph make things clear? If that's not enough, what do you think is missing?

The paragraph is quite clear to me and the only thing, I have
mentioned above is that if you (Peter) want to add anything
new in that regard, then it is about an additional point why
read locks are taken during traversal of tree which is something
he is trying to say in his new patch as below:
"this is at least necessary to support deletion,"

However it could have been added in the existing text as well.

Having said that, I think that was just a very minor thing which is
generally obvious and was not at all the main feedback for patch.
The main point was to explain about how does "right-links at all
levels and high-key" helps to detect and recover from concurrent
page splits which I think the patch has tried to explain, but honestly
the wording you have posted (under heading
"The basic Lehman & Yao Algorithm
================================") is more clear to me.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Stating the significance of Lehman & Yao in the nbtree README

From
Amit Kapila
Date:
On Fri, Sep 12, 2014 at 10:54 PM, Peter Geoghegan <pg@heroku.com> wrote:
>
> It isn't. It's a minor point, originally raised by Amit.

I have observed that this patch is in 'Needs Review' state for
next CF.  Do you expect any further review from myside?  I think
we can use text recommended by Heikki and after that if you
feel something more is still required, then you can update the same
and send an updated patch.  I believe it should be in 'Waiting On Author'
state, please do let me know if you feel otherwise.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Fri, Sep 26, 2014 at 11:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I have observed that this patch is in 'Needs Review' state for
> next CF.  Do you expect any further review from myside?  I think
> we can use text recommended by Heikki and after that if you
> feel something more is still required, then you can update the same
> and send an updated patch.  I believe it should be in 'Waiting On Author'
> state, please do let me know if you feel otherwise.

I don't think so. Thanks for the review!

-- 
Peter Geoghegan



Re: Stating the significance of Lehman & Yao in the nbtree README

From
Heikki Linnakangas
Date:
On 09/27/2014 09:36 AM, Peter Geoghegan wrote:
> On Fri, Sep 26, 2014 at 11:34 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I have observed that this patch is in 'Needs Review' state for
>> next CF.  Do you expect any further review from myside?  I think
>> we can use text recommended by Heikki and after that if you
>> feel something more is still required, then you can update the same
>> and send an updated patch.  I believe it should be in 'Waiting On Author'
>> state, please do let me know if you feel otherwise.
>
> I don't think so. Thanks for the review!

Ok, applied those extra paragraphs now, and marked as "committed" in the 
commitfest.

- Heikki




Re: Stating the significance of Lehman & Yao in the nbtree README

From
Peter Geoghegan
Date:
On Mon, Nov 24, 2014 at 3:51 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> Ok, applied those extra paragraphs now, and marked as "committed" in the
> commitfest.

Thanks!

-- 
Peter Geoghegan