Thread: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

[HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Jeff Davis
Date:
https://www.postgresql.org/message-id/CAJEAwVE4UAmm8fr%2BNW8XTnKV6M--ACoNhL3ES8yoKL2sKhbaiw%40mail.gmail.com

Let me re-summarize what's been done here to make sure I understand:

Each key in GIN has its own posting tree, which is itself a btree,
holding all of the tuples that contain that key. This posting tree is
really just a set of tuples, and searches always scan an entire
posting tree (because all the tuples contain the key, so all are a
potential match).

Currently, the cleanup process requires blocking all reads and writes
in the posting list. I assume this is only a problem when a key is
common enough that its posting list is quite large (how large?).

This patch makes a couple changes to improve concurrency:

1. Vacuum leaves first, without removing any pages. Instead, just keep
track of pages which are completely empty (more generally, subtrees
that contain empty pages).

2. Free the empty pages in those subtrees. That requires blocking
reads and writes to that subtree, but not the entire posting list.
Blocking writes is easy: acquiring a cleanup lock on the root of any
subtree always blocks any writes to that subtree, because the write
keeps a pin on all pages in that path. Blocking reads is accomplished
by locking the page to be deleted, then locking/unlocking the page one
left of that (which may be outside the subtree?).


Maybe a basic question, but why is the posting list a btree at all,
and not, say a doubly-linked list?

Review of the code itself:

* Nice and simple, with a net line count of only +45.
* Remove commented-out code.
* In the README, "leaf-to-right" should be left-to-right; and "takinf"
should be "taking".
* When you say the leftmost page is never deleted; do you mean the
leftmost of the entire posting tree, or leftmost of the subtree that
you are removing pages from?
* In order to keep out concurrent reads, you need to lock/unlock the
left page while holding exclusive lock on the page being deleted, but
I didn't see how that happens exactly in the code. Where does that
happen?

Regards,    Jeff Davis



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
Hi, Jeff!

Thanks for review!

Here's a patch corrected according to your suggestions.

2017-01-19 11:48 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:
> https://www.postgresql.org/message-id/CAJEAwVE4UAmm8fr%2BNW8XTnKV6M--ACoNhL3ES8yoKL2sKhbaiw%40mail.gmail.com
>
> Let me re-summarize what's been done here to make sure I understand:
>
> Each key in GIN has its own posting tree, which is itself a btree,
> holding all of the tuples that contain that key. This posting tree is
> really just a set of tuples, and searches always scan an entire
> posting tree (because all the tuples contain the key, so all are a
> potential match).
>
> Currently, the cleanup process requires blocking all reads and writes
> in the posting list. I assume this is only a problem when a key is
> common enough that its posting list is quite large (how large?).
The posting tree shall be at least 3 levels high to make this patch
useful. I guess it's at least 10000 postings of a single term
(assuming average hundred of tuples per page).

> This patch makes a couple changes to improve concurrency:
>
> 1. Vacuum leaves first, without removing any pages. Instead, just keep
> track of pages which are completely empty (more generally, subtrees
> that contain empty pages).
>
> 2. Free the empty pages in those subtrees. That requires blocking
> reads and writes to that subtree, but not the entire posting list.
> Blocking writes is easy: acquiring a cleanup lock on the root of any
> subtree always blocks any writes to that subtree, because the write
> keeps a pin on all pages in that path. Blocking reads is accomplished
> by locking the page to be deleted, then locking/unlocking the page one
> left of that (which may be outside the subtree?).
No, leftmost page within tree. It may be referenced by rightlink from
outside the subtree, that's the reason we ceep it alive even if it's
empy.

> Maybe a basic question, but why is the posting list a btree at all,
> and not, say a doubly-linked list?
When GIN executes searches usually it have to fetch documents which
contains intersection of posting lists. It is faster to intersect
B-trees, if common elements are rare.

>
> Review of the code itself:
>
> * Nice and simple, with a net line count of only +45.
> * Remove commented-out code.
I've removed the code. Though it's purpose was to accent changed
tricky concurrency places
> * In the README, "leaf-to-right" should be left-to-right; and "takinf"
> should be "taking".
Fixed
> * When you say the leftmost page is never deleted; do you mean the
> leftmost of the entire posting tree, or leftmost of the subtree that
> you are removing pages from?
Leftmost of a subtree, containing totally empty subtree. It's not
neccesarily leftmost page of whole posting tree.
> * In order to keep out concurrent reads, you need to lock/unlock the
> left page while holding exclusive lock on the page being deleted, but
> I didn't see how that happens exactly in the code. Where does that
> happen?
in function ginDeletePage()
LockBuffer(lBuffer, GIN_EXCLUSIVE);
We have to mark this page dirty, since it's rightlink is changed. We
cannot mark it dirty without locking it, even if we surely know that
no any reader can reach it out (due to cleanup lock at the root of
cleaned subtree).

Thank you for your suggestions, do not hesitate to ask any questions,
concurrency and GIN both are very interesting topics.

Best regards, Andrey Borodin.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
Hello Jeff!

>Review of the code itself:
How do you think, is there anything else to improve in that patch or
we can mark it as 'Ready for committer'?

I've checked the concurrency algorithm with original work of Lehman
and Yao on B-link-tree. For me everything looks fine (safe, deadlock
free and not increased probability of livelock due to
LockBufferForCleanup call).

Best regards, Andrey Borodin.



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Jeff Davis
Date:
On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin <borodin@octonica.com> wrote:
> Hello Jeff!
>
>>Review of the code itself:
> How do you think, is there anything else to improve in that patch or
> we can mark it as 'Ready for committer'?
>
> I've checked the concurrency algorithm with original work of Lehman
> and Yao on B-link-tree. For me everything looks fine (safe, deadlock
> free and not increased probability of livelock due to
> LockBufferForCleanup call).

Hi,

I looked at the patch some more.

I have some reservations about the complexity and novelty of the
approach. I don't think I've seen an approach quite like this one
before. It seems like the main reason you are using this approach is
because the btree structure was too simple (no left links); is that
right? My gut is telling me we should either leave it simple, or use a
real concurrent btree implementation.

One idea I had that might be simpler is to use a two-stage page
delete. The first stage would remove the link from the parent and mark
the page deleted, but leave the right link intact and prevent
recycling. The second stage would follow the chain of right links
along each level, removing the right links to deleted pages and
freeing the page to be recycled.

Another idea is to partition the posting tree so that the root page
actually points to K posting trees. Scans would need to merge them,
but it would allow inserts to progress in one tree while the other is
being vacuumed. I think this would give good concurrency even for K=2.
I just had this idea now, so I didn't think it through very well.

What do you think?

Regards,    Jeff Davis



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
Hi!

2017-01-23 11:32 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:
> I have some reservations about the complexity and novelty of the
> approach. I don't think I've seen an approach quite like this one
> before. It seems like the main reason you are using this approach is
> because the btree structure was too simple (no left links); is that
> right? My gut is telling me we should either leave it simple, or use a
> real concurrent btree implementation.

GIN B-tree is departed from backend\access\nbtree almost like nbtree
is departed from L&Y algorithm. As far as I understand there is no
"high key" concept as in nbtree. I'm not sure that it's possible to
implement same level of cuncurrency as in nbtree without that.
Also GIN B-tree is actually two different B-trees, large portions of
code is shared between Entries tree and Postings tree. I'm not sure
going fully concurrent vacuum worth it, there will be a lot of
changes. And then there is compression...installing high keys into GIN
may be a mess, with pg_upgrade.

Proposed patch, on a contrary, simplifies code. There is more code
deleted than added. It does not affect WAL, changes nothing in page
layout.

> One idea I had that might be simpler is to use a two-stage page
> delete. The first stage would remove the link from the parent and mark
> the page deleted, but leave the right link intact and prevent
> recycling. The second stage would follow the chain of right links
> along each level, removing the right links to deleted pages and
> freeing the page to be recycled.

As far as I recall, this is resemplant to Lenin and Shasha algorithm.
I'll look into it, but I think it relies on concept of "high key" on a
page somehow (high key is the key from a sibling page, not local
rightmost key as GIN B-tree uses).

> Another idea is to partition the posting tree so that the root page
> actually points to K posting trees. Scans would need to merge them,
> but it would allow inserts to progress in one tree while the other is
> being vacuumed. I think this would give good concurrency even for K=2.
> I just had this idea now, so I didn't think it through very well.

This is efficient from a point of view of frequent vacuums.
concurrency of same key inserts can benefit from this approach. But
all aother operations are more complicated and less efficient.
GIN already has Fast Update list to tackle problems in this manner.
But I do not feel I have resources to try to implement it...


Thank you for your considerations. I'll think about concurrency and
"high keys" more, but I'm realy afraid of amount of changes which may
be prerequisites.

Best regards, Andrey Borodin.



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Jeff Davis
Date:
On Mon, Jan 23, 2017 at 1:55 AM, Andrew Borodin <borodin@octonica.com> wrote:
> Proposed patch, on a contrary, simplifies code. There is more code
> deleted than added. It does not affect WAL, changes nothing in page
> layout.

I appreciate that it is fewer lines of code. My hesitation comes from
a couple areas:

1. I haven't seen the approach before of finding the parent of any
empty subtree. While that sounds like a reasonable thing to do, it
worries me to invent new approaches in an area as well-studied as
btrees. The problem is less that it's too complex, and more that it's
different. Future developers looking at this code will expect it to be
either very simple or very standard, because it's just a posting list.

2. It relies on searches to only start from the very leftmost page
(correct?). If someone wants to optimize the intersection of two
posting trees later, they might break it if they don't understand the
assumptions in this patch.

3. This adds a heap allocation, and copies the contents of the page,
and then releases the pin and lock. GinStepRight is careful to avoid
doing that (it locks the target before releasing the page containing
the pointer). I don't see a problem in your patch, but again, we are
breaking an assumption that future developers might make.

Your patch solves a real problem (a 90-second stall is clearly not
good) and I don't want to dismiss that. But I'd like to consider some
alternatives that may not have these downsides.

Regards,    Jeff Davis



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
Hello!

I see your point on alternatives. I will do my best to generate more
ideas on how vacuum can be done withing existing index structure, this
will take a day or two.
In this message I'll answer some questions.

2017-01-23 22:45 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:
> 1. I haven't seen the approach before of finding the parent of any
> empty subtree. While that sounds like a reasonable thing to do, it
> worries me to invent new approaches in an area as well-studied as
> btrees. The problem is less that it's too complex, and more that it's
> different. Future developers looking at this code will expect it to be
> either very simple or very standard, because it's just a posting list.

Technically, approach of locking a subtree is not novel. Lehman and
Yao focused on "that any process for manipulating the tree uses only a
small (constant) number of locks at any time." We are locking unknown
and possibly large amount of pages.

> 2. It relies on searches to only start from the very leftmost page
> (correct?). If someone wants to optimize the intersection of two
> posting trees later, they might break it if they don't understand the
> assumptions in this patch.

Searches start from root, to find a correct position.
But cuncurrency of the patch does not depend on that.
Patch relies on the idea, that at any given subtree, if:
1. There is no inserts within subtree (guaranteed by cleanup lock)
2. Every page was locked with Exclusive lock at least once, locks were
taken from left to right, without lock coupling.
than: there is no read searches within the subtree.

> 3. This adds a heap allocation, and copies the contents of the page,
> and then releases the pin and lock. GinStepRight is careful to avoid
> doing that (it locks the target before releasing the page containing
> the pointer). I don't see a problem in your patch, but again, we are
> breaking an assumption that future developers might make.

Yes, heap allocation worries me a lot too, but let me explain details.

GinStepRight - is the place where lock coupling is done. That is, you
lock rightlink before releasing already locked page.
Generally, B-tree tends to lock just one page at a time, but this
place is an important exception (not the only exception, but others
behave similarly).

Proposed patch is doing tree traversal at worst two times.

First: scan trough the tree, lock internal pages for read, lock leafs
for write, delete dead tuples. If empty subtrees are found, invoke for
them second stage.
Here children Block Numbers are stored in palloc`d array, to do this
scan with minimum of locks. This is unacceptable if there may be
concurrent VACUUM, which can delete a page when we are asleep for some
reason. (My point of worring number 1)

Second: this stage recives subtree containing a lot of pages which
must be just deleted. Here we take cleanup lock on subtree root,
ensuring no inserts are within (they hold pins on stack from root to
leaf). Than we take exclusive locks on all pages from left to right,
without lock coupling. When we find empty page, we lock left page too,
thus doing lock coupling form right to left. If search might be wihtin
a subtree, it could deadlock with us. (My point of worring number 2)

But both points seems to have reasonable explanation why it cannot happen.

Best regards, Andrey Borodin.



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Jeff Davis
Date:
On Tue, Jan 24, 2017 at 3:18 AM, Andrew Borodin <borodin@octonica.com> wrote:
> Technically, approach of locking a subtree is not novel. Lehman and
> Yao focused on "that any process for manipulating the tree uses only a
> small (constant) number of locks at any time." We are locking unknown
> and possibly large amount of pages.

By the way, can you show me where the Lehman and Yao paper addresses
page recycling?

It says that one approach is to allow fewer than K entries on a leaf
node; presumably as few as zero. But it doesn't seem to show how to
remove all references to the page and recycle it in a new place in the
tree.

Regards,    Jeff Davis



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
2017-01-24 22:29 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:
> On Tue, Jan 24, 2017 at 3:18 AM, Andrew Borodin <borodin@octonica.com> wrote:
>> Technically, approach of locking a subtree is not novel. Lehman and
>> Yao focused on "that any process for manipulating the tree uses only a
>> small (constant) number of locks at any time." We are locking unknown
>> and possibly large amount of pages.
>
> By the way, can you show me where the Lehman and Yao paper addresses
> page recycling?
>
> It says that one approach is to allow fewer than K entries on a leaf
> node; presumably as few as zero. But it doesn't seem to show how to
> remove all references to the page and recycle it in a new place in the
> tree.
>
> Regards,
>      Jeff Davis

Here J. Hellerstein comments L&Y paper [1] saying that effectively
there is no deletion algorithm provided.
Here [2] is paper referenced from nbtree docs. I'll check if this is
applicable to GIN B-tree.

[1] http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html
[2] http://dl.acm.org/citation.cfm?id=324589



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
2017-01-25 12:09 GMT+05:00 Andrew Borodin <borodin@octonica.com>:
> 2017-01-24 22:29 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:
>> By the way, can you show me where the Lehman and Yao paper addresses
>> page recycling?
>
> Here J. Hellerstein comments L&Y paper [1] saying that effectively
> there is no deletion algorithm provided.
> Here [2] is paper referenced from nbtree docs. I'll check if this is
> applicable to GIN B-tree.
>
> [1] http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html
> [2] http://dl.acm.org/citation.cfm?id=324589

Hi!

I'll summarize here my state of studying concurrent methods of page unlinking.

GIN B-tree does not have "high key". That means, that rightmost key on
a page is maximal for the non-leaf page.
But I do not see anything theoretical in a way of implementation of
Lanin and Shasha`s methods of page merging, with slight modifications.
Their paper does not even mention high key(high fence key in papers by
Goetz Graefe).

But it's not a simple task due to large portions of shared code
between entry tree and posting tree.

Also, I do not see a reason why this method can be practically
superior to proposed patch.

Currently, I do not have resources to implement a proof of concept for
fully concurrent page unlinking to make benchmarking.

Best regards, Andrey Borodin.



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Michael Paquier
Date:
On Mon, Jan 30, 2017 at 4:28 PM, Andrew Borodin <borodin@octonica.com> wrote:
> I'll summarize here my state of studying concurrent methods of page unlinking.
>
> GIN B-tree does not have "high key". That means, that rightmost key on
> a page is maximal for the non-leaf page.
> But I do not see anything theoretical in a way of implementation of
> Lanin and Shasha`s methods of page merging, with slight modifications.
> Their paper does not even mention high key(high fence key in papers by
> Goetz Graefe).
>
> But it's not a simple task due to large portions of shared code
> between entry tree and posting tree.
>
> Also, I do not see a reason why this method can be practically
> superior to proposed patch.
>
> Currently, I do not have resources to implement a proof of concept for
> fully concurrent page unlinking to make benchmarking.

I am marking this patch as returned with feedback.
-- 
Michael



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Vladimir Borodin
Date:

31 янв. 2017 г., в 9:50, Michael Paquier <michael.paquier@gmail.com> написал(а):

On Mon, Jan 30, 2017 at 4:28 PM, Andrew Borodin <borodin@octonica.com> wrote:
I'll summarize here my state of studying concurrent methods of page unlinking.

GIN B-tree does not have "high key". That means, that rightmost key on
a page is maximal for the non-leaf page.
But I do not see anything theoretical in a way of implementation of
Lanin and Shasha`s methods of page merging, with slight modifications.
Their paper does not even mention high key(high fence key in papers by
Goetz Graefe).

But it's not a simple task due to large portions of shared code
between entry tree and posting tree.

Also, I do not see a reason why this method can be practically
superior to proposed patch.

Currently, I do not have resources to implement a proof of concept for
fully concurrent page unlinking to make benchmarking.

I am marking this patch as returned with feedback.

Michael, sorry, but why? If I understood everything right, the main question from Jeff was why is it implemented in such way? And Jeff wanted to see other ways of solving the problem. Andrew wrote about them above and it seems that implementing them would be quite expensive and without any obvious win. I would rather expect some reaction from Jeff or may be someone else, so shouldn’t it be marked as «Ready for committer» or at least «Moved to next CF»?

--
Michael


--
May the force be with you…

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Michael Paquier
Date:
On Tue, Jan 31, 2017 at 4:31 PM, Vladimir Borodin <root@simply.name> wrote:
> 31 янв. 2017 г., в 9:50, Michael Paquier <michael.paquier@gmail.com>
> написал(а):
>
>> I am marking this patch as returned with feedback.
>
> Michael, sorry, but why?

Because I have been through many patches today.

> If I understood everything right, the main question
> from Jeff was why is it implemented in such way? And Jeff wanted to see
> other ways of solving the problem. Andrew wrote about them above and it
> seems that implementing them would be quite expensive and without any
> obvious win. I would rather expect some reaction from Jeff or may be someone
> else, so shouldn’t it be marked as «Ready for committer» or at least «Moved
> to next CF»?

I have moved to to the next CF.
--
Michael



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Jeff Davis
Date:
On Sun, Jan 22, 2017 at 10:32 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin <borodin@octonica.com> wrote:
> One idea I had that might be simpler is to use a two-stage page
> delete. The first stage would remove the link from the parent and mark
> the page deleted, but leave the right link intact and prevent
> recycling. The second stage would follow the chain of right links
> along each level, removing the right links to deleted pages and
> freeing the page to be recycled.

Do you think this approach is viable as a simplification?

Regards,   Jeff Davis



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
Hi, Jeff!

2017-02-05 3:45 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:
> On Sun, Jan 22, 2017 at 10:32 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>> On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin <borodin@octonica.com> wrote:
>> One idea I had that might be simpler is to use a two-stage page
>> delete. The first stage would remove the link from the parent and mark
>> the page deleted, but leave the right link intact and prevent
>> recycling. The second stage would follow the chain of right links
>> along each level, removing the right links to deleted pages and
>> freeing the page to be recycled.
>
> Do you think this approach is viable as a simplification?

To consider this approach I need to ask several questions.

1. Are we discussing simplification of existing GIN vacuum? Or
proposed GIN vacuum? Please, note that they do not differ in the way
page is unlinked, function ginDeletePage() is almost untoched.

2. What do you mean by "stage"? In terms of the paper "A symmetric
concurrent B-tree algorithm" by Lanin&Shasha: stage is an
uninterrupted period of holding lock on nonempty page set.
Here's the picture https://www.screencast.com/t/xUpGKgkkU from L&S
Both paper (L&Y and L&S) tend to avoid lock coupling, which is
inevitable if you want to do parent unlink first. To prevent insertion
of records on a page, you have to mark it. If you are doing this in
the stage when you unlink from parent - you have to own both locks.

3. What do you want to simplify? Currently we unlink a page from
parent and left sibling in one stage, at cost of aquiring CleanUp lock
(the way we aquire it - is the diference between current and patched
version).
2-stage algorithms will not be simplier, yet it will be more concurrent.
Please note, that during absence of high fence keys in GIN B-tree we
actually should implement algorithm from figure 3A
https://www.screencast.com/t/2cfGZtrzaz0z  (It would be incorrect, but
only with presence of high keys)

Best regards, Andrey Borodin.



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
David Steele
Date:
On 2/5/17 11:04 AM, Andrew Borodin wrote:
> Hi, Jeff!
> 
> 2017-02-05 3:45 GMT+05:00 Jeff Davis <pgsql@j-davis.com>:
>> On Sun, Jan 22, 2017 at 10:32 PM, Jeff Davis <pgsql@j-davis.com> wrote:
>>> On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin <borodin@octonica.com> wrote:
>>> One idea I had that might be simpler is to use a two-stage page
>>> delete. The first stage would remove the link from the parent and mark
>>> the page deleted, but leave the right link intact and prevent
>>> recycling. The second stage would follow the chain of right links
>>> along each level, removing the right links to deleted pages and
>>> freeing the page to be recycled.
>>
>> Do you think this approach is viable as a simplification?
> 
> To consider this approach I need to ask several questions.
> 
> 1. Are we discussing simplification of existing GIN vacuum? Or
> proposed GIN vacuum? Please, note that they do not differ in the way
> page is unlinked, function ginDeletePage() is almost untoched.
> 
> 2. What do you mean by "stage"? In terms of the paper "A symmetric
> concurrent B-tree algorithm" by Lanin&Shasha: stage is an
> uninterrupted period of holding lock on nonempty page set.
> Here's the picture https://www.screencast.com/t/xUpGKgkkU from L&S
> Both paper (L&Y and L&S) tend to avoid lock coupling, which is
> inevitable if you want to do parent unlink first. To prevent insertion
> of records on a page, you have to mark it. If you are doing this in
> the stage when you unlink from parent - you have to own both locks.
> 
> 3. What do you want to simplify? Currently we unlink a page from
> parent and left sibling in one stage, at cost of aquiring CleanUp lock
> (the way we aquire it - is the diference between current and patched
> version).
> 2-stage algorithms will not be simplier, yet it will be more concurrent.
> Please note, that during absence of high fence keys in GIN B-tree we
> actually should implement algorithm from figure 3A
> https://www.screencast.com/t/2cfGZtrzaz0z  (It would be incorrect, but
> only with presence of high keys)

This patch applies cleanly and compiles at cccbdde.

Jeff, any thoughts on Andrew's responses?

-- 
-David
david@pgmasters.net



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
2017-03-16 21:27 GMT+05:00 David Steele <david@pgmasters.net>:
> This patch applies cleanly and compiles at cccbdde.
>
> Jeff, any thoughts on Andrew's responses?

Hi, David!

I've got some updates on the matter of this patch, since the
understanding of the B-tree bothered me much.
Currently, I'm at PgConf.Russia, where I've contacted Theodor Sigaev,
and he answered my questions about the GIN.
0. I think that proposed patch is safe (deadlock free, does not
introduce new livelocks, all the resources guarded properly)
1. There _are_ high keys at the posting trees, they are just called
rightmost keys, but in fact they are high keys in terms of L&Y
algorithm.
2.  Thus, L&S fully concurrent vacuum is possible, indeed, and
furthermore Theodor suggested that I should implement not only page
eviction, but also page merge and tree condence algorithm.
3. Eventually, I'll do that, certainly, but, currently, I can't
predict the time it'll take. I think I'll start somewhere in the
summer, may be right after GiST intrapage indexing.

As for now, I think that having this patch in PostgreSQL 10 is viable.

Best regards, Andrey Borodin.



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Peter Geoghegan
Date:
On Thu, Mar 16, 2017 at 11:53 AM, Andrew Borodin <borodin@octonica.com> wrote:
> 2.  Thus, L&S fully concurrent vacuum is possible, indeed, and
> furthermore Theodor suggested that I should implement not only page
> eviction, but also page merge and tree condence algorithm.

I think that it's very hard to make merging of pages that are not
completely empty work, while also using the L&Y algorithm.


-- 
Peter Geoghegan



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
2017-03-16 23:55 GMT+05:00 Peter Geoghegan <pg@bowt.ie>:
> On Thu, Mar 16, 2017 at 11:53 AM, Andrew Borodin <borodin@octonica.com> wrote:
>> 2.  Thus, L&S fully concurrent vacuum is possible, indeed, and
>> furthermore Theodor suggested that I should implement not only page
>> eviction, but also page merge and tree condence algorithm.
>
> I think that it's very hard to make merging of pages that are not
> completely empty work, while also using the L&Y algorithm.

That's true. This is a distant plan...

Best regards, Andrey Borodin.



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Teodor Sigaev
Date:
> Thank you for your suggestions, do not hesitate to ask any questions,
> concurrency and GIN both are very interesting topics.

I had a look on patch and found some issue.
Look at ginvacuum.c around line 387, function ginVacuumPostingTreeLeaves():

        /*         * All subtree is empty - just return TRUE to indicate that parent must         * do a cleanup.
Unlesswe are ROOT an there is way to go upper.         */
 
        if(isChildHasVoid && !isAnyNonempy && !isRoot)            return TRUE;
        if(isChildHasVoid)        {    ...    ginScanToDelete(gvs, blkno, TRUE, &root, InvalidOffsetNumber);}

In first 'if' clause I see !isRoot, so second if and corresponding 
ginScanToDelete() could be called only for root in posting tree. If so, it seems 
to me, it wasn't a good idea to move ginScanToDelete() from
ginVacuumPostingTree() and patch should just remove lock for cleanup over 
ginVacuumPostingTreeLeaves() and if it returns that tree contains empty page 
then lock the whole posting tree to do ginScanToDelete() work.



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
Hi, Teodor!

2017-03-21 20:32 GMT+05:00 Teodor Sigaev <teodor@sigaev.ru>:
> I had a look on patch

That's great, thanks!

>
>         /*
>          * All subtree is empty - just return TRUE to indicate that parent
> must
>          * do a cleanup. Unless we are ROOT an there is way to go upper.
>          */
>
>         if(isChildHasVoid && !isAnyNonempy && !isRoot)
>             return TRUE;
>
>         if(isChildHasVoid)
>         {
>                 ...
>                 ginScanToDelete(gvs, blkno, TRUE, &root,
> InvalidOffsetNumber);
>         }
>
> In first 'if' clause I see !isRoot, so second if and corresponding
> ginScanToDelete() could be called only for root in posting tree.

No, second conditional code will be called for any subtree, which
contains totally empty subtree. That check !isRoot covers case when
the entire posting tree should be erased: we cannot just quit out of
recursive cleanup, we have to make a scan here, starting from root.

Probably, variable isChildHasVoid has a bit confusing name. This flag
indicates that some subtree:
1. Had empty pages
2. Did not bother deleting them, because there is a chance that it is
a part of a bigger empty subtree.
May be it'd be better to call the variable "someChildIsVoidSubtree".

>just remove lock for cleanup over ginVacuumPostingTreeLeaves() and if it returns that tree contains empty page then
lockthe whole posting tree to do ginScanToDelete() work.
 

It is, indeed, viable approach. But currently proposed patch is
capable of dealing with small page deletes without much of locking
fuss, even in 4-5 level trees.

How do you think, which way should we take?

Best regards, Andrey Borodin.



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Teodor Sigaev
Date:
> No, second conditional code will be called for any subtree, which
> contains totally empty subtree. That check !isRoot covers case when
> the entire posting tree should be erased: we cannot just quit out of
> recursive cleanup, we have to make a scan here, starting from root.
Oh, I see

> Probably, variable isChildHasVoid has a bit confusing name. This flag
> indicates that some subtree:
> 1. Had empty pages
> 2. Did not bother deleting them, because there is a chance that it is
> a part of a bigger empty subtree.
> May be it'd be better to call the variable "someChildIsVoidSubtree".

hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')

And if the whole posting tree is empty,then we could mark root page as leaf and 
remove all other pages in tree without any locking. Although, it could be a task 
for separate patch.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Andrew Borodin
Date:
2017-03-22 22:48 GMT+05:00 Teodor Sigaev <teodor@sigaev.ru>:
> hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')

Yes, I think this naming is good. It's clear what's in common in these
flags and what's different.

> And if the whole posting tree is empty,then we could mark root page as leaf
> and remove all other pages in tree without any locking. Although, it could
> be a task for separate patch.

From the performance point of view, this is a very good idea. Both,
performance of VACUUM and performance of Scans. But doing so we risk
to leave some garbage pages in case of a crash. And I do not see how
to avoid these without unlinking pages one by one. I agree, that
leaving this trick for a separate patch is quite reasonable.

Best regards, Andrey Borodin.



Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

From
Teodor Sigaev
Date:
Thank you, pushed

Andrew Borodin wrote:
> 2017-03-22 22:48 GMT+05:00 Teodor Sigaev <teodor@sigaev.ru>:
>> hasEmptyChild? and hasNonEmptyChild (BTW, isAnyNonempy has missed 't')
>
> Yes, I think this naming is good. It's clear what's in common in these
> flags and what's different.
>
>> And if the whole posting tree is empty,then we could mark root page as leaf
>> and remove all other pages in tree without any locking. Although, it could
>> be a task for separate patch.
>
> From the performance point of view, this is a very good idea. Both,
> performance of VACUUM and performance of Scans. But doing so we risk
> to leave some garbage pages in case of a crash. And I do not see how
> to avoid these without unlinking pages one by one. I agree, that
> leaving this trick for a separate patch is quite reasonable.
>
> Best regards, Andrey Borodin.
>
>

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/