Re: GiST VACUUM - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: GiST VACUUM
Date
Msg-id B15F859E-D858-4DC4-800C-69B2CB3094BF@yandex-team.ru
Whole thread Raw
In response to Re: GiST VACUUM  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: GiST VACUUM  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
> 14 июля 2018 г., в 0:28, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> On 13/07/18 21:28, Andrey Borodin wrote:
>>> 13 июля 2018 г., в 18:25, Heikki Linnakangas <hlinnaka@iki.fi>
>>> написал(а):
>>> Looking at the second patch, to scan the GiST index in physical
>>> order, that seems totally unsafe, if there are any concurrent page
>>> splits. In the logical scan, pushStackIfSplited() deals with that,
>>> by comparing the page's NSN with the parent's LSN. But I don't see
>>> anything like that in the physical scan code.
>> Leaf page can be pointed by internal page and rightlink
>> simultaneously. The purpose of NSN is to visit this page exactly once
>> by following only on of two links in a scan. This is achieved
>> naturally if we read everything from the beginning to the end. (That
>> is how I understand, I can be wrong)
>
> The scenario where this fails goes like this:
>
> 1. Vacuum scans physical pages 1-10
> 2. A concurrent insertion splits page 15. The new left half stays on page 15, but the new right half goes to page 5
> 3. Vacuum scans pages 11-20
>
> Now, if there were any dead tuples on the right half of the split, moved to page 5, the vacuum would miss them.
>
> The way this is handled in B-tree is that when a page is split, the page is stamped with current "vacuum cycle id".
Whenthe vacuum scan encounters a page with the current cycle id, whose right-link points to a lower-numbered page, it
immediatelyfollows the right link, and re-scans it. I.e. in the above example, if it was a B-tree, in step 3 when
vacuumscans page 15, it would see that it was concurrently split. It would immediately vacuum page 5 again, before
continuingthe scan in physical order. 
>
> We'll need to do something similar in GiST.

OK, I will do this.

This is tradeoff between complex concurrency feature and possibility of few dead tuples left after VACUUM. I want to
understand:is it something dangerous in this dead tuples? 
There is one more serious race condition: result of first scan is just a hint where to look for downlinks to empty
pages.If internal page splits between scan and cleanup, offsets of downlinks will be changed, cleanup will lock pages,
seenon-empty pages and will not delete them (though there are not dead tuples, just not deleted empty leafs). 


Best regards, Andrey Borodin.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Fix some error handling for read() and errno
Next
From: John Naylor
Date:
Subject: Re: missing toast table for pg_policy