Re: GiST VACUUM - Mailing list pgsql-hackers

From Andrey Borodin
Subject Re: GiST VACUUM
Date
Msg-id 5742CFFD-3A65-4980-9091-D74BE7C6D9E3@yandex-team.ru
Whole thread Raw
In response to Re: GiST VACUUM  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: GiST VACUUM
List pgsql-hackers
> 22 марта 2019 г., в 1:04, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
> ...
> When I started testing this, I quickly noticed that empty pages were not being deleted nearly as much as I expected.
Itracked it to this check in gistdeletepage: 
>
>> +       if (GistFollowRight(leafPage)
>> +               || GistPageGetNSN(parentPage) > GistPageGetNSN(leafPage))
>> +       {
>> +               /* Don't mess with a concurrent page split. */
>> +               return false;
>> +       }
>
> That NSN test was bogus. It prevented the leaf page from being reused, if the parent page was *ever* split after the
leafpage was created. I don't see any reason to check the NSN here. 
That's true. This check had sense only when parent page was not locked (and seems like comparison should be opposite).
Whenboth pages are locked, this test as no sense at all. Check was incorrectly "fixed" by me when transitioning from
single-scandelete to two-scan delete during summer 2018. Just wandering how hard is it to simply delete a page... 

>> Though, I'm not sure it is important for GIN. Scariest thing that can
>> happen: it will return same tid twice. But it is doing bitmap scan,
>> you cannot return same bit twice...
>
> Hmm. Could it return a completely unrelated tuple?
No, I do not think so, it will do comparisons on posting tree tuples.

> We don't always recheck the original index quals in a bitmap index scan, IIRC. Also, a search might get confused if
it'sdescending down a posting tree, and lands on a different kind of a page, altogether. 
Yes, search will return an error, user will reissue query and everything will be almost fine.

> PS. for Gist, we could almost use the LSN / NSN mechanism to detect the case that a deleted page is reused: Add a new
fieldto the GiST page header, to store a new "deleteNSN" field. When a page is deleted, the deleted page's deleteNSN is
setto the LSN of the deletion record. When the page is reused, the deleteNSN field is kept unchanged. When you follow a
downlinkduring search, if you see that the page's deleteNSN > parent's LSN, you know that it was concurrently deleted
andrecycled, and should be ignored. That would allow reusing deleted pages immediately. Unfortunately that would
requireadding a new field to the gist page header/footer, which requires upgrade work :-(. Maybe one day, we'll bite
thebullet. Something to keep in mind, if we have to change the page format anyway, for some reason. 
Yeah, the same day we will get rid of invalid tuples. I can make a patch for v13. Actually, I have a lot of patches
thatI want in GiST in v13. Or v14. 

Best regards, Andrey Borodin.

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: speeding up planning with partitions
Next
From: Alexander Korotkov
Date:
Subject: Re: Connections hang indefinitely while taking a gin index's LWLockbuffer_content lock