GiST VACUUM - Mailing list pgsql-hackers

From Andrey Borodin
Subject GiST VACUUM
Date
Msg-id 1B9FAC6F-FA19-4A24-8C1B-F4F574844892@yandex-team.ru
Whole thread Raw
Responses Re: GiST VACUUM  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
Hi, hackers!

Here's new version of GiST VACUUM patch set aimed at CF 2018-09.
Original thread can be found at [0]

==Purpose==
Currently GiST never reuse pages even if they are completely empty. This can lead to a bloat, if a lot of index tuples
aredeleted from key space, which do not receive newly inserted tuples. 
First patch fixes that issue: empty pages are collected and reused for new page splits.

Second patch improves scanning algorithm to read GiST blocks in physical order. This can dramatically increase speed of
scanning,especially on HDD. 
This scan is using relatively big chunk of memory to build map of whole GiST graph. If there is not enough maintenance
memory,patch had the fallback to old GiST VACUUM (from first patch). 

==How logical scan works==
GiST VACUUM scans graph in DFS search to find removed tuples on leaf pages. It remembers internal pages that are
referencingcompletely empty leaf pages. 
Then in additional step, these pages are rescanned to delete references and mark leaf pages as free.

==How physical scan works==
Scan builds array of GistPSItem (Physical scan item). GistPSItem contains List of offsets pointing to potentially empty
leafpages and the information necessary to collect that list in one sequential file read. 
Array of GistPSItems has one item for each GiST block.
When we encounter leaf page, if it is empty - we mark it empty and jump to parent (if known) to update it's list.
When we encounter internal page, we check GistPSItem of every child block to check if it is empty leaf and to mark
parentptr there. 

==Limitations==
At least one reference on each internal pages is left undeleted to preserve balancing of the tree.
Pages that has FOLLOW-RIGHT flag also are not deleted, even if empty.


Thank you for your attention, any thoughts are welcome.

Best regards, Andrey Borodin.


[0] https://www.postgresql.org/message-id/flat/1147341441925550%40web17j.yandex.ru#1147341441925550@web17j.yandex.ru

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Rewrite of pg_dump TAP tests
Next
From: Andrey Borodin
Date:
Subject: Re: 2018-03 CFM