Re: Questions/Observations related to Gist vacuum - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Questions/Observations related to Gist vacuum |
Date | |
Msg-id | CAA4eK1JHR7riKw_mbc7O7ad5pS12WhnLJAiwPER51KFvxGjvgw@mail.gmail.com Whole thread Raw |
In response to | Re: Questions/Observations related to Gist vacuum (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Questions/Observations related to Gist vacuum
|
List | pgsql-hackers |
On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 15/10/2019 09:37, Amit Kapila wrote: > > 2. Right now, in gistbulkdelete we make a note of empty leaf pages and > > internals pages and then in the second pass during gistvacuumcleanup, > > we delete all the empty leaf pages. I was thinking why unlike nbtree, > > we have delayed the deletion of empty pages till gistvacuumcleanup. I > > don't see any problem if we do this during gistbulkdelete itself > > similar to nbtree, also I think there is some advantage in marking the > > pages as deleted as early as possible. Basically, if the vacuum > > operation is canceled or errored out between gistbulkdelete and > > gistvacuumcleanup, then I think the deleted pages could be marked as > > recyclable very early in next vacuum operation. The other advantage > > of doing this during gistbulkdelete is we can avoid sharing > > information between gistbulkdelete and gistvacuumcleanup which is > > quite helpful for a parallel vacuum as the information is not trivial > > (it is internally stored as in-memory Btree). OTOH, there might be > > some advantage for delaying the deletion of pages especially in the > > case of multiple scans during a single VACUUM command. We can > > probably delete all empty leaf pages in one go which could in some > > cases lead to fewer internal page reads. However, I am not sure if it > > is really advantageous to postpone the deletion as there seem to be > > some downsides to it as well. I don't see it documented why unlike > > nbtree we consider delaying deletion of empty pages till > > gistvacuumcleanup, but I might be missing something. > > Hmm. The thinking is/was that removing the empty pages is somewhat > expensive, because it has to scan all the internal nodes to find the > downlinks to the to-be-deleted pages. Furthermore, it needs to scan all > the internal pages (or at least until it has found all the downlinks), > regardless of how many empty pages are being deleted. So it makes sense > to do it only once, for all the empty pages. You're right though, that > there would be advantages, too, in doing it after each pass. > I was thinking more about this and it seems that there could be more cases where delaying the delete mark for pages can further delay the recycling of pages. It is quite possible that immediately after bulk delete the value of nextFullXid (used as deleteXid) is X and during vacuum clean up it can be X + N where the chances of N being large is more when there are multiple passes of vacuum scan. Now, if we would have set the value of deleteXid as X, then there are more chances for the next vacuum to recycle it. I am not sure but it might be that in the future, we could come up with something (say if we can recompute RecentGlobalXmin again) where we can recycle pages of first index scan in the next scan of the index during single vacuum operation. This is just to emphasize the point that doing the delete marking of pages in the same pass has advantages, otherwise, I understand that there are advantages in delaying it as well. > All things > considered, I'm not sure which is better. > Yeah, this is a tough call to make, but if we can allow it to delete the pages in bulkdelete conditionally for parallel vacuum workers, then it would be better. I think we have below option w.r.t Gist indexes for parallel vacuum a. don't allow Gist Index to participate in parallel vacuum b. allow delete of leaf pages in bulkdelete for parallel worker c. always allow deleting leaf pages in bulkdelete d. Invent some mechanism to share all the Gist stats via shared memory (a) is not a very good option, but it is a safe option as we can extend it in the future and we might decide to go with it especially if we can't decide among any other options. (b) would serve the need but would add some additional checks in gistbulkdelete and will look more like a hack. (c) would be best, but I think it will be difficult to be sure that is a good decision for all type of cases. (d) can require a lot of effort and I am not sure if it is worth adding complexity in the proposed patch. Do you have any thoughts on this? Just to give you an idea of the current parallel vacuum patch, the master backend scans the heap and forms the dead tuple array in dsm and then we launch one worker for each index based on the availability of workers and share the dead tuple array with each worker. Each worker performs bulkdelete for the index. In the end, we perform cleanup of all the indexes either via worker or master backend based on some conditions. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: