Re: Index Skip Scan - Mailing list pgsql-hackers
From | Dmitry Dolgov |
---|---|
Subject | Re: Index Skip Scan |
Date | |
Msg-id | 20200121160942.eu4bvbu46nt223ug@localhost Whole thread Raw |
In response to | Re: Index Skip Scan (Peter Geoghegan <pg@bowt.ie>) |
Responses |
RE: Index Skip Scan
|
List | pgsql-hackers |
> On Mon, Jan 20, 2020 at 01:19:30PM -0800, Peter Geoghegan wrote: Thanks for the commentaries. I'm trying to clarify your conclusions for myself, so couple of questions. > > > - nbtsearch.c in general > > > Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to neverrelease the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1]I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario- so it may be fine. > > Try making _bt_findinsertloc() call _bt_vacuum_one_page() whenever the > page is P_HAS_GARBAGE(), regardless of whether or not the page is > about to split. That will still be correct, while having a much better > chance of breaking the patch during stress-testing. > > Relying on a buffer pin to prevent the B-Tree structure itself from > changing in any important way seems likely to be broken already. Even > if it isn't, it sounds fragile. Except for checking low/high key (which should be done with a lock), I believe the current implementation follows the same pattern I see quite often, namely * get a lock on a page of interest and test it's values (if we can find next distinct value right on the next one without goind down the tree). * if not, unlock the current page, search within the tree with _bt_search (which locks a resuling new page) and examine values on a new page, when necessary do _bt_steppage Is there an obvious problem with this approach, when it comes to the page structure modification? > A leaf page doesn't really have anything called a low key. It usually > has a current first "data item"/non-pivot tuple, which is an > inherently unstable thing. Would this inherent instability be resolved for this particular case by having a lock on a page while checking a first data item, or there is something else I need to take into account? > > There is a BT_READ lock in place when finding the correct leaf page, or > > searching within the leaf page itself. _bt_vacuum_one_page deletes only > > LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do > > you have some feedback for this ? > > It sounds like the design of the patch relies on doing something other > than stopping a scan "between" pages, in the sense that is outlined in > the commit message of commit 09cb5c0e. If so, then that's a serious > flaw in its design. Could you please elaborate why does it sound like that? If I understand correctly, to stop a scan only "between" pages one need to use only _bt_readpage/_bt_steppage? Other than that there is no magic with scan position in the patch, so I'm not sure if I'm missing something here.
pgsql-hackers by date: