Re: Index Skip Scan (new UniqueKeys) - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Index Skip Scan (new UniqueKeys)
Date
Msg-id 20201024164553.tel5uynf2dzyh4az@localhost
Whole thread Raw
In response to Re: Index Skip Scan (new UniqueKeys)  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Index Skip Scan (new UniqueKeys)
Re: Index Skip Scan (new UniqueKeys)
List pgsql-hackers
> On Tue, Oct 06, 2020 at 05:20:39PM +0200, Dmitry Dolgov wrote:
> > On Mon, Sep 21, 2020 at 05:59:32PM -0700, Peter Geoghegan wrote:
> >
> > * I see the following compiler warning:
> >
> > /code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:
> > In function ‘populate_baserel_uniquekeys’:
> > /code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:797:13:
> > warning: ‘expr’ may be used uninitialized in this function
> > [-Wmaybe-uninitialized]
> >   797 |   else if (!list_member(unique_index->rel->reltarget->exprs, expr))
> >       |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> This is mostly for UniqueKeys patch, which is attached here only as a
> dependency, but I'll prepare changes for that. Interesting enough I
> can't reproduce this warning, but if I understand correctly gcc has some
> history of spurious uninitialized warnings, so I guess it could be
> version dependent.
>
> > * Perhaps the warning is related to this nearby code that I noticed
> > Valgrind complains about:
> >
> > ==1083468== VALGRINDERROR-BEGIN
> > ==1083468== Invalid read of size 4
> > ==1083468==    at 0x59568A: get_exprs_from_uniqueindex (uniquekeys.c:771)
> > ==1083468==    by 0x593C5B: populate_baserel_uniquekeys (uniquekeys.c:140)
>
> This also belongs to UniqueKeys patch, but at least I can reproduce this
> one. My guess is that nkeycolums should be used there, not ncolums,
> which is visible in index_incuding tests. The same as previous one, will
> prepare corresponding changes.
>
> > * Do we really need the AM-level boolean flag/argument named
> > "scanstart"? Why not just follow the example of btgettuple(), which
> > determines whether or not the scan has been initialized based on the
> > current scan position?
> >
> > Just because you set so->currPos.buf to InvalidBuffer doesn't mean you
> > cannot or should not take the same approach as btgettuple(). And even
> > if you can't take exactly the same approach, I would still think that
> > the scan's opaque B-Tree state should remember if it's the first call
> > to _bt_skip() (rather than some subsequent call) in some other way
> > (e.g. carrying a "scanstart" bool flag directly).
>
> Yes, agree, carrying this flag inside the opaque state would be better.

Here is a new version which doesn't require "scanstart" argument and
contains few other changes to address the issues mentioned earlier. It's
also based on the latest UniqueKeys patches with the valgrind issue
fixed (as before they're attached also just for the references, you can
find more in the original thread). I didn't rework commentaries yet,
will post it soon (need to get an inspiration first, probably via
reading Shakespeare unless someone has better suggestions).

> > * Why is it okay to do anything important based on the
> > _bt_scankey_within_page() return value?
> >
> > If the page is empty, then how can we know that it's okay to go to the
> > next value? I'm concerned that there could be subtle bugs in this
> > area. VACUUM will usually just delete the empty page. But it won't
> > always do so, for a variety of reasons that aren't worth going into
> > now. This could mask bugs in this area. I'm concerned about patterns
> > like this one from _bt_skip():
> >
> >             while (!nextFound)
> >             {
> >                 ....
> >
> >                 if (_bt_scankey_within_page(scan, so->skipScanKey,
> >                                             so->currPos.buf, dir))
> >                 {
> >                     ...
> >                 }
> >                 else
> >                     /*
> >                      * If startItup could be not found within the current page,
> >                      * assume we found something new
> >                      */
> >                     nextFound = true;
> >                 ....
> >             }
> >
> > Why would you assume that "we found something new" here? In general I
> > just don't understand the design of _bt_skip(). I get the basic idea
> > of what you're trying to do, but it could really use better comments.
>
> Yeah, I'll put more efforts into clear comments. There are two different
> ways in which _bt_scankey_within_page is being used.
>
> The first one is to check if it's possible to skip traversal of the tree
> from root in case if what we're looking for could be on the current
> page. In this case an empty page would mean we need to search from the
> root, so not sure what could be the issue here?
>
> The second one (that you've highlighted above) I admit is probably the
> most questionable part of the patch and open for suggestions how to
> improve it. It's required for one particular case with a cursor when
> scan advances forward but reads backward. What could happen here is we
> found one valid item, but the next one e.g. do not pass scan key
> conditions, and we end up with the previous item again. I'm not entirely
> sure how presence of an empty page could change this scenario, could you
> please show an example?
>
> > *The "jump one more time if it's the same as at the beginning" thing
> > seems scary to me. Maybe you should be doing something with the actual
> > high key here.
>
> Same as for the previous question, can you give a hint what do you mean
> by "doing something with the actual high key"?

The question is still there and I would really appreciate clarification
about what exactly scenarios I need to look for with empty pages. I've
tried to perform testing with "attempt_pagedel = false" suggestion, but
didn't find anything suspicious.

Attachment

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Timing of relcache inval at parallel worker init
Next
From: David Rowley
Date:
Subject: Re: Parallel Append can break run-time partition pruning