Re: Avoid full GIN index scan when possible - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Avoid full GIN index scan when possible
Date
Msg-id CAOBaU_YJadWvDVO3FQ_hddaTHSreWre6hPSNw=4wY6SskrvLuA@mail.gmail.com
Whole thread Raw
In response to Re: Avoid full GIN index scan when possible  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses Re: Avoid full GIN index scan when possible
List pgsql-hackers
Hi,

On Sat, Jan 11, 2020 at 1:10 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
>
> On Fri, Jan 10, 2020 at 7:36 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> >
> > On Fri, Jan 10, 2020 at 6:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>
> >> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> >> > So, I think v10 is a version of patch, which can be committed after
> >> > some cleanup.  And we can try doing better nulls handling in a separate
> >> > patch.
> >>
> >> The cfbot reports that this doesn't pass regression testing.
> >> I haven't looked into why not.
> >
> >
> > Thank you for noticing.  I'll take care of it.
>
>
> In the v10 I've fixed a bug with nulls handling, but it appears that
> test contained wrong expected result.  I've modified this test so that
> it directly compares sequential scan results with bitmap indexscan
> results.

Thanks a lot for working on that!  I'm not very familiar with gin
internals so additional eyes are definitely needed here but I agree
that this approach is simpler and cleaner.  I didn't find any problem
with the modified logic, the patch applies cleanly, compiles without
warning and all regression tests pass, so it all seems good to me.

Here are a few comments:

- In keyGetItem(), it seems that some comments would need to be
updated wrt. the new excludeOnly flag.  I'm thinking of:

     * Ok, we now know that there are no matches < minItem.
     *
     * If minItem is lossy, it means that there were no exact items on the
     * page among requiredEntries, because lossy pointers sort after exact
     * items. However, there might be exact items for the same page among
     * additionalEntries, so we mustn't advance past them.

and

        /*
         * Normally, none of the items in additionalEntries can have a curItem
         * larger than minItem. But if minItem is a lossy page, then there
         * might be exact items on the same page among additionalEntries.
         */        if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
        {
            Assert(ItemPointerIsLossyPage(&minItem) || key->nrequired == 0);
            minItem = entry->curItem;
        }

While at it, IIUC only excludeOnly key can have nrequired == 0 (if
that's the case, this could be explicitly said in startScanKey
relevant comment), so it'd be more consistent to also use excludeOnly
rather than nrequired in this assert?

- the pg_trgm regression tests check for the number of rows returned
with the new "excludeOnly" permutations, but only with an indexscan,
should we make sure that the same results are returned with a seq
scan?



pgsql-hackers by date:

Previous
From: Dent John
Date:
Subject: Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from aREFCURSOR
Next
From: Tomas Vondra
Date:
Subject: Re: Amcheck: do rightlink verification with lock coupling