Home > mailing lists

Re: Avoid full GIN index scan when possible - Mailing list pgsql-hackers

From	Julien Rouhaud
Subject	Re: Avoid full GIN index scan when possible
Date	January 11, 2020 15:21:15
Msg-id	CAOBaU_YJadWvDVO3FQ_hddaTHSreWre6hPSNw=4wY6SskrvLuA@mail.gmail.com Whole thread Raw
In response to	Re: Avoid full GIN index scan when possible (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Responses	Re: Avoid full GIN index scan when possible
List	pgsql-hackers

Tree view

Hi,

On Sat, Jan 11, 2020 at 1:10 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
>
> On Fri, Jan 10, 2020 at 7:36 PM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> >
> > On Fri, Jan 10, 2020 at 6:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>
> >> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> >> > So, I think v10 is a version of patch, which can be committed after
> >> > some cleanup.  And we can try doing better nulls handling in a separate
> >> > patch.
> >>
> >> The cfbot reports that this doesn't pass regression testing.
> >> I haven't looked into why not.
> >
> >
> > Thank you for noticing.  I'll take care of it.
>
>
> In the v10 I've fixed a bug with nulls handling, but it appears that
> test contained wrong expected result.  I've modified this test so that
> it directly compares sequential scan results with bitmap indexscan
> results.

Thanks a lot for working on that!  I'm not very familiar with gin
internals so additional eyes are definitely needed here but I agree
that this approach is simpler and cleaner.  I didn't find any problem
with the modified logic, the patch applies cleanly, compiles without
warning and all regression tests pass, so it all seems good to me.

Here are a few comments:

- In keyGetItem(), it seems that some comments would need to be
updated wrt. the new excludeOnly flag.  I'm thinking of:

     * Ok, we now know that there are no matches < minItem.
     *
     * If minItem is lossy, it means that there were no exact items on the
     * page among requiredEntries, because lossy pointers sort after exact
     * items. However, there might be exact items for the same page among
     * additionalEntries, so we mustn't advance past them.

and

        /*
         * Normally, none of the items in additionalEntries can have a curItem
         * larger than minItem. But if minItem is a lossy page, then there
         * might be exact items on the same page among additionalEntries.
         */        if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
        {
            Assert(ItemPointerIsLossyPage(&minItem) || key->nrequired == 0);
            minItem = entry->curItem;
        }

While at it, IIUC only excludeOnly key can have nrequired == 0 (if
that's the case, this could be explicitly said in startScanKey
relevant comment), so it'd be more consistent to also use excludeOnly
rather than nrequired in this assert?

- the pg_trgm regression tests check for the number of rows returned
with the new "excludeOnly" permutations, but only with an indexscan,
should we make sure that the same results are returned with a seq
scan?

pgsql-hackers by date:

From: Dent John
Date: 11 January 2020, 15:04:05
Subject: Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from aREFCURSOR

From: Tomas Vondra
Date: 11 January 2020, 15:25:01
Subject: Re: Amcheck: do rightlink verification with lock coupling

Re: Avoid full GIN index scan when possible - Mailing list pgsql-hackers

Previous

Next