Re: Avoid full GIN index scan when possible - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Avoid full GIN index scan when possible
Date
Msg-id CAOBaU_YgU=WmUN22uGHEjUTymqEpM_aZi=ETUZd2nhEZabBc4Q@mail.gmail.com
Whole thread Raw
In response to Re: Avoid full GIN index scan when possible  (Nikita Glukhov <n.gluhov@postgrespro.ru>)
Responses Re: Avoid full GIN index scan when possible
List pgsql-hackers
On Sat, Jun 29, 2019 at 12:51 AM Nikita Glukhov
<n.gluhov@postgrespro.ru> wrote:>
> On 29.06.2019 1:23, Julien Rouhaud wrote:
>
> But that kinda resembles stuff we already have - selectivity/cost. So
> why shouldn't this be considered as part of costing?
>
> Yeah, I'm not entirely convinced that we need anything new here.
> The cost estimate function can detect such situations, and so can
> the index AM at scan start --- for example, btree checks for
> contradictory quals at scan start.  There's a certain amount of
> duplicative effort involved there perhaps, but you also have to
> keep in mind that we don't know the values of run-time-determined
> comparison values until scan start.  So if you want certainty rather
> than just a cost estimate, you may have to do these sorts of checks
> at scan start.
>
> Ah, I didn't know about _bt_preprocess_keys().  I'm not familiar with
> this code, so please bear with me.  IIUC the idea would be to add
> additional logic in gingetbitmap() / ginNewScanKey() to drop some
> quals at runtime.  But that would mean that additional logic would
> also be required in BitmapHeapScan, or that all the returned bitmap
> should be artificially marked as lossy to enforce a recheck?
>
> We have a similar solution for this problem.  The idea is to avoid full index
> scan inside GIN itself when we have some GIN entries, and forcibly recheck
> all tuples if triconsistent() returns GIN_MAYBE for the keys that emitted no
> GIN entries.

Thanks for looking at it.  That's I think a way better approach.

> The attached patch in its current shape contain at least two ugly places:
>
> 1. We still need to initialize empty scan key to call triconsistent(), but
>    then we have to remove it from the list of scan keys.  Simple refactoring
>    of ginFillScanKey() can be helpful here.
>
> 2. We need to replace GIN_SEARCH_MODE_EVERYTHING with GIN_SEARCH_MODE_ALL
>    if there are no GIN entries and some key requested GIN_SEARCH_MODE_ALL
>    because we need to skip NULLs in GIN_SEARCH_MODE_ALL.  Simplest example here
>    is "array @> '{}'": triconsistent() returns GIN_TRUE, recheck is not forced,
>    and GIN_SEARCH_MODE_EVERYTHING returns NULLs that are not rechecked.  Maybe
>    it would be better to introduce new GIN_SEARCH_MODE_EVERYTHING_NON_NULL.

Also

+       if (searchMode == GIN_SEARCH_MODE_ALL && nQueryValues <= 0)
+       {
+           /*
+            * Don't emit ALL key with no entries, check only whether
+            * unconditional recheck is needed.
+            */
+           GinScanKey  key = &so->keys[--so->nkeys];
+
+           hasSearchAllMode = true;
+           so->forcedRecheck = key->triConsistentFn(key) != GIN_TRUE;
+       }

Shouldn't you make sure that  the forcedRecheck flag can't reset?

> -- patched
> EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
>                                                       QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on test  (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1)
>    Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
>    Rows Removed by Index Recheck: 2
>    Heap Blocks: exact=114
>    ->  Bitmap Index Scan on test_t_idx  (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302
loops=1)
>          Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
>  Planning Time: 0.080 ms
>  Execution Time: 0.450 ms
> (8 rows)

One thing that's bothering me is that the explain implies that the
LIKE '%i% was part of the index scan, while in reality it wasn't.  One
of the reason why I tried to modify the qual while generating the path
was to have the explain be clearer about what is really done.



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: Magnus Hagander
Date:
Subject: Re: Commitfest 2019-07, the first of five* for PostgreSQL 13