Re: Avoid full GIN index scan when possible - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: Avoid full GIN index scan when possible |
Date | |
Msg-id | CAOBaU_YgU=WmUN22uGHEjUTymqEpM_aZi=ETUZd2nhEZabBc4Q@mail.gmail.com Whole thread Raw |
In response to | Re: Avoid full GIN index scan when possible (Nikita Glukhov <n.gluhov@postgrespro.ru>) |
Responses |
Re: Avoid full GIN index scan when possible
|
List | pgsql-hackers |
On Sat, Jun 29, 2019 at 12:51 AM Nikita Glukhov <n.gluhov@postgrespro.ru> wrote:> > On 29.06.2019 1:23, Julien Rouhaud wrote: > > But that kinda resembles stuff we already have - selectivity/cost. So > why shouldn't this be considered as part of costing? > > Yeah, I'm not entirely convinced that we need anything new here. > The cost estimate function can detect such situations, and so can > the index AM at scan start --- for example, btree checks for > contradictory quals at scan start. There's a certain amount of > duplicative effort involved there perhaps, but you also have to > keep in mind that we don't know the values of run-time-determined > comparison values until scan start. So if you want certainty rather > than just a cost estimate, you may have to do these sorts of checks > at scan start. > > Ah, I didn't know about _bt_preprocess_keys(). I'm not familiar with > this code, so please bear with me. IIUC the idea would be to add > additional logic in gingetbitmap() / ginNewScanKey() to drop some > quals at runtime. But that would mean that additional logic would > also be required in BitmapHeapScan, or that all the returned bitmap > should be artificially marked as lossy to enforce a recheck? > > We have a similar solution for this problem. The idea is to avoid full index > scan inside GIN itself when we have some GIN entries, and forcibly recheck > all tuples if triconsistent() returns GIN_MAYBE for the keys that emitted no > GIN entries. Thanks for looking at it. That's I think a way better approach. > The attached patch in its current shape contain at least two ugly places: > > 1. We still need to initialize empty scan key to call triconsistent(), but > then we have to remove it from the list of scan keys. Simple refactoring > of ginFillScanKey() can be helpful here. > > 2. We need to replace GIN_SEARCH_MODE_EVERYTHING with GIN_SEARCH_MODE_ALL > if there are no GIN entries and some key requested GIN_SEARCH_MODE_ALL > because we need to skip NULLs in GIN_SEARCH_MODE_ALL. Simplest example here > is "array @> '{}'": triconsistent() returns GIN_TRUE, recheck is not forced, > and GIN_SEARCH_MODE_EVERYTHING returns NULLs that are not rechecked. Maybe > it would be better to introduce new GIN_SEARCH_MODE_EVERYTHING_NON_NULL. Also + if (searchMode == GIN_SEARCH_MODE_ALL && nQueryValues <= 0) + { + /* + * Don't emit ALL key with no entries, check only whether + * unconditional recheck is needed. + */ + GinScanKey key = &so->keys[--so->nkeys]; + + hasSearchAllMode = true; + so->forcedRecheck = key->triConsistentFn(key) != GIN_TRUE; + } Shouldn't you make sure that the forcedRecheck flag can't reset? > -- patched > EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%'; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on test (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1) > Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text)) > Rows Removed by Index Recheck: 2 > Heap Blocks: exact=114 > -> Bitmap Index Scan on test_t_idx (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302 loops=1) > Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text)) > Planning Time: 0.080 ms > Execution Time: 0.450 ms > (8 rows) One thing that's bothering me is that the explain implies that the LIKE '%i% was part of the index scan, while in reality it wasn't. One of the reason why I tried to modify the qual while generating the path was to have the explain be clearer about what is really done.
pgsql-hackers by date: