Re: Avoid full GIN index scan when possible - Mailing list pgsql-hackers
From | Nikita Glukhov |
---|---|
Subject | Re: Avoid full GIN index scan when possible |
Date | |
Msg-id | ae1ecc00-2de6-ea77-1236-f754292ba828@postgrespro.ru Whole thread Raw |
In response to | Re: Avoid full GIN index scan when possible (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
Re: Avoid full GIN index scan when possible
Re: Avoid full GIN index scan when possible |
List | pgsql-hackers |
Hi!
On 29.06.2019 1:23, Julien Rouhaud wrote:
But that kinda resembles stuff we already have - selectivity/cost. So why shouldn't this be considered as part of costing?Yeah, I'm not entirely convinced that we need anything new here. The cost estimate function can detect such situations, and so can the index AM at scan start --- for example, btree checks for contradictory quals at scan start. There's a certain amount of duplicative effort involved there perhaps, but you also have to keep in mind that we don't know the values of run-time-determined comparison values until scan start. So if you want certainty rather than just a cost estimate, you may have to do these sorts of checks at scan start.Ah, I didn't know about _bt_preprocess_keys(). I'm not familiar with this code, so please bear with me. IIUC the idea would be to add additional logic in gingetbitmap() / ginNewScanKey() to drop some quals at runtime. But that would mean that additional logic would also be required in BitmapHeapScan, or that all the returned bitmap should be artificially marked as lossy to enforce a recheck?
We have a similar solution for this problem. The idea is to avoid full index scan inside GIN itself when we have some GIN entries, and forcibly recheck all tuples if triconsistent() returns GIN_MAYBE for the keys that emitted no GIN entries. The attached patch in its current shape contain at least two ugly places: 1. We still need to initialize empty scan key to call triconsistent(), but then we have to remove it from the list of scan keys. Simple refactoring of ginFillScanKey() can be helpful here. 2. We need to replace GIN_SEARCH_MODE_EVERYTHING with GIN_SEARCH_MODE_ALL if there are no GIN entries and some key requested GIN_SEARCH_MODE_ALL because we need to skip NULLs in GIN_SEARCH_MODE_ALL. Simplest example here is "array @> '{}'": triconsistent() returns GIN_TRUE, recheck is not forced, and GIN_SEARCH_MODE_EVERYTHING returns NULLs that are not rechecked. Maybe it would be better to introduce new GIN_SEARCH_MODE_EVERYTHING_NON_NULL. Example: CREATE TABLE test AS SELECT i::text AS t FROM generate_series(0, 999999) i; CREATE INDEX ON test USING gin (t gin_trgm_ops); -- master EXPLAIN ANALYZE SELECT * FROM test WHERE LIKE '%1234%' AND t LIKE '%1%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on test (cost=11777.99..16421.73 rows=7999 width=32) (actual time=65.431..65.857 rows=300 loops=1) Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text)) Rows Removed by Index Recheck: 2 Heap Blocks: exact=114 -> Bitmap Index Scan on test_t_idx (cost=0.00..11775.99 rows=7999 width=0) (actual time=65.380..65.380 rows=302 loops=1) Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))Planning Time: 0.151 msExecution Time: 65.900 ms (8 rows) -- patched EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%'; QUERY PLAN -----------------------------------------------------------------------------------------------------------------------Bitmap Heap Scan on test (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1) Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text)) Rows Removed by Index Recheck: 2 Heap Blocks: exact=114 -> Bitmap Index Scan on test_t_idx (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302 loops=1) Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))Planning Time: 0.080 msExecution Time: 0.450 ms (8 rows) -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-hackers by date: