Re: Avoid full GIN index scan when possible - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Avoid full GIN index scan when possible
Date
Msg-id 20190629131135.ilktnqtollvbfnml@development
Whole thread Raw
In response to Re: Avoid full GIN index scan when possible  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Avoid full GIN index scan when possible
List pgsql-hackers
On Sat, Jun 29, 2019 at 02:50:51PM +0200, Julien Rouhaud wrote:
>On Sat, Jun 29, 2019 at 12:25 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Sat, Jun 29, 2019 at 11:10:03AM +0200, Julien Rouhaud wrote:
>> >On Sat, Jun 29, 2019 at 12:51 AM Nikita Glukhov
>> >> -- patched
>> >> EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
>> >>                                                       QUERY PLAN
>> >>
-----------------------------------------------------------------------------------------------------------------------
>> >>  Bitmap Heap Scan on test  (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1)
>> >>    Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
>> >>    Rows Removed by Index Recheck: 2
>> >>    Heap Blocks: exact=114
>> >>    ->  Bitmap Index Scan on test_t_idx  (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302
loops=1)
>> >>          Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
>> >>  Planning Time: 0.080 ms
>> >>  Execution Time: 0.450 ms
>> >> (8 rows)
>> >
>> >One thing that's bothering me is that the explain implies that the
>> >LIKE '%i% was part of the index scan, while in reality it wasn't.  One
>> >of the reason why I tried to modify the qual while generating the path
>> >was to have the explain be clearer about what is really done.
>>
>> Yeah, I think that's a bit annoying - it'd be nice to make it clear
>> which quals were actually used to scan the index. It some cases it may
>> not be possible (e.g. in cases when the decision is done at runtime, not
>> while planning the query), but it'd be nice to show it when possible.
>
>Maybe we could somehow add some runtime information about ignored
>quals, similar to the "never executed" information for loops?
>

Maybe. I suppose it depends on when exactly we make the decision about
which quals to ignore.

>> A related issue is that during costing is too late to modify cardinality
>> estimates, so the 'Bitmap Index Scan' will be expected to return fewer
>> rows than it actually returns (after ignoring the full-scan quals).
>> Ignoring redundant quals (the way btree does it at execution) does not
>> have such consequence, of course.
>>
>> Which may be an issue, because we essentially want to modify the list of
>> quals to minimize the cost of
>>
>>    bitmap index scan + recheck during bitmap heap scan
>>
>> OTOH it's not a huge issue, because it won't affect the rest of the plan
>> (because that uses the bitmap heap scan estimates, and those are not
>> affected by this).
>
>Doesn't this problem already exists, as the quals that we could drop
>can't actually reduce the node's results?

How could it not reduce the node's results, if you ignore some quals
that are not redundant? My understanding is we have a plan like this:

    Bitmap Heap Scan
      -> Bitmap Index Scan

and by ignoring some quals at the index scan level, we trade the (high)
cost of evaluating the qual there for a plain recheck at the bitmap heap
scan. But it means the index scan may produce more rows, so it's only a
win if the "extra rechecks" are cheaper than the (removed) full scan.

So the full scan might actually reduce the number of rows from the index
scan, but clearly whatever we do the results from the bitmap heap scan
must remain the same.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Choosing values for multivariate MCV lists
Next
From: Julien Rouhaud
Date:
Subject: Re: Avoid full GIN index scan when possible