Re: PATCH: index-only scans with partial indexes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PATCH: index-only scans with partial indexes
Date
Msg-id 55F6AF33.8000206@2ndquadrant.com
Whole thread Raw
In response to Re: PATCH: index-only scans with partial indexes  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: PATCH: index-only scans with partial indexes  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers

On 09/14/2015 12:51 PM, Kyotaro HORIGUCHI wrote:
> I rethinked on this from the first.
>
>> Sorry.
>>
>>> Hi, this looks to be a bug of cost_index(). The attached patch
>>> would fix that.
>>
>> No, that's wrong. please forget the patch. The qual in qpquals
>> should be indexquals which is excluded because it is not
>> necessary to be applied. The right way would be remove the cost
>> for qpqual in cost_index().
>
> Your patch allows index only scan even if a qual contains
> non-index column when the qual can be removed by implication from
> index predicates.
>
> So the 'implied' clauses is not needed ever after. It should be
> excluded from cost estimation and it is not needed on execution
> even if index only scan is found not to be doable finally.
>
> So the implicit quals may be removed on building index paths but
> I think check_index_only is not the place.
>
> Removing implied quals from index quals is not only for index
> *only* scan so the place for removing such quals is in
> build_index_paths, in the loop of step 1. After removing the
> quals there, check_index_only will naturally give disired result.
>
> # I remember that I have tried the same or similar thing. I don't
> # recall what made me give up then.

I don't think this is particularly related to the patch, because some of 
the anomalies can be observed even on master. For example, let's force 
the index scans to be non-IOS by requesting another column from the heap:
  create table t (a int, b int, c int);  insert into t select i,i,i from generate_series(1,1000000) s(i);
  create index idx1 on t (a)   where b between 300000 and 600000;  create index idx2 on t (a,b) where b between 300000
and600000;
 
  analyze t;  vacuum t;

The indexes have exactly the same size (thanks to alignment of 
IndexTuples), and should have exactly the same statistics:

test=# \di+                       List of relations Schema | Name | Type  | Owner | Table |  Size   | Description
--------+------+-------+-------+-------+---------+------------- public | idx1 | index | user  | t     | 6600 kB |
public| idx2 | index | user  | t     | 6600 kB |
 
(2 rows)

Now, let's see the query reading column 'c' (forcing heap fetches)

explain select c from t where b between 300000 and 600000;                              QUERY PLAN
----------------------------------------------------------------------- Index Scan using idx1 on t
(cost=0.42..10945.99rows=300971 width=4)
 
(1 row)

drop index idx1;
set enable_bitmapscan = off;

explain select c from t where b between 300000 and 600000;                              QUERY PLAN
----------------------------------------------------------------------- Index Scan using idx2 on t
(cost=0.42..19688.08rows=300971 width=4)   Index Cond: ((b >= 300000) AND (b <= 600000))
 
(2 rows)

I claim that either both or none of the indexes should use "Index Cond".

This is exactly the same reason that lead to the strange behavior after 
applying the patch, but in this case the heap access actually introduces 
some additional cost so the issue is not that obvious.

But in reality the costs should be pretty much exactly the same - the 
indexes have exactly the same size, statistics, selectivity etc.

Also, the plan difference suggests this is not merely a costing issue, 
because while with idx1 (first plan) it was correctly detected we don't 
need to evaluate the condition on the partial index, on idx2 that's not 
true and we'll waste time doing that. So we probably can't just tweak 
the costing a bit - this probably needs to be addressed when actually 
building the index path.


regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Next
From: Robert Haas
Date:
Subject: Re: row_security GUC, BYPASSRLS