Re: PATCH: index-only scans with partial indexes - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: PATCH: index-only scans with partial indexes
Date
Msg-id 20150917.194553.172396487.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: PATCH: index-only scans with partial indexes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: PATCH: index-only scans with partial indexes
List pgsql-hackers
Hi, sorry in advance for hardly readable long descriptions..

At Mon, 14 Sep 2015 13:27:47 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<55F6AF33.8000206@2ndquadrant.com>
> I don't think this is particularly related to the patch, because some
> of the anomalies can be observed even on master. For example, let's
> force the index scans to be non-IOS by requesting another column from
> the heap:
...
> I claim that either both or none of the indexes should use "Index
> Cond".

Perhaps I understood your point and I think I understood this.

Planner dicides whether to use the partial index far before
creating specific paths, when measuring baserel sizes. If some
partial index is implied by the restriction list,
check_partial_indexes() marks the index as predOk, which means
that the index is usable for the baserel scan even if the
restriction clauses doesn't match the index columns.

Then create_index_paths allows to generating paths for partial
indexes with predOK. Index clauses of the created paths for the
partial indexes don't contain clauses doesn't match the index
columns. It is empty for your first example and it is the same
with restrictinfo for the second.

Finally, create_indexscan_plan strips baserestriction caluses
implied by index predicate in addtion to index conditions. So
both of your example has no filter conditions.

The following example would show you the result of the above
steps.

explain select c from t where b between 300000 + 1 and 600000 and c = 3;                           QUERY PLAN
                
 
------------------------------------------------------------------Index Scan using idx1 on t  (cost=0.42..11665.77
rows=1width=4)  Filter: ((b >= 300001) AND (c = 3))
 

The index predicate (b >= 300000 AND b <= 600000) implies the
restrction (b >= 300001 AND b <= 600000) so idx1 is allowed to be
used, then conversely the restriction b >= 300001 is implied by
the index predicate b >= 300000 so it is not shown as Index Cond
and the other two are not, so they are shown and executed as
Filter.

But regardless of whether stripped as implied conditions or not
at planning phase, the cost for all clauses that don't match the
index columns are added when creating index paths. That will be
one of the cause of cost error.

> This is exactly the same reason that lead to the strange behavior
> after applying the patch, but in this case the heap access actually
> introduces some additional cost so the issue is not that obvious.

So you're right on the point that check_index_only is doing
wrong. It should properly ignore restrictions implied by index
predicates as your patch is doing. But cost_index doesn't know
that some nonindex-conditions of baserestrictinfo is finally
useless, and it is assuming that nonindex conditions are always
applied on heap tuples.


After all, what should be done to properly ignore useless
conditions would be,
1. Make create_index_paths() to make the list of restrict   clauses which are implied by the index predicate of the
index  in focus. The list would be stored as a member in   IndexOptInfo. Then create index clauses excluding the listed
 clauses and call get_index_paths using them. Modify   check_index_only() to ignore the listed clauses when pulling
varattnos.This is similar but different a bit to what I said   in the previous mail.
 
2. Pass the listed clauses to generated IndexPath.
3. Modify extract_nonindex_conditions to be called with the   exclusion list and the cluases are exluded from the
resultof   the function.
 
4. Make create_indexscan_plan to extract qpqual excluding the   exclusion list.

The same result could be obtained by more smarter way but the
steps above will archive right decision on whether to do index
only scan on partial index and correct cost estimate propery
ignoring unused restrictions.

Does this make sense for you?

> But in reality the costs should be pretty much exactly the same - the
> indexes have exactly the same size, statistics, selectivity etc.
> 
> Also, the plan difference suggests this is not merely a costing issue,
> because while with idx1 (first plan) it was correctly detected we
> don't need to evaluate the condition on the partial index, on idx2
> that's not true and we'll waste time doing that. So we probably can't
> just tweak the costing a bit - this probably needs to be addressed
> when actually building the index path.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [patch] Proposal for \rotate in psql
Next
From: Robert Haas
Date:
Subject: Re: Parallel Seq Scan