Thread: Redundant filter in index scan with a bool column

Redundant filter in index scan with a bool column

From
Alexander Kuzmenkov
Date:
Hi hackers,

Consider this query plan:

create table t (i int, b bool);
create index on t(i, b);
set enable_bitmapscan to off;
explain select * from t where i = 300 and b;

                                QUERY PLAN
-------------------------------------------------------------------------
  Index Only Scan using t_i_b_idx on t  (cost=0.15..24.27 rows=6 width=5)
    Index Cond: ((i = 300) AND (b = true))
    Filter: b


The filter is not needed, why is it there? Turns out we can't recognize 
that the restriction clause 'b' and the index clause 'b = true' are 
equivalent. My first reaction was to patch operator_predicate_proof to 
handle this case, but there is a more straightforward way: mark the 
expanded index clause as potentially redundant when it is generated in 
expand_indexqual_conditions. There is already RestrictInfo.parent_ec 
that is used to mark redundant EC-derived join clauses. The patch 
renames it to rinfo_parent and uses it to mark the expanded index 
clauses. Namely, for both the expanded and the original clause, 
rinfo_parent points to the original clause or its generating EC, if set. 
The name is no good -- the original clause is not a parent of itself, 
after all. I considered something like redundancy_tag, but some places 
actually use the fact that it points to the generating EC, so I don't 
like this name either.

What do you think?

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Redundant filter in index scan with a bool column

From
Tom Lane
Date:
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
> The filter is not needed, why is it there? Turns out we can't recognize 
> that the restriction clause 'b' and the index clause 'b = true' are 
> equivalent.

Yeah, it's intentional that we don't get rid of the extra clause;
it doesn't really seem worth the expense and complexity to do so.
Indexes on bool columns are a pretty niche case in the first place.
Usually, if you are interested in just the rows where b = true,
you're better off using "where b" as an index predicate.  In your
example, we can do this instead:

regression=# create index on t(i) where b;
CREATE INDEX
regression=# explain select * from t where i = 300 and b;
                            QUERY PLAN                            
------------------------------------------------------------------
 Index Scan using t_i_idx on t  (cost=0.12..24.19 rows=6 width=5)
   Index Cond: (i = 300)
(2 rows)

resulting in a much smaller index, if the b=true condition is selective
enough to be worth indexing.  Even in the case you showed, how much is
the redundant filter clause really costing?

> My first reaction was to patch operator_predicate_proof to 
> handle this case, but there is a more straightforward way: mark the 
> expanded index clause as potentially redundant when it is generated in 
> expand_indexqual_conditions. There is already RestrictInfo.parent_ec 
> that is used to mark redundant EC-derived join clauses. The patch 
> renames it to rinfo_parent and uses it to mark the expanded index 
> clauses.

That's an unbelievable hack that almost certainly breaks existing uses.

The approach of teaching predtest.c that "b = true" implies "b" would
work, but it seems a bit brute-force because ordinarily such cases
would never be seen there, thanks to simplify_boolean_equality having
canonicalized the former into the latter.  The problem we have is that
indxpath.c re-generates "b = true" in indexscan conditions.  Thinking
about it now, I wonder if we could postpone that conversion till later,
say do it in create_indexscan_plan after having checked for redundant
clauses.  Not sure how messy that'd be.

            regards, tom lane