Thread: Strange logic for partial index proving

Strange logic for partial index proving

From
Simon Riggs
Date:
Sweating over the logic of the theorem prover, I notice it doesn't
actually bother to complete an accurate test. I can't see that it
produces an error, but I thought I would raise it, if only to share my
annoyance at the realisation of how it does things. :-(

drop table tenk;
create table tenk (col1 int);

insert into tenk select generate_series(1,10000);

create index idx1 on tenk (col1) where col1 > 1 and col1 < 10;

explain select * from tenk where col1 > 5 and col1 < -5;                            QUERY PLAN
--------------------------------------------------------------------Bitmap Heap Scan on tenk  (cost=2.05..49.87 rows=50
width=4) Recheck Cond: ((col1 > 5) AND (col1 < -5))  ->  Bitmap Index Scan on idx1  (cost=0.00..2.05 rows=50 width=0)
    Index Cond: ((col1 > 5) AND (col1 < -5))
 
(4 rows)

...thus it uses an index which does *not* match the query clause to test
the impossible condition and thus returns the correct answer of zero.
Seems fairly quick also :-)

AFAICS this is just a feature of the theorem prover and it never returns
an incorrect answer. Anybody think differently?

Best Regards, Simon Riggs




Re: Strange logic for partial index proving

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Sweating over the logic of the theorem prover, I notice it doesn't
> actually bother to complete an accurate test.

Sure it does.

> create index idx1 on tenk (col1) where col1 > 1 and col1 < 10;

> explain select * from tenk where col1 > 5 and col1 < -5;
> [ uses that index ]

This is a perfectly legitimate situation.  "col1 > 5" implies "col1 > 1"
and "col1 < -5" implies "col1 < 10", therefore the query WHERE condition
implies the index predicate, therefore the index contains all tuples
that could pass the WHERE condition, therefore the index is usable.

Kindly do not break this.
        regards, tom lane


Re: Strange logic for partial index proving

From
Simon Riggs
Date:
On Tue, 2005-06-21 at 16:29 -0400, Tom Lane wrote:
> > create index idx1 on tenk (col1) where col1 > 1 and col1 < 10;
> 
> > explain select * from tenk where col1 > 5 and col1 < -5;
> > [ uses that index ]
> 
> This is a perfectly legitimate situation.  

Like I said, its correct. I didn't suggest changing it.

> "col1 > 5" implies "col1 > 1"
> and "col1 < -5" implies "col1 < 10", therefore the query WHERE condition
> implies the index predicate, therefore the index contains all tuples
> that could pass the WHERE condition, therefore the index is usable.

..."all tuples that pass the WHERE condition", like none.

Guess I'm not Mr Logic.

Best Regards, Simon Riggs





Re: Strange logic for partial index proving

From
"Jim C. Nasby"
Date:
On Tue, Jun 21, 2005 at 10:33:45PM +0100, Simon Riggs wrote:
> On Tue, 2005-06-21 at 16:29 -0400, Tom Lane wrote:
> > > create index idx1 on tenk (col1) where col1 > 1 and col1 < 10;
> > 
> > > explain select * from tenk where col1 > 5 and col1 < -5;
> > > [ uses that index ]
> > 
> > This is a perfectly legitimate situation.  
> 
> Like I said, its correct. I didn't suggest changing it.
> 
> > "col1 > 5" implies "col1 > 1"
> > and "col1 < -5" implies "col1 < 10", therefore the query WHERE condition
> > implies the index predicate, therefore the index contains all tuples
> > that could pass the WHERE condition, therefore the index is usable.
> 
> ..."all tuples that pass the WHERE condition", like none.
> 
> Guess I'm not Mr Logic.

Has anyone looked at how hard it would be to identify impossible
conditions as part of planning the query? In this case, you obviously
can't get any results, so there's no point in even planning anything. Of
course this is a somewhat nonsensical example, but I suspect that there
are cases where QBE or other front-ends will generate queries that
contain some impossible conditions that can be eliminated.
-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Strange logic for partial index proving

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> Has anyone looked at how hard it would be to identify impossible
> conditions as part of planning the query?

The question in my mind is not so much how hard it would be as how many
cycles we would waste trying to prove things that won't be true for
99.999% of queries.  There is always a tradeoff involved when you add
more processing to the planner, and in this case I can't believe that it
would be a win.

Simon is looking at a different and much more constrained case (WHERE
clause provably inconsistent with check constraints of individual tables
in an inheritance hierarchy), and so the risk of wasted processing
doesn't loom so large.

Note also that when the contradictory constraints are on a column of a
btree index, the amount you save by recognizing the condition in the
planner isn't all that great, since the btree index code discovers it
during plan startup anyway.
        regards, tom lane


Re: Strange logic for partial index proving

From
laser
Date:
This thread make me to think about the question:
could this "feature" be used in select count(*) type
query that force it to use index?

I make a little test, but found a strange phenoment,
created a simple table:

create table partial_idx_t(id serial, f1 integer, f2 text);

then insert many rows into it. then build a partial index
on it:

create index partial_idx on partial_idx_t (id) where id >=1 and id <=
2000000000;

my purpose is to see if I could use partial index while doing count(*),
it seems the index being used after a 'set enable_seqscan=off':

laser=# explain analyze select count(*) from partial_idx_t where id >=1
and id <=2000000000;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=13933.39..13933.39 rows=1 width=0) (actual
time=1901.761..1901.762 rows=1 loops=1)
-> Index Scan using partial_idx on partial_idx_t (cost=0.00..12622.93
rows=524183 width=0) (actual time=0.130..1230.634 rows=524288 loops=1)
Index Cond: ((id >= 1) AND (id <= 2000000000))
Total runtime: 1901.876 ms

but it seems a count(*) without WHERE condition is still faster:

laser=# explain analyze select count(*) from partial_idx_t;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=100009638.60..100009638.60 rows=1 width=0) (actual
time=1567.317..1567.318 rows=1 loops=1)
-> Seq Scan on partial_idx_t (cost=100000000.00..100008327.88
rows=524288 width=0) (actual time=0.046..906.747 rows=524288 loops=1)
Total runtime: 1567.401 ms

but the cost field of the explain result that used partial index is really
lower. but the runtime been much more high (I'm with default planner
setting).

How can I understand above? (BTW, all test done after a vacuum full
analyze).

regards laser


Re: Strange logic for partial index proving

From
Richard Huxton
Date:
laser wrote:
> This thread make me to think about the question:
> could this "feature" be used in select count(*) type
> query that force it to use index?

No. Because of issues with concurrent updates to the table. See archives
for discussion.

> I make a little test, but found a strange phenoment,
> created a simple table:
> 
> create table partial_idx_t(id serial, f1 integer, f2 text);
> 
> then insert many rows into it. then build a partial index
> on it:
> 
> create index partial_idx on partial_idx_t (id) where id >=1 and id <=
> 2000000000;
> 
> my purpose is to see if I could use partial index while doing count(*),
> it seems the index being used after a 'set enable_seqscan=off':
> 
> laser=# explain analyze select count(*) from partial_idx_t where id >=1
> and id <=2000000000;
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=13933.39..13933.39 rows=1 width=0) (actual
> time=1901.761..1901.762 rows=1 loops=1)
> -> Index Scan using partial_idx on partial_idx_t (cost=0.00..12622.93
> rows=524183 width=0) (actual time=0.130..1230.634 rows=524288 loops=1)
> Index Cond: ((id >= 1) AND (id <= 2000000000))
> Total runtime: 1901.876 ms
> 
> but it seems a count(*) without WHERE condition is still faster:
> 
> laser=# explain analyze select count(*) from partial_idx_t;
> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=100009638.60..100009638.60 rows=1 width=0) (actual
> time=1567.317..1567.318 rows=1 loops=1)
> -> Seq Scan on partial_idx_t (cost=100000000.00..100008327.88
> rows=524288 width=0) (actual time=0.046..906.747 rows=524288 loops=1)
> Total runtime: 1567.401 ms

OK - the time is less for the seq-scan because with an index it has to
go to the index, find a row, go to the table and check that row is
visible. With the seq-scan it just works through the table.

> but the cost field of the explain result that used partial index is really
> lower. but the runtime been much more high (I'm with default planner
> setting).

I'd say your configuration settings are a long way from accurate.
Probably the general/performance lists would be a better place to
discuss this though.

-- Richard Huxton Archonet Ltd


Re: Strange logic for partial index proving

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> laser wrote:
>> Aggregate (cost=13933.39..13933.39 rows=1 width=0) (actual
>> time=1901.761..1901.762 rows=1 loops=1)
>> -> Index Scan using partial_idx on partial_idx_t (cost=0.00..12622.93
>> rows=524183 width=0) (actual time=0.130..1230.634 rows=524288 loops=1)
>> Index Cond: ((id >= 1) AND (id <= 2000000000))
>> Total runtime: 1901.876 ms

>> Aggregate (cost=100009638.60..100009638.60 rows=1 width=0) (actual
>> time=1567.317..1567.318 rows=1 loops=1)
>> -> Seq Scan on partial_idx_t (cost=100000000.00..100008327.88
>> rows=524288 width=0) (actual time=0.046..906.747 rows=524288 loops=1)
>> Total runtime: 1567.401 ms

> I'd say your configuration settings are a long way from accurate.

Actually, I'd say these estimates are pretty good.  Ignoring the
100000000 penalty from "set enable_seqscan off", we have:

Planner cost ratio: 13933.39 / 9638.60 = 1.45
Actual cost ratio: 1901.876 / 1567.401 = 1.21

Given all the inaccuracies in the planning process, getting as close as
that is about the best you can hope for.
        regards, tom lane


Re: Strange logic for partial index proving

From
Richard Huxton
Date:
Tom Lane wrote:
> Richard Huxton <dev@archonet.com> writes:
> 
>>laser wrote:
>>>Aggregate (cost=100009638.60..100009638.60 rows=1 width=0) (actual
>>>time=1567.317..1567.318 rows=1 loops=1)
>>>-> Seq Scan on partial_idx_t (cost=100000000.00..100008327.88
>>>rows=524288 width=0) (actual time=0.046..906.747 rows=524288 loops=1)
>>>Total runtime: 1567.401 ms
> 
> 
>>I'd say your configuration settings are a long way from accurate.
> 
> 
> Actually, I'd say these estimates are pretty good.  Ignoring the
> 100000000 penalty from "set enable_seqscan off", we have:

<sigh> of course the 100 million is from seqscan=off, that's what he was 
demonstrating. Sometimes I wonder why I get out of bed in the mornings :-/

--   Richard Huxton  Archonet Ltd


Re: Strange logic for partial index proving

From
Bruno Wolff III
Date:
On Thu, Jun 23, 2005 at 16:13:24 +0800, laser <laser@toping.com.cn> wrote:
> 
> This thread make me to think about the question:
> could this "feature" be used in select count(*) type
> query that force it to use index?

count(*) can already be helped by indexes, but probably not the way you think.
The count isn't saved anywhere, so each row needs to be checked to make sure
it is visible to the current transaction and that it satisfies any WHERE
conditions. The latter can sometimes be sped up using index scans (typically
if the matchin grows are less than 5% of the table or somewhat more if
the rows are clustered according to the index). If you are counting all
of the visible records in a table a sequential scan is going to be
the fastest plan.