Thread: Strange logic for partial index proving
Sweating over the logic of the theorem prover, I notice it doesn't actually bother to complete an accurate test. I can't see that it produces an error, but I thought I would raise it, if only to share my annoyance at the realisation of how it does things. :-( drop table tenk; create table tenk (col1 int); insert into tenk select generate_series(1,10000); create index idx1 on tenk (col1) where col1 > 1 and col1 < 10; explain select * from tenk where col1 > 5 and col1 < -5; QUERY PLAN --------------------------------------------------------------------Bitmap Heap Scan on tenk (cost=2.05..49.87 rows=50 width=4) Recheck Cond: ((col1 > 5) AND (col1 < -5)) -> Bitmap Index Scan on idx1 (cost=0.00..2.05 rows=50 width=0) Index Cond: ((col1 > 5) AND (col1 < -5)) (4 rows) ...thus it uses an index which does *not* match the query clause to test the impossible condition and thus returns the correct answer of zero. Seems fairly quick also :-) AFAICS this is just a feature of the theorem prover and it never returns an incorrect answer. Anybody think differently? Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > Sweating over the logic of the theorem prover, I notice it doesn't > actually bother to complete an accurate test. Sure it does. > create index idx1 on tenk (col1) where col1 > 1 and col1 < 10; > explain select * from tenk where col1 > 5 and col1 < -5; > [ uses that index ] This is a perfectly legitimate situation. "col1 > 5" implies "col1 > 1" and "col1 < -5" implies "col1 < 10", therefore the query WHERE condition implies the index predicate, therefore the index contains all tuples that could pass the WHERE condition, therefore the index is usable. Kindly do not break this. regards, tom lane
On Tue, 2005-06-21 at 16:29 -0400, Tom Lane wrote: > > create index idx1 on tenk (col1) where col1 > 1 and col1 < 10; > > > explain select * from tenk where col1 > 5 and col1 < -5; > > [ uses that index ] > > This is a perfectly legitimate situation. Like I said, its correct. I didn't suggest changing it. > "col1 > 5" implies "col1 > 1" > and "col1 < -5" implies "col1 < 10", therefore the query WHERE condition > implies the index predicate, therefore the index contains all tuples > that could pass the WHERE condition, therefore the index is usable. ..."all tuples that pass the WHERE condition", like none. Guess I'm not Mr Logic. Best Regards, Simon Riggs
On Tue, Jun 21, 2005 at 10:33:45PM +0100, Simon Riggs wrote: > On Tue, 2005-06-21 at 16:29 -0400, Tom Lane wrote: > > > create index idx1 on tenk (col1) where col1 > 1 and col1 < 10; > > > > > explain select * from tenk where col1 > 5 and col1 < -5; > > > [ uses that index ] > > > > This is a perfectly legitimate situation. > > Like I said, its correct. I didn't suggest changing it. > > > "col1 > 5" implies "col1 > 1" > > and "col1 < -5" implies "col1 < 10", therefore the query WHERE condition > > implies the index predicate, therefore the index contains all tuples > > that could pass the WHERE condition, therefore the index is usable. > > ..."all tuples that pass the WHERE condition", like none. > > Guess I'm not Mr Logic. Has anyone looked at how hard it would be to identify impossible conditions as part of planning the query? In this case, you obviously can't get any results, so there's no point in even planning anything. Of course this is a somewhat nonsensical example, but I suspect that there are cases where QBE or other front-ends will generate queries that contain some impossible conditions that can be eliminated. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > Has anyone looked at how hard it would be to identify impossible > conditions as part of planning the query? The question in my mind is not so much how hard it would be as how many cycles we would waste trying to prove things that won't be true for 99.999% of queries. There is always a tradeoff involved when you add more processing to the planner, and in this case I can't believe that it would be a win. Simon is looking at a different and much more constrained case (WHERE clause provably inconsistent with check constraints of individual tables in an inheritance hierarchy), and so the risk of wasted processing doesn't loom so large. Note also that when the contradictory constraints are on a column of a btree index, the amount you save by recognizing the condition in the planner isn't all that great, since the btree index code discovers it during plan startup anyway. regards, tom lane
This thread make me to think about the question: could this "feature" be used in select count(*) type query that force it to use index? I make a little test, but found a strange phenoment, created a simple table: create table partial_idx_t(id serial, f1 integer, f2 text); then insert many rows into it. then build a partial index on it: create index partial_idx on partial_idx_t (id) where id >=1 and id <= 2000000000; my purpose is to see if I could use partial index while doing count(*), it seems the index being used after a 'set enable_seqscan=off': laser=# explain analyze select count(*) from partial_idx_t where id >=1 and id <=2000000000; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=13933.39..13933.39 rows=1 width=0) (actual time=1901.761..1901.762 rows=1 loops=1) -> Index Scan using partial_idx on partial_idx_t (cost=0.00..12622.93 rows=524183 width=0) (actual time=0.130..1230.634 rows=524288 loops=1) Index Cond: ((id >= 1) AND (id <= 2000000000)) Total runtime: 1901.876 ms but it seems a count(*) without WHERE condition is still faster: laser=# explain analyze select count(*) from partial_idx_t; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=100009638.60..100009638.60 rows=1 width=0) (actual time=1567.317..1567.318 rows=1 loops=1) -> Seq Scan on partial_idx_t (cost=100000000.00..100008327.88 rows=524288 width=0) (actual time=0.046..906.747 rows=524288 loops=1) Total runtime: 1567.401 ms but the cost field of the explain result that used partial index is really lower. but the runtime been much more high (I'm with default planner setting). How can I understand above? (BTW, all test done after a vacuum full analyze). regards laser
laser wrote: > This thread make me to think about the question: > could this "feature" be used in select count(*) type > query that force it to use index? No. Because of issues with concurrent updates to the table. See archives for discussion. > I make a little test, but found a strange phenoment, > created a simple table: > > create table partial_idx_t(id serial, f1 integer, f2 text); > > then insert many rows into it. then build a partial index > on it: > > create index partial_idx on partial_idx_t (id) where id >=1 and id <= > 2000000000; > > my purpose is to see if I could use partial index while doing count(*), > it seems the index being used after a 'set enable_seqscan=off': > > laser=# explain analyze select count(*) from partial_idx_t where id >=1 > and id <=2000000000; > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=13933.39..13933.39 rows=1 width=0) (actual > time=1901.761..1901.762 rows=1 loops=1) > -> Index Scan using partial_idx on partial_idx_t (cost=0.00..12622.93 > rows=524183 width=0) (actual time=0.130..1230.634 rows=524288 loops=1) > Index Cond: ((id >= 1) AND (id <= 2000000000)) > Total runtime: 1901.876 ms > > but it seems a count(*) without WHERE condition is still faster: > > laser=# explain analyze select count(*) from partial_idx_t; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=100009638.60..100009638.60 rows=1 width=0) (actual > time=1567.317..1567.318 rows=1 loops=1) > -> Seq Scan on partial_idx_t (cost=100000000.00..100008327.88 > rows=524288 width=0) (actual time=0.046..906.747 rows=524288 loops=1) > Total runtime: 1567.401 ms OK - the time is less for the seq-scan because with an index it has to go to the index, find a row, go to the table and check that row is visible. With the seq-scan it just works through the table. > but the cost field of the explain result that used partial index is really > lower. but the runtime been much more high (I'm with default planner > setting). I'd say your configuration settings are a long way from accurate. Probably the general/performance lists would be a better place to discuss this though. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > laser wrote: >> Aggregate (cost=13933.39..13933.39 rows=1 width=0) (actual >> time=1901.761..1901.762 rows=1 loops=1) >> -> Index Scan using partial_idx on partial_idx_t (cost=0.00..12622.93 >> rows=524183 width=0) (actual time=0.130..1230.634 rows=524288 loops=1) >> Index Cond: ((id >= 1) AND (id <= 2000000000)) >> Total runtime: 1901.876 ms >> Aggregate (cost=100009638.60..100009638.60 rows=1 width=0) (actual >> time=1567.317..1567.318 rows=1 loops=1) >> -> Seq Scan on partial_idx_t (cost=100000000.00..100008327.88 >> rows=524288 width=0) (actual time=0.046..906.747 rows=524288 loops=1) >> Total runtime: 1567.401 ms > I'd say your configuration settings are a long way from accurate. Actually, I'd say these estimates are pretty good. Ignoring the 100000000 penalty from "set enable_seqscan off", we have: Planner cost ratio: 13933.39 / 9638.60 = 1.45 Actual cost ratio: 1901.876 / 1567.401 = 1.21 Given all the inaccuracies in the planning process, getting as close as that is about the best you can hope for. regards, tom lane
Tom Lane wrote: > Richard Huxton <dev@archonet.com> writes: > >>laser wrote: >>>Aggregate (cost=100009638.60..100009638.60 rows=1 width=0) (actual >>>time=1567.317..1567.318 rows=1 loops=1) >>>-> Seq Scan on partial_idx_t (cost=100000000.00..100008327.88 >>>rows=524288 width=0) (actual time=0.046..906.747 rows=524288 loops=1) >>>Total runtime: 1567.401 ms > > >>I'd say your configuration settings are a long way from accurate. > > > Actually, I'd say these estimates are pretty good. Ignoring the > 100000000 penalty from "set enable_seqscan off", we have: <sigh> of course the 100 million is from seqscan=off, that's what he was demonstrating. Sometimes I wonder why I get out of bed in the mornings :-/ -- Richard Huxton Archonet Ltd
On Thu, Jun 23, 2005 at 16:13:24 +0800, laser <laser@toping.com.cn> wrote: > > This thread make me to think about the question: > could this "feature" be used in select count(*) type > query that force it to use index? count(*) can already be helped by indexes, but probably not the way you think. The count isn't saved anywhere, so each row needs to be checked to make sure it is visible to the current transaction and that it satisfies any WHERE conditions. The latter can sometimes be sped up using index scans (typically if the matchin grows are less than 5% of the table or somewhat more if the rows are clustered according to the index). If you are counting all of the visible records in a table a sequential scan is going to be the fastest plan.