Re: limit clause breaks query planner? - Mailing list pgsql-performance
From | Pavel Stehule |
---|---|
Subject | Re: limit clause breaks query planner? |
Date | |
Msg-id | 162867790809011217n587e6090qdec2c4c6a1f580e6@mail.gmail.com Whole thread Raw |
In response to | Re: limit clause breaks query planner? ("Pavel Stehule" <pavel.stehule@gmail.com>) |
Responses |
Re: limit clause breaks query planner?
Re: limit clause breaks query planner? |
List | pgsql-performance |
Hello 2008/9/1 David West <david.west@cusppoint.com>: > Thanks for your suggestion but the result is the same. > > Here is the explain analyse output from different queries. > Select * from my_table where A is null and B = '21' limit 15 > > "Limit (cost=0.00..3.68 rows=15 width=128) (actual time=85837.043..85896.140 rows=15 loops=1)" > " -> Seq Scan on my_table this_ (cost=0.00..258789.88 rows=1055580 width=128) (actual time=85837.038..85896.091 rows=15loops=1)" > " Filter: ((A IS NULL) AND ((B)::text = '21'::text))" > "Total runtime: 85896.214 ms" > I see it - problem is in statistics - system expect 1055580, but there is only 15 values. try a) increase statistics on column a and b - probably there are strong dependency between column a nad b, because statistic are totally out b) try cursors declare cursor c as Select * from my_table where A is null and B = '21' limit 15; fetch forward 15 from c; http://www.postgresql.org/docs/8.2/static/sql-fetch.html maybe select * from (select * from mytable where b = '21' offset 0) where a is null limit 15 regards Pavel Stehule > As you can see the estimated cost was 3.68: a long way from the true value. > > Doing 'set enable_seqscan=false' and repeating the select: > "Limit (cost=0.00..5.58 rows=15 width=128) (actual time=4426.438..4426.834 rows=15 loops=1)" > " -> Index Scan using idx_A on my_table this_ (cost=0.00..392956.76 rows=1055970 width=128) (actual time=4426.433..4426.768rows=15 loops=1)" > " Index Cond: (A IS NULL)" > " Filter: ((B)::text = '21'::text)" > "Total runtime: 4426.910 ms" > > Probably some caching made this query faster, but it's still too slow, and using the wrong index. > > Deleting index A gives: > "Limit (cost=0.00..56.47 rows=15 width=128) (actual time=10.298..10.668 rows=15 loops=1)" > " -> Index Scan using idx_B on my_table this_ (cost=0.00..3982709.15 rows=1057960 width=128) (actual time=10.293..10.618rows=15 loops=1)" > " Index Cond: ((B)::text = '21'::text)" > " Filter: (A IS NULL)" > "Total runtime: 10.735 ms" > Much better. However I need index A for another query so I can't just delete it. > > Looking at the estimated cost, you can see why it's choosing the order that it is choosing, but it just doesn't seem toreflect reality at all. > > Now here's the result of the query, with both indexes in place and sequential scan enabled > Select * from my_table where A is null and B = '21' > "Bitmap Heap Scan on my_table this_ (cost=20412.89..199754.37 rows=1060529 width=128) (actual time=470.772..7432.062 rows=1020062loops=1)" > " Recheck Cond: ((B)::text = '21'::text)" > " Filter: (A IS NULL)" > " -> Bitmap Index Scan on idx_B (cost=0.00..20147.76 rows=1089958 width=0) (actual time=466.545..466.545 rows=1020084loops=1)" > " Index Cond: ((B)::text = '21'::text)" > "Total runtime: 8940.119 ms" > > In this case it goes for the correct index. It appears that the query planner makes very simplistic assumptions when itcomes to LIMIT? > > Thanks > David > > -----Original Message----- > From: Pavel Stehule [mailto:pavel.stehule@gmail.com] > Sent: 01 September 2008 13:53 > To: David West > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] limit clause breaks query planner? > > Hello > > you should partial index > > create index foo(b) on mytable where a is null; > > regards > Pavel Stehule > > 2008/9/1 David West <david.west@cusppoint.com>: >> Hi, >> >> >> >> I have a single table with about 10 million rows, and two indexes. Index A >> is on a column A with 95% null values. Index B is on a column B with about >> 10 values, ie. About a million rows of each value. >> >> >> >> When I do a simple query on the table (no joins) with the following >> condition: >> >> A is null AND >> >> B = '21' >> >> >> >> it uses the correct index, index B. However, when I add a limit clause of >> 15, postgres decides to do a sequential scan :s. Looking at the results >> from explain: >> >> >> >> "Limit (cost=0.00..3.69 rows=15 width=128)" >> >> " -> Seq Scan on my_table this_ (cost=0.00..252424.24 rows=1025157 >> width=128)" >> >> " Filter: ((A IS NULL) AND ((B)::text = '21'::text))" >> >> >> >> It appears that postgres is (very incorrectly) assuming that it will only >> have to retrieve 15 rows on a sequential scan, and gives a total cost of >> 3.69. In reality, it has to scan many rows forward until it finds the >> correct value, yielding very poor performance for my table. >> >> >> >> If I disable sequential scan (set enable_seqscan=false) it then incorrectly >> uses the index A that has 95% null values: it seems to incorrectly apply the >> same logic again that it will only have to retrieve 15 rows with the limit >> clause, and thinks that the index scan using A is faster than index scan B. >> >> >> >> Only by deleting the index on A and disabling sequential scan will it use >> the correct index, which is of course by far the fastest. >> >> >> >> Is there an assumption in the planner that a limit of 15 will mean that >> postgres will only have to read 15 rows? If so is this a bad assumption? >> If a particular query is faster without a limit, then surely it will also be >> faster with the limit. >> >> >> >> Any workarounds for this? >> >> >> >> Thanks >> >> David > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
pgsql-performance by date: