Thread: limit 1 and functional indexes
Hi, Postgres choses the wrong index when I add limit 1 to the query. This should not affect the index chosen. I read that functional indexes are sometimes not chosen correctly by optimizer. Is there anything I can do to always use the functional index in the following queries? Query with limit 1 choses wrong index: --------------------------------------------------------------------------------------- explain select code from transactions where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111') order by order_date DESC LIMIT 1 Index Scan Backward using transactions_date_aff on transactions (cost=0.00..930780.96 rows=2879 width=33) --------------------------------------------------------------------------------------- Without limit 1 choses correct index: --------------------------------------------------------------------------------------- explain select code from transactions where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111') order by order_date DESC Index Scan using transactions_pop_i on transactions (cost=0.00..11351.72 rows=2879 width=33) --------------------------------------------------------------------------------------- We have postgresql-7.3.2-3. Thank you, Alexandra
On Wed, Jan 28, 2004 at 12:23:38 +0100, Alexandra Birch <alexandra@trymedia.com> wrote: > Hi, > > Postgres choses the wrong index when I add limit 1 to the query. > This should not affect the index chosen. I don't know the complete answer to your question, but since no one else has commented I will answer what I can. It IS reasobable for the planner to choose a different plan when you add a LIMIT clause to a query. > I read that functional indexes are sometimes not chosen correctly by > optimizer. I don't believe there are any particular problems with functional indexes. The opitmizer isn't perfect and will sometimes choose poor plans. > Is there anything I can do to always use the functional index in the > following queries? Have you done an ANALYZE of the table recently? It might be useful to see the EXPLAIN ANALYZE output, rather than just the EXPLAIN output, as that will give the actual times needed to do the various steps. > > Query with limit 1 choses wrong index: > --------------------------------------------------------------------------------------- > explain > select code > from transactions > where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111') > order by order_date DESC LIMIT 1 > > Index Scan Backward using transactions_date_aff on transactions (cost=0.00..930780.96 rows=2879 width=33) > --------------------------------------------------------------------------------------- > > Without limit 1 choses correct index: > --------------------------------------------------------------------------------------- > explain > select code > from transactions > where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111') > order by order_date DESC > > Index Scan using transactions_pop_i on transactions (cost=0.00..11351.72 rows=2879 width=33) > --------------------------------------------------------------------------------------- > > We have postgresql-7.3.2-3. > Thank you, > > Alexandra > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
> > > > Postgres choses the wrong index when I add limit 1 to the query. > > This should not affect the index chosen. > > I don't know the complete answer to your question, but since no one else > has commented I will answer what I can. Thanks - your reply is apreciated :) > It IS reasobable for the planner to choose a different plan when you > add a LIMIT clause to a query. OK - I'll investigate this further. > > I read that functional indexes are sometimes not chosen correctly by > > optimizer. > > I don't believe there are any particular problems with functional indexes. > The opitmizer isn't perfect and will sometimes choose poor plans. OK - but there was some discussion about statistics for functional indexes, for eg: http://archives.postgresql.org/pgsql-general/2004-01/msg00978.php This does not help me solve my problem though :) > > Is there anything I can do to always use the functional index in the > > following queries? > > Have you done an ANALYZE of the table recently? Yip - I should have said we do a daily VACUUM ANALYZE. > It might be useful to see the EXPLAIN ANALYZE output, rather than just > the EXPLAIN output, as that will give the actual times needed to do > the various steps. I thought the cost values would be enough from the EXPLAIN alone. And the query takes so long to run :( Here is the output of EXPLAIN ANALYZE first with limit 1 then without: explain analyze select code from transactions where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111') order by order_date DESC LIMIT 1; --------------------------------------------------------------------------------------------------Limit (cost=0.00..332.44rows=1 width=33) (actual time=377745.75..377745.75 rows=0 loops=1) -> Index Scan Backward using transactions_date_affon transactions (cost=0.00..982549.96 rows=2956 width=33) (actual time=377718.61..377718.61 rows=0 loops=1) Filter: (upper((pop)::text) = '79BCDC8A4A4F99E7C111111111111111'::text)Totalruntime: 378439.32 msec explain analyze select code from transactions where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111') order by order_date DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ -------------Sort (cost=11824.16..11831.55 rows=2956 width=33) (actual time=248.17..248.17 rows=0 loops=1) Sort Key: order_date -> Index Scan using transactions_pop_i on transactions (cost=0.00..11653.79 rows=2956 width=33) (actual time=126.13..126.13 rows=0 loops=1) Index Cond: (upper((pop)::text) = '79BCDC8A4A4F99E7C111111111111111'::text)Total runtime: 248.25 msec Thank you, Alexandra
On Thu, Jan 29, 2004 at 16:02:06 +0100, Alexandra Birch <alexandra@trymedia.com> wrote: > > Here is the output of EXPLAIN ANALYZE first with limit 1 then without: The time estimate for the limit 1 case is way off. I can't tell if that is a bug or not having detailed enough statistics. Hopefully someone more knowlegable will take a look at this question. > > explain analyze > select code > from transactions > where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111') > order by order_date DESC LIMIT 1; > -------------------------------------------------------------------------------------------------- > Limit (cost=0.00..332.44 rows=1 width=33) (actual time=377745.75..377745.75 rows=0 loops=1) > -> Index Scan Backward using transactions_date_aff on transactions (cost=0.00..982549.96 rows=2956 width=33) (actual > time=377718.61..377718.61 rows=0 loops=1) > Filter: (upper((pop)::text) = '79BCDC8A4A4F99E7C111111111111111'::text) > Total runtime: 378439.32 msec > > explain analyze > select code > from transactions > where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111') > order by order_date DESC; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------ > ------------- > Sort (cost=11824.16..11831.55 rows=2956 width=33) (actual time=248.17..248.17 rows=0 loops=1) > Sort Key: order_date > -> Index Scan using transactions_pop_i on transactions (cost=0.00..11653.79 rows=2956 width=33) (actual time=126.13..126.13 > rows=0 loops=1) > Index Cond: (upper((pop)::text) = '79BCDC8A4A4F99E7C111111111111111'::text) > Total runtime: 248.25 msec > > Thank you, > > Alexandra >
Bruno Wolff III <bruno@wolff.to> writes: > > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------------ > > Sort (cost=11824.16..11831.55 rows=2956 width=33) (actual time=248.17..248.17 rows=0 loops=1) > > Sort Key: order_date > > -> Index Scan using transactions_pop_i on transactions > > (cost=0.00..11653.79 rows=2956 width=33) > > (actual time=126.13..126.13 rows=0 loops=1) > > Index Cond: (upper((pop)::text) = '79BCDC8A4A4F99E7C111111111111111'::text) > > Total runtime: 248.25 msec Yeah, the problem with functional indexes is that the optimizer doesn't have any clue how the records are distributed since it only has statistics for columns, not your expression. Notice it's estimating 2956 rows where in fact there are 0. I think someone was actually working on this so it may be improved in 7.5 but I'm not sure. Given the type of data you're storing, which looks like hex strings, are you sure you need to do a case-insensitive search here? Can't you just uppercase it when you store it? The other option would be to use a subquery and force the planner not to pull it up, something like: select code from ( select code from transactions where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111') offset 0 )order by order_date DESC; The offset 0 prevents the optimizer from pulling the subquery into the outer query. I think this will prevent it from even considering the order_date index scan, but you'll have to try to be sure. -- greg
> From: gsstark@mit.edu [mailto:gsstark@mit.edu] > Sent: viernes, 30 de enero de 2004 7:08 > > Yeah, the problem with functional indexes is that the optimizer doesn't have > any clue how the records are distributed since it only has statistics for > columns, not your expression. Notice it's estimating 2956 rows where in fact > there are 0. Thanks for the explication. > Given the type of data you're storing, which looks like hex strings, are you > sure you need to do a case-insensitive search here? Can't you just uppercase > it when you store it? That would be great but we store a variety of case insensitive proof of purchase codes here. Some we give to customers in upper case and some in lower case. Hopefully someday we can redesign it all to just be in uppercase... > The offset 0 prevents the optimizer from pulling the subquery into the outer > query. I think this will prevent it from even considering the order_date index > scan, but you'll have to try to be sure. It works perfectly - thanks a million! Strangely the offset 0 does not seem to make any difference. Gotta read up more about subqueries :) explain analyzeselect code,order_date from ( select code, order_date from transactions where UPPER(pop)= UPPER('c892eb2f877e3a28ddc8e196cd5a8aad') limit 1 ) as fooorder by order_date DESC; --------------------------------------------------Sort (cost=3.95..3.96 rows=1 width=33) (actual time=0.14..0.14 rows=1loops=1) Sort Key: order_date -> Subquery Scan foo (cost=0.00..3.94 rows=1 width=33) (actual time=0.06..0.07 rows=1loops=1) -> Limit (cost=0.00..3.94 rows=1 width=33) (actual time=0.06..0.06 rows=1 loops=1) -> Index Scan using transactions_pop_i on transactions (cost=0.00..11653.84 rows=2956 width=33) (actual time=0.05..0.06 rows=2 loops=1) Index Cond: (upper((pop)::text) = 'C892EB2F877E3A28DDC8E196CD5A8AAD'::text)Totalruntime: 0.20 msec (7 rows) explain analyzeselect code,order_date from ( select code, order_date from transactions where UPPER(pop)= UPPER('c892eb2f877e3a28ddc8e196cd5a8aad') limit 1 offset 0 ) as fooorder by order_dateDESC; --------------------------------------------------Sort (cost=3.95..3.96 rows=1 width=33) (actual time=0.14..0.14 rows=1loops=1) Sort Key: order_date -> Subquery Scan foo (cost=0.00..3.94 rows=1 width=33) (actual time=0.06..0.07 rows=1loops=1) -> Limit (cost=0.00..3.94 rows=1 width=33) (actual time=0.06..0.06 rows=1 loops=1) -> Index Scan using transactions_pop_i on transactions (cost=0.00..11653.84 rows=2956 width=33) (actual time=0.06..0.06 rows=2 loops=1) Index Cond: (upper((pop)::text) = 'C892EB2F877E3A28DDC8E196CD5A8AAD'::text)Totalruntime: 0.20 msec
"Alexandra Birch" <alexandra@trymedia.com> writes: > It works perfectly - thanks a million! > Strangely the offset 0 does not seem to make any difference. > Gotta read up more about subqueries :) > > explain analyze > select code,order_date > from ( > select code, order_date > from transactions > where UPPER(pop) = UPPER('c892eb2f877e3a28ddc8e196cd5a8aad') > limit 1 > ) as foo > order by order_date DESC; I think what you're trying to do here is get the last order? Then you'll want the limit to be on the outer query where it's ordered by order_date: select code,order_date from ( select code, order_date from transactions where UPPER(pop) =UPPER('c892eb2f877e3a28ddc8e196cd5a8aad') offset 0 ) as foo order by order_date DESC; limit 1 Note that in theory postgres should be able to find the same plan for this query as yours since it's equivalent. It really ought to use the order_date index since it thinks it would be more efficient. However it's unable to because postgres doesn't try every possible index, only the ones that look like they'll be useful for a where clause or an order by. And the order by on the outer query isn't considered when it's looking at the subquery. It normally handles this case by merging the subquery into the outer query, but it can't do that if there's a limit or offset. So an "offset 0" is convenient for fooling it into thinking the subquery can't be pulled up without actually changing the output. You could do "order by upper(pop)" instead which might be clearer for someone reading the query in that it makes it look like you're trying to encourage it to use the index on upper(pop). In theory "order by"s on subqueries are useless and postgres could ignore them, but it doesn't. -- greg