Thread: limit 1 and functional indexes

limit 1 and functional indexes

From
"Alexandra Birch"
Date:
Hi, 

Postgres choses the wrong index when I add limit 1 to the query.
This should not affect the index chosen.
I read that functional indexes are sometimes not chosen correctly by 
optimizer. 
Is there anything I can do to always use the functional index in the
following queries? 

Query with limit 1 choses wrong index:
---------------------------------------------------------------------------------------
explain
select code 
from transactions 
where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111')
order by order_date DESC LIMIT 1

Index Scan Backward using transactions_date_aff on transactions (cost=0.00..930780.96 rows=2879 width=33)
---------------------------------------------------------------------------------------

Without limit 1 choses correct index:
---------------------------------------------------------------------------------------
explain
select code 
from transactions 
where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111')
order by order_date DESC

Index Scan using transactions_pop_i on transactions  (cost=0.00..11351.72 rows=2879 width=33)
---------------------------------------------------------------------------------------

We have postgresql-7.3.2-3.
Thank you,

Alexandra


Re: limit 1 and functional indexes

From
Bruno Wolff III
Date:
On Wed, Jan 28, 2004 at 12:23:38 +0100, Alexandra Birch <alexandra@trymedia.com> wrote:
> Hi, 
> 
> Postgres choses the wrong index when I add limit 1 to the query.
> This should not affect the index chosen.

I don't know the complete answer to your question, but since no one else
has commented I will answer what I can.

It IS reasobable for the planner to choose a different plan when you
add a LIMIT clause to a query.

> I read that functional indexes are sometimes not chosen correctly by 
> optimizer. 

I don't believe there are any particular problems with functional indexes.
The opitmizer isn't perfect and will sometimes choose poor plans.

> Is there anything I can do to always use the functional index in the
> following queries? 

Have you done an ANALYZE of the table recently?

It might be useful to see the EXPLAIN ANALYZE output, rather than just
the EXPLAIN output, as that will give the actual times needed to do
the various steps.

> 
> Query with limit 1 choses wrong index:
> ---------------------------------------------------------------------------------------
> explain
> select code 
> from transactions 
> where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111')
> order by order_date DESC LIMIT 1
> 
> Index Scan Backward using transactions_date_aff on transactions (cost=0.00..930780.96 rows=2879 width=33)
> ---------------------------------------------------------------------------------------
> 
> Without limit 1 choses correct index:
> ---------------------------------------------------------------------------------------
> explain
> select code 
> from transactions 
> where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111')
> order by order_date DESC
> 
> Index Scan using transactions_pop_i on transactions  (cost=0.00..11351.72 rows=2879 width=33)
> ---------------------------------------------------------------------------------------
> 
> We have postgresql-7.3.2-3.
> Thank you,
> 
> Alexandra
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly


Re: limit 1 and functional indexes

From
"Alexandra Birch"
Date:
> >
> > Postgres choses the wrong index when I add limit 1 to the query.
> > This should not affect the index chosen.
>
> I don't know the complete answer to your question, but since no one else
> has commented I will answer what I can.

Thanks - your reply is apreciated :)

> It IS reasobable for the planner to choose a different plan when you
> add a LIMIT clause to a query.

OK - I'll investigate this further.

> > I read that functional indexes are sometimes not chosen correctly by
> > optimizer.
>
> I don't believe there are any particular problems with functional indexes.
> The opitmizer isn't perfect and will sometimes choose poor plans.

OK - but there was some discussion about statistics for functional indexes, for eg:
http://archives.postgresql.org/pgsql-general/2004-01/msg00978.php
This does not help me solve my problem though :)

> > Is there anything I can do to always use the functional index in the
> > following queries?
>
> Have you done an ANALYZE of the table recently?

Yip - I should have said we do a daily VACUUM ANALYZE.

> It might be useful to see the EXPLAIN ANALYZE output, rather than just
> the EXPLAIN output, as that will give the actual times needed to do
> the various steps.

I thought the cost values would be enough from the EXPLAIN alone.
And the query takes so long to run :(

Here is the output of EXPLAIN ANALYZE first with limit 1 then without:

explain analyze
select code
from transactions
where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111')
order by order_date DESC LIMIT 1;
--------------------------------------------------------------------------------------------------Limit
(cost=0.00..332.44rows=1 width=33) (actual time=377745.75..377745.75 rows=0 loops=1)  ->  Index Scan Backward using
transactions_date_affon transactions  (cost=0.00..982549.96 rows=2956 width=33) (actual
 
time=377718.61..377718.61 rows=0 loops=1)        Filter: (upper((pop)::text) =
'79BCDC8A4A4F99E7C111111111111111'::text)Totalruntime: 378439.32 msec
 

explain analyze
select code
from transactions
where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111')
order by order_date DESC;                                                                  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
-------------Sort  (cost=11824.16..11831.55 rows=2956 width=33) (actual time=248.17..248.17 rows=0 loops=1)  Sort Key:
order_date ->  Index Scan using transactions_pop_i on transactions  (cost=0.00..11653.79 rows=2956 width=33) (actual
time=126.13..126.13
rows=0 loops=1)        Index Cond: (upper((pop)::text) = '79BCDC8A4A4F99E7C111111111111111'::text)Total runtime: 248.25
msec

Thank you,

Alexandra



Re: limit 1 and functional indexes

From
Bruno Wolff III
Date:
On Thu, Jan 29, 2004 at 16:02:06 +0100, Alexandra Birch <alexandra@trymedia.com> wrote:
> 
> Here is the output of EXPLAIN ANALYZE first with limit 1 then without:

The time estimate for the limit 1 case is way off. I can't tell if that
is a bug or not having detailed enough statistics.

Hopefully someone more knowlegable will take a look at this question.

> 
> explain analyze
> select code
> from transactions
> where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111')
> order by order_date DESC LIMIT 1;
> --------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..332.44 rows=1 width=33) (actual time=377745.75..377745.75 rows=0 loops=1)
>    ->  Index Scan Backward using transactions_date_aff on transactions  (cost=0.00..982549.96 rows=2956 width=33)
(actual
> time=377718.61..377718.61 rows=0 loops=1)
>          Filter: (upper((pop)::text) = '79BCDC8A4A4F99E7C111111111111111'::text)
>  Total runtime: 378439.32 msec
> 
> explain analyze
> select code
> from transactions
> where UPPER(pop) = UPPER('79bcdc8a4a4f99e7c111111111111111')
> order by order_date DESC;
>                                                                    QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------
> -------------
>  Sort  (cost=11824.16..11831.55 rows=2956 width=33) (actual time=248.17..248.17 rows=0 loops=1)
>    Sort Key: order_date
>    ->  Index Scan using transactions_pop_i on transactions  (cost=0.00..11653.79 rows=2956 width=33) (actual
time=126.13..126.13
> rows=0 loops=1)
>          Index Cond: (upper((pop)::text) = '79BCDC8A4A4F99E7C111111111111111'::text)
>  Total runtime: 248.25 msec
> 
> Thank you,
> 
> Alexandra
> 


Re: limit 1 and functional indexes

From
Greg Stark
Date:
Bruno Wolff III <bruno@wolff.to> writes:

> >                                                                    QUERY PLAN
> >
------------------------------------------------------------------------------------------------------------------------------------
> >  Sort  (cost=11824.16..11831.55 rows=2956 width=33) (actual time=248.17..248.17 rows=0 loops=1)
> >    Sort Key: order_date
> >    ->  Index Scan using transactions_pop_i on transactions
> >             (cost=0.00..11653.79 rows=2956 width=33) 
> >             (actual time=126.13..126.13 rows=0 loops=1)
> >          Index Cond: (upper((pop)::text) = '79BCDC8A4A4F99E7C111111111111111'::text)
> >  Total runtime: 248.25 msec


Yeah, the problem with functional indexes is that the optimizer doesn't have
any clue how the records are distributed since it only has statistics for
columns, not your expression. Notice it's estimating 2956 rows where in fact
there are 0.

I think someone was actually working on this so it may be improved in 7.5 but
I'm not sure.

Given the type of data you're storing, which looks like hex strings, are you
sure you need to do a case-insensitive search here? Can't you just uppercase
it when you store it?

The other option would be to use a subquery and force the planner not to pull
it up, something like:

select code  from (        select code           from transactions          where UPPER(pop) =
UPPER('79bcdc8a4a4f99e7c111111111111111')        offset 0       )order by order_date DESC;
 


The offset 0 prevents the optimizer from pulling the subquery into the outer
query. I think this will prevent it from even considering the order_date index
scan, but you'll have to try to be sure.

-- 
greg



Re: limit 1 and functional indexes: SOLVED

From
"Alexandra Birch"
Date:
> From: gsstark@mit.edu [mailto:gsstark@mit.edu]
> Sent: viernes, 30 de enero de 2004 7:08
>
> Yeah, the problem with functional indexes is that the optimizer doesn't have
> any clue how the records are distributed since it only has statistics for
> columns, not your expression. Notice it's estimating 2956 rows where in fact
> there are 0.

Thanks for the explication.

> Given the type of data you're storing, which looks like hex strings, are you
> sure you need to do a case-insensitive search here? Can't you just uppercase
> it when you store it?

That would be great but we store a variety of case insensitive proof of purchase
codes here. Some we give to customers in upper case and some in lower case.
Hopefully someday we can redesign it all to just be in uppercase...

> The offset 0 prevents the optimizer from pulling the subquery into the outer
> query. I think this will prevent it from even considering the order_date index
> scan, but you'll have to try to be sure.

It works perfectly - thanks a million!
Strangely the offset 0 does not seem to make any difference.
Gotta read up more about subqueries :)
explain analyzeselect code,order_date  from (        select code, order_date          from transactions         where
UPPER(pop)= UPPER('c892eb2f877e3a28ddc8e196cd5a8aad')         limit 1       ) as fooorder by order_date DESC;
 
--------------------------------------------------Sort  (cost=3.95..3.96 rows=1 width=33) (actual time=0.14..0.14
rows=1loops=1)  Sort Key: order_date  ->  Subquery Scan foo  (cost=0.00..3.94 rows=1 width=33) (actual time=0.06..0.07
rows=1loops=1)        ->  Limit  (cost=0.00..3.94 rows=1 width=33) (actual time=0.06..0.06 rows=1 loops=1)
-> Index Scan using transactions_pop_i on transactions  (cost=0.00..11653.84 rows=2956 width=33) (actual
 
time=0.05..0.06 rows=2 loops=1)                    Index Cond: (upper((pop)::text) =
'C892EB2F877E3A28DDC8E196CD5A8AAD'::text)Totalruntime: 0.20 msec
 
(7 rows)


explain analyzeselect code,order_date  from (        select code, order_date          from transactions         where
UPPER(pop)= UPPER('c892eb2f877e3a28ddc8e196cd5a8aad')         limit 1         offset 0       ) as fooorder by
order_dateDESC;
 
--------------------------------------------------Sort  (cost=3.95..3.96 rows=1 width=33) (actual time=0.14..0.14
rows=1loops=1)  Sort Key: order_date  ->  Subquery Scan foo  (cost=0.00..3.94 rows=1 width=33) (actual time=0.06..0.07
rows=1loops=1)        ->  Limit  (cost=0.00..3.94 rows=1 width=33) (actual time=0.06..0.06 rows=1 loops=1)
-> Index Scan using transactions_pop_i on transactions  (cost=0.00..11653.84 rows=2956 width=33) (actual
 
time=0.06..0.06 rows=2 loops=1)                    Index Cond: (upper((pop)::text) =
'C892EB2F877E3A28DDC8E196CD5A8AAD'::text)Totalruntime: 0.20 msec
 





Re: limit 1 and functional indexes: SOLVED

From
Greg Stark
Date:
"Alexandra Birch" <alexandra@trymedia.com> writes:

> It works perfectly - thanks a million!
> Strangely the offset 0 does not seem to make any difference.
> Gotta read up more about subqueries :)
> 
>  explain analyze
>  select code,order_date
>    from (
>          select code, order_date
>            from transactions
>           where UPPER(pop) = UPPER('c892eb2f877e3a28ddc8e196cd5a8aad')
>           limit 1
>         ) as foo
>  order by order_date DESC;

I think what you're trying to do here is get the last order? Then you'll want
the limit to be on the outer query where it's ordered by order_date:
 select code,order_date   from (         select code, order_date           from transactions          where UPPER(pop)
=UPPER('c892eb2f877e3a28ddc8e196cd5a8aad')         offset 0        ) as foo  order by order_date DESC;  limit 1
 

Note that in theory postgres should be able to find the same plan for this
query as yours since it's equivalent. It really ought to use the order_date
index since it thinks it would be more efficient. 

However it's unable to because postgres doesn't try every possible index, only
the ones that look like they'll be useful for a where clause or an order by.
And the order by on the outer query isn't considered when it's looking at the
subquery. 

It normally handles this case by merging the subquery into the outer query,
but it can't do that if there's a limit or offset. So an "offset 0" is
convenient for fooling it into thinking the subquery can't be pulled up
without actually changing the output.

You could do "order by upper(pop)" instead which might be clearer for someone
reading the query in that it makes it look like you're trying to encourage it
to use the index on upper(pop). In theory "order by"s on subqueries are
useless and postgres could ignore them, but it doesn't.

-- 
greg