Re: ORDER BY ... LIMIT.. performance - Mailing list pgsql-performance

From Tom Lane
Subject Re: ORDER BY ... LIMIT.. performance
Date
Msg-id 13210.1039210098@sss.pgh.pa.us
Whole thread Raw
In response to Re: ORDER BY ... LIMIT.. performance  ("john cartmell" <john.cartmell@mediaburst.co.uk>)
List pgsql-performance
"john cartmell" <john.cartmell@mediaburst.co.uk> writes:
> The query:
> SELECT * FROM tblcompany WHERE lower(companyname) like 'a g m%' ORDER BY
> companyname;
> returns 20 rows.
  ^^^^^^^^^^^^^^^

Ahh, light dawns.

> When limit is 19:
>     EXPLAIN ANALYZE SELECT * FROM tblcompany WHERE
> lower(companyname) like 'a g m%' ORDER BY companyname LIMIT 19,0;
>     NOTICE:  QUERY PLAN:

>     Limit  (cost=0.00..4621.68 rows=19 width=223) (actual
> time=561.20..563.11 rows=19 loops=1)
>       ->  Index Scan using idx_tblcompany_companyname on tblcompany
> (cost=0.00..1542006.83 rows=6339 width=223) (actual time=561.19..563.07
> rows=20 loops=1)
>     Total runtime: 563.22 msec

> But when it is 20:
>     EXPLAIN ANALYZE SELECT * FROM tblcompany WHERE
> lower(companyname) like 'a g m%' ORDER BY companyname LIMIT 20,0;
>     NOTICE:  QUERY PLAN:

>     Limit  (cost=0.00..4864.92 rows=20 width=223) (actual
> time=559.58..21895.02 rows=20 loops=1)
>       ->  Index Scan using idx_tblcompany_companyname on tblcompany
> (cost=0.00..1542006.83 rows=6339 width=223) (actual
> time=559.57..21894.97 rows=20 loops=1)
>     Total runtime: 21895.13 msec

The problem here is that in current releases, the Limit plan node tries
to fetch one more row than requested (you can see this in the actual
rowcounts for the first example).  So in your second example, the base
indexscan is actually being run to completion before the Limit gives up.
And since that scan is being used for ordering, not for implementing the
WHERE clause, it visits all the rows.  (When you leave off LIMIT, the
planner chooses a plan that's more amenable to fetching all the data...)

I recently revised the Limit logic so that it doesn't fetch the extra
row.  This takes more code, but you're not the first to complain of
the old behavior.  It'll be in 7.4, or if you're brave you could
probably apply the diff to 7.3.

In the meantime, a more appropriate query would be

SELECT * FROM tblcompany
WHERE lower(companyname) like 'a g m%'
ORDER BY lower(companyname)
LIMIT whatever

so that an index on lower(companyname) could be used both for the WHERE
clause and for the ordering.

            regards, tom lane

pgsql-performance by date:

Previous
From: Rod Taylor
Date:
Subject: Re: ALTER TABLE .. < ADD | DROP > OIDS
Next
From: Hannu Krosing
Date:
Subject: Re: Speeding up aggregates