Thread: ORDER BY ... LIMIT.. performance

ORDER BY ... LIMIT.. performance

From

"john cartmell"

Date:

05 December 2002, 04:51:10

I am not sure whether this is a know problem but we discovered this the
other day.
We are using PostgreSQL 7.2.1 on Redhat 7.3.

The table has about over a million rows (~1.4).

The query concerned is of the form

SELECT *
FROM tblCompany
WHERE lower(companyname) like 'company%'
ORDER BY companyname
LIMIT 20,0

There is a functional index lower(companyname) for the like clause.

Without the LIMIT clause the query takes approximately 3-5 seconds to
return.
If total number of rows returned without the LIMIT clause is greater
than 20 records, then the above query also takes th same amount of time.
But if the the total number of rows is 20 or less then the time taken
for the above query to return goes up to 20-30 seconds. Has anyone else
come across this. We have managed to get round it by performing a count
first and only performing the LIMIT if there are enough rows but surely
the query should be able to do this itself!

John Cartmell

Re: ORDER BY ... LIMIT.. performance

From

Tom Lane

Date:

05 December 2002, 17:51:52

"john cartmell" <john.cartmell@mediaburst.co.uk> writes:
> Without the LIMIT clause the query takes approximately 3-5 seconds to
> return.
> If total number of rows returned without the LIMIT clause is greater
> than 20 records, then the above query also takes th same amount of time.
> But if the the total number of rows is 20 or less then the time taken
> for the above query to return goes up to 20-30 seconds.

What does EXPLAIN (or better EXPLAIN ANALYZE) show for these various
cases?  Evidently the planner is shifting to a different plan because
of the small LIMIT, but with no details it's hard to say anything
useful.

            regards, tom lane

Re: ORDER BY ... LIMIT.. performance

From

"Josh Berkus"

Date:

05 December 2002, 18:10:08

John,

> I am not sure whether this is a know problem but we discovered this
> the
> other day.
> We are using PostgreSQL 7.2.1 on Redhat 7.3.

First of all, there are a few bug-fixes between 7.2.1 and 7.2.3.  One
relates to backups, and another to security.  So you should upgrade to
7.2.3 immediately -- no init or restore from backup required (not
version 7.3, which has some significant changes).

> The table has about over a million rows (~1.4).
>
> The query concerned is of the form
>
> SELECT *
> FROM tblCompany
> WHERE lower(companyname) like 'company%'
> ORDER BY companyname
> LIMIT 20,0
>
> There is a functional index lower(companyname) for the like clause.
>
> Without the LIMIT clause the query takes approximately 3-5 seconds to
> return.
> If total number of rows returned without the LIMIT clause is greater
> than 20 records, then the above query also takes th same amount of
> time.
> But if the the total number of rows is 20 or less then the time taken
> for the above query to return goes up to 20-30 seconds. Has anyone
> else
> come across this. We have managed to get round it by performing a
> count
> first and only performing the LIMIT if there are enough rows but
> surely
> the query should be able to do this itself!

This seems very odd.  Please do the following:

1) Post an EXPLAIN ANALYZE statement for the above query, with limit,
that returns in 3-5 seconds.
2) Post an EXPLAIN ANALYZE for a query that returns slowly (20-30
seconds).

Thanks!

-Josh

Re: ORDER BY ... LIMIT.. performance

From

"john cartmell"

Date:

06 December 2002, 06:33:23



> 1) Post an EXPLAIN ANALYZE statement for the above query, with limit,
> that returns in 3-5 seconds.
> 2) Post an EXPLAIN ANALYZE for a query that returns slowly (20-30
> seconds).

The query:
SELECT * FROM tblcompany WHERE lower(companyname) like 'a g m%' ORDER BY
companyname;
returns 20 rows.
Its EXPLAIN ANALYZE is as follows:
    NOTICE:  QUERY PLAN:

    Sort  (cost=64196.18..64196.18 rows=6339 width=224) (actual
time=2274.64..2274.66 rows=20 loops=1)
      ->  Seq Scan on tblcompany  (cost=0.00..63795.86 rows=6339
width=224) (actual time=1023.37..2274.41 rows=20 loops=1)
    Total runtime: 2274.78 msec

When limit is 19:
    EXPLAIN ANALYZE SELECT * FROM tblcompany WHERE
lower(companyname) like 'a g m%' ORDER BY companyname LIMIT 19,0;
    NOTICE:  QUERY PLAN:

    Limit  (cost=0.00..4621.68 rows=19 width=223) (actual
time=561.20..563.11 rows=19 loops=1)
      ->  Index Scan using idx_tblcompany_companyname on tblcompany
(cost=0.00..1542006.83 rows=6339 width=223) (actual time=561.19..563.07
rows=20 loops=1)
    Total runtime: 563.22 msec


But when it is 20:
    EXPLAIN ANALYZE SELECT * FROM tblcompany WHERE
lower(companyname) like 'a g m%' ORDER BY companyname LIMIT 20,0;
    NOTICE:  QUERY PLAN:

    Limit  (cost=0.00..4864.92 rows=20 width=223) (actual
time=559.58..21895.02 rows=20 loops=1)
      ->  Index Scan using idx_tblcompany_companyname on tblcompany
(cost=0.00..1542006.83 rows=6339 width=223) (actual
time=559.57..21894.97 rows=20 loops=1)
    Total runtime: 21895.13 msec


    Admitedly the query without the limit has a different query plan
but the last two don't and yet vary wildly.
    John Cartmell

Re: ORDER BY ... LIMIT.. performance

From

"Josh Berkus"

Date:

06 December 2002, 13:16:23

John,

> But when it is 20:
>  EXPLAIN ANALYZE SELECT * FROM tblcompany WHERE
> lower(companyname) like 'a g m%' ORDER BY companyname LIMIT 20,0;
>  NOTICE:  QUERY PLAN:
>
>  Limit  (cost=0.00..4864.92 rows=20 width=223) (actual
> time=559.58..21895.02 rows=20 loops=1)
>    ->  Index Scan using idx_tblcompany_companyname on tblcompany
> (cost=0.00..1542006.83 rows=6339 width=223) (actual
> time=559.57..21894.97 rows=20 loops=1)
>  Total runtime: 21895.13 msec

That's extremely odd.   From the look of it, Postgres is taking an
extra 18 seconds just to find that 20th row.

Does this table expereince very frequent deletions and updates, or
perhaps mass record replacement from a file?   Try running VACUUM FULL
ANALYZE, and possibly even REINDEX on idx_tblcompany_companyname.
   Massive numbers of dead tuples could account for this performance
irregularity.

-Josh

Re: ORDER BY ... LIMIT.. performance

From

Tom Lane

Date:

06 December 2002, 16:29:46

"john cartmell" <john.cartmell@mediaburst.co.uk> writes:
> The query:
> SELECT * FROM tblcompany WHERE lower(companyname) like 'a g m%' ORDER BY
> companyname;
> returns 20 rows.
  ^^^^^^^^^^^^^^^

Ahh, light dawns.

> When limit is 19:
>     EXPLAIN ANALYZE SELECT * FROM tblcompany WHERE
> lower(companyname) like 'a g m%' ORDER BY companyname LIMIT 19,0;
>     NOTICE:  QUERY PLAN:

>     Limit  (cost=0.00..4621.68 rows=19 width=223) (actual
> time=561.20..563.11 rows=19 loops=1)
>       ->  Index Scan using idx_tblcompany_companyname on tblcompany
> (cost=0.00..1542006.83 rows=6339 width=223) (actual time=561.19..563.07
> rows=20 loops=1)
>     Total runtime: 563.22 msec

> But when it is 20:
>     EXPLAIN ANALYZE SELECT * FROM tblcompany WHERE
> lower(companyname) like 'a g m%' ORDER BY companyname LIMIT 20,0;
>     NOTICE:  QUERY PLAN:

>     Limit  (cost=0.00..4864.92 rows=20 width=223) (actual
> time=559.58..21895.02 rows=20 loops=1)
>       ->  Index Scan using idx_tblcompany_companyname on tblcompany
> (cost=0.00..1542006.83 rows=6339 width=223) (actual
> time=559.57..21894.97 rows=20 loops=1)
>     Total runtime: 21895.13 msec

The problem here is that in current releases, the Limit plan node tries
to fetch one more row than requested (you can see this in the actual
rowcounts for the first example).  So in your second example, the base
indexscan is actually being run to completion before the Limit gives up.
And since that scan is being used for ordering, not for implementing the
WHERE clause, it visits all the rows.  (When you leave off LIMIT, the
planner chooses a plan that's more amenable to fetching all the data...)

I recently revised the Limit logic so that it doesn't fetch the extra
row.  This takes more code, but you're not the first to complain of
the old behavior.  It'll be in 7.4, or if you're brave you could
probably apply the diff to 7.3.

In the meantime, a more appropriate query would be

SELECT * FROM tblcompany
WHERE lower(companyname) like 'a g m%'
ORDER BY lower(companyname)
LIMIT whatever

so that an index on lower(companyname) could be used both for the WHERE
clause and for the ordering.

            regards, tom lane