Thread: limit clause produces wrong query plan

limit clause produces wrong query plan

From
"Andrus"
Date:
Adding limit clause causes very slow query:

explain analyze select * from firma2.dok where doktyyp='J' order by dokumnr
limit 100
"Limit  (cost=0.00..4371.71 rows=100 width=1107) (actual
time=33189.971..33189.971 rows=0 loops=1)"
"  ->  Index Scan using dok_dokumnr_idx on dok  (cost=0.00..278740.01
rows=6376 width=1107) (actual time=33189.959..33189.959 rows=0 loops=1)"
"        Filter: (doktyyp = 'J'::bpchar)"
"Total runtime: 33190.103 ms"


Without limit is is fast:

explain analyze select * from firma2.dok where doktyyp='J' order by dokumnr
"Sort  (cost=7061.80..7077.74 rows=6376 width=1107) (actual
time=0.119..0.119 rows=0 loops=1)"
"  Sort Key: dokumnr"
"  ->  Index Scan using dok_doktyyp on dok  (cost=0.00..3118.46 rows=6376
width=1107) (actual time=0.101..0.101 rows=0 loops=1)"
"        Index Cond: (doktyyp = 'J'::bpchar)"
"Total runtime: 0.245 ms"

How to fix this without dropping dok_doktyyp index so that limit can safely
used for paged data access ?

indexes:

dok_doktyyp:  dok(doktyyp)
dok_dokumnr_idx:  dok(dokumnr)

types:

dokumnr int primary key
doktyyp char(1)

Andrus.

Using 8.1.4


Re: limit clause produces wrong query plan

From
"Andrus"
Date:
> it was veery fast. To be honest I do not know what is happening?!

This is really weird.
It seems that PostgreSql OFFSET / LIMIT are not optimized and thus typical
paging queries

SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET pageno*100 LIMIT 100

or even first page query

SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100

cannot be used in PostgreSql at all for big tables.

Do you have any idea how to fix this ?

Andrus.


Re: limit clause produces wrong query plan

From
"Scott Marlowe"
Date:
On Mon, Nov 24, 2008 at 10:26 AM, Andrus <kobruleht2@hot.ee> wrote:
>> it was veery fast. To be honest I do not know what is happening?!
>
> This is really weird.
> It seems that PostgreSql OFFSET / LIMIT are not optimized and thus typical
> paging queries

And how exactly should it be optimized?  If a query is even moderately
interesting, with a few joins and a where clause, postgresql HAS to
create the rows that come before your offset in order to assure that
it's giving you the right rows.

> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET pageno*100 LIMIT 100

This will get progressively slower as pageno goes up.

> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100

That should be plenty fast.

> cannot be used in PostgreSql at all for big tables.

Can't be used in any real database with any real complexity to its query either.

> Do you have any idea how to fix this ?

A standard workaround is to use some kind of sequential, or nearly so,
id field, and then use between on that field.

select * from table where idfield between x and x+100;

Re: limit clause produces wrong query plan

From
"Andrus"
Date:
Scott,

>And how exactly should it be optimized?  If a query is even moderately
>interesting, with a few joins and a where clause, postgresql HAS to
>create the rows that come before your offset in order to assure that
>it's giving you the right rows.

SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 100 LIMIT 100

It should scan primary key in index order for 200 first keys and skipping
first 100 keys.

>> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100
>
> That should be plenty fast.

The example which I  posted shows that

SELECT ... FROM bigtable ORDER BY intprimarykey LIMIT 100

this is extremely *slow*: seq scan is  performed over whole bigtable.

> A standard workaround is to use some kind of sequential, or nearly so,
> id field, and then use between on that field.
>
> select * from table where idfield between x and x+100;

Users can delete and insert any rows in table.
This appoarch requires updating x in every row in big table after each
insert, delete or order column change and is thus extremely slow.
So I do'nt understand how this can be used for large tables.

Andrus.


Re: limit clause produces wrong query plan

From
Chris
Date:
Andrus wrote:
> Scott,
>
>> And how exactly should it be optimized?  If a query is even moderately
>> interesting, with a few joins and a where clause, postgresql HAS to
>> create the rows that come before your offset in order to assure that
>> it's giving you the right rows.
>
> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 100 LIMIT 100
>
> It should scan primary key in index order for 200 first keys and
> skipping first 100 keys.

... which if you have a lot of table joins, unions/intersects/whatever
else, should be done on which field and how?

For a query like:

select * t1 join t2 using (id) where t1.id='x' order by t1.id limit 100;

it has to join the tables first (may involve a seq scan) to make sure
the id's match up, reduce the number of rows to match the where clause
(may/may not be done first, I don't know) - the limit is applied last.

it can't grab the first 100 entries from t1 - because they might not
have a matching id in t2, let alone match the where clause.

--
Postgresql & php tutorials
http://www.designmagick.com/


Re: limit clause produces wrong query plan

From
PFC
Date:
> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 100 LIMIT 100

    I think pagination is overrated.

    If the query produces, for instance, something like 100 rows or less,
more often than not, getting all the rows will take the exact same time as
getting a portion of the rows... in all those cases, it is much better to
cache the results somewhere (user session, table, whatever) and paginate
based on that, rather than perform the same query lots of times.
Especially when some non-indexed sorting takes place in which case you are
gonna fetch all the rows anyway. Something like row-id can be stored
instead of the full rows, also. There are exceptions of course.

    And if the query produces 20.000 results... who is ever going to scroll
to page 1257 ?

> The example which I  posted shows that
>
> SELECT ... FROM bigtable ORDER BY intprimarykey LIMIT 100
>
> this is extremely *slow*: seq scan is  performed over whole bigtable.

    This is wrong though. It should use an index, especially if you have a
LIMIT...


Re: limit clause produces wrong query plan

From
Scott Carey
Date:
I believe his earlier (original) complaint was that it was slower with the LIMIT than with no limit.  As in, the select
(*)query was faster to get the whole thing than apply the limit.  Wherever that is the case, it is broken. 

Certainly a complex join makes this more difficult, but one would agree that with a LIMIT it should never be slower
thanwithout, right? 

Any LIMIT combined with an ORDER by at the last stage can't be completed without fetching every row and sorting.  If
thereis an index on the column being sorted at each table in the join it is possible to short-circuit the plan after
enoughresult rows are found, but this would have to iterate on each arm of the join until there were enough matches,
andpostgres does not have any such iterative query strategy that I'm aware of. 
For a single large, indexed table with no joins it certainly makes sense to terminate the index scan early.

Often, one is better off with explicit subselects for the join branches with explicit LIMIT on each of those (oversized
tocompensate for the likelihood of a match) but you have to be willing to get less than the LIMIT ammount if there are
notenough matches, and that is usually bad for 'paging' style queries rather than 'report on top X' queries where the
prematuretruncation is probably ok and is enough to make the query fast.  But if you have very large datasets on the
querybranch arms, asking for only 10K of 300K rows to sort and then match against 10K in another arm, and find 500
items,can be a huge gain versus doing the entire thing. 


-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Scott Marlowe
Sent: Monday, November 24, 2008 11:23 AM
To: Andrus
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] limit clause produces wrong query plan

On Mon, Nov 24, 2008 at 10:26 AM, Andrus <kobruleht2@hot.ee> wrote:
>> it was veery fast. To be honest I do not know what is happening?!
>
> This is really weird.
> It seems that PostgreSql OFFSET / LIMIT are not optimized and thus typical
> paging queries

And how exactly should it be optimized?  If a query is even moderately
interesting, with a few joins and a where clause, postgresql HAS to
create the rows that come before your offset in order to assure that
it's giving you the right rows.

> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET pageno*100 LIMIT 100

This will get progressively slower as pageno goes up.

> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100

That should be plenty fast.

> cannot be used in PostgreSql at all for big tables.

Can't be used in any real database with any real complexity to its query either.

> Do you have any idea how to fix this ?

A standard workaround is to use some kind of sequential, or nearly so,
id field, and then use between on that field.

select * from table where idfield between x and x+100;

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance