Thread: limit clause produces wrong query plan
Adding limit clause causes very slow query: explain analyze select * from firma2.dok where doktyyp='J' order by dokumnr limit 100 "Limit (cost=0.00..4371.71 rows=100 width=1107) (actual time=33189.971..33189.971 rows=0 loops=1)" " -> Index Scan using dok_dokumnr_idx on dok (cost=0.00..278740.01 rows=6376 width=1107) (actual time=33189.959..33189.959 rows=0 loops=1)" " Filter: (doktyyp = 'J'::bpchar)" "Total runtime: 33190.103 ms" Without limit is is fast: explain analyze select * from firma2.dok where doktyyp='J' order by dokumnr "Sort (cost=7061.80..7077.74 rows=6376 width=1107) (actual time=0.119..0.119 rows=0 loops=1)" " Sort Key: dokumnr" " -> Index Scan using dok_doktyyp on dok (cost=0.00..3118.46 rows=6376 width=1107) (actual time=0.101..0.101 rows=0 loops=1)" " Index Cond: (doktyyp = 'J'::bpchar)" "Total runtime: 0.245 ms" How to fix this without dropping dok_doktyyp index so that limit can safely used for paged data access ? indexes: dok_doktyyp: dok(doktyyp) dok_dokumnr_idx: dok(dokumnr) types: dokumnr int primary key doktyyp char(1) Andrus. Using 8.1.4
> it was veery fast. To be honest I do not know what is happening?! This is really weird. It seems that PostgreSql OFFSET / LIMIT are not optimized and thus typical paging queries SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET pageno*100 LIMIT 100 or even first page query SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100 cannot be used in PostgreSql at all for big tables. Do you have any idea how to fix this ? Andrus.
On Mon, Nov 24, 2008 at 10:26 AM, Andrus <kobruleht2@hot.ee> wrote: >> it was veery fast. To be honest I do not know what is happening?! > > This is really weird. > It seems that PostgreSql OFFSET / LIMIT are not optimized and thus typical > paging queries And how exactly should it be optimized? If a query is even moderately interesting, with a few joins and a where clause, postgresql HAS to create the rows that come before your offset in order to assure that it's giving you the right rows. > SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET pageno*100 LIMIT 100 This will get progressively slower as pageno goes up. > SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100 That should be plenty fast. > cannot be used in PostgreSql at all for big tables. Can't be used in any real database with any real complexity to its query either. > Do you have any idea how to fix this ? A standard workaround is to use some kind of sequential, or nearly so, id field, and then use between on that field. select * from table where idfield between x and x+100;
Scott, >And how exactly should it be optimized? If a query is even moderately >interesting, with a few joins and a where clause, postgresql HAS to >create the rows that come before your offset in order to assure that >it's giving you the right rows. SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 100 LIMIT 100 It should scan primary key in index order for 200 first keys and skipping first 100 keys. >> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100 > > That should be plenty fast. The example which I posted shows that SELECT ... FROM bigtable ORDER BY intprimarykey LIMIT 100 this is extremely *slow*: seq scan is performed over whole bigtable. > A standard workaround is to use some kind of sequential, or nearly so, > id field, and then use between on that field. > > select * from table where idfield between x and x+100; Users can delete and insert any rows in table. This appoarch requires updating x in every row in big table after each insert, delete or order column change and is thus extremely slow. So I do'nt understand how this can be used for large tables. Andrus.
Andrus wrote: > Scott, > >> And how exactly should it be optimized? If a query is even moderately >> interesting, with a few joins and a where clause, postgresql HAS to >> create the rows that come before your offset in order to assure that >> it's giving you the right rows. > > SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 100 LIMIT 100 > > It should scan primary key in index order for 200 first keys and > skipping first 100 keys. ... which if you have a lot of table joins, unions/intersects/whatever else, should be done on which field and how? For a query like: select * t1 join t2 using (id) where t1.id='x' order by t1.id limit 100; it has to join the tables first (may involve a seq scan) to make sure the id's match up, reduce the number of rows to match the where clause (may/may not be done first, I don't know) - the limit is applied last. it can't grab the first 100 entries from t1 - because they might not have a matching id in t2, let alone match the where clause. -- Postgresql & php tutorials http://www.designmagick.com/
> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 100 LIMIT 100 I think pagination is overrated. If the query produces, for instance, something like 100 rows or less, more often than not, getting all the rows will take the exact same time as getting a portion of the rows... in all those cases, it is much better to cache the results somewhere (user session, table, whatever) and paginate based on that, rather than perform the same query lots of times. Especially when some non-indexed sorting takes place in which case you are gonna fetch all the rows anyway. Something like row-id can be stored instead of the full rows, also. There are exceptions of course. And if the query produces 20.000 results... who is ever going to scroll to page 1257 ? > The example which I posted shows that > > SELECT ... FROM bigtable ORDER BY intprimarykey LIMIT 100 > > this is extremely *slow*: seq scan is performed over whole bigtable. This is wrong though. It should use an index, especially if you have a LIMIT...
I believe his earlier (original) complaint was that it was slower with the LIMIT than with no limit. As in, the select (*)query was faster to get the whole thing than apply the limit. Wherever that is the case, it is broken. Certainly a complex join makes this more difficult, but one would agree that with a LIMIT it should never be slower thanwithout, right? Any LIMIT combined with an ORDER by at the last stage can't be completed without fetching every row and sorting. If thereis an index on the column being sorted at each table in the join it is possible to short-circuit the plan after enoughresult rows are found, but this would have to iterate on each arm of the join until there were enough matches, andpostgres does not have any such iterative query strategy that I'm aware of. For a single large, indexed table with no joins it certainly makes sense to terminate the index scan early. Often, one is better off with explicit subselects for the join branches with explicit LIMIT on each of those (oversized tocompensate for the likelihood of a match) but you have to be willing to get less than the LIMIT ammount if there are notenough matches, and that is usually bad for 'paging' style queries rather than 'report on top X' queries where the prematuretruncation is probably ok and is enough to make the query fast. But if you have very large datasets on the querybranch arms, asking for only 10K of 300K rows to sort and then match against 10K in another arm, and find 500 items,can be a huge gain versus doing the entire thing. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Scott Marlowe Sent: Monday, November 24, 2008 11:23 AM To: Andrus Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] limit clause produces wrong query plan On Mon, Nov 24, 2008 at 10:26 AM, Andrus <kobruleht2@hot.ee> wrote: >> it was veery fast. To be honest I do not know what is happening?! > > This is really weird. > It seems that PostgreSql OFFSET / LIMIT are not optimized and thus typical > paging queries And how exactly should it be optimized? If a query is even moderately interesting, with a few joins and a where clause, postgresql HAS to create the rows that come before your offset in order to assure that it's giving you the right rows. > SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET pageno*100 LIMIT 100 This will get progressively slower as pageno goes up. > SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100 That should be plenty fast. > cannot be used in PostgreSql at all for big tables. Can't be used in any real database with any real complexity to its query either. > Do you have any idea how to fix this ? A standard workaround is to use some kind of sequential, or nearly so, id field, and then use between on that field. select * from table where idfield between x and x+100; -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance