Thread: Inefficient SELECT with OFFSET and LIMIT

Inefficient SELECT with OFFSET and LIMIT

From
Clive Page
Date:
I have just discovered that if one does a SELECT with a LIMIT and OFFSET
     values, say

SELECT myfunc(mycol) FROM table LIMIT 50 OFFSET 10000 ;

Then the whole of the selection expressions, including the function calls,
are actuall executed for every record, not just those being selected but
also those being skipped, i.e. 10050 in this case.
Actually it's even odder, as the number is that plus one, as the next
record in sequence is also passed to the function.

I discovered this by accident, since I was using a user-defined function
in pl/pgsql and included by mistake some debug code using RAISE INFO, so
this diagnostic output gave the game away (and all of it came out before
any of the results of the selection, which was another surprise).

It looks as if OFFSET is implemented just be throwing away the results,
until the OFFSET has been reached.

It would be nice if OFFSET could be implemented in some more efficient
way.



--
Clive Page,
Dept of Physics & Astronomy,
University of Leicester, U.K.


Re: Inefficient SELECT with OFFSET and LIMIT

From
Greg Stark
Date:
Clive Page <cgp@leicester.ac.uk> writes:

> SELECT myfunc(mycol) FROM table LIMIT 50 OFFSET 10000 ;

> It looks as if OFFSET is implemented just be throwing away the results,
> until the OFFSET has been reached.
>
> It would be nice if OFFSET could be implemented in some more efficient
> way.

You could do something like:

select myfunc(mycol) from (select mycol from table limit 50 offset 10000) as x;

I think it's not easy for the optimizer to do it because there are lots of
cases where it can't. Consider if you had an ORDER BY clause on the myfunc
output column for example. Or if myfunc was a set-returning function.

--
greg

Re: Inefficient SELECT with OFFSET and LIMIT

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Clive Page <cgp@leicester.ac.uk> writes:
>> It would be nice if OFFSET could be implemented in some more efficient
>> way.

> You could do something like:

> select myfunc(mycol) from (select mycol from table limit 50 offset 10000) as x;

Note that this won't eliminate the major inefficiency, which is having
to read 10000+50 rows from the table.  But if myfunc() has side-effects
or is very expensive to run, it'd probably be worth doing.

> I think it's not easy for the optimizer to do it because there are lots of
> cases where it can't.

I don't actually know of any cases where it could do much of anything to
avoid fetching the OFFSET rows.  The problems are basically the same as
with COUNT(*) optimization: without examining each row, you don't know
if it would have been returned or not.  We could possibly postpone
evaluation of the SELECT output list until after the OFFSET step (thus
automating the above hack), but even that only works if there are no
set-returning functions in the output list ...

            regards, tom lane

PS: BTW, the one-extra-row effect that Clive noted is gone in 7.4.