Thread: Shortcutting too-large offsets?

Shortcutting too-large offsets?

From
Josh Berkus
Date:
All,

Here's a case which it seems like we ought to be able to optimize for:

datamart-# ORDER BY txn_timestamp DESC
datamart-# LIMIT 200
datamart-# OFFSET 6000;

                                       QUERY PLAN

---------------------------
 Limit  (cost=560529.82..560529.82 rows=1 width=145) (actual
time=22419.760..22419.760 rows=0 loops=1)
   ->  Sort  (cost=560516.17..560529.82 rows=5459 width=145) (actual
time=22418.076..22419.144 rows=5828 loops=1)
         Sort Key: lh.txn_timestamp
         Sort Method: quicksort  Memory: 1744kB
         ->  Nested Loop Left Join  (cost=0.00..560177.32 rows=5459
width=145) (actual time=4216.898..22398.658 rows=5828 loops=1)
               ->  Nested Loop Left Join  (cost=0.00..88186.22 rows=5459
width=135) (actual time=4216.747..19250.891 rows=5828 loops=1)
                     ->  Nested Loop Left Join  (cost=0.00..86657.26
rows=5459 width=124) (actual time=4216.723..19206.461 rows=5828 loops=1)

... it seems like, if we get as far as the sort and the executors knows
that there are less rows than the final offset, it ought to be able to
skip the final sort.

Is there some non-obvious reason which would make this kind of
optimization difficult?  Doesn't the executor know at that point how
many rows it has?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Shortcutting too-large offsets?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Here's a case which it seems like we ought to be able to optimize for:
> [ offset skips all the output of a sort node ]
> Is there some non-obvious reason which would make this kind of
> optimization difficult?  Doesn't the executor know at that point how
> many rows it has?

In principle, yeah, we could make it do that, but it seems like a likely
source of maintenance headaches.  This example is not exactly compelling
enough to make me want to do it.  Large OFFSETs are always going to be
problematic from a performance standpoint, and the fact that we could
short-circuit this one corner case isn't really going to make them much
more usable.

            regards, tom lane

Re: Shortcutting too-large offsets?

From
pasman pasmański
Date:
It may be difficult, i think. When unsorted recordset is stored in
temp table, number of records may be saved and used. Otherwise it is
unknown.

2011/9/30, Josh Berkus <josh@agliodbs.com>:
> All,
>
> Here's a case which it seems like we ought to be able to optimize for:
>
> datamart-# ORDER BY txn_timestamp DESC
> datamart-# LIMIT 200
> datamart-# OFFSET 6000;
>
>                                        QUERY PLAN
>
> ---------------------------
>  Limit  (cost=560529.82..560529.82 rows=1 width=145) (actual
> time=22419.760..22419.760 rows=0 loops=1)
>    ->  Sort  (cost=560516.17..560529.82 rows=5459 width=145) (actual
> time=22418.076..22419.144 rows=5828 loops=1)
>          Sort Key: lh.txn_timestamp
>          Sort Method: quicksort  Memory: 1744kB
>          ->  Nested Loop Left Join  (cost=0.00..560177.32 rows=5459
> width=145) (actual time=4216.898..22398.658 rows=5828 loops=1)
>                ->  Nested Loop Left Join  (cost=0.00..88186.22 rows=5459
> width=135) (actual time=4216.747..19250.891 rows=5828 loops=1)
>                      ->  Nested Loop Left Join  (cost=0.00..86657.26
> rows=5459 width=124) (actual time=4216.723..19206.461 rows=5828 loops=1)
>
> ... it seems like, if we get as far as the sort and the executors knows
> that there are less rows than the final offset, it ought to be able to
> skip the final sort.
>
> Is there some non-obvious reason which would make this kind of
> optimization difficult?  Doesn't the executor know at that point how
> many rows it has?
>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


--
------------
pasman

Re: Shortcutting too-large offsets?

From
Josh Berkus
Date:
Tom,

> In principle, yeah, we could make it do that, but it seems like a likely
> source of maintenance headaches.  This example is not exactly compelling
> enough to make me want to do it.  Large OFFSETs are always going to be
> problematic from a performance standpoint, and the fact that we could
> short-circuit this one corner case isn't really going to make them much
> more usable.

It's not that uncommon of a corner case though; it's one which happens
all the time in webapps which paginate.  People just have to ask for a
page after the end.  It's really a question of how simple the code to
make the optimization would be; if it would be a 5-line patch, then it's
worth it; if it would be a 110-line refactor, no.

Lemme see if I can figure it out ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Shortcutting too-large offsets?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> In principle, yeah, we could make it do that, but it seems like a likely
>> source of maintenance headaches.  This example is not exactly compelling
>> enough to make me want to do it.  Large OFFSETs are always going to be
>> problematic from a performance standpoint, and the fact that we could
>> short-circuit this one corner case isn't really going to make them much
>> more usable.

> It's not that uncommon of a corner case though; it's one which happens
> all the time in webapps which paginate.  People just have to ask for a
> page after the end.  It's really a question of how simple the code to
> make the optimization would be; if it would be a 5-line patch, then it's
> worth it; if it would be a 110-line refactor, no.

No, it's really a question of whether it's worth any lines at all,
and I remain unconvinced.  If someone asks for the last page, or any
page near the end, it'll take just about the same amount of time as
asking for a page past the end.  So any app like this is going to have
sucky performance, and kluging the corner case isn't going to help much.

            regards, tom lane

Re: Shortcutting too-large offsets?

From
Robert Haas
Date:
On Fri, Sep 30, 2011 at 2:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>>> In principle, yeah, we could make it do that, but it seems like a likely
>>> source of maintenance headaches.  This example is not exactly compelling
>>> enough to make me want to do it.  Large OFFSETs are always going to be
>>> problematic from a performance standpoint, and the fact that we could
>>> short-circuit this one corner case isn't really going to make them much
>>> more usable.
>
>> It's not that uncommon of a corner case though; it's one which happens
>> all the time in webapps which paginate.  People just have to ask for a
>> page after the end.  It's really a question of how simple the code to
>> make the optimization would be; if it would be a 5-line patch, then it's
>> worth it; if it would be a 110-line refactor, no.
>
> No, it's really a question of whether it's worth any lines at all,
> and I remain unconvinced.  If someone asks for the last page, or any
> page near the end, it'll take just about the same amount of time as
> asking for a page past the end.  So any app like this is going to have
> sucky performance, and kluging the corner case isn't going to help much.

It looks to me like it took 22.3 seconds to do the nested loop and
then 22.4 seconds to do the nested loop plus the sort.  So the sort
itself only took 100 ms, which is hardly worth getting excited about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company