Thread: Shortcutting too-large offsets?
All, Here's a case which it seems like we ought to be able to optimize for: datamart-# ORDER BY txn_timestamp DESC datamart-# LIMIT 200 datamart-# OFFSET 6000; QUERY PLAN --------------------------- Limit (cost=560529.82..560529.82 rows=1 width=145) (actual time=22419.760..22419.760 rows=0 loops=1) -> Sort (cost=560516.17..560529.82 rows=5459 width=145) (actual time=22418.076..22419.144 rows=5828 loops=1) Sort Key: lh.txn_timestamp Sort Method: quicksort Memory: 1744kB -> Nested Loop Left Join (cost=0.00..560177.32 rows=5459 width=145) (actual time=4216.898..22398.658 rows=5828 loops=1) -> Nested Loop Left Join (cost=0.00..88186.22 rows=5459 width=135) (actual time=4216.747..19250.891 rows=5828 loops=1) -> Nested Loop Left Join (cost=0.00..86657.26 rows=5459 width=124) (actual time=4216.723..19206.461 rows=5828 loops=1) ... it seems like, if we get as far as the sort and the executors knows that there are less rows than the final offset, it ought to be able to skip the final sort. Is there some non-obvious reason which would make this kind of optimization difficult? Doesn't the executor know at that point how many rows it has? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Here's a case which it seems like we ought to be able to optimize for: > [ offset skips all the output of a sort node ] > Is there some non-obvious reason which would make this kind of > optimization difficult? Doesn't the executor know at that point how > many rows it has? In principle, yeah, we could make it do that, but it seems like a likely source of maintenance headaches. This example is not exactly compelling enough to make me want to do it. Large OFFSETs are always going to be problematic from a performance standpoint, and the fact that we could short-circuit this one corner case isn't really going to make them much more usable. regards, tom lane
It may be difficult, i think. When unsorted recordset is stored in temp table, number of records may be saved and used. Otherwise it is unknown. 2011/9/30, Josh Berkus <josh@agliodbs.com>: > All, > > Here's a case which it seems like we ought to be able to optimize for: > > datamart-# ORDER BY txn_timestamp DESC > datamart-# LIMIT 200 > datamart-# OFFSET 6000; > > QUERY PLAN > > --------------------------- > Limit (cost=560529.82..560529.82 rows=1 width=145) (actual > time=22419.760..22419.760 rows=0 loops=1) > -> Sort (cost=560516.17..560529.82 rows=5459 width=145) (actual > time=22418.076..22419.144 rows=5828 loops=1) > Sort Key: lh.txn_timestamp > Sort Method: quicksort Memory: 1744kB > -> Nested Loop Left Join (cost=0.00..560177.32 rows=5459 > width=145) (actual time=4216.898..22398.658 rows=5828 loops=1) > -> Nested Loop Left Join (cost=0.00..88186.22 rows=5459 > width=135) (actual time=4216.747..19250.891 rows=5828 loops=1) > -> Nested Loop Left Join (cost=0.00..86657.26 > rows=5459 width=124) (actual time=4216.723..19206.461 rows=5828 loops=1) > > ... it seems like, if we get as far as the sort and the executors knows > that there are less rows than the final offset, it ought to be able to > skip the final sort. > > Is there some non-obvious reason which would make this kind of > optimization difficult? Doesn't the executor know at that point how > many rows it has? > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- ------------ pasman
Tom, > In principle, yeah, we could make it do that, but it seems like a likely > source of maintenance headaches. This example is not exactly compelling > enough to make me want to do it. Large OFFSETs are always going to be > problematic from a performance standpoint, and the fact that we could > short-circuit this one corner case isn't really going to make them much > more usable. It's not that uncommon of a corner case though; it's one which happens all the time in webapps which paginate. People just have to ask for a page after the end. It's really a question of how simple the code to make the optimization would be; if it would be a 5-line patch, then it's worth it; if it would be a 110-line refactor, no. Lemme see if I can figure it out ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >> In principle, yeah, we could make it do that, but it seems like a likely >> source of maintenance headaches. This example is not exactly compelling >> enough to make me want to do it. Large OFFSETs are always going to be >> problematic from a performance standpoint, and the fact that we could >> short-circuit this one corner case isn't really going to make them much >> more usable. > It's not that uncommon of a corner case though; it's one which happens > all the time in webapps which paginate. People just have to ask for a > page after the end. It's really a question of how simple the code to > make the optimization would be; if it would be a 5-line patch, then it's > worth it; if it would be a 110-line refactor, no. No, it's really a question of whether it's worth any lines at all, and I remain unconvinced. If someone asks for the last page, or any page near the end, it'll take just about the same amount of time as asking for a page past the end. So any app like this is going to have sucky performance, and kluging the corner case isn't going to help much. regards, tom lane
On Fri, Sep 30, 2011 at 2:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Josh Berkus <josh@agliodbs.com> writes: >>> In principle, yeah, we could make it do that, but it seems like a likely >>> source of maintenance headaches. This example is not exactly compelling >>> enough to make me want to do it. Large OFFSETs are always going to be >>> problematic from a performance standpoint, and the fact that we could >>> short-circuit this one corner case isn't really going to make them much >>> more usable. > >> It's not that uncommon of a corner case though; it's one which happens >> all the time in webapps which paginate. People just have to ask for a >> page after the end. It's really a question of how simple the code to >> make the optimization would be; if it would be a 5-line patch, then it's >> worth it; if it would be a 110-line refactor, no. > > No, it's really a question of whether it's worth any lines at all, > and I remain unconvinced. If someone asks for the last page, or any > page near the end, it'll take just about the same amount of time as > asking for a page past the end. So any app like this is going to have > sucky performance, and kluging the corner case isn't going to help much. It looks to me like it took 22.3 seconds to do the nested loop and then 22.4 seconds to do the nested loop plus the sort. So the sort itself only took 100 ms, which is hardly worth getting excited about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company