Re: selects with large offset really slow - Mailing list pgsql-general

From John Smith
Subject Re: selects with large offset really slow
Date
Msg-id 20030208005125.60785.qmail@web40701.mail.yahoo.com
Whole thread Raw
In response to selects with large offset really slow  (John Smith <john_smith_45678@yahoo.com>)
List pgsql-general

This is interesting:

explain analyze select * from links order by id limit 100 offset 90000;
Index Scan...Total runtime: 2951.00 msec

explain analyze select * from links order by id limit 100 offset 9000;
Index Scan...Total runtime: 232.00 msec

explain analyze select * from links order by id limit 100 offset 900;
Index Scan...Total runtime: 39.00 msec

explain analyze select * from links order by id limit 100 offset 90;
Index Scan...Total runtime: 6.00 msec

I don't get it - why's it take longer the larger the offset is? Disk seeks to find index entries?

John

 Dennis Gearon <gearond@cvc.net> wrote:

Seems like it should do intelligent retrieval of records for that stage of processing.

2/7/2003 12:54:21 PM, John Smith wrote:

>
>
> Date: Fri, 7 Feb 2003 12:54:21 -0800 (PST)
>
> From: John Smith
> Subject:Re: [GENERAL] selects with large offset really slow
> To: pgsql-general@postgresql.org
>
>
>
>
> Thanks, I'll try those suggestions. But...
>
>
> Why can't PG just use an index? Say, look at the index for 'url', go to
> entry 90000, then get the next 100 entries? I was suprised that it retrieves
> *all* records then sorts them (when there's already a sorted index). I'm
> trying to switch from mysql - the same exact query with it is very fast with
> 100-500K+ rows, and a large offset doesn't seem to affect t he query's speed.
>
> John
>
>
> Richard Huxton wrote:
>
> On Friday 07 Feb 2003 5:01 am, John Smith wrote:
> > There are 90K-100K records in each of two tables. This simple join is
> > really slow and the larger the offset, the longer it takes. Anything I
> can
> > do to speed it up (a lot)? I've double-checked and there are indexes on
> > everything used for joins and ordering.
>
> > QUERY PLAN
> > -------------------------------------------------------------------------
> --
> >-----------------------
> Limit (cost=19546.62..19546.87 rows=100 width=62)
> > (actual time=20557.00..20558.00 rows=100 loops=1)
> > -> Sort (cost=19321.62..19571.32 rows=99881 width=62) (actual
> > time=19775.00..20410.00 rows=90101 loops=1)
> > Sort Key: l.url
> > -> Hash Join (cost=2471.00..7662.54 rows=99881 width=62) (actual
> > time=3013.00..12002.00 rows=100000 loops=1) Hash Cond: ("outer".id =
> > "in ner".link_id)
>
> It's the join and sort that's getting you. PG has to fetch and sort all the
>
> rows so it can discard 90,000 of them. I can't think of a good way for it
> to
> optimise this, though you might want to check your sort_mem is set high
> enough.
>
> > explain analyze select l.id, l.url
> > from links l
> > inner join stats s
> > on l.id = s.link_id
> > and s.referrer_id = 1
> > order by l.url
> > limit 100
> > offset 90000;
>
> There are three options you might want to look at:
>
> 1. Use a temporary table, then select from that for each page.
> 2. Use a cursor, and just fetch 100 records at a time from it.
> 3. Cheat and fetch where l.url>=X, remembering X as the highest url f rom
> the
> last set of results. This of course means pages of results will overlap.
>
> --
> Richard Huxton
>
>
>
>
>
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now




Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now

pgsql-general by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: selects with large offset really slow
Next
From: John Smith
Date:
Subject: Re: selects with large offset really slow