Thread: Query slows after offset of 100K
Hi all, I've been reading through the performance list of the last few months, and haven't been able to find a solution to my problemyet, so I'm posting the specifics here now. If anyone can suggest what might work (or point me to where this hasbeen covered before), that would be great. My current suspicion is that the shared_buffers setting is far too low. My query is as follows: SELECT o.objectid, o.objectname, o.isactive, o.modificationtime FROM object o WHERE ( o.deleted = false OR o.deleted IS NULL ) AND o.accountid = 111 ORDER BY 2 LIMIT 20 OFFSET 10000; The object table has primary key objectid, an index on objectname, and a unique constraint on ( accountid, objectname ). What I'm trying to do is show only 20 records to the user at a time, sorting on objectname, and the ones I display dependon the page they're on (that's why I've got LIMIT plus OFFSET, of course). When offsetting up to about 90K records, the EXPLAIN ANALYZE is similar to the following: Limit (cost=15357.06..15387.77 rows=20 width=35) (actual time=19.235..19.276 rows=20 loops=1) -> Index Scan using account_objectname on "object" o (cost=0.00..1151102.10 rows=749559 width=35) (actual time=0.086..14.981rows=10020 loops=1) Index Cond: (accountid = 354) Filter: ((NOT deleted) OR (deleted IS NULL)) Total runtime: 19.315 ms If I move the offset up to 100K records or higher, I get: Limit (cost=145636.26..145636.31 rows=20 width=35) (actual time=13524.327..13524.355 rows=20 loops=1) -> Sort (cost=145386.26..147260.16 rows=749559 width=35) (actual time=13409.216..13481.793 rows=100020 loops=1) Sort Key: objectname -> Seq Scan on "object" o (cost=0.00..16685.49 rows=749559 width=35) (actual time=0.011..1600.683 rows=749549loops=1) Filter: (((NOT deleted) OR (deleted IS NULL)) AND (accountid = 354)) Total runtime: 14452.374 ms That's a huge decrease in performance, and I'm wondering if there's a way around it. Right now there are about 750K records in the object table, and that number will only increase with time. I've already run a VACUUM FULL on the table and played with changing work_mem, but so far am not seeing any improvement. Are there any other settings I can change to get back to that super-fast index scan? Is the shared_buffers = 2000 settingway too low? The reason I haven't actually changed that setting is due to some system limitations, etc., that requiremore work than just a change in the config file. If I can get confirmation that this is a likely cause/solution,then I can get the extra changes made. I'm running a quad core 2.33GHz Xeon with 4GB memory (1.2GB free), using Postgres 8.1.11. Thanks, Michael Lorenz _________________________________________________________________ It's simple! Sell your car for just $30 at CarPoint.com.au http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fsecure%2Dau%2Eimrworldwide%2Ecom%2Fcgi%2Dbin%2Fa%2Fci%5F450304%2Fet%5F2%2Fcg%5F801459%2Fpi%5F1004813%2Fai%5F859641&_t=762955845&_r=tig_OCT07&_m=EXT
Michael Lorenz <mlorenz1@hotmail.com> writes: > My query is as follows: > SELECT o.objectid, o.objectname, o.isactive, o.modificationtime > FROM object o > WHERE ( o.deleted = false OR o.deleted IS NULL ) > AND o.accountid = 111 > ORDER BY 2 > LIMIT 20 OFFSET 10000; This is guaranteed to lose --- huge OFFSET values are never a good idea (hint: the database still has to fetch those rows it's skipping over). A saner way to do pagination is to remember the last key you displayed and do something like "WHERE key > $lastkey ORDER BY key LIMIT 20", which will allow the database to go directly to the desired rows, as long as you have an index on the key. You do need a unique ordering key for this to work, though. regards, tom lane
Fair enough, and I did think of this as well. However, I didn't think this was a viable option in my case, since we're currentlyallowing the user to randomly access the pages (so $lastkey wouldn't really have any meaning). The user can chooseto sort on object ID, name or modification time, and then go straight to any page in the list. With 750K records,that's around 37K pages. Maybe a better way to phrase my question is: how can I paginate my data on 3 different keys which allow random access toany given page, and still get reasonable performance? Should I just force the user to limit their result set to some givennumber of records before allowing any paginated access? Or is it just not practical, period? Thanks, Michael Lorenz ---------------------------------------- > To: mlorenz1@hotmail.com > CC: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Query slows after offset of 100K > Date: Thu, 14 Feb 2008 14:08:15 -0500 > From: tgl@sss.pgh.pa.us > > Michael Lorenz writes: >> My query is as follows: >> SELECT o.objectid, o.objectname, o.isactive, o.modificationtime >> FROM object o >> WHERE ( o.deleted = false OR o.deleted IS NULL ) >> AND o.accountid = 111 >> ORDER BY 2 >> LIMIT 20 OFFSET 10000; > > This is guaranteed to lose --- huge OFFSET values are never a good idea > (hint: the database still has to fetch those rows it's skipping over). > > A saner way to do pagination is to remember the last key you displayed > and do something like "WHERE key> $lastkey ORDER BY key LIMIT 20", > which will allow the database to go directly to the desired rows, > as long as you have an index on the key. You do need a unique ordering > key for this to work, though. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly _________________________________________________________________ Your Future Starts Here. Dream it? Then be it! Find it at www.seek.com.au http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Eseek%2Ecom%2Eau%2F%3Ftracking%3Dsk%3Ahet%3Ask%3Anine%3A0%3Ahot%3Atext&_t=764565661&_r=OCT07_endtext_Future&_m=EXT
Michael, Our application had a similar problem, and what we did to avoid having people click into the middle of 750k records was to show the first page with forward/back links but no link to go to the middle. So people could manually page forward as far as they want, but nobody is going to sit there clicking next 37k times. We have several thousand users and none of them complained about the change. Maybe it's because at the same time as we made that change we also improved the rest of the searching/filtering interface. But I think that really people don't need to jump to the middle of the records anyway as long as you have decent search abilities. If you wanted to keep your same GUI, one workaround would be to periodically update a table which maps "page number" to "first unique key on page". That decouples the expensive work to generate the page offsets from the query itself, so if your data changes fairly infrequently it might be appropriate. Sort of a materialized-view type approach. If you can be approximate in your GUI you can do a lot more with this optimization-- if people don't necessarily need to be able to go directly to page 372898 but instead would be satisfied with a page roughly 47% of the way into the massive result set (think of a GUI slider), then you wouldn't need to update the lookup table as often even if the data changed frequently, because adding a few thousand records to a 750k row result set is statistically insignificant, so your markers wouldn't need to be updated very frequently and you wouldn't need to store a marker for each page, maybe only 100 markers spread evenly across the result set would be sufficient. -- Mark Lewis On Thu, 2008-02-14 at 19:49 +0000, Michael Lorenz wrote: > Fair enough, and I did think of this as well. However, I didn't think this was a viable option in my case, since we'recurrently allowing the user to randomly access the pages (so $lastkey wouldn't really have any meaning). The user canchoose to sort on object ID, name or modification time, and then go straight to any page in the list. With 750K records,that's around 37K pages. > > Maybe a better way to phrase my question is: how can I paginate my data on 3 different keys which allow random accessto any given page, and still get reasonable performance? Should I just force the user to limit their result set tosome given number of records before allowing any paginated access? Or is it just not practical, period? > > Thanks, > Michael Lorenz > > ---------------------------------------- > > To: mlorenz1@hotmail.com > > CC: pgsql-performance@postgresql.org > > Subject: Re: [PERFORM] Query slows after offset of 100K > > Date: Thu, 14 Feb 2008 14:08:15 -0500 > > From: tgl@sss.pgh.pa.us > > > > Michael Lorenz writes: > >> My query is as follows: > >> SELECT o.objectid, o.objectname, o.isactive, o.modificationtime > >> FROM object o > >> WHERE ( o.deleted = false OR o.deleted IS NULL ) > >> AND o.accountid = 111 > >> ORDER BY 2 > >> LIMIT 20 OFFSET 10000; > > > > This is guaranteed to lose --- huge OFFSET values are never a good idea > > (hint: the database still has to fetch those rows it's skipping over). > > > > A saner way to do pagination is to remember the last key you displayed > > and do something like "WHERE key> $lastkey ORDER BY key LIMIT 20", > > which will allow the database to go directly to the desired rows, > > as long as you have an index on the key. You do need a unique ordering > > key for this to work, though. > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > _________________________________________________________________ > Your Future Starts Here. Dream it? Then be it! Find it at www.seek.com.au > http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Eseek%2Ecom%2Eau%2F%3Ftracking%3Dsk%3Ahet%3Ask%3Anine%3A0%3Ahot%3Atext&_t=764565661&_r=OCT07_endtext_Future&_m=EXT > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate
Michael Lorenz <mlorenz1@hotmail.com> writes: > Fair enough, and I did think of this as well. However, I didn't think this was a viable option in my case, since we'recurrently allowing the user to randomly access the pages (so $lastkey wouldn't really have any meaning). The user canchoose to sort on object ID, name or modification time, and then go straight to any page in the list. With 750K records,that's around 37K pages. Well, my first question is whether that user interface is actually useful to anyone. If you have a dictionary and you want to look up "foosball", do you start by guessing that it's on page 432? No, you look for the "F" tab. I'd suggest that what you want is to present links that let people go to specific spots in the key distribution, rather than expressing it in terms of so-many-pages. regards, tom lane
On Thu, 14 Feb 2008, Michael Lorenz wrote: > When offsetting up to about 90K records, the EXPLAIN ANALYZE is similar to the following: > Limit (cost=15357.06..15387.77 rows=20 width=35) (actual time=19.235..19.276 rows=20 loops=1) > -> Index Scan using account_objectname on "object" o (cost=0.00..1151102.10 rows=749559 width=35) (actual time=0.086..14.981rows=10020 loops=1) It looks like the planner thinks that index scan will have to go through 749559 rows, but there are actually only 10020 there. Is this table is getting ANALYZE'd usefully? VACUUM FULL doesn't do that. If the row estimates are so far off, that might explain why it thinks the index scan is going to be so huge it might as well just walk the whole thing. Actually, VACUUM FULL can be its own problem--you probably want a very regular VACUUM instead. > Is the shared_buffers = 2000 setting way too low? Quite; with 4GB of ram that could easily be 100,000+ instead. I wouldn't make that whole jump at once, but 2000 is only a mere 16MB of memory dedicated to the database. Also, be sure to set effective_cache_size to something reflective of your total memory minus application+OS as it also has an impact here; you've probably also got that set extremely low and if this server is mostly for PostgreSQL a good starting point would be something like 300000 (=2.4GB). > Are there any other settings I can change to get back to that super-fast > index scan? Well, you can try to turn off sequential scans for the query. You can test if that makes a difference like this: SET enable_seq_scan to off; EXPLAIN ANALYZE <x>; SET enable_seq_scan to on; It's also possible to tweak parameters like random_page_cost to similarly prefer indexes. Far better to fix the true underlying issues though (above and below). -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes: > On Thu, 14 Feb 2008, Michael Lorenz wrote: >> When offsetting up to about 90K records, the EXPLAIN ANALYZE is similar to the following: >> Limit (cost=15357.06..15387.77 rows=20 width=35) (actual time=19.235..19.276 rows=20 loops=1) >> -> Index Scan using account_objectname on "object" o (cost=0.00..1151102.10 rows=749559 width=35) (actual time=0.086..14.981rows=10020 loops=1) > It looks like the planner thinks that index scan will have to go through > 749559 rows, but there are actually only 10020 there. No, you have to be careful about that. The estimated rowcount is for the case where the plan node is run to completion, but when there's a LIMIT over it, it may not get run to completion. In this case the limit was satisfied after pulling 10020 rows from the indexscan, but we can't tell what fraction of the underlying scan was actually completed. regards, tom lane
On Thu, 14 Feb 2008, Michael Lorenz wrote: > When offsetting up to about 90K records, the EXPLAIN ANALYZE is similar to the following: > Limit (cost=15357.06..15387.77 rows=20 width=35) (actual time=19.235..19.276 rows=20 loops=1) > -> Index Scan using account_objectname on "object" o (cost=0.00..1151102.10 rows=749559 width=35) (actual time=0.086..14.981rows=10020 loops=1) > Index Cond: (accountid = 354) > Filter: ((NOT deleted) OR (deleted IS NULL)) > Total runtime: 19.315 ms Since this is scanning through 10,000 random rows in 19 milliseconds, I say all this data is already in the cache. If it wasn't, you'd be looking at 10,000 random seeks on disk, at about 7ms each, which is 70 seconds. Try dropping the OS caches (on Linux echo "1" >/proc/sys/vm/drop_caches) and see if the performance is worse. > If I move the offset up to 100K records or higher, I get: > Limit (cost=145636.26..145636.31 rows=20 width=35) (actual time=13524.327..13524.355 rows=20 loops=1) > -> Sort (cost=145386.26..147260.16 rows=749559 width=35) (actual time=13409.216..13481.793 rows=100020 loops=1) > Sort Key: objectname > -> Seq Scan on "object" o (cost=0.00..16685.49 rows=749559 width=35) (actual time=0.011..1600.683 rows=749549loops=1) > Filter: (((NOT deleted) OR (deleted IS NULL)) AND (accountid = 354)) > Total runtime: 14452.374 ms And here, it only takes 1.5 seconds to fetch the entire table from disc (or it's already in the cache or something), but 14 seconds to sort the whole lot in memory. In any case, Postgres is making a good choice - it's just that you have an unexpected benefit in the first case that the data is in cache. Setting the effective cache size correctly will help the planner in this case. Setting work_mem higher will improve the performance of the sort in the second case. Of course, what others have said about trying to avoid large offsets is good advice. You don't actually need a unique index, but it makes it simpler if you do. Matthew -- The early bird gets the worm. If you want something else for breakfast, get up later.