Re: query with offset stops using index scan - Mailing list pgsql-general
From | Stanislav Raskin |
---|---|
Subject | Re: query with offset stops using index scan |
Date | |
Msg-id | E1KZpQG-0002AE-00@teena.zerebecki.de Whole thread Raw |
In response to | Re: query with offset stops using index scan ("Scott Marlowe" <scott.marlowe@gmail.com>) |
List | pgsql-general |
> If there's a chance to upgrade to 8.3 please do so. I am aware of the benefits with 8.3, but such an upgrade would require quite some changes in our application, including introduction of explicit casting mechanisms. We are going to do so sooner or later, but right now we need to focus on other stuff. > What's happening here is that the query > planner is switching plans because it thinks the sequential scan and > sort are cheaper. and at some point it will likely be right. Thank you very much for pointing out the issue. I am still a bit puzzled, because there are only about 2000 data sets in this table. It's not like we're handling millions of rows. Clustering on the id did the trick. Now the planner always chooses to use the index. Unfortunately, I cannot use multi-column indexes here, because the expressions in the WHERE statement can vary quite strongly, depending on user input. Cursors are not an option, because it is a web application, meaning that I have to use a new connection for basically every HTTP request. The "where id between x and x+y" is indeed much, much faster, but it presumes the knowledge of x and y, which is not the case, because serial ids are not necessarily continuous (i.e. if some data sets were deleted). -----Ursprüngliche Nachricht----- Von: Scott Marlowe [mailto:scott.marlowe@gmail.com] Gesendet: Sonntag, 31. August 2008 17:26 An: Stanislav Raskin Cc: pgsql-general@postgresql.org Betreff: Re: [GENERAL] query with offset stops using index scan On Sun, Aug 31, 2008 at 7:14 AM, Stanislav Raskin <sr@brainswell.de> wrote: > Hello everybody, > > Now, if I increase OFFSET slowly, it works all the same way, until OFFSET > reaches the value of 750. Then, the planner refuses to use an index scan and > does a plain seq scan+sort, which makes the query about 10-20 times slower: > > I use 8.1.4, and I did a vacuum full analyze before running the queries. If there's a chance to upgrade to 8.3 please do so. While 8.1 was a solid reliable workhorse of a database, there's been a lot of work done in general for better performance and more features. It likely won't fix this one problem, but it's often smarter about corner cases in query plans than 8.1 so it's worth looking into. Now back to your problem. What's happening here is that the query planner is switching plans because it thinks the sequential scan and sort are cheaper. and at some point it will likely be right. That's because a random page cost is much higher than a sequential page cost. So at some point, say when you're grabbing 2% to 25% of a table, it will switch to sequential scans. Now, if the data is all cached, then it's still quicker to do the index scan further along than to use a seq scan and a sort. Unless your table is clustered to the index you're sorting on, a Seq scan will almost always win if you need the whole table. However, you may be in a position where a multi-column index and clustering on id will allow you to run this offset higher. It's still a poor performer for large chunks of large tables. first cluster on the primary key id, then create a three column index for (active, valid_until, locked) Note that the order should be from the most choosey to least choosey column, generally. So assuming only a tiny percentage of records meet valid_until, make it the first column, and so forth. A query like: select active, count(active) from table group by active; will give you an idea there. In the long run if you want good performance on larger data sets (i.e. higher offset numbers) you'll likely need to switch to either cursors, or using "where id between x and x+y" or lookup tables, or something like that.
pgsql-general by date: