Thread: Pagination - 1 or 2 queries?
Since you usually need to know the total number of rows a query would return, do you think it's better to: a) Do one query with a LIMIT and OFFSET to get the results, and another COUNT query to get the total number of rows? b) Do a single query without a LIMIT and OFFSET, then do a seek or similiar to get at the rows you want? Most tutorials, code, etc. I've seen do "a". The eclipse library does "b". __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
On Fri, 5 Sep 2003, CSN wrote: > Since you usually need to know the total number of > rows a query would return, do you think it's better > to: > > a) Do one query with a LIMIT and OFFSET to get the > results, and another COUNT query to get the total > number of rows? > > b) Do a single query without a LIMIT and OFFSET, then > do a seek or similiar to get at the rows you want? > > Most tutorials, code, etc. I've seen do "a". The > eclipse library does "b". Either way works. Does the eclipse library use a cursor, or grab the whole dataset and then seek on the client side? If it uses a cursor, I'd expect it to be the fastest and simplest implementation. Since a lot of libs are designed to work with MySQL, they often are written in the first method, where select count(*) is quite quick on MySQL, and MySQL doesn't have cursor support. With Postgresql, the cursor is likely to be the faster method.
--- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > On Fri, 5 Sep 2003, CSN wrote: > > > > > --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > > > On Fri, 5 Sep 2003, CSN wrote: > > > > > > > Since you usually need to know the total > number of > > > > rows a query would return, do you think it's > > > better > > > > to: > > > > > > > > a) Do one query with a LIMIT and OFFSET to get > the > > > > results, and another COUNT query to get the > total > > > > number of rows? > > > > > > > > b) Do a single query without a LIMIT and > OFFSET, > > > then > > > > do a seek or similiar to get at the rows you > want? > > > > > > > > > > > Most tutorials, code, etc. I've seen do "a". > The > > > > eclipse library does "b". > > > > > > Either way works. Does the eclipse library use > a > > > cursor, or grab the > > > whole dataset and then seek on the client side? > If > > > it uses a cursor, I'd > > > expect it to be the fastest and simplest > > > implementation. Since a lot of > > > libs are designed to work with MySQL, they often > are > > > written in the first > > > method, where select count(*) is quite quick on > > > MySQL, and MySQL doesn't > > > have cursor support. > > > > > > With Postgresql, the cursor is likely to be the > > > faster method. > > > > > > > Eclipse appears to just use > pg_fetch_array($result, > > $index). That'd be pretty similiar to a cursor > > wouldn't it? i.e. only the specified rows would be > > sent to the client (but all rows would be in the > > server's memory). > > > > Eclipse's docs make the argument that "b" is > better > > because "a" still needs to select/examine all rows > > before doing the LIMIT and OFFSET. > > If they aren't explicitly declaring a cursor, then b > isn't exactly the > same. If you do: > > select * from table order by fieldname > > then > > $row = pg_fetch_array() > > then the whole data set is returned to the client > (i.e. php) before we can > get the row. Now, if they do: > > begin; > declare bubba as cursor for select * from table > order by fieldname; > move forward 100 in bubba; > fetch 5 from bubba; > rollback; > > Then you get the same kind of effect, but only 5 > rows have to be retrieved > from the database to the client, and pg_fetch_array > will now iterate over > those 5 rows only, and then run dry, so to speak. > Ah, I think you're right ;). PG would get all the rows, then pass them all on to PHP, then scripts access whichever ones they want. With Mysql it looks like mysql_unbuffered_query would avoid sending all rows to PHP. PHP also has asynchronous methods for PG: pg_send_query, pg_get_result, etc. Not sure if they'd have a similiar effect, but probably better off using a cursor at any rate :). CSN __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
--- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > On Fri, 5 Sep 2003, CSN wrote: > > > Since you usually need to know the total number of > > rows a query would return, do you think it's > better > > to: > > > > a) Do one query with a LIMIT and OFFSET to get the > > results, and another COUNT query to get the total > > number of rows? > > > > b) Do a single query without a LIMIT and OFFSET, > then > > do a seek or similiar to get at the rows you want? > > > > > Most tutorials, code, etc. I've seen do "a". The > > eclipse library does "b". > > Either way works. Does the eclipse library use a > cursor, or grab the > whole dataset and then seek on the client side? If > it uses a cursor, I'd > expect it to be the fastest and simplest > implementation. Since a lot of > libs are designed to work with MySQL, they often are > written in the first > method, where select count(*) is quite quick on > MySQL, and MySQL doesn't > have cursor support. > > With Postgresql, the cursor is likely to be the > faster method. > Eclipse appears to just use pg_fetch_array($result, $index). That'd be pretty similiar to a cursor wouldn't it? i.e. only the specified rows would be sent to the client (but all rows would be in the server's memory). Eclipse's docs make the argument that "b" is better because "a" still needs to select/examine all rows before doing the LIMIT and OFFSET. http://www.students.cs.uu.nl/people/voostind/eclipse/api/index.html (PagedQuery) CSN __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
On Fri, 5 Sep 2003, CSN wrote: > > --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > > On Fri, 5 Sep 2003, CSN wrote: > > > > > Since you usually need to know the total number of > > > rows a query would return, do you think it's > > better > > > to: > > > > > > a) Do one query with a LIMIT and OFFSET to get the > > > results, and another COUNT query to get the total > > > number of rows? > > > > > > b) Do a single query without a LIMIT and OFFSET, > > then > > > do a seek or similiar to get at the rows you want? > > > > > > > > Most tutorials, code, etc. I've seen do "a". The > > > eclipse library does "b". > > > > Either way works. Does the eclipse library use a > > cursor, or grab the > > whole dataset and then seek on the client side? If > > it uses a cursor, I'd > > expect it to be the fastest and simplest > > implementation. Since a lot of > > libs are designed to work with MySQL, they often are > > written in the first > > method, where select count(*) is quite quick on > > MySQL, and MySQL doesn't > > have cursor support. > > > > With Postgresql, the cursor is likely to be the > > faster method. > > > > Eclipse appears to just use pg_fetch_array($result, > $index). That'd be pretty similiar to a cursor > wouldn't it? i.e. only the specified rows would be > sent to the client (but all rows would be in the > server's memory). > > Eclipse's docs make the argument that "b" is better > because "a" still needs to select/examine all rows > before doing the LIMIT and OFFSET. If they aren't explicitly declaring a cursor, then b isn't exactly the same. If you do: select * from table order by fieldname then $row = pg_fetch_array() then the whole data set is returned to the client (i.e. php) before we can get the row. Now, if they do: begin; declare bubba as cursor for select * from table order by fieldname; move forward 100 in bubba; fetch 5 from bubba; rollback; Then you get the same kind of effect, but only 5 rows have to be retrieved from the database to the client, and pg_fetch_array will now iterate over those 5 rows only, and then run dry, so to speak.
scott.marlowe wrote: > On Fri, 5 Sep 2003, CSN wrote: > > > Since you usually need to know the total number of > > rows a query would return, do you think it's better > > to: > > > > a) Do one query with a LIMIT and OFFSET to get the > > results, and another COUNT query to get the total > > number of rows? > > > > b) Do a single query without a LIMIT and OFFSET, then > > do a seek or similiar to get at the rows you want? > > > > Most tutorials, code, etc. I've seen do "a". The > > eclipse library does "b". > > Either way works. Does the eclipse library use a cursor, or grab the > whole dataset and then seek on the client side? If it uses a cursor, I'd > expect it to be the fastest and simplest implementation. Since a lot of > libs are designed to work with MySQL, they often are written in the first > method, where select count(*) is quite quick on MySQL, and MySQL doesn't > have cursor support. > > With Postgresql, the cursor is likely to be the faster method. I agree --- with a LIMIT and COUNT(*), you run the query twice. With a cursor, you run it once, and only pull the rows to the client you want. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Fri, 5 Sep 2003, CSN wrote: > > --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > > begin; > > declare bubba as cursor for select * from table > > order by fieldname; > > move forward 100 in bubba; > > fetch 5 from bubba; > > rollback; > > > > Then you get the same kind of effect, but only 5 > > rows have to be retrieved > > from the database to the client, and pg_fetch_array > > will now iterate over > > those 5 rows only, and then run dry, so to speak. > > Actually, with this method would you be able to get > the count of all rows that could be returned (not just > the 5)? Yes, you can use "fetch absolute count from cursorname" should work.
--- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > begin; > declare bubba as cursor for select * from table > order by fieldname; > move forward 100 in bubba; > fetch 5 from bubba; > rollback; > > Then you get the same kind of effect, but only 5 > rows have to be retrieved > from the database to the client, and pg_fetch_array > will now iterate over > those 5 rows only, and then run dry, so to speak. Actually, with this method would you be able to get the count of all rows that could be returned (not just the 5)? __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
CSN wrote: > > Behind the scenes, is there much performance > difference between: > > SELECT * > FROM table_with_millions_of_rows > ORDER BY col1; > > and: > > SELECT * > FROM table_with_millions_of_rows > ORDER BY col1 > LIMIT 100 > OFFSET 100000; Yes, the optimizer knows it doesn't have to execute the entire query, so the later can be faster, but that doesn't return the COUNT(*), as requested. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Behind the scenes, is there much performance difference between: SELECT * FROM table_with_millions_of_rows ORDER BY col1; and: SELECT * FROM table_with_millions_of_rows ORDER BY col1 LIMIT 100 OFFSET 100000; ? Wouldn't the second query would use far less memory? CSN --- Bruce Momjian <pgman@candle.pha.pa.us> wrote: > scott.marlowe wrote: > > On Fri, 5 Sep 2003, CSN wrote: > > > > > Since you usually need to know the total number > of > > > rows a query would return, do you think it's > better > > > to: > > > > > > a) Do one query with a LIMIT and OFFSET to get > the > > > results, and another COUNT query to get the > total > > > number of rows? > > > > > > b) Do a single query without a LIMIT and OFFSET, > then > > > do a seek or similiar to get at the rows you > want? > > > > > > Most tutorials, code, etc. I've seen do "a". The > > > eclipse library does "b". > > > > Either way works. Does the eclipse library use a > cursor, or grab the > > whole dataset and then seek on the client side? > If it uses a cursor, I'd > > expect it to be the fastest and simplest > implementation. Since a lot of > > libs are designed to work with MySQL, they often > are written in the first > > method, where select count(*) is quite quick on > MySQL, and MySQL doesn't > > have cursor support. > > > > With Postgresql, the cursor is likely to be the > faster method. > > I agree --- with a LIMIT and COUNT(*), you run the > query twice. With a > cursor, you run it once, and only pull the rows to > the client you want. > > -- > Bruce Momjian | > http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) > 359-1001 > + If your life is a hard drive, | 13 Roberts > Road > + Christ can be your backup. | Newtown > Square, Pennsylvania 19073 __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
On Fri, 5 Sep 2003, CSN wrote: > > --- "scott.marlowe" <scott.marlowe@ihs.com> wrote: > > begin; > > declare bubba as cursor for select * from table > > order by fieldname; > > move forward 100 in bubba; > > fetch 5 from bubba; > > rollback; > > > > Then you get the same kind of effect, but only 5 > > rows have to be retrieved > > from the database to the client, and pg_fetch_array > > will now iterate over > > those 5 rows only, and then run dry, so to speak. > > Actually, with this method would you be able to get > the count of all rows that could be returned (not just > the 5)? My previous one about using absolute count was wrong, btw, so you can either fetch forward all and get the count that returns or run select count(*). note that if you fetch forward all on a complex query, you may NOT be able to fetch backward all since cursors have a hard time going backwards on complex queries.