Thread: Pagination - 1 or 2 queries?

Pagination - 1 or 2 queries?

From
CSN
Date:
Since you usually need to know the total number of
rows a query would return, do you think it's better
to:

a) Do one query with a LIMIT and OFFSET to get the
results, and another COUNT query to get the total
number of rows?

b) Do a single query without a LIMIT and OFFSET, then
do a seek or similiar to get at the rows you want?

Most tutorials, code, etc. I've seen do "a". The
eclipse library does "b".


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Re: Pagination - 1 or 2 queries?

From
"scott.marlowe"
Date:
On Fri, 5 Sep 2003, CSN wrote:

> Since you usually need to know the total number of
> rows a query would return, do you think it's better
> to:
>
> a) Do one query with a LIMIT and OFFSET to get the
> results, and another COUNT query to get the total
> number of rows?
>
> b) Do a single query without a LIMIT and OFFSET, then
> do a seek or similiar to get at the rows you want?
>
> Most tutorials, code, etc. I've seen do "a". The
> eclipse library does "b".

Either way works.  Does the eclipse library use a cursor, or grab the
whole dataset and then seek on the client side?  If it uses a cursor, I'd
expect it to be the fastest and simplest implementation.  Since a lot of
libs are designed to work with MySQL, they often are written in the first
method, where select count(*) is quite quick on MySQL, and MySQL doesn't
have cursor support.

With Postgresql, the cursor is likely to be the faster method.


Re: Pagination - 1 or 2 queries?

From
CSN
Date:
--- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> On Fri, 5 Sep 2003, CSN wrote:
>
> >
> > --- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> > > On Fri, 5 Sep 2003, CSN wrote:
> > >
> > > > Since you usually need to know the total
> number of
> > > > rows a query would return, do you think it's
> > > better
> > > > to:
> > > >
> > > > a) Do one query with a LIMIT and OFFSET to get
> the
> > > > results, and another COUNT query to get the
> total
> > > > number of rows?
> > > >
> > > > b) Do a single query without a LIMIT and
> OFFSET,
> > > then
> > > > do a seek or similiar to get at the rows you
> want?
> > >
> > > >
> > > > Most tutorials, code, etc. I've seen do "a".
> The
> > > > eclipse library does "b".
> > >
> > > Either way works.  Does the eclipse library use
> a
> > > cursor, or grab the
> > > whole dataset and then seek on the client side?
> If
> > > it uses a cursor, I'd
> > > expect it to be the fastest and simplest
> > > implementation.  Since a lot of
> > > libs are designed to work with MySQL, they often
> are
> > > written in the first
> > > method, where select count(*) is quite quick on
> > > MySQL, and MySQL doesn't
> > > have cursor support.
> > >
> > > With Postgresql, the cursor is likely to be the
> > > faster method.
> > >
> >
> > Eclipse appears to just use
> pg_fetch_array($result,
> > $index). That'd be pretty similiar to a cursor
> > wouldn't it? i.e. only the specified rows would be
> > sent to the client (but all rows would be in the
> > server's memory).
> >
> > Eclipse's docs make the argument that "b" is
> better
> > because "a" still needs to select/examine all rows
> > before doing the LIMIT and OFFSET.
>
> If they aren't explicitly declaring a cursor, then b
> isn't exactly the
> same.  If you do:
>
> select * from table order by fieldname
>
> then
>
> $row = pg_fetch_array()
>
> then the whole data set is returned to the client
> (i.e. php) before we can
> get the row.  Now, if they do:
>
> begin;
> declare bubba as cursor for select * from table
> order by fieldname;
> move forward 100 in bubba;
> fetch 5 from bubba;
> rollback;
>
> Then you get the same kind of effect, but only 5
> rows have to be retrieved
> from the database to the client, and pg_fetch_array
> will now iterate over
> those 5 rows only, and then run dry, so to speak.
>


Ah, I think you're right ;). PG would get all the
rows, then pass them all on to PHP, then scripts
access whichever ones they want.

With Mysql it looks like mysql_unbuffered_query would
avoid sending all rows to PHP.

PHP also has asynchronous methods for PG:
pg_send_query, pg_get_result, etc. Not sure if they'd
have a similiar effect, but probably better off using
a cursor at any rate :).

CSN



__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Re: Pagination - 1 or 2 queries?

From
CSN
Date:
--- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> On Fri, 5 Sep 2003, CSN wrote:
>
> > Since you usually need to know the total number of
> > rows a query would return, do you think it's
> better
> > to:
> >
> > a) Do one query with a LIMIT and OFFSET to get the
> > results, and another COUNT query to get the total
> > number of rows?
> >
> > b) Do a single query without a LIMIT and OFFSET,
> then
> > do a seek or similiar to get at the rows you want?
>
> >
> > Most tutorials, code, etc. I've seen do "a". The
> > eclipse library does "b".
>
> Either way works.  Does the eclipse library use a
> cursor, or grab the
> whole dataset and then seek on the client side?  If
> it uses a cursor, I'd
> expect it to be the fastest and simplest
> implementation.  Since a lot of
> libs are designed to work with MySQL, they often are
> written in the first
> method, where select count(*) is quite quick on
> MySQL, and MySQL doesn't
> have cursor support.
>
> With Postgresql, the cursor is likely to be the
> faster method.
>

Eclipse appears to just use pg_fetch_array($result,
$index). That'd be pretty similiar to a cursor
wouldn't it? i.e. only the specified rows would be
sent to the client (but all rows would be in the
server's memory).

Eclipse's docs make the argument that "b" is better
because "a" still needs to select/examine all rows
before doing the LIMIT and OFFSET.

http://www.students.cs.uu.nl/people/voostind/eclipse/api/index.html
(PagedQuery)

CSN



__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Re: Pagination - 1 or 2 queries?

From
"scott.marlowe"
Date:
On Fri, 5 Sep 2003, CSN wrote:

>
> --- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> > On Fri, 5 Sep 2003, CSN wrote:
> >
> > > Since you usually need to know the total number of
> > > rows a query would return, do you think it's
> > better
> > > to:
> > >
> > > a) Do one query with a LIMIT and OFFSET to get the
> > > results, and another COUNT query to get the total
> > > number of rows?
> > >
> > > b) Do a single query without a LIMIT and OFFSET,
> > then
> > > do a seek or similiar to get at the rows you want?
> >
> > >
> > > Most tutorials, code, etc. I've seen do "a". The
> > > eclipse library does "b".
> >
> > Either way works.  Does the eclipse library use a
> > cursor, or grab the
> > whole dataset and then seek on the client side?  If
> > it uses a cursor, I'd
> > expect it to be the fastest and simplest
> > implementation.  Since a lot of
> > libs are designed to work with MySQL, they often are
> > written in the first
> > method, where select count(*) is quite quick on
> > MySQL, and MySQL doesn't
> > have cursor support.
> >
> > With Postgresql, the cursor is likely to be the
> > faster method.
> >
>
> Eclipse appears to just use pg_fetch_array($result,
> $index). That'd be pretty similiar to a cursor
> wouldn't it? i.e. only the specified rows would be
> sent to the client (but all rows would be in the
> server's memory).
>
> Eclipse's docs make the argument that "b" is better
> because "a" still needs to select/examine all rows
> before doing the LIMIT and OFFSET.

If they aren't explicitly declaring a cursor, then b isn't exactly the
same.  If you do:

select * from table order by fieldname

then

$row = pg_fetch_array()

then the whole data set is returned to the client (i.e. php) before we can
get the row.  Now, if they do:

begin;
declare bubba as cursor for select * from table order by fieldname;
move forward 100 in bubba;
fetch 5 from bubba;
rollback;

Then you get the same kind of effect, but only 5 rows have to be retrieved
from the database to the client, and pg_fetch_array will now iterate over
those 5 rows only, and then run dry, so to speak.


Re: Pagination - 1 or 2 queries?

From
Bruce Momjian
Date:
scott.marlowe wrote:
> On Fri, 5 Sep 2003, CSN wrote:
>
> > Since you usually need to know the total number of
> > rows a query would return, do you think it's better
> > to:
> >
> > a) Do one query with a LIMIT and OFFSET to get the
> > results, and another COUNT query to get the total
> > number of rows?
> >
> > b) Do a single query without a LIMIT and OFFSET, then
> > do a seek or similiar to get at the rows you want?
> >
> > Most tutorials, code, etc. I've seen do "a". The
> > eclipse library does "b".
>
> Either way works.  Does the eclipse library use a cursor, or grab the
> whole dataset and then seek on the client side?  If it uses a cursor, I'd
> expect it to be the fastest and simplest implementation.  Since a lot of
> libs are designed to work with MySQL, they often are written in the first
> method, where select count(*) is quite quick on MySQL, and MySQL doesn't
> have cursor support.
>
> With Postgresql, the cursor is likely to be the faster method.

I agree --- with a LIMIT and COUNT(*), you run the query twice.  With a
cursor, you run it once, and only pull the rows to the client you want.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Pagination - 1 or 2 queries?

From
"scott.marlowe"
Date:
On Fri, 5 Sep 2003, CSN wrote:

>
> --- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> > begin;
> > declare bubba as cursor for select * from table
> > order by fieldname;
> > move forward 100 in bubba;
> > fetch 5 from bubba;
> > rollback;
> >
> > Then you get the same kind of effect, but only 5
> > rows have to be retrieved
> > from the database to the client, and pg_fetch_array
> > will now iterate over
> > those 5 rows only, and then run dry, so to speak.
>
> Actually, with this method would you be able to get
> the count of all rows that could be returned (not just
> the 5)?

Yes, you can use "fetch absolute count from cursorname" should work.


Re: Pagination - 1 or 2 queries?

From
CSN
Date:
--- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> begin;
> declare bubba as cursor for select * from table
> order by fieldname;
> move forward 100 in bubba;
> fetch 5 from bubba;
> rollback;
>
> Then you get the same kind of effect, but only 5
> rows have to be retrieved
> from the database to the client, and pg_fetch_array
> will now iterate over
> those 5 rows only, and then run dry, so to speak.

Actually, with this method would you be able to get
the count of all rows that could be returned (not just
the 5)?



__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Re: Pagination - 1 or 2 queries?

From
Bruce Momjian
Date:
CSN wrote:
>
> Behind the scenes, is there much performance
> difference between:
>
> SELECT *
> FROM table_with_millions_of_rows
> ORDER BY col1;
>
> and:
>
> SELECT *
> FROM table_with_millions_of_rows
> ORDER BY col1
> LIMIT 100
> OFFSET 100000;

Yes, the optimizer knows it doesn't have to execute the entire query, so
the later can be faster, but that doesn't return the COUNT(*), as
requested.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Pagination - 1 or 2 queries?

From
CSN
Date:
Behind the scenes, is there much performance
difference between:

SELECT *
FROM table_with_millions_of_rows
ORDER BY col1;

and:

SELECT *
FROM table_with_millions_of_rows
ORDER BY col1
LIMIT 100
OFFSET 100000;

?

Wouldn't the second query would use far less memory?


CSN


--- Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> scott.marlowe wrote:
> > On Fri, 5 Sep 2003, CSN wrote:
> >
> > > Since you usually need to know the total number
> of
> > > rows a query would return, do you think it's
> better
> > > to:
> > >
> > > a) Do one query with a LIMIT and OFFSET to get
> the
> > > results, and another COUNT query to get the
> total
> > > number of rows?
> > >
> > > b) Do a single query without a LIMIT and OFFSET,
> then
> > > do a seek or similiar to get at the rows you
> want?
> > >
> > > Most tutorials, code, etc. I've seen do "a". The
> > > eclipse library does "b".
> >
> > Either way works.  Does the eclipse library use a
> cursor, or grab the
> > whole dataset and then seek on the client side?
> If it uses a cursor, I'd
> > expect it to be the fastest and simplest
> implementation.  Since a lot of
> > libs are designed to work with MySQL, they often
> are written in the first
> > method, where select count(*) is quite quick on
> MySQL, and MySQL doesn't
> > have cursor support.
> >
> > With Postgresql, the cursor is likely to be the
> faster method.
>
> I agree --- with a LIMIT and COUNT(*), you run the
> query twice.  With a
> cursor, you run it once, and only pull the rows to
> the client you want.
>
> --
>   Bruce Momjian                        |
> http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610)
> 359-1001
>   +  If your life is a hard drive,     |  13 Roberts
> Road
>   +  Christ can be your backup.        |  Newtown
> Square, Pennsylvania 19073


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Re: Pagination - 1 or 2 queries?

From
"scott.marlowe"
Date:
On Fri, 5 Sep 2003, CSN wrote:

>
> --- "scott.marlowe" <scott.marlowe@ihs.com> wrote:
> > begin;
> > declare bubba as cursor for select * from table
> > order by fieldname;
> > move forward 100 in bubba;
> > fetch 5 from bubba;
> > rollback;
> >
> > Then you get the same kind of effect, but only 5
> > rows have to be retrieved
> > from the database to the client, and pg_fetch_array
> > will now iterate over
> > those 5 rows only, and then run dry, so to speak.
>
> Actually, with this method would you be able to get
> the count of all rows that could be returned (not just
> the 5)?

My previous one about using absolute count was wrong, btw, so you can
either fetch forward all and get the count that returns or run select
count(*).  note that if you fetch forward all on a complex query, you may
NOT be able to fetch backward all since cursors have a hard time going
backwards on complex queries.