Thread: COUNT & Pagination

COUNT & Pagination

From
David Shadovitz
Date:
I understand that COUNT queries are expensive.  So I'm looking for advice on
displaying paginated query results.

I display my query results like this:

  Displaying 1 to 50 of 2905.
  1-50 | 51-100 | 101-150 | etc.

I do this by executing two queries.  One is of the form:

  SELECT <select list> FROM <view/table list> WHERE <filter> LIMIT m OFFSET n

The other is identical except that I replace the select list with COUNT(*).

I'm looking for suggestions to replace that COUNT query.  I cannot use the
method of storing the number of records in a separate table because my queries
(a) involve joins, and (b) have a WHERE clause.

And an unrelated question:
I'm running PG 7.2.2 and want to upgrade to 7.4.1.  I've never upgraded PG
before and I'm nervous.  Can I simply run pg_dumpall, install 7.4.1, and then
feed the dump into psql?  I'm planning to use pg_dumpall rather than pg_dump
because I want to preserve the users I've defined.  My database is the only one
on the system.

Thanks.
-David (who would love to go to Bruce Momjian's boot camp)

Re: COUNT & Pagination

From
Jeff Fitzmyers
Date:
> So I'm looking for advice on displaying paginated query results.
> Displaying 1 to 50 of 2905.
>   1-50 | 51-100 | 101-150 | etc.
>
> I do this by executing two queries.  One is of the form:
>   SELECT <select list> FROM <view/table list> WHERE <filter> LIMIT m
> OFFSET n
> The other is identical except that I replace the select list with
> COUNT(*).

This is the only way I know of how to do it.

> I'm running PG 7.2.2 and want to upgrade to 7.4.1.  I've never
> upgraded PG
> before and I'm nervous.  Can I simply run pg_dumpall, install 7.4.1,
> and then
> feed the dump into psql?

I would practice and play with it on another machine until you can do
it easily. You will learn a lot and the experience might prove
invaluable in may ways :-)

Jeff



Re: COUNT & Pagination

From
Christopher Kings-Lynne
Date:
> I understand that COUNT queries are expensive.  So I'm looking for advice on
> displaying paginated query results.
>
> I display my query results like this:
>
>   Displaying 1 to 50 of 2905.
>   1-50 | 51-100 | 101-150 | etc.
>
> I do this by executing two queries.  One is of the form:
>
>   SELECT <select list> FROM <view/table list> WHERE <filter> LIMIT m OFFSET n
>
> The other is identical except that I replace the select list with COUNT(*).
>
> I'm looking for suggestions to replace that COUNT query.  I cannot use the
> method of storing the number of records in a separate table because my queries
> (a) involve joins, and (b) have a WHERE clause.

Well, on all my sites, I do what you do and just live with it :P  You
can investigate using cursors however (DECLARE, MOVE & FETCH)

> And an unrelated question:
> I'm running PG 7.2.2 and want to upgrade to 7.4.1.  I've never upgraded PG
> before and I'm nervous.  Can I simply run pg_dumpall, install 7.4.1, and then
> feed the dump into psql?  I'm planning to use pg_dumpall rather than pg_dump
> because I want to preserve the users I've defined.  My database is the only one
> on the system.

I recommend something like this:

-- disable access to your database to make sure you have a complete dump

-- run dump as database owner account
su pgsql (or whatever your postgres user is)

-- do compressed dump
pg_dumpall > backup.sql

-- backup old data dir
mv /usr/local/pgsql/data /usr/local/pgsql/data.7.2

-- remove old postgres, install new
-- run NEW initdb.  replace latin1 with your encoding
-- -W specifies a superuser password
initdb -D /usr/local/pgsql/data -E LATIN1 -W

-- restore dump, watching output VERY CAREFULLY:
-- (run as pgsql user again)
psql template1 < backup.sql > log.txt
-- Watch stderr very carefully to check any errors that might occur.

-- If restore fails, re-initdb and re-restore

Chris


Re: COUNT & Pagination

From
Richard Huxton
Date:
On Sunday 11 January 2004 18:10, David Shadovitz wrote:
> I understand that COUNT queries are expensive.  So I'm looking for advice
> on displaying paginated query results.
>
> I display my query results like this:
>
>   Displaying 1 to 50 of 2905.
>   1-50 | 51-100 | 101-150 | etc.
>
> I do this by executing two queries.

If you only need the count when you've got the results, most PG client
interfaces will tell you how many rows you've got. What language is your app
in?

--
  Richard Huxton
  Archonet Ltd

Re: COUNT & Pagination

From
David Shadovitz
Date:
> If you only need the count when you've got the results, most PG client
> interfaces will tell you how many rows you've got. What language is your app
> in?

PHP.
But I have only a subset of the results, retrieved via a query with a "LIMIT
<m>" clause, so $pg_numrows is m.
And retrieving all results (i.e. no LIMIT) is at least as expensive as
COUNT(*).

-David

Re: COUNT & Pagination

From
Robert Treat
Date:
On Mon, 2004-01-12 at 10:37, David Shadovitz wrote:
> > If you only need the count when you've got the results, most PG client
> > interfaces will tell you how many rows you've got. What language is your app
> > in?
>
> PHP.
> But I have only a subset of the results, retrieved via a query with a "LIMIT
> <m>" clause, so $pg_numrows is m.
> And retrieving all results (i.e. no LIMIT) is at least as expensive as
> COUNT(*).
>

Depending on frequency of updates and need for real time info, you could
cache the count in session as long as the user stays within the given
piece of your app.

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: COUNT & Pagination

From
mallah@trade-india.com
Date:
> I understand that COUNT queries are expensive.  So I'm looking for advice
> on
> displaying paginated query results.
>
> I display my query results like this:
>
>   Displaying 1 to 50 of 2905.
>   1-50 | 51-100 | 101-150 | etc.
>
> I do this by executing two queries.  One is of the form:
>
>   SELECT <select list> FROM <view/table list> WHERE <filter> LIMIT m
> OFFSET n
>
> The other is identical except that I replace the select list with
> COUNT(*).
>
> I'm looking for suggestions to replace that COUNT query.



We avert the subsequent execution of count(*) by passing the
value of cout(*) as a query parameter through the link in page
numbers. This works for us.

This ofcourse assumes that that the number of rows matching the
Where clause does not changes while the user is viewing the search
results.

Hope it helps.

Regds
Mallah.




I cannot use the
> method of storing the number of records in a separate table because my
> queries
> (a) involve joins, and (b) have a WHERE clause.
>
> And an unrelated question:
> I'm running PG 7.2.2 and want to upgrade to 7.4.1.  I've never upgraded PG
> before and I'm nervous.  Can I simply run pg_dumpall, install 7.4.1, and
> then
> feed the dump into psql?  I'm planning to use pg_dumpall rather than
> pg_dump
> because I want to preserve the users I've defined.  My database is the
> only one
> on the system.
>
> Thanks.
> -David (who would love to go to Bruce Momjian's boot camp)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>
>


Re: COUNT & Pagination

From
"David Shadovitz"
Date:
> We avert the subsequent execution of count(*) by passing the
> value of count(*) as a query parameter through the link in page
> numbers.

Mallah, and others who mentioned caching the record count:

Yes, I will certainly do this.  I can detect whether the query's filter has
been changed, or whether the user is merely paging through the results or
sorting* the results.

I'd love to completely eliminate the cost of the COUNT(*) query, but I guess
that I cannot have everything.

* My HTML table column headers are hyperlinks which re-execute the query,
sorting the results by the selected column.  The first click does an ASC
sort; a second click does a DESC sort.

Thanks.
-David

Re: COUNT & Pagination

From
"scott.marlowe"
Date:
On Tue, 13 Jan 2004, David Shadovitz wrote:

> > We avert the subsequent execution of count(*) by passing the
> > value of count(*) as a query parameter through the link in page
> > numbers.
>
> Mallah, and others who mentioned caching the record count:
>
> Yes, I will certainly do this.  I can detect whether the query's filter has
> been changed, or whether the user is merely paging through the results or
> sorting* the results.
>
> I'd love to completely eliminate the cost of the COUNT(*) query, but I guess
> that I cannot have everything.
>
> * My HTML table column headers are hyperlinks which re-execute the query,
> sorting the results by the selected column.  The first click does an ASC
> sort; a second click does a DESC sort.

another useful trick is to have your script save out the count(*) result
in a single row table with a timestamp, and every time you grab if, check
to see if x number of minutes have passed, and if so, update that row with
a count(*).  You can even have a cron job do it so your own scripts don't
incur the cost of the count(*) and delay output to the user.



Re: COUNT & Pagination

From
Rajesh Kumar Mallah
Date:
scott.marlowe wrote:
On Tue, 13 Jan 2004, David Shadovitz wrote:
 
We avert the subsequent execution of count(*) by passing the
value of count(*) as a query parameter through the link in page
numbers.     
Mallah, and others who mentioned caching the record count:

Yes, I will certainly do this.  I can detect whether the query's filter has 
been changed, or whether the user is merely paging through the results or 
sorting* the results.

I'd love to completely eliminate the cost of the COUNT(*) query, but I guess 
that I cannot have everything.

* My HTML table column headers are hyperlinks which re-execute the query, 
sorting the results by the selected column.  The first click does an ASC 
sort; a second click does a DESC sort.   
another useful trick is to have your script save out the count(*) result 
in a single row table with a timestamp, and every time you grab if, check 
to see if x number of minutes have passed, and if so, update that row with 
a count(*). 

Greetings!

The count(*) can get evaluated with any arbitrary combination
in whre clause how do you plan to store that information ?

In a typical application pagination could be required in n number
of contexts . I would be interested to know more about this trick
and its applicability in such situations.

Offtopic:

Does PostgreSQL optimise repeated execution of similar queries ie
queries on same table or set of tables (in a join) with same where clause
 and only differing in LIMIT and OFFSET.

I dont know much about MySQL, Is their "Query Cache" achieving
better results in such cases? and do we have anything similar in
PostgreSQL ? I think  the most recently accessed tables  anyways
get loaded in shared buffers in PostgreSQL  so that its not accessed
from the disk. But is the "Query Cache" really different from this.
Can anyone knowing a little better about the working of MySQLs'
query cache throw some light?

Regds
Mallah.

 You can even have a cron job do it so your own scripts don't 
incur the cost of the count(*) and delay output to the user.



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate     subscribe-nomail command to majordomo@postgresql.org so that your     message can get through to the mailing list cleanly
 

Re: COUNT & Pagination

From
"scott.marlowe"
Date:
On Wed, 14 Jan 2004, Rajesh Kumar Mallah wrote:

> scott.marlowe wrote:
>
> >On Tue, 13 Jan 2004, David Shadovitz wrote:
> >
> >
> >
> >>>We avert the subsequent execution of count(*) by passing the
> >>>value of count(*) as a query parameter through the link in page
> >>>numbers.
> >>>
> >>>
> >>Mallah, and others who mentioned caching the record count:
> >>
> >>Yes, I will certainly do this.  I can detect whether the query's filter has
> >>been changed, or whether the user is merely paging through the results or
> >>sorting* the results.
> >>
> >>I'd love to completely eliminate the cost of the COUNT(*) query, but I guess
> >>that I cannot have everything.
> >>
> >>* My HTML table column headers are hyperlinks which re-execute the query,
> >>sorting the results by the selected column.  The first click does an ASC
> >>sort; a second click does a DESC sort.
> >>
> >>
> >
> >another useful trick is to have your script save out the count(*) result
> >in a single row table with a timestamp, and every time you grab if, check
> >to see if x number of minutes have passed, and if so, update that row with
> >a count(*).
> >
>
> Greetings!
>
> The count(*) can get evaluated with any arbitrary combination
> in whre clause how do you plan to store that information ?
>
> In a typical application pagination could be required in n number
> of contexts . I would be interested to know more about this trick
> and its applicability in such situations.
>
> Offtopic:
>
> Does PostgreSQL optimise repeated execution of similar queries ie
> queries on same table or set of tables (in a join) with same where clause
>  and only differing in LIMIT and OFFSET.

Yes, and no.

Yes, previously run query should be faster, if it fits in kernel cache.

No, Postgresql doesn't cache any previous results or plans (unless you use
prepare / execute, then it only caches the plan, not the query results).

Plus, the design of Postgresql is such that it would have to do a LOT of
cache checking to see if there were any updates to the underlying data
between selects.  Since such queries are unlikely to be repeated inside a
transaction, the only place where you wouldn't have to check for new
tuples, it's not really worth trying to implement.

Keep in mind most databases can use an index on max(*) because each
aggregate is programmed by hand to do one thing.  In Postgresql, you can
create your own aggregate, and since there's no simple way to make
aggregates use indexes in the general sense, it's not likely to get
optimized.  I.e. any optimization for JUST max(*)/min(*) is unlikely
unless it can be used for the other aggregates.



Re: COUNT & Pagination

From
CoL
Date:
Hi,

David Shadovitz wrote, On 1/11/2004 7:10 PM:

> I understand that COUNT queries are expensive.  So I'm looking for advice on
> displaying paginated query results.
>
> I display my query results like this:
>
>   Displaying 1 to 50 of 2905.
>   1-50 | 51-100 | 101-150 | etc.
>
> I do this by executing two queries.  One is of the form:
>
>   SELECT <select list> FROM <view/table list> WHERE <filter> LIMIT m OFFSET n
>
> The other is identical except that I replace the select list with COUNT(*).
yes, you need 2 query. Or select it from one:
select *, (select count(*) from table) as count from table...

pg will optimize this query, and do the count only once

>
> And an unrelated question:
> I'm running PG 7.2.2 and want to upgrade to 7.4.1.  I've never upgraded PG
> before and I'm nervous.  Can I simply run pg_dumpall, install 7.4.1, and then
> feed the dump into psql?  I'm planning to use pg_dumpall rather than pg_dump
> because I want to preserve the users I've defined.  My database is the only one
> on the system.

yes. But check tha faq and the manual for a better explain.

C.

Re: COUNT & Pagination

From
Neil Conway
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> Yes, previously run query should be faster, if it fits in kernel
> cache.

Or the PostgreSQL buffer cache.

> Plus, the design of Postgresql is such that it would have to do a
> LOT of cache checking to see if there were any updates to the
> underlying data between selects.

Last I checked (which was a while ago, admittedly), the MySQL design
completely purges the query cache for a relation whenever that
relation is mentioned in an INSERT, UPDATE, or DELETE. When this was
discussed (check the -hackers archives for more), IIRC the consensus
was that it's not worth implementing it if we can't do better than
that.

-Neil