Thread: COUNT & Pagination
I understand that COUNT queries are expensive. So I'm looking for advice on displaying paginated query results. I display my query results like this: Displaying 1 to 50 of 2905. 1-50 | 51-100 | 101-150 | etc. I do this by executing two queries. One is of the form: SELECT <select list> FROM <view/table list> WHERE <filter> LIMIT m OFFSET n The other is identical except that I replace the select list with COUNT(*). I'm looking for suggestions to replace that COUNT query. I cannot use the method of storing the number of records in a separate table because my queries (a) involve joins, and (b) have a WHERE clause. And an unrelated question: I'm running PG 7.2.2 and want to upgrade to 7.4.1. I've never upgraded PG before and I'm nervous. Can I simply run pg_dumpall, install 7.4.1, and then feed the dump into psql? I'm planning to use pg_dumpall rather than pg_dump because I want to preserve the users I've defined. My database is the only one on the system. Thanks. -David (who would love to go to Bruce Momjian's boot camp)
> So I'm looking for advice on displaying paginated query results. > Displaying 1 to 50 of 2905. > 1-50 | 51-100 | 101-150 | etc. > > I do this by executing two queries. One is of the form: > SELECT <select list> FROM <view/table list> WHERE <filter> LIMIT m > OFFSET n > The other is identical except that I replace the select list with > COUNT(*). This is the only way I know of how to do it. > I'm running PG 7.2.2 and want to upgrade to 7.4.1. I've never > upgraded PG > before and I'm nervous. Can I simply run pg_dumpall, install 7.4.1, > and then > feed the dump into psql? I would practice and play with it on another machine until you can do it easily. You will learn a lot and the experience might prove invaluable in may ways :-) Jeff
> I understand that COUNT queries are expensive. So I'm looking for advice on > displaying paginated query results. > > I display my query results like this: > > Displaying 1 to 50 of 2905. > 1-50 | 51-100 | 101-150 | etc. > > I do this by executing two queries. One is of the form: > > SELECT <select list> FROM <view/table list> WHERE <filter> LIMIT m OFFSET n > > The other is identical except that I replace the select list with COUNT(*). > > I'm looking for suggestions to replace that COUNT query. I cannot use the > method of storing the number of records in a separate table because my queries > (a) involve joins, and (b) have a WHERE clause. Well, on all my sites, I do what you do and just live with it :P You can investigate using cursors however (DECLARE, MOVE & FETCH) > And an unrelated question: > I'm running PG 7.2.2 and want to upgrade to 7.4.1. I've never upgraded PG > before and I'm nervous. Can I simply run pg_dumpall, install 7.4.1, and then > feed the dump into psql? I'm planning to use pg_dumpall rather than pg_dump > because I want to preserve the users I've defined. My database is the only one > on the system. I recommend something like this: -- disable access to your database to make sure you have a complete dump -- run dump as database owner account su pgsql (or whatever your postgres user is) -- do compressed dump pg_dumpall > backup.sql -- backup old data dir mv /usr/local/pgsql/data /usr/local/pgsql/data.7.2 -- remove old postgres, install new -- run NEW initdb. replace latin1 with your encoding -- -W specifies a superuser password initdb -D /usr/local/pgsql/data -E LATIN1 -W -- restore dump, watching output VERY CAREFULLY: -- (run as pgsql user again) psql template1 < backup.sql > log.txt -- Watch stderr very carefully to check any errors that might occur. -- If restore fails, re-initdb and re-restore Chris
On Sunday 11 January 2004 18:10, David Shadovitz wrote: > I understand that COUNT queries are expensive. So I'm looking for advice > on displaying paginated query results. > > I display my query results like this: > > Displaying 1 to 50 of 2905. > 1-50 | 51-100 | 101-150 | etc. > > I do this by executing two queries. If you only need the count when you've got the results, most PG client interfaces will tell you how many rows you've got. What language is your app in? -- Richard Huxton Archonet Ltd
> If you only need the count when you've got the results, most PG client > interfaces will tell you how many rows you've got. What language is your app > in? PHP. But I have only a subset of the results, retrieved via a query with a "LIMIT <m>" clause, so $pg_numrows is m. And retrieving all results (i.e. no LIMIT) is at least as expensive as COUNT(*). -David
On Mon, 2004-01-12 at 10:37, David Shadovitz wrote: > > If you only need the count when you've got the results, most PG client > > interfaces will tell you how many rows you've got. What language is your app > > in? > > PHP. > But I have only a subset of the results, retrieved via a query with a "LIMIT > <m>" clause, so $pg_numrows is m. > And retrieving all results (i.e. no LIMIT) is at least as expensive as > COUNT(*). > Depending on frequency of updates and need for real time info, you could cache the count in session as long as the user stays within the given piece of your app. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
> I understand that COUNT queries are expensive. So I'm looking for advice > on > displaying paginated query results. > > I display my query results like this: > > Displaying 1 to 50 of 2905. > 1-50 | 51-100 | 101-150 | etc. > > I do this by executing two queries. One is of the form: > > SELECT <select list> FROM <view/table list> WHERE <filter> LIMIT m > OFFSET n > > The other is identical except that I replace the select list with > COUNT(*). > > I'm looking for suggestions to replace that COUNT query. We avert the subsequent execution of count(*) by passing the value of cout(*) as a query parameter through the link in page numbers. This works for us. This ofcourse assumes that that the number of rows matching the Where clause does not changes while the user is viewing the search results. Hope it helps. Regds Mallah. I cannot use the > method of storing the number of records in a separate table because my > queries > (a) involve joins, and (b) have a WHERE clause. > > And an unrelated question: > I'm running PG 7.2.2 and want to upgrade to 7.4.1. I've never upgraded PG > before and I'm nervous. Can I simply run pg_dumpall, install 7.4.1, and > then > feed the dump into psql? I'm planning to use pg_dumpall rather than > pg_dump > because I want to preserve the users I've defined. My database is the > only one > on the system. > > Thanks. > -David (who would love to go to Bruce Momjian's boot camp) > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > >
> We avert the subsequent execution of count(*) by passing the > value of count(*) as a query parameter through the link in page > numbers. Mallah, and others who mentioned caching the record count: Yes, I will certainly do this. I can detect whether the query's filter has been changed, or whether the user is merely paging through the results or sorting* the results. I'd love to completely eliminate the cost of the COUNT(*) query, but I guess that I cannot have everything. * My HTML table column headers are hyperlinks which re-execute the query, sorting the results by the selected column. The first click does an ASC sort; a second click does a DESC sort. Thanks. -David
On Tue, 13 Jan 2004, David Shadovitz wrote: > > We avert the subsequent execution of count(*) by passing the > > value of count(*) as a query parameter through the link in page > > numbers. > > Mallah, and others who mentioned caching the record count: > > Yes, I will certainly do this. I can detect whether the query's filter has > been changed, or whether the user is merely paging through the results or > sorting* the results. > > I'd love to completely eliminate the cost of the COUNT(*) query, but I guess > that I cannot have everything. > > * My HTML table column headers are hyperlinks which re-execute the query, > sorting the results by the selected column. The first click does an ASC > sort; a second click does a DESC sort. another useful trick is to have your script save out the count(*) result in a single row table with a timestamp, and every time you grab if, check to see if x number of minutes have passed, and if so, update that row with a count(*). You can even have a cron job do it so your own scripts don't incur the cost of the count(*) and delay output to the user.
scott.marlowe wrote:
Greetings!
The count(*) can get evaluated with any arbitrary combination
in whre clause how do you plan to store that information ?
In a typical application pagination could be required in n number
of contexts . I would be interested to know more about this trick
and its applicability in such situations.
Offtopic:
Does PostgreSQL optimise repeated execution of similar queries ie
queries on same table or set of tables (in a join) with same where clause
and only differing in LIMIT and OFFSET.
I dont know much about MySQL, Is their "Query Cache" achieving
better results in such cases? and do we have anything similar in
PostgreSQL ? I think the most recently accessed tables anyways
get loaded in shared buffers in PostgreSQL so that its not accessed
from the disk. But is the "Query Cache" really different from this.
Can anyone knowing a little better about the working of MySQLs'
query cache throw some light?
Regds
Mallah.
On Tue, 13 Jan 2004, David Shadovitz wrote:We avert the subsequent execution of count(*) by passing the value of count(*) as a query parameter through the link in page numbers.Mallah, and others who mentioned caching the record count: Yes, I will certainly do this. I can detect whether the query's filter has been changed, or whether the user is merely paging through the results or sorting* the results. I'd love to completely eliminate the cost of the COUNT(*) query, but I guess that I cannot have everything. * My HTML table column headers are hyperlinks which re-execute the query, sorting the results by the selected column. The first click does an ASC sort; a second click does a DESC sort.another useful trick is to have your script save out the count(*) result in a single row table with a timestamp, and every time you grab if, check to see if x number of minutes have passed, and if so, update that row with a count(*).
Greetings!
The count(*) can get evaluated with any arbitrary combination
in whre clause how do you plan to store that information ?
In a typical application pagination could be required in n number
of contexts . I would be interested to know more about this trick
and its applicability in such situations.
Offtopic:
Does PostgreSQL optimise repeated execution of similar queries ie
queries on same table or set of tables (in a join) with same where clause
and only differing in LIMIT and OFFSET.
I dont know much about MySQL, Is their "Query Cache" achieving
better results in such cases? and do we have anything similar in
PostgreSQL ? I think the most recently accessed tables anyways
get loaded in shared buffers in PostgreSQL so that its not accessed
from the disk. But is the "Query Cache" really different from this.
Can anyone knowing a little better about the working of MySQLs'
query cache throw some light?
Regds
Mallah.
You can even have a cron job do it so your own scripts don't incur the cost of the count(*) and delay output to the user. ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
On Wed, 14 Jan 2004, Rajesh Kumar Mallah wrote: > scott.marlowe wrote: > > >On Tue, 13 Jan 2004, David Shadovitz wrote: > > > > > > > >>>We avert the subsequent execution of count(*) by passing the > >>>value of count(*) as a query parameter through the link in page > >>>numbers. > >>> > >>> > >>Mallah, and others who mentioned caching the record count: > >> > >>Yes, I will certainly do this. I can detect whether the query's filter has > >>been changed, or whether the user is merely paging through the results or > >>sorting* the results. > >> > >>I'd love to completely eliminate the cost of the COUNT(*) query, but I guess > >>that I cannot have everything. > >> > >>* My HTML table column headers are hyperlinks which re-execute the query, > >>sorting the results by the selected column. The first click does an ASC > >>sort; a second click does a DESC sort. > >> > >> > > > >another useful trick is to have your script save out the count(*) result > >in a single row table with a timestamp, and every time you grab if, check > >to see if x number of minutes have passed, and if so, update that row with > >a count(*). > > > > Greetings! > > The count(*) can get evaluated with any arbitrary combination > in whre clause how do you plan to store that information ? > > In a typical application pagination could be required in n number > of contexts . I would be interested to know more about this trick > and its applicability in such situations. > > Offtopic: > > Does PostgreSQL optimise repeated execution of similar queries ie > queries on same table or set of tables (in a join) with same where clause > and only differing in LIMIT and OFFSET. Yes, and no. Yes, previously run query should be faster, if it fits in kernel cache. No, Postgresql doesn't cache any previous results or plans (unless you use prepare / execute, then it only caches the plan, not the query results). Plus, the design of Postgresql is such that it would have to do a LOT of cache checking to see if there were any updates to the underlying data between selects. Since such queries are unlikely to be repeated inside a transaction, the only place where you wouldn't have to check for new tuples, it's not really worth trying to implement. Keep in mind most databases can use an index on max(*) because each aggregate is programmed by hand to do one thing. In Postgresql, you can create your own aggregate, and since there's no simple way to make aggregates use indexes in the general sense, it's not likely to get optimized. I.e. any optimization for JUST max(*)/min(*) is unlikely unless it can be used for the other aggregates.
Hi, David Shadovitz wrote, On 1/11/2004 7:10 PM: > I understand that COUNT queries are expensive. So I'm looking for advice on > displaying paginated query results. > > I display my query results like this: > > Displaying 1 to 50 of 2905. > 1-50 | 51-100 | 101-150 | etc. > > I do this by executing two queries. One is of the form: > > SELECT <select list> FROM <view/table list> WHERE <filter> LIMIT m OFFSET n > > The other is identical except that I replace the select list with COUNT(*). yes, you need 2 query. Or select it from one: select *, (select count(*) from table) as count from table... pg will optimize this query, and do the count only once > > And an unrelated question: > I'm running PG 7.2.2 and want to upgrade to 7.4.1. I've never upgraded PG > before and I'm nervous. Can I simply run pg_dumpall, install 7.4.1, and then > feed the dump into psql? I'm planning to use pg_dumpall rather than pg_dump > because I want to preserve the users I've defined. My database is the only one > on the system. yes. But check tha faq and the manual for a better explain. C.
"scott.marlowe" <scott.marlowe@ihs.com> writes: > Yes, previously run query should be faster, if it fits in kernel > cache. Or the PostgreSQL buffer cache. > Plus, the design of Postgresql is such that it would have to do a > LOT of cache checking to see if there were any updates to the > underlying data between selects. Last I checked (which was a while ago, admittedly), the MySQL design completely purges the query cache for a relation whenever that relation is mentioned in an INSERT, UPDATE, or DELETE. When this was discussed (check the -hackers archives for more), IIRC the consensus was that it's not worth implementing it if we can't do better than that. -Neil