Re: Slow count(*) again... - Mailing list pgsql-performance
From | david@lang.hm |
---|---|
Subject | Re: Slow count(*) again... |
Date | |
Msg-id | alpine.DEB.2.00.1010120846410.21889@asgard.lang.hm Whole thread Raw |
In response to | Re: Slow count(*) again... (Joe Uhl <joeuhl@gmail.com>) |
List | pgsql-performance |
On Tue, 12 Oct 2010, Joe Uhl wrote: >>> The biggest single problem with "select count(*)" is that it is >>> seriously overused. People use that idiom to establish existence, which >>> usually leads to a performance disaster in the application using it, >>> unless the table has no more than few hundred records. SQL language, of >>> which PostgreSQL offers an excellent implementation, offers [NOT] >>> EXISTS clause since its inception in the Jurassic era. The problem is >>> with the sequential scan, not with counting. I'd even go as far as to >>> suggest that 99% instances of the "select count(*)" idiom are probably >>> bad use of the SQL language. >> >> I agree, I have seen many very bad examples of using count(*). I will go so >> far as to question the use of count(*) in my examples here. It there a >> better >> way to come up with a page list than using count(*)? What is the best >> method >> to make a page of results and a list of links to other pages of results? Am >> I >> barking up the wrong tree here? > One way I have dealt with this on very large tables is to cache the count(*) > at the application level (using memcached, terracotta, or something along > those lines) and then increment that cache whenever you add a row to the > relevant table. On application restart that cache is re-initialized with a > regular old count(*). This approach works really well and all large systems > in my experience need caching in front of the DB eventually. If you have a > simpler system with say a single application/web server you can simply store > the value in a variable, the specifics would depend on the language and > framework you are using. this works if you know ahead of time what the criteria of the search is going to be. so it will work for select count(*) from table; what this won't work for is cases wher the criteria of the search is unpredictable, i.e. ask the user for input select count(*) from table where field=$input; David Lang > Another more all-DB approach is to create a statistics tables into which you > place aggregated statistics rows (num deleted, num inserted, totals, etc) at > an appropriate time interval in your code. So you have rows containing > aggregated statistics information for the past and some tiny portion of the > new data happening right now that hasn't yet been aggregated. Queries then > look like a summation of the aggregated values in the statistics table plus a > count(*) over just the newest portion of the data table and are generally > very fast. > > Overall I have found that once things get big the layers of your app stack > start to blend together and have to be combined in clever ways to keep speed > up. Postgres is a beast but when you run into things it can't do well just > find a way to cache it or make it work together with some other persistence > tech to handle those cases. > > >
pgsql-performance by date: