Re: Slow count(*) again... - Mailing list pgsql-performance

From david@lang.hm
Subject Re: Slow count(*) again...
Date
Msg-id alpine.DEB.2.00.1010120846410.21889@asgard.lang.hm
Whole thread Raw
In response to Re: Slow count(*) again...  (Joe Uhl <joeuhl@gmail.com>)
List pgsql-performance
On Tue, 12 Oct 2010, Joe Uhl wrote:

>>> The biggest single problem with "select count(*)" is that it is
>>> seriously overused. People use that idiom to establish existence, which
>>> usually leads to a performance disaster in the application using it,
>>> unless the table has no more than few hundred records. SQL language, of
>>> which PostgreSQL offers an excellent implementation,  offers [NOT]
>>> EXISTS clause since its inception in the Jurassic era. The problem is
>>> with the sequential scan, not with counting. I'd even go as far as to
>>> suggest that 99% instances of the "select count(*)" idiom are probably
>>> bad use of the SQL language.
>>
>> I agree, I have seen many very bad examples of using count(*). I will go so
>> far as to question the use of count(*) in my examples here. It there a
>> better
>> way to come up with a page list than using count(*)? What is the best
>> method
>> to make a page of results and a list of links to other pages of results? Am
>> I
>> barking up the wrong tree here?
> One way I have dealt with this on very large tables is to cache the count(*)
> at the application level (using memcached, terracotta, or something along
> those lines) and then increment that cache whenever you add a row to the
> relevant table.  On application restart that cache is re-initialized with a
> regular old count(*).  This approach works really well and all large systems
> in my experience need caching in front of the DB eventually.  If you have a
> simpler system with say a single application/web server you can simply store
> the value in a variable, the specifics would depend on the language and
> framework you are using.

this works if you know ahead of time what the criteria of the search is
going to be.

so it will work for

select count(*) from table;

what this won't work for is cases wher the criteria of the search is
unpredictable, i.e.

ask the user for input

select count(*) from table where field=$input;

David Lang

> Another more all-DB approach is to create a statistics tables into which you
> place aggregated statistics rows (num deleted, num inserted, totals, etc) at
> an appropriate time interval in your code.  So you have rows containing
> aggregated statistics information for the past and some tiny portion of the
> new data happening right now that hasn't yet been aggregated.  Queries then
> look like a summation of the aggregated values in the statistics table plus a
> count(*) over just the newest portion of the data table and are generally
> very fast.
>
> Overall I have found that once things get big the layers of your app stack
> start to blend together and have to be combined in clever ways to keep speed
> up.  Postgres is a beast but when you run into things it can't do well just
> find a way to cache it or make it work together with some other persistence
> tech to handle those cases.
>
>
>

pgsql-performance by date:

Previous
From: Dan Harris
Date:
Subject: Re: Slow count(*) again...
Next
From: david@lang.hm
Date:
Subject: Re: Slow count(*) again...