Home > mailing lists

Re: Performance of count(*) - Mailing list pgsql-performance

From	Craig A. James
Subject	Re: Performance of count(*)
Date	March 22, 2007 15:17:12
Msg-id	46029DE3.9030505@modgraph-usa.com Whole thread Raw
In response to	Re: Performance of count(*) (Michael Stone <mstone+postgres@mathom.us>)
Responses	Re: Performance of count() Re: Performance of count()
List	pgsql-performance

Tree view

Michael Stone wrote:
> On Thu, Mar 22, 2007 at 01:30:35PM +0200, ismo.tuononen@solenovo.fi wrote:
>> approximated count?????
>>
>> why? who would need it? where you can use it?
>
> Do a google query. Look at the top of the page, where it says "results N
> to M of about O". For user interfaces (which is where a lot of this
> count(*) stuff comes from) you quite likely don't care about the exact
> count...

Right on, Michael.

One of our biggest single problems is this very thing.  It's not a Postgres problem specifically, but more embedded in
theidea of a relational database: There are no "job status" or "rough estimate of results" or "give me part of the
answer"features that are critical to many real applications. 

In our case (for a variety of reasons, but this one is critical), we actually can't use Postgres indexing at all -- we
wrotean entirely separate indexing system for our data, one that has the following properties: 

  1. It can give out "pages" of information (i.e. "rows 50-60") without
     rescanning the skipped pages the way "limit/offset" would.
  2. It can give accurate estimates of the total rows that will be returned.
  3. It can accurately estimate the time it will take.

For our primary business-critical data, Postgres is merely a storage system, not a search system, because we have to do
the"heavy lifting" in our own code.  (To be fair, there is no relational database that can handle our data.) 

Many or most web-based search engines face these exact problems.

Craig

pgsql-performance by date:

From: Andreas Kostyrka
Date: 22 March 2007, 15:16:36
Subject: Re: Performance of count(*)

From: Tino Wildenhain
Date: 22 March 2007, 15:31:59
Subject: Re: Performance of count(*)

Re: Performance of count(*) - Mailing list pgsql-performance

Previous

Next