Re: getting count for a specific querry - Mailing list pgsql-sql
| From | Joel Fradkin | 
|---|---|
| Subject | Re: getting count for a specific querry | 
| Date | |
| Msg-id | 004b01c53c72$5e4c6390$797ba8c0@jfradkin Whole thread Raw | 
| In response to | Re: getting count for a specific querry (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Responses | Re: getting count for a specific querry Re: getting count for a specific querry | 
| List | pgsql-sql | 
As always thanks Tom,
I will definitely look at what I can do.
Since it is a count of matched condition records I may not have a way
around.
I don't think my clients would like me to aprox as it is a count of their
records. What I plan on doing assuming I can get all my other problems fixed
(as mentioned I am going to try and get paid help to see if I goofed it up
some where) is make the count a button, so they don't wait everytime, but
can choose to wait if need be, maybe I can store the last count with a count
on day for the generic search it defaults to, and just have them do a count
on demand if they have a specific query. Our screens have several criteria
fields in each application.
Joel Fradkin
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, April 08, 2005 2:28 PM
To: Bob Henkel
Cc: Scott Marlowe; Joel Fradkin; Andrew Sullivan; pgsql-sql@postgresql.org
Subject: Re: [SQL] getting count for a specific querry
Bob Henkel <luckyratfoot@gmail.com> writes:
> From a simple/high level perspective why is this? That is why can't
> PostgreSQL do aggregates as well across large chunks of data. I'm
> assuming it extremely complicated. Otherwise the folks around here
> would have churned out a fix in a month or less and made this issue a
> past story.
You can find very detailed discussions of this in the archives, but
the basic reason is that we have a very general/extensible view of
aggregates (which is how come we can support custom aggregates).
An aggregate is a function that you feed all the input rows to, one
at a time, and then it produces the answer.  Nice, general, extensible,
and not at all optimizable :-(
Now in general that is the only way to do it, and so Scott's implication
that we always suck compared to other databases is really an
overstatement.  Ask another database to do a standard deviation
calculation, for instance, and it'll be just as slow.  However there are
special cases that other DBs can optimize that we don't even try to.
The big ones are:
* COUNT(*) across a whole table --- most non-MVCC databases keep tabs of
the physical number of the rows in the table, and so they can answer
this very quickly.  Postgres doesn't keep such a count, and under MVCC
rules it wouldn't necessarily be the right answer if we had it.
(BTW, count of rows satisfying a particular condition is a different
ballgame entirely; in most cases that can't be optimized at all, AFAIK.)
If you are willing to accept approximate answers there are various
tricks you can use --- see the archives --- but we don't get to fudge
on COUNT(*) itself because it's in the SQL standard.
* MIN or MAX of an indexed column --- most DBs can use an index scan to
find such a row relatively quickly, although whether this trick works or
not depends a whole lot on whether you have WHERE or GROUP BY and just
what those conditions look like.
You can fake the min/max answer in Postgres by doing the transformstion
to an indexable query by hand, for instance instead of MAX(col) doSELECT col FROM tab ORDER BY col DESC LIMIT 1;
There are periodic discussions in the hackers list about teaching the
planner to do that automatically, and it will probably happen someday;
but it's a complicated task and not exceedingly high on the priority list.
        regards, tom lane