Re: getting count for a specific querry - Mailing list pgsql-sql
From | Joel Fradkin |
---|---|
Subject | Re: getting count for a specific querry |
Date | |
Msg-id | 004b01c53c72$5e4c6390$797ba8c0@jfradkin Whole thread Raw |
In response to | Re: getting count for a specific querry (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: getting count for a specific querry
Re: getting count for a specific querry |
List | pgsql-sql |
As always thanks Tom, I will definitely look at what I can do. Since it is a count of matched condition records I may not have a way around. I don't think my clients would like me to aprox as it is a count of their records. What I plan on doing assuming I can get all my other problems fixed (as mentioned I am going to try and get paid help to see if I goofed it up some where) is make the count a button, so they don't wait everytime, but can choose to wait if need be, maybe I can store the last count with a count on day for the generic search it defaults to, and just have them do a count on demand if they have a specific query. Our screens have several criteria fields in each application. Joel Fradkin -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, April 08, 2005 2:28 PM To: Bob Henkel Cc: Scott Marlowe; Joel Fradkin; Andrew Sullivan; pgsql-sql@postgresql.org Subject: Re: [SQL] getting count for a specific querry Bob Henkel <luckyratfoot@gmail.com> writes: > From a simple/high level perspective why is this? That is why can't > PostgreSQL do aggregates as well across large chunks of data. I'm > assuming it extremely complicated. Otherwise the folks around here > would have churned out a fix in a month or less and made this issue a > past story. You can find very detailed discussions of this in the archives, but the basic reason is that we have a very general/extensible view of aggregates (which is how come we can support custom aggregates). An aggregate is a function that you feed all the input rows to, one at a time, and then it produces the answer. Nice, general, extensible, and not at all optimizable :-( Now in general that is the only way to do it, and so Scott's implication that we always suck compared to other databases is really an overstatement. Ask another database to do a standard deviation calculation, for instance, and it'll be just as slow. However there are special cases that other DBs can optimize that we don't even try to. The big ones are: * COUNT(*) across a whole table --- most non-MVCC databases keep tabs of the physical number of the rows in the table, and so they can answer this very quickly. Postgres doesn't keep such a count, and under MVCC rules it wouldn't necessarily be the right answer if we had it. (BTW, count of rows satisfying a particular condition is a different ballgame entirely; in most cases that can't be optimized at all, AFAIK.) If you are willing to accept approximate answers there are various tricks you can use --- see the archives --- but we don't get to fudge on COUNT(*) itself because it's in the SQL standard. * MIN or MAX of an indexed column --- most DBs can use an index scan to find such a row relatively quickly, although whether this trick works or not depends a whole lot on whether you have WHERE or GROUP BY and just what those conditions look like. You can fake the min/max answer in Postgres by doing the transformstion to an indexable query by hand, for instance instead of MAX(col) doSELECT col FROM tab ORDER BY col DESC LIMIT 1; There are periodic discussions in the hackers list about teaching the planner to do that automatically, and it will probably happen someday; but it's a complicated task and not exceedingly high on the priority list. regards, tom lane