Thread: Make COUNT(*) Faster?
Hello all you PostgreSQL/SQL gurus! I've started using PostgreSQL pretty recently, and I am quite disturbed about the performance of a simple SELECT COUNT(*) FROM table. What should (in my mind) be a nearly instantaneous operation instead takes nearly 700ms in a table with only 87k rows of data! If I run an EXPLAIN on this query I can see that it is doing a sequential scan, which seems quite needless, as surely this information is cached in some secret location. It is very possible that I am missing something, so I ask you: is there a faster way to find out how many rows are in a table? I've tried doing a COUNT(column) where I have an index on column, but it still does a sequential scan and it is still very very slow. What are my options? I offer you many thanks in advance, Varun Mehta
On Thu, Jul 07, 2005 at 03:48:39PM -0700, Varun Mehta wrote: > > I've started using PostgreSQL pretty recently, and I am quite > disturbed about the performance of a simple SELECT COUNT(*) FROM > table. What should (in my mind) be a nearly instantaneous operation > instead takes nearly 700ms in a table with only 87k rows of data! Speeding up COUNT is on the developers' TODO list, but it's not as simple as it might seem or it would have been done already. This has been brought up many times over the years -- search the archives to see past discussion. Words to search for include "MVCC," "index," and "visibility." > If I run an EXPLAIN on this query I can see that it is doing a > sequential scan, which seems quite needless, as surely this > information is cached in some secret location. If an estimate will suffice then you could use the table's pg_class.reltuples value, but beware that it can be rather out of date. http://www.postgresql.org/docs/8.0/static/catalog-pg-class.html -- Michael Fuhr http://www.fuhr.org/~mfuhr/
vmehta@apple.com (Varun Mehta) writes: > If I run an EXPLAIN on this query I can see that it is doing a > sequential scan, which seems quite needless, as surely this > information is cached in some secret location. That would in fact surely *NOT* be the case. If you have multiple users performing updates on that table concurrently, with the possibility of some of those updates rolling back, then it doesn't make sense for there to be any such "one place" where a count would be stored. Consider the case where you ask for COUNT(*) while the following set of transactions are outstanding: 1. A transaction, which, as it turns out, will get rolled back, that has inserted 40 tuples; 2. A transaction which has modified 10 tuples, thereby generating 10 dead tuples and adding 10 new ones; 3. 14 transactions are outstanding, each of which have added 2 tuples to the table. None of those transactions have COMMITted, so there are some 78 tuples "in limbo" spread across 16 transactions. If there were some "single secret place" with a count, how would you suggest it address those 78 tuples and 16 transactions that aren't yet (and maybe never will be) part of the count? > It is very possible that I am missing something, so I ask you: is > there a faster way to find out how many rows are in a table? I've > tried doing a COUNT(column) where I have an index on column, but it > still does a sequential scan and it is still very very slow. What > are my options? Use of the index doesn't help because the index isn't forcibly up to date. It has no notion of marking "index tuples" as dead/not visible. Visibility information is only attached to the tuples themselves. Look up "MVCC" for more details... -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. "I will not tell my Legions of Terror "And he must be taken alive!" The command will be: ``And try to take him alive if it is reasonably practical.''" <http://www.eviloverlord.com/>
Chris Browne <cbbrowne@acm.org> writes: > vmehta@apple.com (Varun Mehta) writes: >> If I run an EXPLAIN on this query I can see that it is doing a >> sequential scan, which seems quite needless, as surely this >> information is cached in some secret location. > [ example scenario snipped ] > If there were some "single secret place" with a count, how would you > suggest it address those 78 tuples and 16 transactions that aren't yet > (and maybe never will be) part of the count? It's worse than that: once some of those transactions have committed, the right answer is observer-dependent, since some onlooker transactions may see those guys as committed while others think they are not yet committed. So there could certainly not be just one secret place... There are solutions suggested in the archives, but they all amount to making COUNT(*)-with-no-WHERE-or-GROUP-BY-clause fast at the price of nontrivial distributed overhead for all updates --- overhead that would be paid whether or not the application ever did such a COUNT. That's not a tradeoff we've wanted to make in general. You can implement it yourself via triggers for specific tables that you think it's worth doing for. Also, if an approximate answer is good enough, there are a whole other set of possible solutions. regards, tom lane
Chris Browne wrote: > None of those transactions have COMMITted, so there are some 78 tuples > "in limbo" spread across 16 transactions. > > If there were some "single secret place" with a count, how would you > suggest it address those 78 tuples and 16 transactions that aren't yet > (and maybe never will be) part of the count? Hmmm, I understand this and don't doubt it, but out of curiousity, how does the current SELECT COUNT(*) handle this? It doesn't lock the entire table while counting (I assume) so the current implementation is really just an approximate count in the above scenario anyway. Or even when not, since the true 'count' is likely to have changed by the time the user does anything with the result of SELECT COUNT(*) on any active table (and on an inactive table, pg_class.reltuples is nearly as good as SELECT COUNT(*) and far faster to get to.) I assume this has been beaten well past death, but I don't see why it wouldn't be possible to keep pg_class.reltuples a bit more up-to-date instead of updating it only on vacuums. -- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
On Fri, Jul 08, 2005 at 07:12:26 -0700, Steve Wampler <swampler@noao.edu> wrote: > > Hmmm, I understand this and don't doubt it, but out of curiousity, how > does the current SELECT COUNT(*) handle this? It doesn't lock the entire It only counts tuples visible to the current transaction. > table while counting (I assume) so the current implementation is really > just an approximate count in the above scenario anyway. Or even when No, it is an exact count. > not, since the true 'count' is likely to have changed by the time the There is no single true count. There is a separate true count for each transaction. > user does anything with the result of SELECT COUNT(*) on any active table > (and on an inactive table, pg_class.reltuples is nearly as good as > SELECT COUNT(*) and far faster to get to.) > > I assume this has been beaten well past death, but I don't see why it > wouldn't be possible to keep pg_class.reltuples a bit more up-to-date > instead of updating it only on vacuums. Because it costs resources to keep track of that and people don't usually need exact tuple counts for whole tables. Those that do and are willing to pay the price can use triggers to maintain a count in a separate table.
Bruno Wolff III wrote: > No, it is an exact count. Yes, for the transaction, but it's an approximation of the number of tuples in the table - which is probably what the people who worry about its cost are more interested in (an approximate count for the table). I'm also claiming that a true count for any active table is meaningless and am *not* suggesting that effort be spent on trying to produce such a true count. >>I assume this has been beaten well past death, but I don't see why it >>wouldn't be possible to keep pg_class.reltuples a bit more up-to-date >>instead of updating it only on vacuums. > > > Because it costs resources to keep track of that and people don't usually need > exact tuple counts for whole tables. Yes, we agree completely! (Which is why I said 'a bit more' instead of 'exactly' above.) My uses for COUNT(*) are to get 'reasonable' approximate counts of the table sizes - not true counts, but approximate values. Unfortunately, pg_class.reltuples gets too far off too fast for me to use it as a consistent guide to current table size. If you Folks Who Know believe that simply keeping pg_class.reltuples 'closer' to the actual table size is too expensive, I'll accept that [after all, I have to right now anyway], but I'm surprised that it is, given all the other work that must go on at the start/close of a transaction. I also understand that 'reasonable' and 'closer' are vague terms. In the example scenerio where there were around 80 rows in an indeterminate state, my claim is that, in a table of around a million rows, it doesn't matter whether some portion of those indeterminate rows are included in an approximation of the table size or not (though it might in a table of 100 'true' rows - but the decision to ask for a true 'transaction' count (slow) or an approximate table size (fast) should be left to the user in either case). So, leave COUNT(*) alone. But it would be very handy to have a way to get an approximate table size that is more accurate than is provided by a pg_class.reltuples that is only updated on vacuums. -- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
> So, leave COUNT(*) alone. But it would be very handy to have a > way to get an approximate table size that is more accurate than is > provided by a pg_class.reltuples that is only updated on vacuums. Create 2 sequences, one for counting tuple additions and one for counting tuple deletions. When you INSERT a tuple, bump the "added" sequence (select nextval()); When you DELETE a tuple, bump the "deleted" sequence (select nextval()); To retrieve an approximate count, take the current value of both sequences (select directly -- don't use currval) and subtract the "deletes" from the "adds". This is a very fast tracking mechanism with the catch that it does not handle rollbacks -- but you only wanted approximate. Put all of the logic inside a pair of triggers and a function within the DB. --
On 7/8/05, Steve Wampler <swampler@noao.edu> wrote: > > None of those transactions have COMMITted, so there are some 78 tuples > > "in limbo" spread across 16 transactions. > > > > If there were some "single secret place" with a count, how would you > > suggest it address those 78 tuples and 16 transactions that aren't yet > > (and maybe never will be) part of the count? > > Hmmm, I understand this and don't doubt it, but out of curiousity, how > does the current SELECT COUNT(*) handle this? It doesn't lock the entire > table while counting (I assume) so the current implementation is really > just an approximate count in the above scenario anyway. Or even when > not, since the true 'count' is likely to have changed by the time the > user does anything with the result of SELECT COUNT(*) on any active table > (and on an inactive table, pg_class.reltuples is nearly as good as > SELECT COUNT(*) and far faster to get to.) > > I assume this has been beaten well past death, but I don't see why it > wouldn't be possible to keep pg_class.reltuples a bit more up-to-date > instead of updating it only on vacuums. Use EXPLAIN SELECT * FROM yourcountedtable; Planner seems to track estimated statistics on-the-fly. :) You can even wrap EXPLAIN SELECT in a pgsql function if you need it. Regards, Dawid PS: And be aware that these are 'statistics'. And the statement that there are lies, big lies and statistics is sometimes true even for PostgreSQL. ;-)
On Fri, Jul 08, 2005 at 08:07:27AM -0700, Steve Wampler wrote: > Bruno Wolff III wrote: > > No, it is an exact count. > > Yes, for the transaction, but it's an approximation of the number of > tuples in the table - which is probably what the people who worry about > its cost are more interested in (an approximate count for the table). You seem to have the wrong idea of "tuples in the table" here. You need to think harder about MVCC visibility rules. Given MVCC, there isn't really a "view from nowhere" in the system -- there's just the idea of what tuple visibility. For a little more, you might want to look at the presentation Tom Lane made for this: http://www.postgresql.org/files/developer/transactions.pdf A -- Andrew Sullivan | ajs@crankycanuck.ca A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton
On 7/8/05, Rod Taylor <pg@rbt.ca> wrote: > Create 2 sequences, one for counting tuple additions and one for > counting tuple deletions. > > When you INSERT a tuple, bump the "added" sequence (select nextval()); > > When you DELETE a tuple, bump the "deleted" sequence (select nextval()); > > To retrieve an approximate count, take the current value of both > sequences (select directly -- don't use currval) and subtract the > "deletes" from the "adds". Never thought of that! Good idea. :-) Regards, Dawid PS: There aren't any on ROLLBACK triggers, right? ;-)))
Steve Wampler <swampler@noao.edu> writes: > So, leave COUNT(*) alone. But it would be very handy to have a > way to get an approximate table size that is more accurate than is > provided by a pg_class.reltuples that is only updated on vacuums. If you want something cheap, you could use the same technique the planner uses nowadays: take RelationGetNumberOfBlocks() (which is guaranteed accurate) and multiply by reltuples/relpages. I don't see anyplace where RelationGetNumberOfBlocks is directly exposed to users now, but it'd be trivial to code up a couple of C functions to provide this functionality. regards, tom lane
On Fri, 2005-07-08 at 17:34 +0200, Dawid Kuroczko wrote: > On 7/8/05, Rod Taylor <pg@rbt.ca> wrote: > > Create 2 sequences, one for counting tuple additions and one for > > counting tuple deletions. > > > > When you INSERT a tuple, bump the "added" sequence (select nextval()); > > > > When you DELETE a tuple, bump the "deleted" sequence (select nextval()); > > > > To retrieve an approximate count, take the current value of both > > sequences (select directly -- don't use currval) and subtract the > > "deletes" from the "adds". > > Never thought of that! Good idea. :-) > > Regards, > Dawid > > PS: There aren't any on ROLLBACK triggers, right? ;-))) No. You could make then RI triggers and have them deferred until commit though. --
Tom Lane wrote: > Steve Wampler <swampler@noao.edu> writes: > >>So, leave COUNT(*) alone. But it would be very handy to have a >>way to get an approximate table size that is more accurate than is >>provided by a pg_class.reltuples that is only updated on vacuums. > > If you want something cheap, you could use the same technique the > planner uses nowadays: take RelationGetNumberOfBlocks() (which is > guaranteed accurate) and multiply by reltuples/relpages. I don't > see anyplace where RelationGetNumberOfBlocks is directly exposed to > users now, but it'd be trivial to code up a couple of C functions to > provide this functionality. Yes - this would be an excellent approximation for my needs! The solution that Dawid Kuroczko suggested (just call "explain select * on ..." and parse the result) would be equivalent these days, right? (I think in the 7.x versions the planner was just using pg_class.reltuples, which wouldn't have helped.) If true, I can handle that parsing myself easily enough without exposing RelationGetNumberOfBlocks. Thanks (Tom and Dawid)! -- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
> which wouldn't have helped.) If true, I can handle that parsing myself > easily enough without exposing RelationGetNumberOfBlocks. Is there a way to get EXPLAIN results in a non-text-formatted way for easier use ?I'm asking, because it seems the feature set grows by the minute in posgres nowadays...
Steve Wampler <swampler@noao.edu> writes: > Tom Lane wrote: >> If you want something cheap, you could use the same technique the >> planner uses nowadays: take RelationGetNumberOfBlocks() (which is >> guaranteed accurate) and multiply by reltuples/relpages. > Yes - this would be an excellent approximation for my needs! The > solution that Dawid Kuroczko suggested (just call "explain select * > on ..." and parse the result) would be equivalent these days, right? Close enough (the planner actually does some additional heuristic stuff to avoid going crazy on corner cases). regards, tom lane
> I'm also claiming that a true count for any active table is > meaningless and am *not* suggesting that effort be spent on trying > to produce such a true count. That's a pretty big assumption that would in fact be WRONG. We have managers interested in counting the number of objects we have around (As a domain registry, what objects would you imagine those might be :-)), and they're keen on possibly even being able to reconcile those counts from day to day based on transaction activity. Leaping into some sort of vague guesstimation would destroy the ability to do any kind of analysis of activity, and I daresay enrage them. There may be times that a really rough guess can suffice; there are other times when exactness is absolutely vital. Creating a "fast but WRONG COUNT(*)" which prevented getting the exact answer that the present implementation provides would be a severe misfeature. -- output = reverse("gro.gultn" "@" "enworbbc") http://linuxdatabases.info/info/rdbms.html "The test of a principle is whether it applies even to people you don't like." -- Henry Spencer
Christopher Browne wrote: >>I'm also claiming that a true count for any active table is >>meaningless and am *not* suggesting that effort be spent on trying >>to produce such a true count. > > > That's a pretty big assumption that would in fact be WRONG. Please reread the message from Bruno and reconcile the above statement with his assertion (which I believe) that there is *no* single true count for an active table. [I'm defining 'active' as currently undergoing insert/copy/delete/update actions]. > We have managers interested in counting the number of objects we have > around (As a domain registry, what objects would you imagine those > might be :-)), and they're keen on possibly even being able to > reconcile those counts from day to day based on transaction activity. If Bruno is correct, then they need to do this reconcilation from within a single transaction (the same one that does the COUNT(*)) - or else they are working on an 'inactive' table [one not currently accepting changes]. If neither condition holds, then isn't the result they are using from COUNT(*) currently is *already* an approximation? > Leaping into some sort of vague guesstimation would destroy the > ability to do any kind of analysis of activity, and I daresay enrage > them. No doubt! Let's hope the above conditions hold. > There may be times that a really rough guess can suffice; there are > other times when exactness is absolutely vital. But, as others have said, COUNT(*) does not return a true count for a table, but rather just a true count for the *current transaction*. So COUNT(*)'s from different simultaneous transactions may very well produce different values. > Creating a "fast but WRONG COUNT(*)" which prevented getting the exact > answer that the present implementation provides would be a severe > misfeature. Agreed - note that I did not suggest replacing the current COUNT(*) with an inexact version, but wanted (and now have) a quick way to get a reasonable approximation of the current table size. -- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
On Fri, Jul 08, 2005 at 06:08:03PM +0200, PFC wrote: > > > >which wouldn't have helped.) If true, I can handle that parsing myself > >easily enough without exposing RelationGetNumberOfBlocks. > > Is there a way to get EXPLAIN results in a non-text-formatted way > for easier use ? > I'm asking, because it seems the feature set grows by the minute in > posgres nowadays... Not yet, but a friend of mine was working on a patch to make it output in a custom XML format, to be able to create a tool similar to Redhat's Visual Explain. I expect he will show up in pgsql-hackers sometime ... In spanish: http://www.ubiobio.cl/~gpoo/weblog/archives/000397.html -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "Find a bug in a program, and fix it, and the program will work today. Show the program how to find and fix a bug, and the program will work forever" (Oliver Silfridge)
In article <758d5e7f05070808305c049aae@mail.gmail.com>, Dawid Kuroczko <qnex42@gmail.com> writes: > Use > EXPLAIN SELECT * FROM yourcountedtable; > Planner seems to track estimated statistics on-the-fly. :) > You can even wrap EXPLAIN SELECT in a pgsql function if you > need it. Do you know how to do that? A function "approx_count(tablename)" would be really handy, but FOR row IN EXECUTE 'EXPLAIN SELECT * FROM ' || tbl LOOP fails with the following message: ERROR: cannot open non-SELECT query as cursor > PS: And be aware that these are 'statistics'. And the statement that there > are lies, big lies and statistics is sometimes true even for PostgreSQL. ;-) Does this mean that PostgreSQL believes only in statistics it has faked itself? ;-)
Harald Fuchs <hf0614x@protecting.net> writes: > FOR row IN EXECUTE 'EXPLAIN SELECT * FROM ' || tbl LOOP > fails with the following message: > ERROR: cannot open non-SELECT query as cursor [ checks CVS history... ] Use 8.0.2 or later. regards, tom lane