Re: count(*) performance improvement ideas - Mailing list pgsql-hackers
From | Mark Mielke |
---|---|
Subject | Re: count(*) performance improvement ideas |
Date | |
Msg-id | 47D7FA4A.2060109@mark.mielke.cc Whole thread Raw |
In response to | count(*) performance improvement ideas ("Pavan Deolasee" <pavan.deolasee@gmail.com>) |
Responses |
Re: count(*) performance improvement ideas
|
List | pgsql-hackers |
Pavan Deolasee wrote: > I am reading discussion about improving count(*) performance. I have > also seen a TODO for this. > > Many people have suggested TRIGGER based solution to the slow count(*) > problem. I looked at the following link which presents the solution > neatly. > > http://www.varlena.com/GeneralBits/120.php > > But how does that really work for SERIALIZABLE transactions ? If > two concurrent transactions INSERT/DELETE rows from a table, > the trigger execution of one of the transactions is bound to fail > because of concurrent access. Even for READ COMMITTED transactions, > the trigger execution would wait if the other transaction has executed > the trigger on the same table. Well, I think the READ COMMITTED case > can be handled with DEFERRED triggers, but that may require queuing up > too many triggers if there are many inserts/deletes in a transaction. > > Running trigger for every insert/delete seems too expensive. I wonder > if we can have a separate "counter" table (as suggested in the TRIGGER > based solution) and track total number of tuples inserted and deleted > in a transaction (and all the committed subtransactions). We then > execute a single UPDATE at the end of the transaction. With HOT, > updating the "counter" table should not be a big pain since all these > updates can potentially be HOT updates. Also since the update of > the "counter" table happens at the commit time, other transactions > inserting/deleting from the same user table may need to wait for a > very small period on the "counter" table tuple. > > This still doesn't solve the serializable transaction problem > though. But I am sure we can figure out some solution for that case > as well if we agree on the general approach. > > I am sure this must have been discussed before. So what are the > objections If you are talking about automatically doing this for every table - I have an objection that the performance impact seems unwarranted against the gain. We are still talking about every insert or update updating some counter table, with the only mitigating factor being that the trigger would be coded deeper into PostgreSQL theoretically making it cheaper? You can already today create a trigger on insert that will append to a summary table of some sort, whose only purpose is to maintain counts. At the simplest, it is as somebody else suggested where you might have the other table only store the primary keys with foreign key references back to the main table for handling deletes and updates. Storing transaction numbers and such might allow the data to be reduced in terms of size (and therefore elapsed time to scan), but it seems complex. If this really is a problem that must be solved - I prefer the suggestion from the past of keeping track of live rows per block for a certain transaction range, and any that fall within this range can check off this block quickly with an exact count, then the exceptional blocks (the ones being changed) can be scanned to be sure. But, it's still pretty complicated to implement right and maintain, for what is probably limited gain. I don't personally buy into the need to do exact count(*) on a whole table quickly. I know people ask for it - but I find these same people either confused, or trying to use this functionality to accomplish some other end, under the assumption that because they can get counts faster from other databases, therefore PostgreSQL should do it as well. I sometimes wonder whether these people would even notice if PostgreSQL translated count(*) on the whole table to query reltuples. :-) Cheers, mark -- Mark Mielke <mark@mielke.cc>
pgsql-hackers by date: