Re: count(*) performance improvement ideas - Mailing list pgsql-hackers

From Mark Mielke
Subject Re: count(*) performance improvement ideas
Date
Msg-id 47D7FA4A.2060109@mark.mielke.cc
Whole thread Raw
In response to count(*) performance improvement ideas  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Responses Re: count(*) performance improvement ideas
List pgsql-hackers
Pavan Deolasee wrote:
> I am reading discussion about improving count(*) performance. I have
> also seen a TODO for this.
>
> Many people have suggested TRIGGER based solution to the slow count(*)
> problem. I looked at the following link which presents the solution
> neatly.
>
> http://www.varlena.com/GeneralBits/120.php
>
> But how does that really work for SERIALIZABLE transactions ? If
> two concurrent transactions INSERT/DELETE rows from a table,
> the trigger execution of one of the transactions is bound to fail
> because of concurrent access. Even for READ COMMITTED transactions,
> the trigger execution would wait if the other transaction has executed
> the trigger on the same table. Well, I think the READ COMMITTED case
> can be handled with DEFERRED triggers, but that may require queuing up
> too many triggers if there are many inserts/deletes in a transaction.
>
> Running trigger for every insert/delete seems too expensive. I wonder
> if we can have a separate "counter" table (as suggested in the TRIGGER
> based solution) and track total number of tuples inserted and deleted
> in a transaction (and all the committed subtransactions). We then
> execute a single UPDATE at the end of the transaction. With HOT,
> updating the "counter" table should not be a big pain since all these
> updates can potentially be HOT updates. Also since the update of
> the "counter" table happens at the commit time, other transactions
> inserting/deleting from the same user table may need to wait for a
> very small period on the "counter" table tuple.
>
> This still doesn't solve the serializable transaction problem
> though. But I am sure we can figure out some solution for that case
> as well if we agree on the general approach.
>
> I am sure this must have been discussed before. So what are the
> objections 

If you are talking about automatically doing this for every table - I 
have an objection that the performance impact seems unwarranted against 
the gain. We are still talking about every insert or update updating 
some counter table, with the only mitigating factor being that the 
trigger would be coded deeper into PostgreSQL theoretically making it 
cheaper?

You can already today create a trigger on insert that will append to a 
summary table of some sort, whose only purpose is to maintain counts. At 
the simplest, it is as somebody else suggested where you might have the 
other table only store the primary keys with foreign key references back 
to the main table for handling deletes and updates. Storing transaction 
numbers and such might allow the data to be reduced in terms of size 
(and therefore elapsed time to scan), but it seems complex.

If this really is a problem that must be solved - I prefer the 
suggestion from the past of keeping track of live rows per block for a 
certain transaction range, and any that fall within this range can check 
off this block quickly with an exact count, then the exceptional blocks 
(the ones being changed) can be scanned to be sure. But, it's still 
pretty complicated to implement right and maintain, for what is probably 
limited gain. I don't personally buy into the need to do exact count(*) 
on a whole table quickly. I know people ask for it - but I find these 
same people either confused, or trying to use this functionality to 
accomplish some other end, under the assumption that because they can 
get counts faster from other databases, therefore PostgreSQL should do 
it as well. I sometimes wonder whether these people would even notice if 
PostgreSQL translated count(*) on the whole table to query reltuples. :-)

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: TODO-list on wiki (was: TODO update about SQLSTATE to PGconn)
Next
From: "Dave Page"
Date:
Subject: Re: Re: TODO-list on wiki (was: TODO update about SQLSTATE to PGconn)