Re: Much Ado About COUNT(*) - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Much Ado About COUNT(*)
Date
Msg-id 1106170367.2886.623.camel@jeff
Whole thread Raw
In response to Re: Much Ado About COUNT(*)  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
Responses Re: Much Ado About COUNT(*)
List pgsql-hackers
To fill in some details I think what he's saying is this:

=> create table foo(...);
=> create table foo_count(num int);
=> insert into foo_count values(0);
=> create table foo_change(num int);

then create a trigger "after delete on foo" that does "insert into
foo_change values(-1)" and a trigger "after insert on foo" that inserts
a +1 into foo_change.

Periodically, do:
=> begin;
=> set transaction isolation level serializable;
=> update foo_count set num=num+(select sum(num) from foo_change);
=> delete from foo_change;
=> commit;
=> VACUUM;

And then any time you need the correct count(*) value, do instead:
=> select sum(num) from (select num from foo_count union select num from
foo_change);

And that should work. I haven't tested this exact example, so I may have
overlooked something.

Hope that helps. That way, you don't have huge waste from the second
table, and also triggers maintain it for you and you don't need to think
about it.

Regards,Jeff Davis

On Wed, 2005-01-19 at 17:40 -0300, Alvaro Herrera wrote:
> On Wed, Jan 19, 2005 at 10:16:38AM -0600, Bruno Wolff III wrote:
> > On Wed, Jan 19, 2005 at 14:59:17 -0000,
> >   Mark Cave-Ayland <m.cave-ayland@webbased.co.uk> wrote:
> >
> > > So then I would use SELECT COUNT(*) FROM person_count whenever I wanted to
> > > know the current number of person records. How much quicker would a COUNT(*)
> > > be if visibility were included in the indices as opposed to a "hacked"
> > > approach like this?
> > 
> > You are only going to get a constant factor speed up unless the space savings
> > allows much better use of cache. You probably want to look at using
> > triggers to maintain counts in another table.
> 
> I'd try using a "start value" and a differences list.  So the
> differences list would be initially empty and the start value would be
> 0.  On insert or delete, you create a new difference (with +1 or
> whatever).  Periodically, the differences would be added to the start
> value and the records deleted.  Thus the time to calculate the total
> count is much smaller, and it follows MVCC rules.  Of course there are
> lots of minor details not mentioned here.
> 
> Not sure if I'd model this with a single table or two.
> 



pgsql-hackers by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: Two-phase commit for 8.1
Next
From: Neil Conway
Date:
Subject: Re: Caching of frequently used objects