Home > mailing lists

Re: Much Ado About COUNT(*) - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Much Ado About COUNT(*)
Date	January 19, 2005 21:33:17
Msg-id	1106170367.2886.623.camel@jeff Whole thread
In response to	Re: Much Ado About COUNT(*) (Alvaro Herrera <alvherre@dcc.uchile.cl>)
Responses	Re: Much Ado About COUNT(*)
List	pgsql-hackers

Tree view

To fill in some details I think what he's saying is this:

=> create table foo(...);
=> create table foo_count(num int);
=> insert into foo_count values(0);
=> create table foo_change(num int);

then create a trigger "after delete on foo" that does "insert into
foo_change values(-1)" and a trigger "after insert on foo" that inserts
a +1 into foo_change.

Periodically, do:
=> begin;
=> set transaction isolation level serializable;
=> update foo_count set num=num+(select sum(num) from foo_change);
=> delete from foo_change;
=> commit;
=> VACUUM;

And then any time you need the correct count(*) value, do instead:
=> select sum(num) from (select num from foo_count union select num from
foo_change);

And that should work. I haven't tested this exact example, so I may have
overlooked something.

Hope that helps. That way, you don't have huge waste from the second
table, and also triggers maintain it for you and you don't need to think
about it.

Regards,Jeff Davis

On Wed, 2005-01-19 at 17:40 -0300, Alvaro Herrera wrote:
> On Wed, Jan 19, 2005 at 10:16:38AM -0600, Bruno Wolff III wrote:
> > On Wed, Jan 19, 2005 at 14:59:17 -0000,
> >   Mark Cave-Ayland <m.cave-ayland@webbased.co.uk> wrote:
> >
> > > So then I would use SELECT COUNT(*) FROM person_count whenever I wanted to
> > > know the current number of person records. How much quicker would a COUNT(*)
> > > be if visibility were included in the indices as opposed to a "hacked"
> > > approach like this?
> > 
> > You are only going to get a constant factor speed up unless the space savings
> > allows much better use of cache. You probably want to look at using
> > triggers to maintain counts in another table.
> 
> I'd try using a "start value" and a differences list.  So the
> differences list would be initially empty and the start value would be
> 0.  On insert or delete, you create a new difference (with +1 or
> whatever).  Periodically, the differences would be added to the start
> value and the records deleted.  Thus the time to calculate the total
> count is much smaller, and it follows MVCC rules.  Of course there are
> lots of minor details not mentioned here.
> 
> Not sure if I'd model this with a single table or two.
>

pgsql-hackers by date:

From: "Marc G. Fournier"
Date: 19 January 2005, 20:59:09
Subject: Re: Two-phase commit for 8.1

From: Neil Conway
Date: 19 January 2005, 23:01:50
Subject: Re: Caching of frequently used objects

Re: Much Ado About COUNT(*) - Mailing list pgsql-hackers

Previous

Next