Christopher Browne wrote:
> > IMHO portability is an important point. People are used to MAX() and
> > COUNT(*), and will be surprised that they need some special
> > treatment. While the reasons for this are perfectly explainable,
> > speeding up these aggregates with some extra effort would make porting
> > a bit easier.
>
> The availability of cleverness with MAX()/MIN() is no grand surprise;
> it would be very nice to get some expansion of that to "SELECT VALUE
> FROM TABLE WHERE (CRITERIA) ORDER BY VALUE DESCENDING LIMIT 1;"
>
> But I'm _very_ curious as to what the anticipated treatment to collect
> COUNT() more efficiently would be. I would expect that it would only
> be able to get tuned much more if there's NO "where" clause, so that
> it could use some ("magically-kept-up-to-date") stats on table size.
>
> I don't see any way to optimize COUNT when numbers of rows can
> continually vary. Storing stats somewhere will just make updates more
> expensive. And if those stats are for the table, that doesn't help me
> if I want "COUNT(*) FROM TABLE WHERE UPDATED_ON BETWEEN NOW() - '1
> day' and NOW()".
Yes, count would only use the cached stats for non-WHERE clause
COUNT(*).
My idea is that if a transaction doing a COUNT(*) would first look to
see if there already was a visible cached value, and if not, it would do
the COUNT(*) and insert into the cache table. Any INSERT/DELETE would
remove the value from the cache. As I see it, the commit of the
INSERT/DELETE transaction would then auto-invalidate the cache at the
exact time the transaction commits. This would allow MVCC visibility of
the counts.
A trickier idea would be for INSERT/DELETE to UPDATE the cached value.
It might be possible to always have a valid cache value for COUNT(*).
(COPY would also need to update the cache.)
-- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610)
359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square,
Pennsylvania19073