Tom Lane wrote:
> Sean Chittenden <sean@chittenden.org> writes:
> > Now, there are some obvious problems:
>
> You missed the real reason why this will never happen: it completely
> kills any prospect of concurrent updates. If transaction A has issued
> an update on some row, and gone and modified the relevant aggregate
> cache entries, what happens when transaction B wants to update another
> row? It has to wait for A to commit or not, so it knows whether to
> believe A's changes to the aggregate cache entries.
>
> For some aggregates you could imagine an 'undo' operator to allow
> A's updates to be retroactively removed even after B has applied its
> changes. But that doesn't work very well in general. And in any case,
> you'd have to provide serialization interlocks on physical access to
> each of the aggregate cache entries. That bottleneck applied to every
> update would be likely to negate any possible benefit from using the
> cached values.
Hmm...any chance, then, of giving aggregate functions a means of
asking which table(s) and column(s) the original query referred to so
that it could do proper optimization on its own? For instance, for a
"SELECT min(x) FROM mytable" query, the min() function would be told
upon asking that it's operating on column x of mytable, whereas it
would be told "undefined" for the column if the query were "SELECT
min(x+y) FROM mytable". In the former case, it would be able to do a
"SELECT x FROM mytable ORDER BY x LIMIT 1" on its own, whereas in the
latter it would have no choice but to fetch the data to do its
calculation via the normal means.
But that may be more trouble than it's worth, if aggregate functions
aren't responsible for retrieving the values they're supposed to base
their computations on, or if it's not possible to get the system to
refrain from prefetching data for the aggregate function until the
function asks for it.
--
Kevin Brown kevin@sysexperts.com