Re: nodeAgg perf tweak - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: nodeAgg perf tweak
Date
Msg-id 1101889526.5728.18.camel@localhost.localdomain
Whole thread Raw
In response to nodeAgg perf tweak  (Neil Conway <neilc@samurai.com>)
Responses Re: nodeAgg perf tweak
List pgsql-hackers
On Wed, 2004-12-01 at 02:48, Neil Conway wrote:
> I noticed that we have a bottleneck in aggregate performance in
> advance_transition_function(): according to callgrind, doing datumCopy()
> and pfree() for every tuple produced by the transition function is
> pretty expensive. Some queries bare this out:
> 
...

> I can reproduce this performance difference consistently. I thought this
> might have been attributable to memory checking overhead because
> assertions were enabled, but that doesn't appear to be the case (the
> above results are without --enable-cassert).

That looks like a useful performance gain, and count(*) is a weak point
on the performance roadmap. Thanks.

> The attached patch invokes the transition function in the current memory
> context. I don't think that's right: a memory leak in an aggregate's
> transition function would be problematic when we're invoked from a
> per-query memory context, as is the case with advance_aggregates().
> Perhaps we need an additional short-lived memory context in
> AggStatePerAggData: we could invoke the transition function in that
> context, and reset it once per, say, 1000 tuples.

I'd be a little twitchy about new memory contexts at this stage of beta,
but if the code is fairly well isolated that could be good.

> Alternatively we could just mandate that aggregate transition function's
> not leak memory and then invoke the transition function in, say, the
> aggregate's memory context, but that seems a little fragile.

Would it possible to differentiate between well-known builtin functions
and added transition functions? I realise nothing is that simple, but it
seems we could trust some functions more than others. The trust level
could be determined at planning time. [DB2 uses FENCED or UNFENCED
declarations for this purpose, though the exact meaning is different to
your proposal - I'm not suggesting adding external syntax though]

A performance gain on the built-ins alone would satisfy 80% of users.

-- 
Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: Adding Reply-To: to Lists configuration ...
Next
From: Neil Conway
Date:
Subject: Re: nodeAgg perf tweak