Re: Performance gain from reduction of GROUP BY memory - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Performance gain from reduction of GROUP BY memory
Date
Msg-id 1125387424.4010.398.camel@localhost.localdomain
Whole thread Raw
In response to Re: Performance gain from reduction of GROUP BY memory allocations  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Performance gain from reduction of GROUP BY memory
List pgsql-hackers
On Mon, 2005-08-29 at 20:25 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I notice that Neil's patch regarding reducing the number of memory
> > allocations during aggregation operations isn't mentioned. It was
> > originally discussed in 8.0beta (2-3?) time.
> 
> > What happened there?
> > - patch not committed in the end
> > - committed but not mentioned, as a dropped item
> > - committed but not mentioned, since part of a larger patch
> 
> Are you speaking of these patches?

Yes, those look like the ones I mentioned.

Those seem to have a useful performance improvement?

At very least, the change in Aggregate function API should be mentioned,
no?

> 2005-04-06 19:56  neilc
> 
>     * src/backend/utils/adt/: float.c, numeric.c: Apply the "nodeAgg"
>     optimization to more of the builtin transition functions. This
>     patch optimizes int2_sum(), int4_sum(), float4_accum() and
>     float8_accum() to avoid needing to copy the transition function's
>     state for each input tuple of the aggregate. In an extreme case
>     (e.g. SELECT sum(int2_col) FROM table where table has a single
>     column), it improves performance by about 20%. For more complex
>     queries or tables with wider rows, the relative performance
>     improvement will not be as significant.
> 
> 2005-04-04 19:50  neilc
> 
>     * src/backend/utils/adt/numeric.c: This patch changes
>     int2_avg_accum() and int4_avg_accum() use the nodeAgg performance
>     hack Tom introduced recently. This means we can avoid copying the
>     transition array for each input tuple if these functions are
>     invoked as aggregate transition functions.
>     
>     To test the performance improvement, I created a 1 million row
>     table with a single int4 column. Without the patch, SELECT avg(col)
>     FROM table took about 4.2 seconds (after the data was cached); with
>     the patch, it took about 3.2 seconds. Naturally, the performance
>     improvement for a less trivial query (or a table with wider rows)
>     would be relatively smaller.
> 
> 2005-03-12 15:25  tgl
> 
>     * contrib/intagg/int_aggregate.c,
>     contrib/intagg/int_aggregate.sql.in, doc/src/sgml/xaggr.sgml,
>     doc/src/sgml/xfunc.sgml, src/backend/executor/nodeAgg.c,
>     src/backend/utils/adt/int8.c: Adjust the API for aggregate function
>     calls so that a C-coded function can tell whether it is being used
>     as an aggregate or not.  This allows such a function to avoid
>     re-pallocing a pass-by-reference transition value; normally it
>     would be unsafe for a function to scribble on an input, but in the
>     aggregate case it's safe to reuse the old transition value.  Make
>     int8inc() do this.  This gets a useful improvement in the speed of
>     COUNT(*), at least on narrow tables (it seems to be swamped by I/O
>     when the table rows are wide).    Per a discussion in early December
>     with Neil Conway.  I also fixed int_aggregate.c to check this,
>     thereby turning it into something approaching a supportable
>     technique instead of being a crude hack.

I'll search CVS directly next time. Thanks.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: SHMMAX seems entirely broken in OS X 10.4.2
Next
From: Simon Riggs
Date:
Subject: Re: Query Sampling