Re: Performance gain from reduction of GROUP BY memory - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Performance gain from reduction of GROUP BY memory |
Date | |
Msg-id | 1125387424.4010.398.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Performance gain from reduction of GROUP BY memory allocations (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Performance gain from reduction of GROUP BY memory
|
List | pgsql-hackers |
On Mon, 2005-08-29 at 20:25 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > I notice that Neil's patch regarding reducing the number of memory > > allocations during aggregation operations isn't mentioned. It was > > originally discussed in 8.0beta (2-3?) time. > > > What happened there? > > - patch not committed in the end > > - committed but not mentioned, as a dropped item > > - committed but not mentioned, since part of a larger patch > > Are you speaking of these patches? Yes, those look like the ones I mentioned. Those seem to have a useful performance improvement? At very least, the change in Aggregate function API should be mentioned, no? > 2005-04-06 19:56 neilc > > * src/backend/utils/adt/: float.c, numeric.c: Apply the "nodeAgg" > optimization to more of the builtin transition functions. This > patch optimizes int2_sum(), int4_sum(), float4_accum() and > float8_accum() to avoid needing to copy the transition function's > state for each input tuple of the aggregate. In an extreme case > (e.g. SELECT sum(int2_col) FROM table where table has a single > column), it improves performance by about 20%. For more complex > queries or tables with wider rows, the relative performance > improvement will not be as significant. > > 2005-04-04 19:50 neilc > > * src/backend/utils/adt/numeric.c: This patch changes > int2_avg_accum() and int4_avg_accum() use the nodeAgg performance > hack Tom introduced recently. This means we can avoid copying the > transition array for each input tuple if these functions are > invoked as aggregate transition functions. > > To test the performance improvement, I created a 1 million row > table with a single int4 column. Without the patch, SELECT avg(col) > FROM table took about 4.2 seconds (after the data was cached); with > the patch, it took about 3.2 seconds. Naturally, the performance > improvement for a less trivial query (or a table with wider rows) > would be relatively smaller. > > 2005-03-12 15:25 tgl > > * contrib/intagg/int_aggregate.c, > contrib/intagg/int_aggregate.sql.in, doc/src/sgml/xaggr.sgml, > doc/src/sgml/xfunc.sgml, src/backend/executor/nodeAgg.c, > src/backend/utils/adt/int8.c: Adjust the API for aggregate function > calls so that a C-coded function can tell whether it is being used > as an aggregate or not. This allows such a function to avoid > re-pallocing a pass-by-reference transition value; normally it > would be unsafe for a function to scribble on an input, but in the > aggregate case it's safe to reuse the old transition value. Make > int8inc() do this. This gets a useful improvement in the speed of > COUNT(*), at least on narrow tables (it seems to be swamped by I/O > when the table rows are wide). Per a discussion in early December > with Neil Conway. I also fixed int_aggregate.c to check this, > thereby turning it into something approaching a supportable > technique instead of being a crude hack. I'll search CVS directly next time. Thanks. Best Regards, Simon Riggs
pgsql-hackers by date: