Thread: avg(int2) and avg(int8) micro-opt

avg(int2) and avg(int8) micro-opt

From
Neil Conway
Date:
This patch changes int2_avg_accum() and int4_avg_accum() use the nodeAgg
performance hack Tom introduced recently. This means we can avoid
copying the transition array for each input tuple if these functions are
invoked as aggregate transition functions.

To test the performance improvement, I created a 1 million row table
with a single int4 column. Without the patch, SELECT avg(col) FROM table
took about 4.2 seconds (after the data was cached); with the patch, it
took about 3.2 seconds. Naturally, the performance improvement for a
less trivial query (or a table with wider rows) would be relatively smaller.

It is possible that the transition array might be TOAST'ed (not that I'd
expect that to occur in practice, of course). The aggregates should
continue to work in this case: PG_DETOAST_DATUM() is equivalent to
PG_DETOAST_DATUM_COPY() if the datum is toast'ed, so in effect we just
won't implement the nodeAgg performance hack if the transition array is
toasted. If I've mucked this up, let me know.

Barring any objections, I'll commit this tomorrow.

-Neil
Index: src/backend/utils/adt/numeric.c
===================================================================
RCS file: /Users/neilc/local/cvs/pgsql/src/backend/utils/adt/numeric.c,v
retrieving revision 1.81
diff -c -r1.81 numeric.c
*** src/backend/utils/adt/numeric.c    1 Jan 2005 05:43:07 -0000    1.81
--- src/backend/utils/adt/numeric.c    4 Apr 2005 11:00:41 -0000
***************
*** 2462,2478 ****
  Datum
  int2_avg_accum(PG_FUNCTION_ARGS)
  {
!     ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P_COPY(0);
      int16        newval = PG_GETARG_INT16(1);
      Int8TransTypeData *transdata;

      /*
!      * We copied the input array, so it's okay to scribble on it directly.
       */
      if (ARR_SIZE(transarray) != ARR_OVERHEAD(1) + sizeof(Int8TransTypeData))
          elog(ERROR, "expected 2-element int8 array");
-     transdata = (Int8TransTypeData *) ARR_DATA_PTR(transarray);

      transdata->count++;
      transdata->sum += newval;

--- 2462,2485 ----
  Datum
  int2_avg_accum(PG_FUNCTION_ARGS)
  {
!     ArrayType  *transarray;
      int16        newval = PG_GETARG_INT16(1);
      Int8TransTypeData *transdata;

      /*
!      * If we're invoked by nodeAgg, we can cheat and modify our first
!      * parameter in-place to reduce palloc overhead. Otherwise we need
!      * to make a copy of it before scribbling on it.
       */
+     if (fcinfo->context && IsA(fcinfo->context, AggState))
+         transarray = PG_GETARG_ARRAYTYPE_P(0);
+     else
+         transarray = PG_GETARG_ARRAYTYPE_P_COPY(0);
+
      if (ARR_SIZE(transarray) != ARR_OVERHEAD(1) + sizeof(Int8TransTypeData))
          elog(ERROR, "expected 2-element int8 array");

+     transdata = (Int8TransTypeData *) ARR_DATA_PTR(transarray);
      transdata->count++;
      transdata->sum += newval;

***************
*** 2482,2498 ****
  Datum
  int4_avg_accum(PG_FUNCTION_ARGS)
  {
!     ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P_COPY(0);
      int32        newval = PG_GETARG_INT32(1);
      Int8TransTypeData *transdata;

      /*
!      * We copied the input array, so it's okay to scribble on it directly.
       */
      if (ARR_SIZE(transarray) != ARR_OVERHEAD(1) + sizeof(Int8TransTypeData))
          elog(ERROR, "expected 2-element int8 array");
-     transdata = (Int8TransTypeData *) ARR_DATA_PTR(transarray);

      transdata->count++;
      transdata->sum += newval;

--- 2489,2512 ----
  Datum
  int4_avg_accum(PG_FUNCTION_ARGS)
  {
!     ArrayType  *transarray;
      int32        newval = PG_GETARG_INT32(1);
      Int8TransTypeData *transdata;

      /*
!      * If we're invoked by nodeAgg, we can cheat and modify our first
!      * parameter in-place to reduce palloc overhead. Otherwise we need
!      * to make a copy of it before scribbling on it.
       */
+     if (fcinfo->context && IsA(fcinfo->context, AggState))
+         transarray = PG_GETARG_ARRAYTYPE_P(0);
+     else
+         transarray = PG_GETARG_ARRAYTYPE_P_COPY(0);
+
      if (ARR_SIZE(transarray) != ARR_OVERHEAD(1) + sizeof(Int8TransTypeData))
          elog(ERROR, "expected 2-element int8 array");

+     transdata = (Int8TransTypeData *) ARR_DATA_PTR(transarray);
      transdata->count++;
      transdata->sum += newval;


Re: avg(int2) and avg(int8) micro-opt

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> This patch changes int2_avg_accum() and int4_avg_accum() use the nodeAgg
> performance hack Tom introduced recently.

Why only those two?  Might as well make all the accum functions look alike.

> It is possible that the transition array might be TOAST'ed (not that I'd
> expect that to occur in practice, of course).

AFAICS that is impossible, since the transition value is never stored to
disk.  But your analysis is good anyway.

            regards, tom lane

Re: avg(int2) and avg(int8) micro-opt

From
Neil Conway
Date:
Tom Lane wrote:
> Why only those two?  Might as well make all the accum functions look alike.

Yeah, there might be some others we could improve. float4_accum() and
float8_accum() look like they could be improved pretty easily, and
do_numeric_accum() should also be fixable with some hackery. I suppose
it's also worth optimizing int2_sum(), int4_sum() and int8_sum(). I'll
send a patch for this later today or tomorrow. Are there any other
transition functions where we can apply this technique?

BTW, int2_avg_accum() and int4_avg_accum() patch applied to HEAD.

-Neil

Re: avg(int2) and avg(int8) micro-opt

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Tom Lane wrote:
>> Why only those two?  Might as well make all the accum functions look alike.

> Yeah, there might be some others we could improve. float4_accum() and
> float8_accum() look like they could be improved pretty easily, and
> do_numeric_accum() should also be fixable with some hackery. I suppose
> it's also worth optimizing int2_sum(), int4_sum() and int8_sum(). I'll
> send a patch for this later today or tomorrow. Are there any other
> transition functions where we can apply this technique?

Actually, do_numeric_accum can't be fixed easily because numeric is a
varlena type.  The basic requirement for this hack is that the size of
the transition value be constant ...

            regards, tom lane

Re: avg(int2) and avg(int8) micro-opt

From
Neil Conway
Date:
Neil Conway wrote:
> Yeah, there might be some others we could improve. float4_accum() and
> float8_accum() look like they could be improved pretty easily, and
> do_numeric_accum() should also be fixable with some hackery. I suppose
> it's also worth optimizing int2_sum(), int4_sum() and int8_sum().

Attached is a patch that applies the same optimization to int2_sum(),
int4_sum(), float4_accum(), and float8_accum(). It wasn't possible to
optimize do_numeric_accum() or int8_sum() since they both use numerics.
Performance gains are similar to those measured when this optimization
has been applied to similar aggregates (e.g. avg(float8) goes from about
6400 ms to about 4300 ms on my machine, for a single column of float8
and about 1 million rows).

Barring any objections, I'll apply this to HEAD tomorrow.

-Neil
Index: src/backend/utils/adt/float.c
===================================================================
RCS file: /Users/neilc/local/cvs/pgsql/src/backend/utils/adt/float.c,v
retrieving revision 1.113
diff -c -r1.113 float.c
*** src/backend/utils/adt/float.c    11 Feb 2005 04:08:58 -0000    1.113
--- src/backend/utils/adt/float.c    6 Apr 2005 07:24:38 -0000
***************
*** 1902,1909 ****
      float8        N,
                  sumX,
                  sumX2;
-     Datum        transdatums[3];
-     ArrayType  *result;

      transvalues = check_float8_array(transarray, "float8_accum");
      N = transvalues[0];
--- 1902,1907 ----
***************
*** 1914,1928 ****
      sumX += newval;
      sumX2 += newval * newval;

!     transdatums[0] = Float8GetDatumFast(N);
!     transdatums[1] = Float8GetDatumFast(sumX);
!     transdatums[2] = Float8GetDatumFast(sumX2);
!
!     result = construct_array(transdatums, 3,
!                              FLOAT8OID,
!                          sizeof(float8), false /* float8 byval */ , 'd');

!     PG_RETURN_ARRAYTYPE_P(result);
  }

  Datum
--- 1912,1946 ----
      sumX += newval;
      sumX2 += newval * newval;

!     /*
!      * If we're invoked by nodeAgg, we can cheat and modify our first
!      * parameter in-place to reduce palloc overhead. Otherwise we
!      * construct a new array with the updated transition data and
!      * return it.
!      */
!     if (fcinfo->context && IsA(fcinfo->context, AggState))
!     {
!         transvalues[0] = N;
!         transvalues[1] = sumX;
!         transvalues[2] = sumX2;
!
!         PG_RETURN_ARRAYTYPE_P(transarray);
!     }
!     else
!     {
!         Datum        transdatums[3];
!         ArrayType  *result;
!
!         transdatums[0] = Float8GetDatumFast(N);
!         transdatums[1] = Float8GetDatumFast(sumX);
!         transdatums[2] = Float8GetDatumFast(sumX2);
!
!         result = construct_array(transdatums, 3,
!                                  FLOAT8OID,
!                                  sizeof(float8), false /* float8 byval */ , 'd');

!         PG_RETURN_ARRAYTYPE_P(result);
!     }
  }

  Datum
***************
*** 1935,1942 ****
                  sumX,
                  sumX2,
                  newval;
-     Datum        transdatums[3];
-     ArrayType  *result;

      transvalues = check_float8_array(transarray, "float4_accum");
      N = transvalues[0];
--- 1953,1958 ----
***************
*** 1950,1964 ****
      sumX += newval;
      sumX2 += newval * newval;

!     transdatums[0] = Float8GetDatumFast(N);
!     transdatums[1] = Float8GetDatumFast(sumX);
!     transdatums[2] = Float8GetDatumFast(sumX2);
!
!     result = construct_array(transdatums, 3,
!                              FLOAT8OID,
!                          sizeof(float8), false /* float8 byval */ , 'd');

!     PG_RETURN_ARRAYTYPE_P(result);
  }

  Datum
--- 1966,2000 ----
      sumX += newval;
      sumX2 += newval * newval;

!     /*
!      * If we're invoked by nodeAgg, we can cheat and modify our first
!      * parameter in-place to reduce palloc overhead. Otherwise we
!      * construct a new array with the updated transition data and
!      * return it.
!      */
!     if (fcinfo->context && IsA(fcinfo->context, AggState))
!     {
!         transvalues[0] = N;
!         transvalues[1] = sumX;
!         transvalues[2] = sumX2;
!
!         PG_RETURN_ARRAYTYPE_P(transarray);
!     }
!     else
!     {
!         Datum        transdatums[3];
!         ArrayType  *result;
!
!         transdatums[0] = Float8GetDatumFast(N);
!         transdatums[1] = Float8GetDatumFast(sumX);
!         transdatums[2] = Float8GetDatumFast(sumX2);
!
!         result = construct_array(transdatums, 3,
!                                  FLOAT8OID,
!                                  sizeof(float8), false /* float8 byval */ , 'd');

!         PG_RETURN_ARRAYTYPE_P(result);
!     }
  }

  Datum
Index: src/backend/utils/adt/numeric.c
===================================================================
RCS file: /Users/neilc/local/cvs/pgsql/src/backend/utils/adt/numeric.c,v
retrieving revision 1.82
diff -c -r1.82 numeric.c
*** src/backend/utils/adt/numeric.c    4 Apr 2005 23:50:27 -0000    1.82
--- src/backend/utils/adt/numeric.c    6 Apr 2005 08:07:37 -0000
***************
*** 2357,2363 ****
  Datum
  int2_sum(PG_FUNCTION_ARGS)
  {
-     int64        oldsum;
      int64        newval;

      if (PG_ARGISNULL(0))
--- 2357,2362 ----
***************
*** 2370,2391 ****
          PG_RETURN_INT64(newval);
      }

!     oldsum = PG_GETARG_INT64(0);

!     /* Leave sum unchanged if new input is null. */
!     if (PG_ARGISNULL(1))
!         PG_RETURN_INT64(oldsum);

!     /* OK to do the addition. */
!     newval = oldsum + (int64) PG_GETARG_INT16(1);

!     PG_RETURN_INT64(newval);
  }

  Datum
  int4_sum(PG_FUNCTION_ARGS)
  {
-     int64        oldsum;
      int64        newval;

      if (PG_ARGISNULL(0))
--- 2369,2407 ----
          PG_RETURN_INT64(newval);
      }

!     /*
!      * If we're invoked by nodeAgg, we can cheat and modify out first
!      * parameter in-place to avoid palloc overhead. If not, we need to
!      * return the new value of the transition variable.
!      */
!     if (fcinfo->context && IsA(fcinfo->context, AggState))
!     {
!         int64 *oldsum = (int64 *) PG_GETARG_POINTER(0);

!         /* Leave the running sum unchanged in the new input is null */
!         if (!PG_ARGISNULL(1))
!             *oldsum = *oldsum + (int64) PG_GETARG_INT16(1);

!         PG_RETURN_POINTER(oldsum);
!     }
!     else
!     {
!         int64        oldsum = PG_GETARG_INT64(0);
!
!         /* Leave sum unchanged if new input is null. */
!         if (PG_ARGISNULL(1))
!             PG_RETURN_INT64(oldsum);
!
!         /* OK to do the addition. */
!         newval = oldsum + (int64) PG_GETARG_INT16(1);

!         PG_RETURN_INT64(newval);
!     }
  }

  Datum
  int4_sum(PG_FUNCTION_ARGS)
  {
      int64        newval;

      if (PG_ARGISNULL(0))
***************
*** 2398,2413 ****
          PG_RETURN_INT64(newval);
      }

!     oldsum = PG_GETARG_INT64(0);

!     /* Leave sum unchanged if new input is null. */
!     if (PG_ARGISNULL(1))
!         PG_RETURN_INT64(oldsum);

!     /* OK to do the addition. */
!     newval = oldsum + (int64) PG_GETARG_INT32(1);

!     PG_RETURN_INT64(newval);
  }

  Datum
--- 2414,2447 ----
          PG_RETURN_INT64(newval);
      }

!     /*
!      * If we're invoked by nodeAgg, we can cheat and modify out first
!      * parameter in-place to avoid palloc overhead. If not, we need to
!      * return the new value of the transition variable.
!      */
!     if (fcinfo->context && IsA(fcinfo->context, AggState))
!     {
!         int64 *oldsum = (int64 *) PG_GETARG_POINTER(0);

!         /* Leave the running sum unchanged in the new input is null */
!         if (!PG_ARGISNULL(1))
!             *oldsum = *oldsum + (int64) PG_GETARG_INT32(1);

!         PG_RETURN_POINTER(oldsum);
!     }
!     else
!     {
!         int64        oldsum = PG_GETARG_INT64(0);

!         /* Leave sum unchanged if new input is null. */
!         if (PG_ARGISNULL(1))
!             PG_RETURN_INT64(oldsum);
!
!         /* OK to do the addition. */
!         newval = oldsum + (int64) PG_GETARG_INT32(1);
!
!         PG_RETURN_INT64(newval);
!     }
  }

  Datum
***************
*** 2426,2431 ****
--- 2460,2471 ----
          PG_RETURN_DATUM(newval);
      }

+     /*
+      * Note that we cannot special-case the nodeAgg case here, as we
+      * do for int2_sum and int4_sum: numeric is of variable size, so
+      * we cannot modify our first parameter in-place.
+      */
+
      oldsum = PG_GETARG_NUMERIC(0);

      /* Leave sum unchanged if new input is null. */

Re: avg(int2) and avg(int8) micro-opt

From
Neil Conway
Date:
Neil Conway wrote:
> Attached is a patch that applies the same optimization to int2_sum(),
> int4_sum(), float4_accum(), and float8_accum(). It wasn't possible to
> optimize do_numeric_accum() or int8_sum() since they both use numerics.

Applied.

-Neil