Thread: Avg performance for int8/numeric

Avg performance for int8/numeric

From
Mark Kirkwood
Date:
Avg performance for these two datatypes can be improved by *not*
calculating the sum of squares in the shared accumulator
(do_numeric_accum). However there is a little subtlety as this function
is also the shared by variance and stddev!

This patch:

- Modifies do_numeric_accum to have an extra bool parameter and does not
calc sumX2 when it is false.
- Amends all the accumulators that call it to include the bool (set to
true).
- Adds new functions [int8|numeric]_avg_accum that call do_numeric_accum
with the bool set to false.
- Amends the the bootstrap entries for pg_aggregate to use the new
accumulators for avg(int8|numeric).
- Adds the new accumulators into the bootstrap entries for pg_proc.

Performance gain is approx 33% (it is still slower than doing sum/count
- possibly due to the construct/deconstruct overhead of the numeric
transition array).

Cheers

Mark
Index: src/backend/utils/adt/numeric.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/numeric.c,v
retrieving revision 1.96
diff -c -r1.96 numeric.c
*** src/backend/utils/adt/numeric.c    4 Oct 2006 00:29:59 -0000    1.96
--- src/backend/utils/adt/numeric.c    23 Nov 2006 08:51:44 -0000
***************
*** 2060,2066 ****
   */

  static ArrayType *
! do_numeric_accum(ArrayType *transarray, Numeric newval)
  {
      Datum       *transdatums;
      int            ndatums;
--- 2060,2066 ----
   */

  static ArrayType *
! do_numeric_accum(ArrayType *transarray, Numeric newval, bool useSumX2)
  {
      Datum       *transdatums;
      int            ndatums;
***************
*** 2082,2095 ****
      N = DirectFunctionCall1(numeric_inc, N);
      sumX = DirectFunctionCall2(numeric_add, sumX,
                                 NumericGetDatum(newval));
!     sumX2 = DirectFunctionCall2(numeric_add, sumX2,
                                  DirectFunctionCall2(numeric_mul,
                                                      NumericGetDatum(newval),
                                                      NumericGetDatum(newval)));

      transdatums[0] = N;
      transdatums[1] = sumX;
!     transdatums[2] = sumX2;

      result = construct_array(transdatums, 3,
                               NUMERICOID, -1, false, 'i');
--- 2082,2097 ----
      N = DirectFunctionCall1(numeric_inc, N);
      sumX = DirectFunctionCall2(numeric_add, sumX,
                                 NumericGetDatum(newval));
!     if (useSumX2)
!         sumX2 = DirectFunctionCall2(numeric_add, sumX2,
                                  DirectFunctionCall2(numeric_mul,
                                                      NumericGetDatum(newval),
                                                      NumericGetDatum(newval)));

      transdatums[0] = N;
      transdatums[1] = sumX;
!     if (useSumX2)
!         transdatums[2] = sumX2;

      result = construct_array(transdatums, 3,
                               NUMERICOID, -1, false, 'i');
***************
*** 2103,2109 ****
      ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
      Numeric        newval = PG_GETARG_NUMERIC(1);

!     PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval));
  }

  /*
--- 2105,2123 ----
      ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
      Numeric        newval = PG_GETARG_NUMERIC(1);

!     PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval, true));
! }
!
! /*
!  * Optimized case for average of numeric.
!  */
! Datum
! numeric_avg_accum(PG_FUNCTION_ARGS)
! {
!     ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
!     Numeric        newval = PG_GETARG_NUMERIC(1);
!
!     PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval, false));
  }

  /*
***************
*** 2124,2130 ****

      newval = DatumGetNumeric(DirectFunctionCall1(int2_numeric, newval2));

!     PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval));
  }

  Datum
--- 2138,2144 ----

      newval = DatumGetNumeric(DirectFunctionCall1(int2_numeric, newval2));

!     PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval, true));
  }

  Datum
***************
*** 2136,2142 ****

      newval = DatumGetNumeric(DirectFunctionCall1(int4_numeric, newval4));

!     PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval));
  }

  Datum
--- 2150,2156 ----

      newval = DatumGetNumeric(DirectFunctionCall1(int4_numeric, newval4));

!     PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval, true));
  }

  Datum
***************
*** 2148,2156 ****

      newval = DatumGetNumeric(DirectFunctionCall1(int8_numeric, newval8));

!     PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval));
  }

  Datum
  numeric_avg(PG_FUNCTION_ARGS)
  {
--- 2162,2186 ----

      newval = DatumGetNumeric(DirectFunctionCall1(int8_numeric, newval8));

!     PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval, true));
! }
!
! /*
!  * Optimized case for average of int8.
!  */
! Datum
! int8_avg_accum(PG_FUNCTION_ARGS)
! {
!     ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
!     Datum        newval8 = PG_GETARG_DATUM(1);
!     Numeric        newval;
!
!     newval = DatumGetNumeric(DirectFunctionCall1(int8_numeric, newval8));
!
!     PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval, false));
  }

+
  Datum
  numeric_avg(PG_FUNCTION_ARGS)
  {
Index: src/include/catalog/pg_aggregate.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_aggregate.h,v
retrieving revision 1.58
diff -c -r1.58 pg_aggregate.h
*** src/include/catalog/pg_aggregate.h    4 Oct 2006 00:30:07 -0000    1.58
--- src/include/catalog/pg_aggregate.h    23 Nov 2006 08:51:49 -0000
***************
*** 80,89 ****
   */

  /* avg */
! DATA(insert ( 2100    int8_accum        numeric_avg        0    1231    "{0,0,0}" ));
  DATA(insert ( 2101    int4_avg_accum    int8_avg        0    1016    "{0,0}" ));
  DATA(insert ( 2102    int2_avg_accum    int8_avg        0    1016    "{0,0}" ));
! DATA(insert ( 2103    numeric_accum    numeric_avg        0    1231    "{0,0,0}" ));
  DATA(insert ( 2104    float4_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2105    float8_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2106    interval_accum    interval_avg    0    1187    "{0 second,0 second}" ));
--- 80,89 ----
   */

  /* avg */
! DATA(insert ( 2100    int8_avg_accum    numeric_avg        0    1231    "{0,0,0}" ));
  DATA(insert ( 2101    int4_avg_accum    int8_avg        0    1016    "{0,0}" ));
  DATA(insert ( 2102    int2_avg_accum    int8_avg        0    1016    "{0,0}" ));
! DATA(insert ( 2103    numeric_avg_accum    numeric_avg        0    1231    "{0,0,0}" ));
  DATA(insert ( 2104    float4_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2105    float8_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2106    interval_accum    interval_avg    0    1187    "{0 second,0 second}" ));
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.427
diff -c -r1.427 pg_proc.h
*** src/include/catalog/pg_proc.h    4 Oct 2006 00:30:07 -0000    1.427
--- src/include/catalog/pg_proc.h    23 Nov 2006 08:52:37 -0000
***************
*** 2697,2708 ****
--- 2697,2710 ----
  DATA(insert OID = 1832 (  float8_stddev_samp    PGNSP PGUID 12 f f t f i 1 701 "1022" _null_ _null_ _null_
float8_stddev_samp- _null_ )); 
  DESCR("STDDEV_SAMP aggregate final function");
  DATA(insert OID = 1833 (  numeric_accum    PGNSP PGUID 12 f f t f i 2 1231 "1231 1700" _null_ _null_ _null_
numeric_accum- _null_ )); 
+ DATA(insert OID = 2858 (  numeric_avg_accum    PGNSP PGUID 12 f f t f i 2 1231 "1231 1700" _null_ _null_ _null_
numeric_avg_accum- _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1834 (  int2_accum       PGNSP PGUID 12 f f t f i 2 1231 "1231 21" _null_ _null_ _null_ int2_accum
-_null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1835 (  int4_accum       PGNSP PGUID 12 f f t f i 2 1231 "1231 23" _null_ _null_ _null_ int4_accum
-_null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1836 (  int8_accum       PGNSP PGUID 12 f f t f i 2 1231 "1231 20" _null_ _null_ _null_ int8_accum
-_null_ )); 
+ DATA(insert OID = 2857 (  int8_avg_accum       PGNSP PGUID 12 f f t f i 2 1231 "1231 20" _null_ _null_ _null_
int8_avg_accum- _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1837 (  numeric_avg       PGNSP PGUID 12 f f t f i 1 1700 "1231" _null_ _null_ _null_
numeric_avg- _null_ )); 
  DESCR("AVG aggregate final function");
Index: src/include/utils/builtins.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/utils/builtins.h,v
retrieving revision 1.282
diff -c -r1.282 builtins.h
*** src/include/utils/builtins.h    18 Sep 2006 22:40:40 -0000    1.282
--- src/include/utils/builtins.h    23 Nov 2006 08:52:41 -0000
***************
*** 833,841 ****
--- 833,843 ----
  extern Datum text_numeric(PG_FUNCTION_ARGS);
  extern Datum numeric_text(PG_FUNCTION_ARGS);
  extern Datum numeric_accum(PG_FUNCTION_ARGS);
+ extern Datum numeric_avg_accum(PG_FUNCTION_ARGS);
  extern Datum int2_accum(PG_FUNCTION_ARGS);
  extern Datum int4_accum(PG_FUNCTION_ARGS);
  extern Datum int8_accum(PG_FUNCTION_ARGS);
+ extern Datum int8_avg_accum(PG_FUNCTION_ARGS);
  extern Datum numeric_avg(PG_FUNCTION_ARGS);
  extern Datum numeric_var_pop(PG_FUNCTION_ARGS);
  extern Datum numeric_var_samp(PG_FUNCTION_ARGS);

Re: Avg performance for int8/numeric

From
Neil Conway
Date:
On Fri, 2006-11-24 at 11:08 +1300, Mark Kirkwood wrote:
> - Modifies do_numeric_accum to have an extra bool parameter and does not
> calc sumX2 when it is false.

I think it would be clearer to reorganize this function slightly, and
have only a single branch on "useSumX2". On first glance it isn't
obviously that transdatums[2] is defined (but unchanged) when useSumX2
is false.

> Performance gain is approx 33%

Nice.

> (it is still slower than doing sum/count - possibly due to the
> construct/deconstruct overhead of the numeric transition array).

This would indeed be worth profiling. If it turns out that array
overhead is significant, I wonder if we could use a composite type for
the transition variable instead. That might also make it easier to
represent the "N" value as an int8 rather than a numeric.

-Neil



Re: Avg performance for int8/numeric

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Fri, 2006-11-24 at 11:08 +1300, Mark Kirkwood wrote:
>> - Modifies do_numeric_accum to have an extra bool parameter and does not
>> calc sumX2 when it is false.

> I think it would be clearer to reorganize this function slightly, and
> have only a single branch on "useSumX2".

It would be clearer to just have a separate function.

            regards, tom lane

Re: Avg performance for int8/numeric

From
Bruce Momjian
Date:
This has been saved for the 8.3 release:

    http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Mark Kirkwood wrote:
> Avg performance for these two datatypes can be improved by *not*
> calculating the sum of squares in the shared accumulator
> (do_numeric_accum). However there is a little subtlety as this function
> is also the shared by variance and stddev!
>
> This patch:
>
> - Modifies do_numeric_accum to have an extra bool parameter and does not
> calc sumX2 when it is false.
> - Amends all the accumulators that call it to include the bool (set to
> true).
> - Adds new functions [int8|numeric]_avg_accum that call do_numeric_accum
> with the bool set to false.
> - Amends the the bootstrap entries for pg_aggregate to use the new
> accumulators for avg(int8|numeric).
> - Adds the new accumulators into the bootstrap entries for pg_proc.
>
> Performance gain is approx 33% (it is still slower than doing sum/count
> - possibly due to the construct/deconstruct overhead of the numeric
> transition array).
>
> Cheers
>
> Mark


>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Avg performance for int8/numeric

From
Mark Kirkwood
Date:
Neil Conway wrote:
> On Fri, 2006-11-24 at 11:08 +1300, Mark Kirkwood wrote:
>> - Modifies do_numeric_accum to have an extra bool parameter and does not
>> calc sumX2 when it is false.
>
> I think it would be clearer to reorganize this function slightly, and
> have only a single branch on "useSumX2". On first glance it isn't
> obviously that transdatums[2] is defined (but unchanged) when useSumX2
> is false.
>

Right - new patch attached that adds a new function do_numeric_avg_accum
that only uses N and sum(X). This means I could amend the avg aggregates
for numeric, int8 to have a initvalues of {0,0}.

Cheers

Mark
? gmon.out
Index: src/backend/utils/adt/numeric.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/numeric.c,v
retrieving revision 1.96
diff -c -r1.96 numeric.c
*** src/backend/utils/adt/numeric.c    4 Oct 2006 00:29:59 -0000    1.96
--- src/backend/utils/adt/numeric.c    25 Nov 2006 00:00:58 -0000
***************
*** 2097,2102 ****
--- 2097,2136 ----
      return result;
  }

+ /*
+  * Improve avg performance by not caclulating sum(X*X).
+  */
+ static ArrayType *
+ do_numeric_avg_accum(ArrayType *transarray, Numeric newval)
+ {
+     Datum       *transdatums;
+     int            ndatums;
+     Datum        N,
+                 sumX;
+     ArrayType  *result;
+
+     /* We assume the input is array of numeric */
+     deconstruct_array(transarray,
+                       NUMERICOID, -1, false, 'i',
+                       &transdatums, NULL, &ndatums);
+     if (ndatums != 2)
+         elog(ERROR, "expected 2-element numeric array");
+     N = transdatums[0];
+     sumX = transdatums[1];
+
+     N = DirectFunctionCall1(numeric_inc, N);
+     sumX = DirectFunctionCall2(numeric_add, sumX,
+                                NumericGetDatum(newval));
+
+     transdatums[0] = N;
+     transdatums[1] = sumX;
+
+     result = construct_array(transdatums, 2,
+                              NUMERICOID, -1, false, 'i');
+
+     return result;
+ }
+
  Datum
  numeric_accum(PG_FUNCTION_ARGS)
  {
***************
*** 2107,2112 ****
--- 2141,2158 ----
  }

  /*
+  * Optimized case for average of numeric.
+  */
+ Datum
+ numeric_avg_accum(PG_FUNCTION_ARGS)
+ {
+     ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
+     Numeric        newval = PG_GETARG_NUMERIC(1);
+
+     PG_RETURN_ARRAYTYPE_P(do_numeric_avg_accum(transarray, newval));
+ }
+
+ /*
   * Integer data types all use Numeric accumulators to share code and
   * avoid risk of overflow.    For int2 and int4 inputs, Numeric accumulation
   * is overkill for the N and sum(X) values, but definitely not overkill
***************
*** 2151,2156 ****
--- 2197,2218 ----
      PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval));
  }

+ /*
+  * Optimized case for average of int8.
+  */
+ Datum
+ int8_avg_accum(PG_FUNCTION_ARGS)
+ {
+     ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
+     Datum        newval8 = PG_GETARG_DATUM(1);
+     Numeric        newval;
+
+     newval = DatumGetNumeric(DirectFunctionCall1(int8_numeric, newval8));
+
+     PG_RETURN_ARRAYTYPE_P(do_numeric_avg_accum(transarray, newval));
+ }
+
+
  Datum
  numeric_avg(PG_FUNCTION_ARGS)
  {
***************
*** 2164,2174 ****
      deconstruct_array(transarray,
                        NUMERICOID, -1, false, 'i',
                        &transdatums, NULL, &ndatums);
!     if (ndatums != 3)
!         elog(ERROR, "expected 3-element numeric array");
      N = DatumGetNumeric(transdatums[0]);
      sumX = DatumGetNumeric(transdatums[1]);
-     /* ignore sumX2 */

      /* SQL92 defines AVG of no values to be NULL */
      /* N is zero iff no digits (cf. numeric_uminus) */
--- 2226,2235 ----
      deconstruct_array(transarray,
                        NUMERICOID, -1, false, 'i',
                        &transdatums, NULL, &ndatums);
!     if (ndatums != 2)
!         elog(ERROR, "expected 2-element numeric array");
      N = DatumGetNumeric(transdatums[0]);
      sumX = DatumGetNumeric(transdatums[1]);

      /* SQL92 defines AVG of no values to be NULL */
      /* N is zero iff no digits (cf. numeric_uminus) */
Index: src/include/catalog/pg_aggregate.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_aggregate.h,v
retrieving revision 1.58
diff -c -r1.58 pg_aggregate.h
*** src/include/catalog/pg_aggregate.h    4 Oct 2006 00:30:07 -0000    1.58
--- src/include/catalog/pg_aggregate.h    25 Nov 2006 00:01:01 -0000
***************
*** 80,89 ****
   */

  /* avg */
! DATA(insert ( 2100    int8_accum        numeric_avg        0    1231    "{0,0,0}" ));
  DATA(insert ( 2101    int4_avg_accum    int8_avg        0    1016    "{0,0}" ));
  DATA(insert ( 2102    int2_avg_accum    int8_avg        0    1016    "{0,0}" ));
! DATA(insert ( 2103    numeric_accum    numeric_avg        0    1231    "{0,0,0}" ));
  DATA(insert ( 2104    float4_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2105    float8_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2106    interval_accum    interval_avg    0    1187    "{0 second,0 second}" ));
--- 80,89 ----
   */

  /* avg */
! DATA(insert ( 2100    int8_avg_accum    numeric_avg        0    1231    "{0,0}" ));
  DATA(insert ( 2101    int4_avg_accum    int8_avg        0    1016    "{0,0}" ));
  DATA(insert ( 2102    int2_avg_accum    int8_avg        0    1016    "{0,0}" ));
! DATA(insert ( 2103    numeric_avg_accum    numeric_avg        0    1231    "{0,0}" ));
  DATA(insert ( 2104    float4_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2105    float8_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2106    interval_accum    interval_avg    0    1187    "{0 second,0 second}" ));
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.428
diff -c -r1.428 pg_proc.h
*** src/include/catalog/pg_proc.h    24 Nov 2006 21:18:42 -0000    1.428
--- src/include/catalog/pg_proc.h    25 Nov 2006 00:01:20 -0000
***************
*** 2697,2708 ****
--- 2697,2710 ----
  DATA(insert OID = 1832 (  float8_stddev_samp    PGNSP PGUID 12 f f t f i 1 701 "1022" _null_ _null_ _null_
float8_stddev_samp- _null_ )); 
  DESCR("STDDEV_SAMP aggregate final function");
  DATA(insert OID = 1833 (  numeric_accum    PGNSP PGUID 12 f f t f i 2 1231 "1231 1700" _null_ _null_ _null_
numeric_accum- _null_ )); 
+ DATA(insert OID = 2858 (  numeric_avg_accum    PGNSP PGUID 12 f f t f i 2 1231 "1231 1700" _null_ _null_ _null_
numeric_avg_accum- _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1834 (  int2_accum       PGNSP PGUID 12 f f t f i 2 1231 "1231 21" _null_ _null_ _null_ int2_accum
-_null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1835 (  int4_accum       PGNSP PGUID 12 f f t f i 2 1231 "1231 23" _null_ _null_ _null_ int4_accum
-_null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1836 (  int8_accum       PGNSP PGUID 12 f f t f i 2 1231 "1231 20" _null_ _null_ _null_ int8_accum
-_null_ )); 
+ DATA(insert OID = 2857 (  int8_avg_accum       PGNSP PGUID 12 f f t f i 2 1231 "1231 20" _null_ _null_ _null_
int8_avg_accum- _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1837 (  numeric_avg       PGNSP PGUID 12 f f t f i 1 1700 "1231" _null_ _null_ _null_
numeric_avg- _null_ )); 
  DESCR("AVG aggregate final function");
Index: src/include/utils/builtins.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/utils/builtins.h,v
retrieving revision 1.282
diff -c -r1.282 builtins.h
*** src/include/utils/builtins.h    18 Sep 2006 22:40:40 -0000    1.282
--- src/include/utils/builtins.h    25 Nov 2006 00:01:23 -0000
***************
*** 833,841 ****
--- 833,843 ----
  extern Datum text_numeric(PG_FUNCTION_ARGS);
  extern Datum numeric_text(PG_FUNCTION_ARGS);
  extern Datum numeric_accum(PG_FUNCTION_ARGS);
+ extern Datum numeric_avg_accum(PG_FUNCTION_ARGS);
  extern Datum int2_accum(PG_FUNCTION_ARGS);
  extern Datum int4_accum(PG_FUNCTION_ARGS);
  extern Datum int8_accum(PG_FUNCTION_ARGS);
+ extern Datum int8_avg_accum(PG_FUNCTION_ARGS);
  extern Datum numeric_avg(PG_FUNCTION_ARGS);
  extern Datum numeric_var_pop(PG_FUNCTION_ARGS);
  extern Datum numeric_var_samp(PG_FUNCTION_ARGS);

Re: Avg performance for int8/numeric

From
Mark Kirkwood
Date:
Neil Conway wrote:

>
>> (it is still slower than doing sum/count - possibly due to the
>> construct/deconstruct overhead of the numeric transition array).
>
> This would indeed be worth profiling. If it turns out that array
> overhead is significant, I wonder if we could use a composite type for
> the transition variable instead. That might also make it easier to
> represent the "N" value as an int8 rather than a numeric.
>

I've profiled the 2nd patch using the setup indicated below. The first
64 lines of the flat graph are attached. The complete profile is here:

http://homepages.paradise.net.nz/markir/download/postgres/postgres-avg.gprof.gz

Setup:

avg=# \d avgtest
        Table "public.avgtest"
  Column |     Type      | Modifiers
--------+---------------+-----------
  id     | integer       |
  val0   | bigint        |
  val1   | numeric(12,2) |
  val2   | numeric(10,0) |

avg=# analyze verbose avgtest;
INFO:  analyzing "public.avgtest"
INFO:  "avgtest": scanned 3000 of 87689 pages, containing 342138 live
rows and 0 dead rows; 3000 rows in sample, 10000580 estimated total rows
ANALYZE
Time: 252.033 ms
avg=# select avg(val2) from avgtest;
          avg
---------------------
  714285.214285800000
(1 row)

Time: 35196.028 ms
avg=# \q

regards

Mark
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 14.42      2.16     2.16 100002977     0.00     0.00  AllocSetAlloc
  9.08      3.52     1.36 20000000     0.00     0.00  add_abs
  5.54      4.35     0.83 10000000     0.00     0.00  slot_deform_tuple
  5.41      5.16     0.81 60001673     0.00     0.00  AllocSetFree
  4.34      5.81     0.65 10000000     0.00     0.00  construct_md_array
  4.21      6.44     0.63 20000003     0.00     0.00  make_result
  3.54      6.97     0.53 10000000     0.00     0.00  numeric_add
  3.27      7.46     0.49 30000003     0.00     0.00  set_var_from_num
  3.00      7.91     0.45 100002652     0.00     0.00  MemoryContextAlloc
  2.74      8.32     0.41 10000001     0.00     0.00  heapgettup_pagemode
  2.54      8.70     0.38 10000000     0.00     0.00  advance_transition_function
  2.40      9.06     0.36 30000006     0.00     0.00  alloc_var
  2.27      9.40     0.34 10000000     0.00     0.00  do_numeric_avg_accum
  2.00      9.70     0.30 10000001     0.00     0.00  CopyArrayEls
  2.00     10.00     0.30 10000000     0.00     0.00  numeric_inc
  1.94     10.29     0.29 20000002     0.00     0.00  ArrayGetNItems
  1.94     10.58     0.29 10000001     0.00     0.00  deconstruct_array
  1.87     10.86     0.28 20000002     0.00     0.00  ArrayCastAndSet
  1.74     11.12     0.26 60001672     0.00     0.00  pfree
  1.67     11.37     0.25 10000001     0.00     0.00  slot_getattr
  1.60     11.61     0.24 10000000     0.00     0.00  advance_aggregates
  1.54     11.84     0.23 40000006     0.00     0.00  free_var
  1.54     12.07     0.23 10000001     0.00     0.00  datumCopy
  1.47     12.29     0.22 10000001     0.00     0.00  SeqNext
  1.40     12.50     0.21 20000000     0.00     0.00  add_var
  1.34     12.70     0.20 20000003     0.00     0.00  strip_var
  1.34     12.90     0.20 10000001     0.00     0.00  ExecScan
  1.27     13.09     0.19 10000003     0.00     0.00  AllocSetReset
  1.20     13.27     0.18 10000003     0.00     0.00  ExecProcNode
  1.13     13.44     0.17 70000010     0.00     0.00  pg_detoast_datum
  0.93     13.58     0.14 10000000     0.00     0.00  numeric_avg_accum
  0.93     13.72     0.14        2     0.07     6.61  ExecAgg
  0.87     13.85     0.13 10000001     0.00     0.00  datumGetSize
  0.87     13.98     0.13    87860     0.00     0.00  heapgetpage
  0.73     14.09     0.11 10000001     0.00     0.00  DirectFunctionCall2
  0.73     14.20     0.11 10000000     0.00     0.00  construct_array
  0.60     14.29     0.09 10000148     0.00     0.00  DirectFunctionCall1
  0.53     14.37     0.08 10000001     0.00     0.00  ExecStoreTuple
  0.53     14.45     0.08 10000000     0.00     0.00  HeapTupleSatisfiesSnapshot
  0.40     14.51     0.06 10000103     0.00     0.00  heap_getnext
  0.33     14.56     0.05   254419     0.00     0.00  hash_search_with_hash_value
  0.27     14.60     0.04 10000001     0.00     0.00  MemoryContextReset
  0.27     14.64     0.04 10000000     0.00     0.00  ExecEvalVar
  0.27     14.68     0.04 10000000     0.00     0.00  XidInSnapshot
  0.27     14.72     0.04   511482     0.00     0.00  LWLockRelease
  0.27     14.76     0.04   164939     0.00     0.00  hash_any
  0.27     14.80     0.04    87760     0.00     0.00  StrategyGetBuffer
  0.20     14.83     0.03 10000009     0.00     0.00  TransactionIdPrecedes
  0.20     14.86     0.03    87760     0.00     0.00  FileRead
  0.13     14.88     0.02 10000001     0.00     0.00  ExecSeqScan
  0.13     14.90     0.02   511481     0.00     0.00  LWLockAcquire
  0.13     14.92     0.02    88217     0.00     0.00  ReadBuffer
  0.13     14.94     0.02    87760     0.00     0.00  TerminateBufferIO
  0.07     14.95     0.01   175906     0.00     0.00  ResourceOwnerForgetBuffer
  0.07     14.96     0.01   163587     0.00     0.00  get_hash_value
  0.07     14.97     0.01    88019     0.00     0.00  ReleaseBuffer
  0.07     14.98     0.01    87760     0.00     0.00  PinBuffer_Locked
  0.00     14.98     0.00   176868     0.00     0.00  LockBuffer

Re: Avg performance for int8/numeric

From
"Luke Lonergan"
Date:
So, if I understand this correctly, we're calling Alloc and ContextAlloc 10
times for every row being summed?

There are approx 10M rows and the profile snippet below shows 100M calls to
each of those.

- Luke


On 11/24/06 4:46 PM, "Mark Kirkwood" <markir@paradise.net.nz> wrote:

>  time   seconds   seconds    calls   s/call   s/call  name
>  14.42      2.16     2.16 100002977     0.00     0.00  AllocSetAlloc
>   9.08      3.52     1.36 20000000     0.00     0.00  add_abs
>   5.54      4.35     0.83 10000000     0.00     0.00  slot_deform_tuple
>   5.41      5.16     0.81 60001673     0.00     0.00  AllocSetFree
>   4.34      5.81     0.65 10000000     0.00     0.00  construct_md_array
>   4.21      6.44     0.63 20000003     0.00     0.00  make_result
>   3.54      6.97     0.53 10000000     0.00     0.00  numeric_add
>   3.27      7.46     0.49 30000003     0.00     0.00  set_var_from_num
>   3.00      7.91     0.45 100002652     0.00     0.00  MemoryContextAlloc



Re: Avg performance for int8/numeric

From
Mark Kirkwood
Date:
Luke Lonergan wrote:
> So, if I understand this correctly, we're calling Alloc and ContextAlloc 10
> times for every row being summed?
>
> There are approx 10M rows and the profile snippet below shows 100M calls to
> each of those.
>

Unless I've accidentally run gprof on the profile output for a 100M row
case I had lying around :-( ... I'll check

Re: Avg performance for int8/numeric

From
Mark Kirkwood
Date:
Mark Kirkwood wrote:
> Luke Lonergan wrote:
>> So, if I understand this correctly, we're calling Alloc and
>> ContextAlloc 10
>> times for every row being summed?
>>
>> There are approx 10M rows and the profile snippet below shows 100M
>> calls to
>> each of those.
>>
>
> Unless I've accidentally run gprof on the profile output for a 100M row
> case I had lying around :-( ... I'll check
>

I haven't (so profile as attached is ok)...

Re: Avg performance for int8/numeric

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Mark Kirkwood wrote:
> Neil Conway wrote:
> > On Fri, 2006-11-24 at 11:08 +1300, Mark Kirkwood wrote:
> >> - Modifies do_numeric_accum to have an extra bool parameter and does not
> >> calc sumX2 when it is false.
> >
> > I think it would be clearer to reorganize this function slightly, and
> > have only a single branch on "useSumX2". On first glance it isn't
> > obviously that transdatums[2] is defined (but unchanged) when useSumX2
> > is false.
> >
>
> Right - new patch attached that adds a new function do_numeric_avg_accum
> that only uses N and sum(X). This means I could amend the avg aggregates
> for numeric, int8 to have a initvalues of {0,0}.
>
> Cheers
>
> Mark


>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Avg performance for int8/numeric

From
Bruce Momjian
Date:
I have tested this patch but it generates regression failures.

There was some code drift, so I am attaching an updated version of the
patch, and the regression diffs.  The 'four' column is an 'int4' so my
guess is that somehow the wrong aggregate is being called.

---------------------------------------------------------------------------

Mark Kirkwood wrote:
> Neil Conway wrote:
> > On Fri, 2006-11-24 at 11:08 +1300, Mark Kirkwood wrote:
> >> - Modifies do_numeric_accum to have an extra bool parameter and does not
> >> calc sumX2 when it is false.
> >
> > I think it would be clearer to reorganize this function slightly, and
> > have only a single branch on "useSumX2". On first glance it isn't
> > obviously that transdatums[2] is defined (but unchanged) when useSumX2
> > is false.
> >
>
> Right - new patch attached that adds a new function do_numeric_avg_accum
> that only uses N and sum(X). This means I could amend the avg aggregates
> for numeric, int8 to have a initvalues of {0,0}.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
*** ./expected/aggregates.out    Fri Jul 28 14:33:04 2006
--- ./results/aggregates.out    Thu Feb 15 23:54:45 2007
***************
*** 238,248 ****

  -- user-defined aggregates
  SELECT newavg(four) AS avg_1 FROM onek;
!        avg_1
! --------------------
!  1.5000000000000000
! (1 row)
!
  SELECT newsum(four) AS sum_1500 FROM onek;
   sum_1500
  ----------
--- 238,244 ----

  -- user-defined aggregates
  SELECT newavg(four) AS avg_1 FROM onek;
! ERROR:  expected 2-element numeric array
  SELECT newsum(four) AS sum_1500 FROM onek;
   sum_1500
  ----------

======================================================================

Index: src/backend/utils/adt/numeric.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/numeric.c,v
retrieving revision 1.99
diff -c -c -r1.99 numeric.c
*** src/backend/utils/adt/numeric.c    16 Jan 2007 21:41:13 -0000    1.99
--- src/backend/utils/adt/numeric.c    16 Feb 2007 04:30:27 -0000
***************
*** 2165,2170 ****
--- 2165,2204 ----
      return result;
  }

+ /*
+  * Improve avg performance by not caclulating sum(X*X).
+  */
+ static ArrayType *
+ do_numeric_avg_accum(ArrayType *transarray, Numeric newval)
+ {
+     Datum       *transdatums;
+     int            ndatums;
+     Datum        N,
+                 sumX;
+     ArrayType  *result;
+
+     /* We assume the input is array of numeric */
+     deconstruct_array(transarray,
+                       NUMERICOID, -1, false, 'i',
+                       &transdatums, NULL, &ndatums);
+     if (ndatums != 2)
+         elog(ERROR, "expected 2-element numeric array");
+     N = transdatums[0];
+     sumX = transdatums[1];
+
+     N = DirectFunctionCall1(numeric_inc, N);
+     sumX = DirectFunctionCall2(numeric_add, sumX,
+                                NumericGetDatum(newval));
+
+     transdatums[0] = N;
+     transdatums[1] = sumX;
+
+     result = construct_array(transdatums, 2,
+                              NUMERICOID, -1, false, 'i');
+
+     return result;
+ }
+
  Datum
  numeric_accum(PG_FUNCTION_ARGS)
  {
***************
*** 2175,2180 ****
--- 2209,2226 ----
  }

  /*
+  * Optimized case for average of numeric.
+  */
+ Datum
+ numeric_avg_accum(PG_FUNCTION_ARGS)
+ {
+     ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
+     Numeric        newval = PG_GETARG_NUMERIC(1);
+
+     PG_RETURN_ARRAYTYPE_P(do_numeric_avg_accum(transarray, newval));
+ }
+
+ /*
   * Integer data types all use Numeric accumulators to share code and
   * avoid risk of overflow.    For int2 and int4 inputs, Numeric accumulation
   * is overkill for the N and sum(X) values, but definitely not overkill
***************
*** 2219,2224 ****
--- 2265,2286 ----
      PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval));
  }

+ /*
+  * Optimized case for average of int8.
+  */
+ Datum
+ int8_avg_accum(PG_FUNCTION_ARGS)
+ {
+     ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
+     Datum        newval8 = PG_GETARG_DATUM(1);
+     Numeric        newval;
+
+     newval = DatumGetNumeric(DirectFunctionCall1(int8_numeric, newval8));
+
+     PG_RETURN_ARRAYTYPE_P(do_numeric_avg_accum(transarray, newval));
+ }
+
+
  Datum
  numeric_avg(PG_FUNCTION_ARGS)
  {
***************
*** 2232,2242 ****
      deconstruct_array(transarray,
                        NUMERICOID, -1, false, 'i',
                        &transdatums, NULL, &ndatums);
!     if (ndatums != 3)
!         elog(ERROR, "expected 3-element numeric array");
      N = DatumGetNumeric(transdatums[0]);
      sumX = DatumGetNumeric(transdatums[1]);
-     /* ignore sumX2 */

      /* SQL92 defines AVG of no values to be NULL */
      /* N is zero iff no digits (cf. numeric_uminus) */
--- 2294,2303 ----
      deconstruct_array(transarray,
                        NUMERICOID, -1, false, 'i',
                        &transdatums, NULL, &ndatums);
!     if (ndatums != 2)
!         elog(ERROR, "expected 2-element numeric array");
      N = DatumGetNumeric(transdatums[0]);
      sumX = DatumGetNumeric(transdatums[1]);

      /* SQL92 defines AVG of no values to be NULL */
      /* N is zero iff no digits (cf. numeric_uminus) */
Index: src/include/catalog/catversion.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/catversion.h,v
retrieving revision 1.384
diff -c -c -r1.384 catversion.h
*** src/include/catalog/catversion.h    14 Feb 2007 01:58:58 -0000    1.384
--- src/include/catalog/catversion.h    16 Feb 2007 04:30:28 -0000
***************
*** 53,58 ****
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    200702131

  #endif
--- 53,58 ----
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    200702151

  #endif
Index: src/include/catalog/pg_aggregate.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/pg_aggregate.h,v
retrieving revision 1.60
diff -c -c -r1.60 pg_aggregate.h
*** src/include/catalog/pg_aggregate.h    20 Jan 2007 09:27:19 -0000    1.60
--- src/include/catalog/pg_aggregate.h    16 Feb 2007 04:30:28 -0000
***************
*** 80,89 ****
   */

  /* avg */
! DATA(insert ( 2100    int8_accum        numeric_avg        0    1231    "{0,0,0}" ));
  DATA(insert ( 2101    int4_avg_accum    int8_avg        0    1016    "{0,0}" ));
  DATA(insert ( 2102    int2_avg_accum    int8_avg        0    1016    "{0,0}" ));
! DATA(insert ( 2103    numeric_accum    numeric_avg        0    1231    "{0,0,0}" ));
  DATA(insert ( 2104    float4_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2105    float8_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2106    interval_accum    interval_avg    0    1187    "{0 second,0 second}" ));
--- 80,89 ----
   */

  /* avg */
! DATA(insert ( 2100    int8_avg_accum    numeric_avg        0    1231    "{0,0}" ));
  DATA(insert ( 2101    int4_avg_accum    int8_avg        0    1016    "{0,0}" ));
  DATA(insert ( 2102    int2_avg_accum    int8_avg        0    1016    "{0,0}" ));
! DATA(insert ( 2103    numeric_avg_accum    numeric_avg        0    1231    "{0,0}" ));
  DATA(insert ( 2104    float4_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2105    float8_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2106    interval_accum    interval_avg    0    1187    "{0 second,0 second}" ));
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.443
diff -c -c -r1.443 pg_proc.h
*** src/include/catalog/pg_proc.h    7 Feb 2007 23:11:30 -0000    1.443
--- src/include/catalog/pg_proc.h    16 Feb 2007 04:30:31 -0000
***************
*** 2744,2755 ****
--- 2744,2759 ----
  DESCR("STDDEV_SAMP aggregate final function");
  DATA(insert OID = 1833 (  numeric_accum    PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 1700" _null_ _null_ _null_
numeric_accum- _null_ )); 
  DESCR("aggregate transition function");
+ DATA(insert OID = 2858 (  numeric_avg_accum    PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 1700" _null_ _null_ _null_
numeric_avg_accum- _null_ )); 
+ DESCR("aggregate transition function");
  DATA(insert OID = 1834 (  int2_accum       PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 21" _null_ _null_ _null_
int2_accum- _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1835 (  int4_accum       PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 23" _null_ _null_ _null_
int4_accum- _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1836 (  int8_accum       PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 20" _null_ _null_ _null_
int8_accum- _null_ )); 
  DESCR("aggregate transition function");
+ DATA(insert OID = 2746 (  int8_avg_accum       PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 20" _null_ _null_ _null_
int8_avg_accum- _null_ )); 
+ DESCR("aggregate transition function");
  DATA(insert OID = 1837 (  numeric_avg       PGNSP PGUID 12 1 0 f f t f i 1 1700 "1231" _null_ _null_ _null_
numeric_avg- _null_ )); 
  DESCR("AVG aggregate final function");
  DATA(insert OID = 2514 (  numeric_var_pop  PGNSP PGUID 12 1 0 f f t f i 1 1700 "1231" _null_ _null_ _null_
numeric_var_pop- _null_ )); 
Index: src/include/utils/builtins.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/builtins.h,v
retrieving revision 1.287
diff -c -c -r1.287 builtins.h
*** src/include/utils/builtins.h    28 Jan 2007 16:16:54 -0000    1.287
--- src/include/utils/builtins.h    16 Feb 2007 04:30:32 -0000
***************
*** 841,849 ****
--- 841,851 ----
  extern Datum text_numeric(PG_FUNCTION_ARGS);
  extern Datum numeric_text(PG_FUNCTION_ARGS);
  extern Datum numeric_accum(PG_FUNCTION_ARGS);
+ extern Datum numeric_avg_accum(PG_FUNCTION_ARGS);
  extern Datum int2_accum(PG_FUNCTION_ARGS);
  extern Datum int4_accum(PG_FUNCTION_ARGS);
  extern Datum int8_accum(PG_FUNCTION_ARGS);
+ extern Datum int8_avg_accum(PG_FUNCTION_ARGS);
  extern Datum numeric_avg(PG_FUNCTION_ARGS);
  extern Datum numeric_var_pop(PG_FUNCTION_ARGS);
  extern Datum numeric_var_samp(PG_FUNCTION_ARGS);

Re: Avg performance for int8/numeric

From
Mark Kirkwood
Date:
Bruce Momjian wrote:
> I have tested this patch but it generates regression failures.
>
> There was some code drift, so I am attaching an updated version of the
> patch, and the regression diffs.  The 'four' column is an 'int4' so my
> guess is that somehow the wrong aggregate is being called.
>

Good catch - I must have neglected to run the regression test after
amending the number of array arguments for the numeric avg :-(.

Hmmm - this changing the number of array args for avg means we can't mix
transition functions for variance with final functions for avg - which
is exactly what the regression suite does with the 'newavg' aggregate.

I've 'fixed' this by amending the definition of 'newavg' to use the
transition and final function that 'avg' does. However I found myself
asking if this lost us the point of that test - so I looked back at the
older postgres versions (e.g. 7.1.3) and saw that back *then* 'newavg'
and 'avg' were defined using the same functions...so I think making the
change as indicated is ok.

I've attached a new patch with this change.

Cheers

Mark
diff -Nacr pgsql.orig/src/backend/utils/adt/numeric.c pgsql/src/backend/utils/adt/numeric.c
*** pgsql.orig/src/backend/utils/adt/numeric.c    Sun Jan 21 11:36:20 2007
--- pgsql/src/backend/utils/adt/numeric.c    Fri Feb 16 18:09:24 2007
***************
*** 2165,2170 ****
--- 2165,2204 ----
      return result;
  }

+ /*
+  * Improve avg performance by not caclulating sum(X*X).
+  */
+ static ArrayType *
+ do_numeric_avg_accum(ArrayType *transarray, Numeric newval)
+ {
+     Datum       *transdatums;
+     int            ndatums;
+     Datum        N,
+                 sumX;
+     ArrayType  *result;
+
+     /* We assume the input is array of numeric */
+     deconstruct_array(transarray,
+                       NUMERICOID, -1, false, 'i',
+                       &transdatums, NULL, &ndatums);
+     if (ndatums != 2)
+         elog(ERROR, "expected 2-element numeric array");
+     N = transdatums[0];
+     sumX = transdatums[1];
+
+     N = DirectFunctionCall1(numeric_inc, N);
+     sumX = DirectFunctionCall2(numeric_add, sumX,
+                                NumericGetDatum(newval));
+
+     transdatums[0] = N;
+     transdatums[1] = sumX;
+
+     result = construct_array(transdatums, 2,
+                              NUMERICOID, -1, false, 'i');
+
+     return result;
+ }
+
  Datum
  numeric_accum(PG_FUNCTION_ARGS)
  {
***************
*** 2175,2180 ****
--- 2209,2226 ----
  }

  /*
+  * Optimized case for average of numeric.
+  */
+ Datum
+ numeric_avg_accum(PG_FUNCTION_ARGS)
+ {
+     ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
+     Numeric        newval = PG_GETARG_NUMERIC(1);
+
+     PG_RETURN_ARRAYTYPE_P(do_numeric_avg_accum(transarray, newval));
+ }
+
+ /*
   * Integer data types all use Numeric accumulators to share code and
   * avoid risk of overflow.    For int2 and int4 inputs, Numeric accumulation
   * is overkill for the N and sum(X) values, but definitely not overkill
***************
*** 2219,2224 ****
--- 2265,2286 ----
      PG_RETURN_ARRAYTYPE_P(do_numeric_accum(transarray, newval));
  }

+ /*
+  * Optimized case for average of int8.
+  */
+ Datum
+ int8_avg_accum(PG_FUNCTION_ARGS)
+ {
+     ArrayType  *transarray = PG_GETARG_ARRAYTYPE_P(0);
+     Datum        newval8 = PG_GETARG_DATUM(1);
+     Numeric        newval;
+
+     newval = DatumGetNumeric(DirectFunctionCall1(int8_numeric, newval8));
+
+     PG_RETURN_ARRAYTYPE_P(do_numeric_avg_accum(transarray, newval));
+ }
+
+
  Datum
  numeric_avg(PG_FUNCTION_ARGS)
  {
***************
*** 2232,2242 ****
      deconstruct_array(transarray,
                        NUMERICOID, -1, false, 'i',
                        &transdatums, NULL, &ndatums);
!     if (ndatums != 3)
!         elog(ERROR, "expected 3-element numeric array");
      N = DatumGetNumeric(transdatums[0]);
      sumX = DatumGetNumeric(transdatums[1]);
-     /* ignore sumX2 */

      /* SQL92 defines AVG of no values to be NULL */
      /* N is zero iff no digits (cf. numeric_uminus) */
--- 2294,2303 ----
      deconstruct_array(transarray,
                        NUMERICOID, -1, false, 'i',
                        &transdatums, NULL, &ndatums);
!     if (ndatums != 2)
!         elog(ERROR, "expected 2-element numeric array");
      N = DatumGetNumeric(transdatums[0]);
      sumX = DatumGetNumeric(transdatums[1]);

      /* SQL92 defines AVG of no values to be NULL */
      /* N is zero iff no digits (cf. numeric_uminus) */
diff -Nacr pgsql.orig/src/include/catalog/catversion.h pgsql/src/include/catalog/catversion.h
*** pgsql.orig/src/include/catalog/catversion.h    Fri Feb 16 18:06:34 2007
--- pgsql/src/include/catalog/catversion.h    Fri Feb 16 18:09:24 2007
***************
*** 53,58 ****
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    200702131

  #endif
--- 53,58 ----
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    200702151

  #endif
diff -Nacr pgsql.orig/src/include/catalog/pg_aggregate.h pgsql/src/include/catalog/pg_aggregate.h
*** pgsql.orig/src/include/catalog/pg_aggregate.h    Sun Jan 21 11:36:22 2007
--- pgsql/src/include/catalog/pg_aggregate.h    Fri Feb 16 18:09:24 2007
***************
*** 80,89 ****
   */

  /* avg */
! DATA(insert ( 2100    int8_accum        numeric_avg        0    1231    "{0,0,0}" ));
  DATA(insert ( 2101    int4_avg_accum    int8_avg        0    1016    "{0,0}" ));
  DATA(insert ( 2102    int2_avg_accum    int8_avg        0    1016    "{0,0}" ));
! DATA(insert ( 2103    numeric_accum    numeric_avg        0    1231    "{0,0,0}" ));
  DATA(insert ( 2104    float4_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2105    float8_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2106    interval_accum    interval_avg    0    1187    "{0 second,0 second}" ));
--- 80,89 ----
   */

  /* avg */
! DATA(insert ( 2100    int8_avg_accum    numeric_avg        0    1231    "{0,0}" ));
  DATA(insert ( 2101    int4_avg_accum    int8_avg        0    1016    "{0,0}" ));
  DATA(insert ( 2102    int2_avg_accum    int8_avg        0    1016    "{0,0}" ));
! DATA(insert ( 2103    numeric_avg_accum    numeric_avg        0    1231    "{0,0}" ));
  DATA(insert ( 2104    float4_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2105    float8_accum    float8_avg        0    1022    "{0,0,0}" ));
  DATA(insert ( 2106    interval_accum    interval_avg    0    1187    "{0 second,0 second}" ));
diff -Nacr pgsql.orig/src/include/catalog/pg_proc.h pgsql/src/include/catalog/pg_proc.h
*** pgsql.orig/src/include/catalog/pg_proc.h    Fri Feb 16 18:06:37 2007
--- pgsql/src/include/catalog/pg_proc.h    Fri Feb 16 18:09:24 2007
***************
*** 2744,2754 ****
--- 2744,2758 ----
  DESCR("STDDEV_SAMP aggregate final function");
  DATA(insert OID = 1833 (  numeric_accum    PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 1700" _null_ _null_ _null_
numeric_accum- _null_ )); 
  DESCR("aggregate transition function");
+ DATA(insert OID = 2858 (  numeric_avg_accum    PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 1700" _null_ _null_ _null_
numeric_avg_accum- _null_ )); 
+ DESCR("aggregate transition function");
  DATA(insert OID = 1834 (  int2_accum       PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 21" _null_ _null_ _null_
int2_accum- _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1835 (  int4_accum       PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 23" _null_ _null_ _null_
int4_accum- _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1836 (  int8_accum       PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 20" _null_ _null_ _null_
int8_accum- _null_ )); 
+ DESCR("aggregate transition function");
+ DATA(insert OID = 2746 (  int8_avg_accum       PGNSP PGUID 12 1 0 f f t f i 2 1231 "1231 20" _null_ _null_ _null_
int8_avg_accum- _null_ )); 
  DESCR("aggregate transition function");
  DATA(insert OID = 1837 (  numeric_avg       PGNSP PGUID 12 1 0 f f t f i 1 1700 "1231" _null_ _null_ _null_
numeric_avg- _null_ )); 
  DESCR("AVG aggregate final function");
diff -Nacr pgsql.orig/src/include/utils/builtins.h pgsql/src/include/utils/builtins.h
*** pgsql.orig/src/include/utils/builtins.h    Fri Feb 16 18:06:37 2007
--- pgsql/src/include/utils/builtins.h    Fri Feb 16 18:09:24 2007
***************
*** 841,849 ****
--- 841,851 ----
  extern Datum text_numeric(PG_FUNCTION_ARGS);
  extern Datum numeric_text(PG_FUNCTION_ARGS);
  extern Datum numeric_accum(PG_FUNCTION_ARGS);
+ extern Datum numeric_avg_accum(PG_FUNCTION_ARGS);
  extern Datum int2_accum(PG_FUNCTION_ARGS);
  extern Datum int4_accum(PG_FUNCTION_ARGS);
  extern Datum int8_accum(PG_FUNCTION_ARGS);
+ extern Datum int8_avg_accum(PG_FUNCTION_ARGS);
  extern Datum numeric_avg(PG_FUNCTION_ARGS);
  extern Datum numeric_var_pop(PG_FUNCTION_ARGS);
  extern Datum numeric_var_samp(PG_FUNCTION_ARGS);
diff -Nacr pgsql.orig/src/test/regress/expected/create_aggregate.out
pgsql/src/test/regress/expected/create_aggregate.out
*** pgsql.orig/src/test/regress/expected/create_aggregate.out    Sat Jul 29 13:45:35 2006
--- pgsql/src/test/regress/expected/create_aggregate.out    Sat Feb 17 12:05:39 2007
***************
*** 3,11 ****
  --
  -- all functions CREATEd
  CREATE AGGREGATE newavg (
!    sfunc = int4_accum, basetype = int4, stype = _numeric,
!    finalfunc = numeric_avg,
!    initcond1 = '{0,0,0}'
  );
  -- test comments
  COMMENT ON AGGREGATE newavg_wrong (int4) IS 'an agg comment';
--- 3,11 ----
  --
  -- all functions CREATEd
  CREATE AGGREGATE newavg (
!    sfunc = int4_avg_accum, basetype = int4, stype = _int8,
!    finalfunc = int8_avg,
!    initcond1 = '{0,0}'
  );
  -- test comments
  COMMENT ON AGGREGATE newavg_wrong (int4) IS 'an agg comment';
diff -Nacr pgsql.orig/src/test/regress/sql/create_aggregate.sql pgsql/src/test/regress/sql/create_aggregate.sql
*** pgsql.orig/src/test/regress/sql/create_aggregate.sql    Sat Jul 29 13:45:35 2006
--- pgsql/src/test/regress/sql/create_aggregate.sql    Sat Feb 17 12:00:37 2007
***************
*** 4,12 ****

  -- all functions CREATEd
  CREATE AGGREGATE newavg (
!    sfunc = int4_accum, basetype = int4, stype = _numeric,
!    finalfunc = numeric_avg,
!    initcond1 = '{0,0,0}'
  );

  -- test comments
--- 4,12 ----

  -- all functions CREATEd
  CREATE AGGREGATE newavg (
!    sfunc = int4_avg_accum, basetype = int4, stype = _int8,
!    finalfunc = int8_avg,
!    initcond1 = '{0,0}'
  );

  -- test comments

Re: Avg performance for int8/numeric

From
Bruce Momjian
Date:
Mark Kirkwood wrote:
> Bruce Momjian wrote:
> > I have tested this patch but it generates regression failures.
> >
> > There was some code drift, so I am attaching an updated version of the
> > patch, and the regression diffs.  The 'four' column is an 'int4' so my
> > guess is that somehow the wrong aggregate is being called.
> >
>
> Good catch - I must have neglected to run the regression test after
> amending the number of array arguments for the numeric avg :-(.
>
> Hmmm - this changing the number of array args for avg means we can't mix
> transition functions for variance with final functions for avg - which
> is exactly what the regression suite does with the 'newavg' aggregate.

Yea, I was just looking at this and came to same conclusion.

> I've 'fixed' this by amending the definition of 'newavg' to use the
> transition and final function that 'avg' does. However I found myself
> asking if this lost us the point of that test - so I looked back at the
> older postgres versions (e.g. 7.1.3) and saw that back *then* 'newavg'
> and 'avg' were defined using the same functions...so I think making the
> change as indicated is ok.
>
> I've attached a new patch with this change.

OK, great, will apply.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Avg performance for int8/numeric

From
Bruce Momjian
Date:
Patch applied.  Thanks.

---------------------------------------------------------------------------


Mark Kirkwood wrote:
> Bruce Momjian wrote:
> > I have tested this patch but it generates regression failures.
> >
> > There was some code drift, so I am attaching an updated version of the
> > patch, and the regression diffs.  The 'four' column is an 'int4' so my
> > guess is that somehow the wrong aggregate is being called.
> >
>
> Good catch - I must have neglected to run the regression test after
> amending the number of array arguments for the numeric avg :-(.
>
> Hmmm - this changing the number of array args for avg means we can't mix
> transition functions for variance with final functions for avg - which
> is exactly what the regression suite does with the 'newavg' aggregate.
>
> I've 'fixed' this by amending the definition of 'newavg' to use the
> transition and final function that 'avg' does. However I found myself
> asking if this lost us the point of that test - so I looked back at the
> older postgres versions (e.g. 7.1.3) and saw that back *then* 'newavg'
> and 'avg' were defined using the same functions...so I think making the
> change as indicated is ok.
>
> I've attached a new patch with this change.
>
> Cheers
>
> Mark


--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +