Re: Drop one-argument string_agg? (was Re: [BUGS] string_agg delimiter having no effect with order by) - Mailing list pgsql-hackers

I wrote:
> Hm?  I don't think that an initdb here would have any impact on whether
> we can call the next drop RC1 or not.  We're talking about removing a
> single built-in entry in pg_proc --- it's one of the safest changes we
> could possibly make.

Well, I forgot that an aggregate involves more than one catalog row ;-).
So it's a bit bigger patch than that, but still pretty small and safe.
See attached.

What we are doing here, IMO, is not just changing string_agg() but
instituting a project policy that we are not going to offer built-in
aggregates with the same names and different numbers of arguments ---
otherwise the problem will come right back.  Therefore, the attached
patch adds an opr_sanity regression test that will complain if anyone
tries to add such.  Of course, this isn't preventing users from creating
such aggregates, but it's on their own heads to not get confused if they
do.

This policy also implies that we are never going to allow default
arguments for aggregates, or at least never have any built-in ones
that use such a feature.

By my count the following people had offered an opinion on making
this change:
    for: tgl, josh, badalex, mmoncure
    against: rhaas, thom
Anybody else want to vote, or change their vote after seeing the patch?

            regards, tom lane

Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.522
diff -c -r1.522 func.sgml
*** doc/src/sgml/func.sgml    29 Jul 2010 19:34:40 -0000    1.522
--- doc/src/sgml/func.sgml    4 Aug 2010 22:01:12 -0000
***************
*** 9731,9737 ****
      <thead>
       <row>
        <entry>Function</entry>
!       <entry>Argument Type</entry>
        <entry>Return Type</entry>
        <entry>Description</entry>
       </row>
--- 9731,9737 ----
      <thead>
       <row>
        <entry>Function</entry>
!       <entry>Argument Type(s)</entry>
        <entry>Return Type</entry>
        <entry>Description</entry>
       </row>
***************
*** 9901,9917 ****
          <primary>string_agg</primary>
         </indexterm>
         <function>
!          string_agg(<replaceable class="parameter">expression</replaceable>
!                     [, <replaceable class="parameter">delimiter</replaceable> ] )
         </function>
        </entry>
        <entry>
!        <type>text</type>
        </entry>
        <entry>
         <type>text</type>
        </entry>
!       <entry>input values concatenated into a string, optionally with delimiters</entry>
       </row>

       <row>
--- 9901,9917 ----
          <primary>string_agg</primary>
         </indexterm>
         <function>
!          string_agg(<replaceable class="parameter">expression</replaceable>,
!                     <replaceable class="parameter">delimiter</replaceable>)
         </function>
        </entry>
        <entry>
!        <type>text</type>, <type>text</type>
        </entry>
        <entry>
         <type>text</type>
        </entry>
!       <entry>input values concatenated into a string, separated by delimiter</entry>
       </row>

       <row>
Index: doc/src/sgml/syntax.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v
retrieving revision 1.149
diff -c -r1.149 syntax.sgml
*** doc/src/sgml/syntax.sgml    4 Aug 2010 15:27:57 -0000    1.149
--- doc/src/sgml/syntax.sgml    4 Aug 2010 22:01:12 -0000
***************
*** 1589,1598 ****
  </programlisting>
      not this:
  <programlisting>
! SELECT string_agg(a ORDER BY a, ',') FROM table;  -- not what you want
  </programlisting>
!     The latter syntax will be accepted, but <literal>','</> will be
!     treated as a (useless) sort key.
     </para>

     <para>
--- 1589,1599 ----
  </programlisting>
      not this:
  <programlisting>
! SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect
  </programlisting>
!     The latter is syntactically valid, but it represents a call of a
!     single-argument aggregate function with two <literal>ORDER BY</> keys
!     (the second one being rather useless since it's a constant).
     </para>

     <para>
Index: src/backend/utils/adt/varlena.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/varlena.c,v
retrieving revision 1.177
diff -c -r1.177 varlena.c
*** src/backend/utils/adt/varlena.c    26 Feb 2010 02:01:10 -0000    1.177
--- src/backend/utils/adt/varlena.c    4 Aug 2010 22:01:12 -0000
***************
*** 3320,3326 ****
  /*
   * string_agg - Concatenates values and returns string.
   *
!  * Syntax: string_agg(value text, delimiter text = '') RETURNS text
   *
   * Note: Any NULL values are ignored. The first-call delimiter isn't
   * actually used at all, and on subsequent calls the delimiter precedes
--- 3320,3326 ----
  /*
   * string_agg - Concatenates values and returns string.
   *
!  * Syntax: string_agg(value text, delimiter text) RETURNS text
   *
   * Note: Any NULL values are ignored. The first-call delimiter isn't
   * actually used at all, and on subsequent calls the delimiter precedes
***************
*** 3359,3386 ****

      state = PG_ARGISNULL(0) ? NULL : (StringInfo) PG_GETARG_POINTER(0);

-     /* Append the element unless null. */
-     if (!PG_ARGISNULL(1))
-     {
-         if (state == NULL)
-             state = makeStringAggState(fcinfo);
-         appendStringInfoText(state, PG_GETARG_TEXT_PP(1));        /* value */
-     }
-
-     /*
-      * The transition type for string_agg() is declared to be "internal",
-      * which is a pass-by-value type the same size as a pointer.
-      */
-     PG_RETURN_POINTER(state);
- }
-
- Datum
- string_agg_delim_transfn(PG_FUNCTION_ARGS)
- {
-     StringInfo    state;
-
-     state = PG_ARGISNULL(0) ? NULL : (StringInfo) PG_GETARG_POINTER(0);
-
      /* Append the value unless null. */
      if (!PG_ARGISNULL(1))
      {
--- 3359,3364 ----
Index: src/include/catalog/catversion.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/catversion.h,v
retrieving revision 1.588
diff -c -r1.588 catversion.h
*** src/include/catalog/catversion.h    16 Jul 2010 02:15:54 -0000    1.588
--- src/include/catalog/catversion.h    4 Aug 2010 22:01:12 -0000
***************
*** 53,58 ****
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    201007151

  #endif
--- 53,58 ----
   */

  /*                            yyyymmddN */
! #define CATALOG_VERSION_NO    201008041

  #endif
Index: src/include/catalog/pg_aggregate.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/pg_aggregate.h,v
retrieving revision 1.71
diff -c -r1.71 pg_aggregate.h
*** src/include/catalog/pg_aggregate.h    1 Feb 2010 03:14:43 -0000    1.71
--- src/include/catalog/pg_aggregate.h    4 Aug 2010 22:01:12 -0000
***************
*** 224,231 ****
  DATA(insert ( 2335    array_agg_transfn    array_agg_finalfn        0    2281    _null_ ));

  /* text */
! DATA(insert (3537    string_agg_transfn            string_agg_finalfn    0    2281    _null_ ));
! DATA(insert (3538    string_agg_delim_transfn    string_agg_finalfn    0    2281    _null_ ));

  /*
   * prototypes for functions in pg_aggregate.c
--- 224,230 ----
  DATA(insert ( 2335    array_agg_transfn    array_agg_finalfn        0    2281    _null_ ));

  /* text */
! DATA(insert ( 3538    string_agg_transfn    string_agg_finalfn        0    2281    _null_ ));

  /*
   * prototypes for functions in pg_aggregate.c
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.573
diff -c -r1.573 pg_proc.h
*** src/include/catalog/pg_proc.h    29 Jul 2010 20:09:25 -0000    1.573
--- src/include/catalog/pg_proc.h    4 Aug 2010 22:01:12 -0000
***************
*** 2835,2850 ****
  DESCR("COVAR_SAMP(double, double) aggregate final function");
  DATA(insert OID = 2817 (  float8_corr                PGNSP PGUID 12 1 0 0 f f f t f i 1 0 701 "1022" _null_ _null_
_null__null_ float8_corr _null_ _null_ _null_ )); 
  DESCR("CORR(double, double) aggregate final function");
! DATA(insert OID = 3534 (  string_agg_transfn        PGNSP PGUID 12 1 0 0 f f f f f i 2 0 2281 "2281 25" _null_ _null_
_null__null_ string_agg_transfn _null_ _null_ _null_ )); 
! DESCR("string_agg(text) transition function");
! DATA(insert OID = 3535 (  string_agg_delim_transfn    PGNSP PGUID 12 1 0 0 f f f f f i 3 0 2281 "2281 25 25" _null_
_null__null_ _null_ string_agg_delim_transfn _null_ _null_ _null_ )); 
  DESCR("string_agg(text, text) transition function");
  DATA(insert OID = 3536 (  string_agg_finalfn        PGNSP PGUID 12 1 0 0 f f f f f i 1 0 25 "2281" _null_ _null_
_null__null_ string_agg_finalfn _null_ _null_ _null_ )); 
! DESCR("string_agg final function");
! DATA(insert OID = 3537 (  string_agg                PGNSP PGUID 12 1 0 0 t f f f f i 1 0 25 "25" _null_ _null_ _null_
_null_aggregate_dummy _null_ _null_ _null_ )); 
! DESCR("concatenate aggregate input into an string");
  DATA(insert OID = 3538 (  string_agg                PGNSP PGUID 12 1 0 0 t f f f f i 2 0 25 "25 25" _null_ _null_
_null__null_ aggregate_dummy _null_ _null_ _null_ )); 
! DESCR("concatenate aggregate input into an string with delimiter");

  /* To ASCII conversion */
  DATA(insert OID = 1845 ( to_ascii    PGNSP PGUID 12 1 0 0 f f f t f i 1 0 25 "25" _null_ _null_ _null_ _null_
to_ascii_default_null_ _null_ _null_ )); 
--- 2835,2846 ----
  DESCR("COVAR_SAMP(double, double) aggregate final function");
  DATA(insert OID = 2817 (  float8_corr                PGNSP PGUID 12 1 0 0 f f f t f i 1 0 701 "1022" _null_ _null_
_null__null_ float8_corr _null_ _null_ _null_ )); 
  DESCR("CORR(double, double) aggregate final function");
! DATA(insert OID = 3535 (  string_agg_transfn        PGNSP PGUID 12 1 0 0 f f f f f i 3 0 2281 "2281 25 25" _null_
_null__null_ _null_ string_agg_transfn _null_ _null_ _null_ )); 
  DESCR("string_agg(text, text) transition function");
  DATA(insert OID = 3536 (  string_agg_finalfn        PGNSP PGUID 12 1 0 0 f f f f f i 1 0 25 "2281" _null_ _null_
_null__null_ string_agg_finalfn _null_ _null_ _null_ )); 
! DESCR("string_agg(text, text) final function");
  DATA(insert OID = 3538 (  string_agg                PGNSP PGUID 12 1 0 0 t f f f f i 2 0 25 "25 25" _null_ _null_
_null__null_ aggregate_dummy _null_ _null_ _null_ )); 
! DESCR("concatenate aggregate input into a string");

  /* To ASCII conversion */
  DATA(insert OID = 1845 ( to_ascii    PGNSP PGUID 12 1 0 0 f f f t f i 1 0 25 "25" _null_ _null_ _null_ _null_
to_ascii_default_null_ _null_ _null_ )); 
Index: src/include/utils/builtins.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/builtins.h,v
retrieving revision 1.352
diff -c -r1.352 builtins.h
*** src/include/utils/builtins.h    22 Jul 2010 01:22:35 -0000    1.352
--- src/include/utils/builtins.h    4 Aug 2010 22:01:12 -0000
***************
*** 729,735 ****
  extern Datum pg_column_size(PG_FUNCTION_ARGS);

  extern Datum string_agg_transfn(PG_FUNCTION_ARGS);
- extern Datum string_agg_delim_transfn(PG_FUNCTION_ARGS);
  extern Datum string_agg_finalfn(PG_FUNCTION_ARGS);

  /* version.c */
--- 729,734 ----
***************
*** 780,788 ****
  extern Datum chr (PG_FUNCTION_ARGS);
  extern Datum repeat(PG_FUNCTION_ARGS);
  extern Datum ascii(PG_FUNCTION_ARGS);
- extern Datum string_agg_transfn(PG_FUNCTION_ARGS);
- extern Datum string_agg_delim_transfn(PG_FUNCTION_ARGS);
- extern Datum string_agg_finalfn(PG_FUNCTION_ARGS);

  /* inet_net_ntop.c */
  extern char *inet_net_ntop(int af, const void *src, int bits,
--- 779,784 ----
Index: src/test/regress/expected/aggregates.out
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/expected/aggregates.out,v
retrieving revision 1.22
diff -c -r1.22 aggregates.out
*** src/test/regress/expected/aggregates.out    18 Jul 2010 19:37:48 -0000    1.22
--- src/test/regress/expected/aggregates.out    4 Aug 2010 22:01:12 -0000
***************
*** 800,811 ****
  LINE 1: select aggfns(distinct a,a,c order by a,b)
                                                  ^
  -- string_agg tests
- select string_agg(a) from (values('aaaa'),('bbbb'),('cccc')) g(a);
-   string_agg
- --------------
-  aaaabbbbcccc
- (1 row)
-
  select string_agg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
     string_agg
  ----------------
--- 800,805 ----
***************
*** 818,827 ****
   aaaa,bbbb,cccc
  (1 row)

! select string_agg(a,',') from (values(null),(null),('bbbb'),('cccc')) g(a);
   string_agg
  ------------
!  bbbb,cccc
  (1 row)

  select string_agg(a,',') from (values(null),(null)) g(a);
--- 812,821 ----
   aaaa,bbbb,cccc
  (1 row)

! select string_agg(a,'AB') from (values(null),(null),('bbbb'),('cccc')) g(a);
   string_agg
  ------------
!  bbbbABcccc
  (1 row)

  select string_agg(a,',') from (values(null),(null)) g(a);
***************
*** 831,853 ****
  (1 row)

  -- check some implicit casting cases, as per bug #5564
! select string_agg(distinct f1 order by f1) from varchar_tbl;  -- ok
   string_agg
  ------------
!  aababcd
  (1 row)

! select string_agg(distinct f1::text order by f1) from varchar_tbl;  -- not ok
  ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
! LINE 1: select string_agg(distinct f1::text order by f1) from varcha...
!                                                      ^
! select string_agg(distinct f1 order by f1::text) from varchar_tbl;  -- not ok
  ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
! LINE 1: select string_agg(distinct f1 order by f1::text) from varcha...
!                                                ^
! select string_agg(distinct f1::text order by f1::text) from varchar_tbl;  -- ok
   string_agg
  ------------
!  aababcd
  (1 row)

--- 825,847 ----
  (1 row)

  -- check some implicit casting cases, as per bug #5564
! select string_agg(distinct f1, ',' order by f1) from varchar_tbl;  -- ok
   string_agg
  ------------
!  a,ab,abcd
  (1 row)

! select string_agg(distinct f1::text, ',' order by f1) from varchar_tbl;  -- not ok
  ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
! LINE 1: select string_agg(distinct f1::text, ',' order by f1) from v...
!                                                           ^
! select string_agg(distinct f1, ',' order by f1::text) from varchar_tbl;  -- not ok
  ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
! LINE 1: select string_agg(distinct f1, ',' order by f1::text) from v...
!                                                     ^
! select string_agg(distinct f1::text, ',' order by f1::text) from varchar_tbl;  -- ok
   string_agg
  ------------
!  a,ab,abcd
  (1 row)

Index: src/test/regress/expected/opr_sanity.out
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/expected/opr_sanity.out,v
retrieving revision 1.90
diff -c -r1.90 opr_sanity.out
*** src/test/regress/expected/opr_sanity.out    14 Jan 2010 16:31:09 -0000    1.90
--- src/test/regress/expected/opr_sanity.out    4 Aug 2010 22:01:13 -0000
***************
*** 773,778 ****
--- 773,803 ----
   min     | <       |            1
  (2 rows)

+ -- Check that there are not aggregates with the same name and different
+ -- numbers of arguments.  While not technically wrong, we have a project policy
+ -- to avoid this because it opens the door for confusion in connection with
+ -- ORDER BY: novices frequently put the ORDER BY in the wrong place.
+ -- See the fate of the single-argument form of string_agg() for history.
+ -- The only aggregates that should show up here are count() and count(*).
+ SELECT p1.oid::regprocedure, p2.oid::regprocedure
+ FROM pg_proc AS p1, pg_proc AS p2
+ WHERE p1.oid < p2.oid AND p1.proname = p2.proname AND
+     p1.proisagg AND p2.proisagg AND
+     array_dims(p1.proargtypes) != array_dims(p2.proargtypes)
+ ORDER BY 1;
+      oid      |   oid
+ --------------+---------
+  count("any") | count()
+ (1 row)
+
+ -- For the same reason, aggregates with default arguments are no good.
+ SELECT oid, proname
+ FROM pg_proc AS p
+ WHERE proisagg AND proargdefaults IS NOT NULL;
+  oid | proname
+ -----+---------
+ (0 rows)
+
  -- **************** pg_opfamily ****************
  -- Look for illegal values in pg_opfamily fields
  SELECT p1.oid
Index: src/test/regress/sql/aggregates.sql
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/sql/aggregates.sql,v
retrieving revision 1.18
diff -c -r1.18 aggregates.sql
*** src/test/regress/sql/aggregates.sql    18 Jul 2010 19:37:49 -0000    1.18
--- src/test/regress/sql/aggregates.sql    4 Aug 2010 22:01:13 -0000
***************
*** 357,370 ****
    from (values (1,1,'foo')) v(a,b,c), generate_series(1,2) i;

  -- string_agg tests
- select string_agg(a) from (values('aaaa'),('bbbb'),('cccc')) g(a);
  select string_agg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
  select string_agg(a,',') from (values('aaaa'),(null),('bbbb'),('cccc')) g(a);
! select string_agg(a,',') from (values(null),(null),('bbbb'),('cccc')) g(a);
  select string_agg(a,',') from (values(null),(null)) g(a);

  -- check some implicit casting cases, as per bug #5564
! select string_agg(distinct f1 order by f1) from varchar_tbl;  -- ok
! select string_agg(distinct f1::text order by f1) from varchar_tbl;  -- not ok
! select string_agg(distinct f1 order by f1::text) from varchar_tbl;  -- not ok
! select string_agg(distinct f1::text order by f1::text) from varchar_tbl;  -- ok
--- 357,369 ----
    from (values (1,1,'foo')) v(a,b,c), generate_series(1,2) i;

  -- string_agg tests
  select string_agg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
  select string_agg(a,',') from (values('aaaa'),(null),('bbbb'),('cccc')) g(a);
! select string_agg(a,'AB') from (values(null),(null),('bbbb'),('cccc')) g(a);
  select string_agg(a,',') from (values(null),(null)) g(a);

  -- check some implicit casting cases, as per bug #5564
! select string_agg(distinct f1, ',' order by f1) from varchar_tbl;  -- ok
! select string_agg(distinct f1::text, ',' order by f1) from varchar_tbl;  -- not ok
! select string_agg(distinct f1, ',' order by f1::text) from varchar_tbl;  -- not ok
! select string_agg(distinct f1::text, ',' order by f1::text) from varchar_tbl;  -- ok
Index: src/test/regress/sql/opr_sanity.sql
===================================================================
RCS file: /cvsroot/pgsql/src/test/regress/sql/opr_sanity.sql,v
retrieving revision 1.73
diff -c -r1.73 opr_sanity.sql
*** src/test/regress/sql/opr_sanity.sql    29 Nov 2009 18:53:54 -0000    1.73
--- src/test/regress/sql/opr_sanity.sql    4 Aug 2010 22:01:13 -0000
***************
*** 621,626 ****
--- 621,646 ----
      amopmethod = (SELECT oid FROM pg_am WHERE amname = 'btree')
  ORDER BY 1, 2;

+ -- Check that there are not aggregates with the same name and different
+ -- numbers of arguments.  While not technically wrong, we have a project policy
+ -- to avoid this because it opens the door for confusion in connection with
+ -- ORDER BY: novices frequently put the ORDER BY in the wrong place.
+ -- See the fate of the single-argument form of string_agg() for history.
+ -- The only aggregates that should show up here are count() and count(*).
+
+ SELECT p1.oid::regprocedure, p2.oid::regprocedure
+ FROM pg_proc AS p1, pg_proc AS p2
+ WHERE p1.oid < p2.oid AND p1.proname = p2.proname AND
+     p1.proisagg AND p2.proisagg AND
+     array_dims(p1.proargtypes) != array_dims(p2.proargtypes)
+ ORDER BY 1;
+
+ -- For the same reason, aggregates with default arguments are no good.
+
+ SELECT oid, proname
+ FROM pg_proc AS p
+ WHERE proisagg AND proargdefaults IS NOT NULL;
+
  -- **************** pg_opfamily ****************

  -- Look for illegal values in pg_opfamily fields

pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: ECPG dynamic cursor fix for UPDATE/DELETE ... WHERE CURRENT OF :curname
Next
From: Thom Brown
Date:
Subject: Re: Drop one-argument string_agg? (was Re: [BUGS] string_agg delimiter having no effect with order by)