Re: Compatible defaults for LEAD/LAG - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Compatible defaults for LEAD/LAG
Date
Msg-id 1126241.1600734804@sss.pgh.pa.us
Whole thread Raw
In response to Re: Compatible defaults for LEAD/LAG  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Compatible defaults for LEAD/LAG
List pgsql-hackers
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I see few possibilities how to finish and close this issue:
> 1. use anycompatible type and add note to documentation so expression of
> optional argument can change a result type, and so this is Postgres
> specific - other databases and ANSI SQL disallow this.
> It is the most simple solution, and it is good enough for this case, that
> is not extra important.
> 2. choose the correct type somewhere inside the parser - for these two
> functions.
> 3. introduce and implement some generic solution for this case - it can be
> implemented on C level via some function helper or on syntax level
>    CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default defval
> a%type) ...
> "defval argname%type" means for caller's expression "CAST(defval AS
> typeof(argname))"

I continue to feel that the spec's definition of this is not so
obviously right that we should jump through hoops to duplicate it.
In fact, I don't even agree that we need a disclaimer in the docs
saying that it's not quite the same.  Only the most nitpicky
language lawyers would ever even notice.

In hopes of moving this along a bit, I experimented with converting
the other functions I listed to use anycompatible.  I soon found that
touching any functions/operators that are listed in operator classes
would open a can of worms far larger than I'd previously supposed.
To maintain consistency, we'd have to propagate the datatype changes
to *every* function/operator associated with those opfamilies --- many
of which only have one any-foo input and thus aren't on my original
list.  (An example here is that extending btree array_ops will require
changing the max(anyarray) and min(anyarray) aggregates too.)  We'd
then end up with a situation that would be rather confusing from a
user's standpoint, in that it'd be quite un-obvious why some array
functions use anyarray while other ones use anycompatiblearray.

So I backed off to just changing the functions/operators that have
no opclass connections, such as array_cat.  Even that has some
downsides --- for example, in the attached patch, it's necessary
to change some polymorphism.sql examples that explicitly reference
array_cat(anyarray).  I wonder whether this change would break a
significant number of user-defined aggregates or operators.

(Note that I think we'd have to resist the temptation to fix that
by letting CREATE AGGREGATE et al accept support functions that
take anyarray/anycompatiblearray (etc) interchangeably.  A lot of
the security analysis that went into CVE-2020-14350 depended on
the assumption that utility commands only do exact lookups of
support functions.  If we tried to be lax about this, we'd
re-introduce the possibility for hostile capture of function
references in extension scripts.)

Another interesting issue, not seen in the attached but which
came up while I was experimenting with the more aggressive patch,
was this failure in the polymorphism test:

 select max(histogram_bounds) from pg_stats where tablename = 'pg_am';
-ERROR:  cannot compare arrays of different element types
+ERROR:  function max(anyarray) does not exist

That's because we don't accept pg_stats.histogram_bounds (of
declared type anyarray) as a valid input for anycompatiblearray.
I wonder if that isn't a bug we need to fix in the anycompatible
patch, independently of anything else.  We'd not be able to deduce
an actual element type from such an input, but we already cannot
do so when we match it to an anyarray parameter.

Anyway, attached find

0001 - updates Vik's original patch to HEAD and tweaks the
grammar in the docs a bit.

0002 - add-on patch to convert array_append, array_prepend,
array_cat, array_position, array_positions, array_remove,
array_replace, and width_bucket to use anycompatiblearray.

I think 0001 is committable, but 0002 is just WIP since
I didn't touch the docs.  I'm slightly discouraged about
whether 0002 is worth proceeding with.  Any thoughts?

            regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 461b748d89..14cb8e2ce6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19621,17 +19621,17 @@ SELECT count(*) FROM sometable;
         <indexterm>
          <primary>lag</primary>
         </indexterm>
-        <function>lag</function> ( <parameter>value</parameter> <type>anyelement</type>
+        <function>lag</function> ( <parameter>value</parameter> <type>anycompatible</type>
           <optional>, <parameter>offset</parameter> <type>integer</type>
-          <optional>, <parameter>default</parameter> <type>anyelement</type> </optional></optional> )
-        <returnvalue>anyelement</returnvalue>
+          <optional>, <parameter>default</parameter> <type>anycompatible</type> </optional></optional> )
+        <returnvalue>anycompatible</returnvalue>
        </para>
        <para>
         Returns <parameter>value</parameter> evaluated at
         the row that is <parameter>offset</parameter>
         rows before the current row within the partition; if there is no such
         row, instead returns <parameter>default</parameter>
-        (which must be of the same type as
+        (which must be of a type compatible with
         <parameter>value</parameter>).
         Both <parameter>offset</parameter> and
         <parameter>default</parameter> are evaluated
@@ -19646,17 +19646,17 @@ SELECT count(*) FROM sometable;
         <indexterm>
          <primary>lead</primary>
         </indexterm>
-        <function>lead</function> ( <parameter>value</parameter> <type>anyelement</type>
+        <function>lead</function> ( <parameter>value</parameter> <type>anycompatible</type>
           <optional>, <parameter>offset</parameter> <type>integer</type>
-          <optional>, <parameter>default</parameter> <type>anyelement</type> </optional></optional> )
-        <returnvalue>anyelement</returnvalue>
+          <optional>, <parameter>default</parameter> <type>anycompatible</type> </optional></optional> )
+        <returnvalue>anycompatible</returnvalue>
        </para>
        <para>
         Returns <parameter>value</parameter> evaluated at
         the row that is <parameter>offset</parameter>
         rows after the current row within the partition; if there is no such
         row, instead returns <parameter>default</parameter>
-        (which must be of the same type as
+        (which must be of a type compatible with
         <parameter>value</parameter>).
         Both <parameter>offset</parameter> and
         <parameter>default</parameter> are evaluated
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f48f5fb4d9..6e7832b0f4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9729,8 +9729,8 @@
   proname => 'lag', prokind => 'w', prorettype => 'anyelement',
   proargtypes => 'anyelement int4', prosrc => 'window_lag_with_offset' },
 { oid => '3108', descr => 'fetch the Nth preceding row value with default',
-  proname => 'lag', prokind => 'w', prorettype => 'anyelement',
-  proargtypes => 'anyelement int4 anyelement',
+  proname => 'lag', prokind => 'w', prorettype => 'anycompatible',
+  proargtypes => 'anycompatible int4 anycompatible',
   prosrc => 'window_lag_with_offset_and_default' },
 { oid => '3109', descr => 'fetch the following row value',
   proname => 'lead', prokind => 'w', prorettype => 'anyelement',
@@ -9739,8 +9739,8 @@
   proname => 'lead', prokind => 'w', prorettype => 'anyelement',
   proargtypes => 'anyelement int4', prosrc => 'window_lead_with_offset' },
 { oid => '3111', descr => 'fetch the Nth following row value with default',
-  proname => 'lead', prokind => 'w', prorettype => 'anyelement',
-  proargtypes => 'anyelement int4 anyelement',
+  proname => 'lead', prokind => 'w', prorettype => 'anycompatible',
+  proargtypes => 'anycompatible int4 anycompatible',
   prosrc => 'window_lead_with_offset_and_default' },
 { oid => '3112', descr => 'fetch the first row value',
   proname => 'first_value', prokind => 'w', prorettype => 'anyelement',
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 21c6cac491..19e2ac518a 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -300,6 +300,21 @@ SELECT lag(ten, four, 0) OVER (PARTITION BY four ORDER BY ten), ten, four FROM t
    0 |   3 |    3
 (10 rows)

+SELECT lag(ten, four, 0.7) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY
four,ten; 
+ lag | ten | four
+-----+-----+------
+   0 |   0 |    0
+   0 |   0 |    0
+   4 |   4 |    0
+ 0.7 |   1 |    1
+   1 |   1 |    1
+   1 |   7 |    1
+   7 |   9 |    1
+ 0.7 |   0 |    2
+ 0.7 |   1 |    3
+ 0.7 |   3 |    3
+(10 rows)
+
 SELECT lead(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;
  lead | ten | four
 ------+-----+------
@@ -345,6 +360,21 @@ SELECT lead(ten * 2, 1, -1) OVER (PARTITION BY four ORDER BY ten), ten, four FRO
    -1 |   3 |    3
 (10 rows)

+SELECT lead(ten * 2, 1, -1.4) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY
four,ten; 
+ lead | ten | four
+------+-----+------
+    0 |   0 |    0
+    8 |   0 |    0
+ -1.4 |   4 |    0
+    2 |   1 |    1
+   14 |   1 |    1
+   18 |   7 |    1
+ -1.4 |   9 |    1
+ -1.4 |   0 |    2
+    6 |   1 |    3
+ -1.4 |   3 |    3
+(10 rows)
+
 SELECT first_value(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;
  first_value | ten | four
 -------------+-----+------
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index 9485aebce8..eae5fa6017 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -63,12 +63,14 @@ SELECT lag(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHER
 SELECT lag(ten, four) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;

 SELECT lag(ten, four, 0) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;
+SELECT lag(ten, four, 0.7) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY
four,ten; 

 SELECT lead(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;

 SELECT lead(ten * 2, 1) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;

 SELECT lead(ten * 2, 1, -1) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;
+SELECT lead(ten * 2, 1, -1.4) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY
four,ten; 

 SELECT first_value(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;

diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 7cc812adda..c3377ebe46 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -168,14 +168,14 @@
   oprcode => 'textnename', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },

 { oid => '349', descr => 'append element onto end of array',
-  oprname => '||', oprleft => 'anyarray', oprright => 'anyelement',
-  oprresult => 'anyarray', oprcode => 'array_append' },
+  oprname => '||', oprleft => 'anycompatiblearray', oprright => 'anycompatible',
+  oprresult => 'anycompatiblearray', oprcode => 'array_append' },
 { oid => '374', descr => 'prepend element onto front of array',
-  oprname => '||', oprleft => 'anyelement', oprright => 'anyarray',
-  oprresult => 'anyarray', oprcode => 'array_prepend' },
+  oprname => '||', oprleft => 'anycompatible', oprright => 'anycompatiblearray',
+  oprresult => 'anycompatiblearray', oprcode => 'array_prepend' },
 { oid => '375', descr => 'concatenate',
-  oprname => '||', oprleft => 'anyarray', oprright => 'anyarray',
-  oprresult => 'anyarray', oprcode => 'array_cat' },
+  oprname => '||', oprleft => 'anycompatiblearray', oprright => 'anycompatiblearray',
+  oprresult => 'anycompatiblearray', oprcode => 'array_cat' },

 { oid => '352', descr => 'equal',
   oprname => '=', oprcanhash => 't', oprleft => 'xid', oprright => 'xid',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6e7832b0f4..b239b5d42d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1549,14 +1549,14 @@
   proname => 'cardinality', prorettype => 'int4', proargtypes => 'anyarray',
   prosrc => 'array_cardinality' },
 { oid => '378', descr => 'append element onto end of array',
-  proname => 'array_append', proisstrict => 'f', prorettype => 'anyarray',
-  proargtypes => 'anyarray anyelement', prosrc => 'array_append' },
+  proname => 'array_append', proisstrict => 'f', prorettype => 'anycompatiblearray',
+  proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_append' },
 { oid => '379', descr => 'prepend element onto front of array',
-  proname => 'array_prepend', proisstrict => 'f', prorettype => 'anyarray',
-  proargtypes => 'anyelement anyarray', prosrc => 'array_prepend' },
+  proname => 'array_prepend', proisstrict => 'f', prorettype => 'anycompatiblearray',
+  proargtypes => 'anycompatible anycompatiblearray', prosrc => 'array_prepend' },
 { oid => '383',
-  proname => 'array_cat', proisstrict => 'f', prorettype => 'anyarray',
-  proargtypes => 'anyarray anyarray', prosrc => 'array_cat' },
+  proname => 'array_cat', proisstrict => 'f', prorettype => 'anycompatiblearray',
+  proargtypes => 'anycompatiblearray anycompatiblearray', prosrc => 'array_cat' },
 { oid => '394', descr => 'split delimited text',
   proname => 'string_to_array', proisstrict => 'f', prorettype => '_text',
   proargtypes => 'text text', prosrc => 'text_to_array' },
@@ -1588,15 +1588,15 @@
   proargtypes => 'anyarray anyarray', prosrc => 'array_smaller' },
 { oid => '3277', descr => 'returns an offset of value in array',
   proname => 'array_position', proisstrict => 'f', prorettype => 'int4',
-  proargtypes => 'anyarray anyelement', prosrc => 'array_position' },
+  proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_position' },
 { oid => '3278',
   descr => 'returns an offset of value in array with start index',
   proname => 'array_position', proisstrict => 'f', prorettype => 'int4',
-  proargtypes => 'anyarray anyelement int4', prosrc => 'array_position_start' },
+  proargtypes => 'anycompatiblearray anycompatible int4', prosrc => 'array_position_start' },
 { oid => '3279',
   descr => 'returns an array of offsets of some value in array',
   proname => 'array_positions', proisstrict => 'f', prorettype => '_int4',
-  proargtypes => 'anyarray anyelement', prosrc => 'array_positions' },
+  proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_positions' },
 { oid => '1191', descr => 'array subscripts generator',
   proname => 'generate_subscripts', prorows => '1000', proretset => 't',
   prorettype => 'int4', proargtypes => 'anyarray int4 bool',
@@ -1621,11 +1621,11 @@
   proargtypes => 'internal', prosrc => 'array_unnest_support' },
 { oid => '3167',
   descr => 'remove any occurrences of an element from an array',
-  proname => 'array_remove', proisstrict => 'f', prorettype => 'anyarray',
-  proargtypes => 'anyarray anyelement', prosrc => 'array_remove' },
+  proname => 'array_remove', proisstrict => 'f', prorettype => 'anycompatiblearray',
+  proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_remove' },
 { oid => '3168', descr => 'replace any occurrences of an element in an array',
-  proname => 'array_replace', proisstrict => 'f', prorettype => 'anyarray',
-  proargtypes => 'anyarray anyelement anyelement', prosrc => 'array_replace' },
+  proname => 'array_replace', proisstrict => 'f', prorettype => 'anycompatiblearray',
+  proargtypes => 'anycompatiblearray anycompatible anycompatible', prosrc => 'array_replace' },
 { oid => '2333', descr => 'aggregate transition function',
   proname => 'array_agg_transfn', proisstrict => 'f', prorettype => 'internal',
   proargtypes => 'internal anynonarray', prosrc => 'array_agg_transfn' },
@@ -1651,7 +1651,8 @@
 { oid => '3218',
   descr => 'bucket number of operand given a sorted array of bucket lower bounds',
   proname => 'width_bucket', prorettype => 'int4',
-  proargtypes => 'anyelement anyarray', prosrc => 'width_bucket_array' },
+  proargtypes => 'anycompatible anycompatiblearray',
+  prosrc => 'width_bucket_array' },
 { oid => '3816', descr => 'array typanalyze',
   proname => 'array_typanalyze', provolatile => 's', prorettype => 'bool',
   proargtypes => 'internal', prosrc => 'array_typanalyze' },
diff --git a/src/test/regress/expected/polymorphism.out b/src/test/regress/expected/polymorphism.out
index 1ff40764d9..2c3bb0a60b 100644
--- a/src/test/regress/expected/polymorphism.out
+++ b/src/test/regress/expected/polymorphism.out
@@ -729,10 +729,10 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl;
 (5 rows)

 -- another sort of polymorphic aggregate
-CREATE AGGREGATE array_cat_accum (anyarray)
+CREATE AGGREGATE array_cat_accum (anycompatiblearray)
 (
     sfunc = array_cat,
-    stype = anyarray,
+    stype = anycompatiblearray,
     initcond = '{}'
 );
 SELECT array_cat_accum(i)
@@ -786,16 +786,16 @@ create aggregate build_group(int8, integer) (
   STYPE = int8[]
 );
 -- check proper resolution of data types for polymorphic transfn/finalfn
-create function first_el(anyarray) returns anyelement as
+create function first_el(anycompatiblearray) returns anycompatible as
 'select $1[1]' language sql strict immutable;
 create aggregate first_el_agg_f8(float8) (
   SFUNC = array_append,
   STYPE = float8[],
   FINALFUNC = first_el
 );
-create aggregate first_el_agg_any(anyelement) (
+create aggregate first_el_agg_any(anycompatible) (
   SFUNC = array_append,
-  STYPE = anyarray,
+  STYPE = anycompatiblearray,
   FINALFUNC = first_el
 );
 select first_el_agg_f8(x::float8) from generate_series(1,10) x;
diff --git a/src/test/regress/sql/polymorphism.sql b/src/test/regress/sql/polymorphism.sql
index e5222f1f81..70a21c8978 100644
--- a/src/test/regress/sql/polymorphism.sql
+++ b/src/test/regress/sql/polymorphism.sql
@@ -498,10 +498,10 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl;

 -- another sort of polymorphic aggregate

-CREATE AGGREGATE array_cat_accum (anyarray)
+CREATE AGGREGATE array_cat_accum (anycompatiblearray)
 (
     sfunc = array_cat,
-    stype = anyarray,
+    stype = anycompatiblearray,
     initcond = '{}'
 );

@@ -549,7 +549,7 @@ create aggregate build_group(int8, integer) (

 -- check proper resolution of data types for polymorphic transfn/finalfn

-create function first_el(anyarray) returns anyelement as
+create function first_el(anycompatiblearray) returns anycompatible as
 'select $1[1]' language sql strict immutable;

 create aggregate first_el_agg_f8(float8) (
@@ -558,9 +558,9 @@ create aggregate first_el_agg_f8(float8) (
   FINALFUNC = first_el
 );

-create aggregate first_el_agg_any(anyelement) (
+create aggregate first_el_agg_any(anycompatible) (
   SFUNC = array_append,
-  STYPE = anyarray,
+  STYPE = anycompatiblearray,
   FINALFUNC = first_el
 );


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Global snapshots