Re: Compatible defaults for LEAD/LAG - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Compatible defaults for LEAD/LAG
Date
Msg-id 3902163.1598824779@sss.pgh.pa.us
Whole thread Raw
In response to Re: Compatible defaults for LEAD/LAG  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Compatible defaults for LEAD/LAG
List pgsql-hackers
Pavel Stehule <pavel.stehule@gmail.com> writes:
> This is nice example of usage of anycompatible type (that is consistent
> with other things in Postgres), but standard says something else.
> It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
> but Tom doesn't like this patch.
> I am more inclined to think so this feature should be implemented
> differently - there is no strong reason to go against standard or against
> the implementations of other databases (and increase the costs of porting).
> Now the implementation is limited, but allowed behaviour is 100% ANSI.

I don't particularly buy this argument.  The case at hand is what to do
if we have, say,

    select lag(integer_column, 1, 1.2) over ...

The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2".  The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work.  But

(1) I doubt that anybody actually writes such things;

(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise.  It is not normally the case that
any information-losing cast would be applied silently within an
expression.

So this deviation from spec doesn't bother me; we have much bigger ones.

My concern with this patch is what I said upthread: I'm not sure that
we should be adjusting this behavior in a piecemeal fashion.  I looked
through pg_proc to find all the functions that have more than one any*
argument, and found these:

                      oid                      | prorettype 
-----------------------------------------------+------------
 lag(anyelement,integer,anyelement)            | anyelement
 lead(anyelement,integer,anyelement)           | anyelement
 width_bucket(anyelement,anyarray)             | integer
 btarraycmp(anyarray,anyarray)                 | integer
 array_eq(anyarray,anyarray)                   | boolean
 array_ne(anyarray,anyarray)                   | boolean
 array_lt(anyarray,anyarray)                   | boolean
 array_gt(anyarray,anyarray)                   | boolean
 array_le(anyarray,anyarray)                   | boolean
 array_ge(anyarray,anyarray)                   | boolean
 array_append(anyarray,anyelement)             | anyarray
 array_prepend(anyelement,anyarray)            | anyarray
 array_cat(anyarray,anyarray)                  | anyarray
 array_larger(anyarray,anyarray)               | anyarray
 array_smaller(anyarray,anyarray)              | anyarray
 array_position(anyarray,anyelement)           | integer
 array_position(anyarray,anyelement,integer)   | integer
 array_positions(anyarray,anyelement)          | integer[]
 array_remove(anyarray,anyelement)             | anyarray
 array_replace(anyarray,anyelement,anyelement) | anyarray
 arrayoverlap(anyarray,anyarray)               | boolean
 arraycontains(anyarray,anyarray)              | boolean
 arraycontained(anyarray,anyarray)             | boolean
 elem_contained_by_range(anyelement,anyrange)  | boolean
 range_contains_elem(anyrange,anyelement)      | boolean
 range_eq(anyrange,anyrange)                   | boolean
 range_ne(anyrange,anyrange)                   | boolean
 range_overlaps(anyrange,anyrange)             | boolean
 range_contains(anyrange,anyrange)             | boolean
 range_contained_by(anyrange,anyrange)         | boolean
 range_adjacent(anyrange,anyrange)             | boolean
 range_before(anyrange,anyrange)               | boolean
 range_after(anyrange,anyrange)                | boolean
 range_overleft(anyrange,anyrange)             | boolean
 range_overright(anyrange,anyrange)            | boolean
 range_union(anyrange,anyrange)                | anyrange
 range_merge(anyrange,anyrange)                | anyrange
 range_intersect(anyrange,anyrange)            | anyrange
 range_minus(anyrange,anyrange)                | anyrange
 range_cmp(anyrange,anyrange)                  | integer
 range_lt(anyrange,anyrange)                   | boolean
 range_le(anyrange,anyrange)                   | boolean
 range_ge(anyrange,anyrange)                   | boolean
 range_gt(anyrange,anyrange)                   | boolean
 range_gist_same(anyrange,anyrange,internal)   | internal
(45 rows)

Now, there's no point in changing range_eq and the later entries in this
table (the ones taking two anyrange's); our rather lame definition of
anycompatiblerange means that we'd get no benefit from doing so.  But
I think there's a strong case for changing everything before range_eq.
It'd be nice if something like

SELECT array[1] < array[1.1];

would work the same way that "SELECT 1 < 1.1" would.

I checked the other concern that I had about whether the more flexible
polymorphic definitions would create any new ambiguities, and I don't
see any problems with this list.  As functions, none of these names are
overloaded, except with different numbers of arguments so there's no
ambiguity.  Most of the array functions are also operators, but the
competing operators do not take arrays, so it doesn't look like there's
any issue on that side either.

So I think we should be more ambitious and generalize all of these
to use anycompatiblearray etc.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Deprecating postfix and factorial operators in PostgreSQL 13
Next
From: Stephen Frost
Date:
Subject: Re: New default role- 'pg_read_all_data'