Re: Compatible defaults for LEAD/LAG - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Compatible defaults for LEAD/LAG
Date
Msg-id CAFj8pRAFhk_Ogoj7nmotAH-MTEJap35cGRZT9AShXBKUVumLig@mail.gmail.com
Whole thread Raw
In response to Re: Compatible defaults for LEAD/LAG  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Compatible defaults for LEAD/LAG
List pgsql-hackers


ne 30. 8. 2020 v 23:59 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> This is nice example of usage of anycompatible type (that is consistent
> with other things in Postgres), but standard says something else.
> It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
> but Tom doesn't like this patch.
> I am more inclined to think so this feature should be implemented
> differently - there is no strong reason to go against standard or against
> the implementations of other databases (and increase the costs of porting).
> Now the implementation is limited, but allowed behaviour is 100% ANSI.

I don't particularly buy this argument.  The case at hand is what to do
if we have, say,

        select lag(integer_column, 1, 1.2) over ...

The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2".  The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work.  But

(1) I doubt that anybody actually writes such things;

(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise.  It is not normally the case that
any information-losing cast would be applied silently within an
expression.

postgres=# create table foo(a int);
CREATE TABLE
postgres=# insert into foo values(1.1);
INSERT 0 1

postgres=# create table foo(a int default 1.1);
CREATE TABLE
postgres=# insert into foo values(default);
INSERT 0 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ 1 │
└───┘
(1 row)

It is maybe strange, but it is not an unusual pattern in SQL. I think it can be analogy with default clause

DECLARE a int DEFAULT 1.2;

The default value doesn't change a type of variable. This is maybe too stupid example. There can be other little bit more realistic

CREATE OR REPLACE FUNCTION foo(a numeric, b numeric, ...
DECLARE x int DEFAULT a;
BEGIN
  ...

I am afraid about performance - if default value can change type, then some other things can stop work - like using index.

For *this* case we don't speak about some operations between two independent operands or function arguments. We are speaking about
the value and about a *default* for the value.


So this deviation from spec doesn't bother me; we have much bigger ones.

ok,  if it is acceptable for other people, I can accept it too - I understand well so it is a corner case and there is some consistency with other Postgres features.

Maybe this difference should be mentioned in documentation.


My concern with this patch is what I said upthread: I'm not sure that
we should be adjusting this behavior in a piecemeal fashion.  I looked
through pg_proc to find all the functions that have more than one any*
argument, and found these:

                      oid                      | prorettype
-----------------------------------------------+------------
 lag(anyelement,integer,anyelement)            | anyelement
 lead(anyelement,integer,anyelement)           | anyelement
 width_bucket(anyelement,anyarray)             | integer
 btarraycmp(anyarray,anyarray)                 | integer
 array_eq(anyarray,anyarray)                   | boolean
 array_ne(anyarray,anyarray)                   | boolean
 array_lt(anyarray,anyarray)                   | boolean
 array_gt(anyarray,anyarray)                   | boolean
 array_le(anyarray,anyarray)                   | boolean
 array_ge(anyarray,anyarray)                   | boolean
 array_append(anyarray,anyelement)             | anyarray
 array_prepend(anyelement,anyarray)            | anyarray
 array_cat(anyarray,anyarray)                  | anyarray
 array_larger(anyarray,anyarray)               | anyarray
 array_smaller(anyarray,anyarray)              | anyarray
 array_position(anyarray,anyelement)           | integer
 array_position(anyarray,anyelement,integer)   | integer
 array_positions(anyarray,anyelement)          | integer[]
 array_remove(anyarray,anyelement)             | anyarray
 array_replace(anyarray,anyelement,anyelement) | anyarray
 arrayoverlap(anyarray,anyarray)               | boolean
 arraycontains(anyarray,anyarray)              | boolean
 arraycontained(anyarray,anyarray)             | boolean
 elem_contained_by_range(anyelement,anyrange)  | boolean
 range_contains_elem(anyrange,anyelement)      | boolean
 range_eq(anyrange,anyrange)                   | boolean
 range_ne(anyrange,anyrange)                   | boolean
 range_overlaps(anyrange,anyrange)             | boolean
 range_contains(anyrange,anyrange)             | boolean
 range_contained_by(anyrange,anyrange)         | boolean
 range_adjacent(anyrange,anyrange)             | boolean
 range_before(anyrange,anyrange)               | boolean
 range_after(anyrange,anyrange)                | boolean
 range_overleft(anyrange,anyrange)             | boolean
 range_overright(anyrange,anyrange)            | boolean
 range_union(anyrange,anyrange)                | anyrange
 range_merge(anyrange,anyrange)                | anyrange
 range_intersect(anyrange,anyrange)            | anyrange
 range_minus(anyrange,anyrange)                | anyrange
 range_cmp(anyrange,anyrange)                  | integer
 range_lt(anyrange,anyrange)                   | boolean
 range_le(anyrange,anyrange)                   | boolean
 range_ge(anyrange,anyrange)                   | boolean
 range_gt(anyrange,anyrange)                   | boolean
 range_gist_same(anyrange,anyrange,internal)   | internal
(45 rows)

Now, there's no point in changing range_eq and the later entries in this
table (the ones taking two anyrange's); our rather lame definition of
anycompatiblerange means that we'd get no benefit from doing so.  But
I think there's a strong case for changing everything before range_eq.
It'd be nice if something like

SELECT array[1] < array[1.1];

would work the same way that "SELECT 1 < 1.1" would.

There it has sense without any discussion. But it is a little bit different than using the default value in the LAG function.


I checked the other concern that I had about whether the more flexible
polymorphic definitions would create any new ambiguities, and I don't
see any problems with this list.  As functions, none of these names are
overloaded, except with different numbers of arguments so there's no
ambiguity.  Most of the array functions are also operators, but the
competing operators do not take arrays, so it doesn't look like there's
any issue on that side either.

So I think we should be more ambitious and generalize all of these
to use anycompatiblearray etc.

+1

Pavel

                        regards, tom lane

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: v13: show extended stats target in \d
Next
From: Rahila
Date:
Subject: Re: More tests with USING INDEX replident and dropped indexes