Re: array support patch phase 1 patch - Mailing list pgsql-patches

From Joe Conway
Subject Re: array support patch phase 1 patch
Date
Msg-id 3E949D12.6030200@joeconway.com
Whole thread Raw
In response to Re: array support patch phase 1 patch  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: array support patch phase 1 patch
List pgsql-patches
Tom Lane wrote:
> I've applied much but not all of this.  Some comments:

Thanks for the review and cleanup!

> I didn't apply any of the aggregate changes, because I'm unsure of how
> we actually want to do that.  AFAICT you've set it up so that the
> polymorphism of the transition function is hidden within the aggregate,
> and one must still create a separate aggregate for each input datatype.
> Couldn't we fix things so that a polymorphic transition function leads
> to a polymorphic aggregate?  I'm not totally sure that makes sense, but
> it seems worth thinking about.

I went back and forth on that issue myself more than once while I was
working on this. I'll take another look at making aggregate polymorphic
as well.

> Also I didn't put in the bool_op stuff.  That seemed pretty messy; in
> particular I didn't care for looking at the operator names to decide
> what to do.  Another problem is that the actual lookup of the scalar
> operators would be schema search path dependent.  I'd feel more
> comfortable with something that created a tighter binding of the array
> operators to the underlying scalar operators.  Not sure how to do it,
> though.

But the lookup would be schema search path dependent if we were given
two scalars, so I don't see this as any different. Would it be better to
use the same operators as the scalars ("=", "<>", ...etc)? It makes
sense to me that "array = element" should apply the "=" operator for the
element data type, across all of the array elements. Maybe this takes us
back to Peter's suggestion:
   expression IN (array)
   expression NOT IN (array)
   expression operator ANY (array)
   expression operator SOME (array)
   (expression) operator (array)
   (expression) operator ALL (array)

In this case operator is taken as the appropriate operator for the
expression type as both left and right arguments, and array is treated
the same as a one column subquery.

> There are a number of remaining annoyances in array type assignment and
> coercion.  Here's one:
>
> create table t1 (p point[]);
> insert into t1 values(array['(3,4)','(4,5)','(6,7)']);
> ERROR:  column "p" is of type point[] but expression is of type text[]
>         You will need to rewrite or cast the expression
> Instead you have to write
> insert into t1 values(array['(3,4)'::point,'(4,5)','(6,7)']);
>
> I'm not sure if there's any way around this, but it certainly reduces the
> usefulness of untyped literals.  Right offhand it seems like maybe we could
> invent an UNKNOWNARRAY type so as to postpone the decision about what the
> array's type should be --- but I'm worried about possible side-effects.
> I think there are places where the existence of "unknown[]" might allow
> the type resolution logic to follow paths we don't want it to follow.
>
> I think there are a lot of places where binary-compatible element types
> and domain element types will not be treated reasonably.  We need to think
> about where and how to handle that.

Yeah, I think this is what led me to the hack in parse_coerce that you
didn't like ;-), but I'm not sure how to better handle it. I'll think a
bit more on it.

>
> Another problem is:
>
> regression=# select distinct array(select * from text_tbl) from foo;
> ERROR:  Unable to identify an equality operator for type text[]
>
> This DISTINCT query would fail anyway of course, for lack of a '<'
> operator for arrays, but it seems like we ought to be able to find the
> polymorphic '=' operator.
>
> We really ought to reimplement array_eq to be less bogus: instead of bit
> equality it ought to be applying the element type's equality operator.

OK. I'll look at these issues again. Should I also look to implement:
   array <> array
   array > array
   array < array
   array >= array
   array <= array

as Hannu suggested?


> I rearranged the way that parse_coerce and function/operator overloading
> resolution handle arrays.  It seems cleaner to me, but take a look and
> see what you think.

OK.


> One thing I didn't like about the implementation of many of these functions
> is that they were willing to repeat catalog lookups on every call.  I fixed
> array_type_coerce to cache lookup results, please see if you can apply the
> technique elsewhere.

OK.

> The selectivity estimation functions you'd assigned to the bool_ops operators
> seemed like pretty bogus choices.  I am not sure we can use scalar estimators
> for these at all --- perhaps a new set of estimators needs to be written.

I figured you wouldn't like that, but I was at a loss as to how to
approach something better. I'll take a shot at writing new estimators
based on the scalar ones.


> AFAICT, SQL99 specifies the syntax "typename ARRAY [ intconst ]" and
> nothing else.  I do not see a reason to accept ARRAY without [],
> ARRAY with empty [], ARRAY with multiple [], etc if the spec doesn't
> make us do so.  The existing Postgres syntax without the word ARRAY
> gets the job done just fine.

I had in my notes this (I think from SQL200x):

6.1 <data type>
<collection type> ::= <array type> | <multiset type>
<array type> ::=
   <data type> ARRAY [ <left bracket or trigraph>
                       <unsigned integer>
                       <right bracket or trigraph> ]

So I thought
   typename ARRAY [ intconst ]
   typename ARRAY
were both per spec. As far as "typename ARRAY [ ]", ARRAY with multiple
[], etc goes, I only included them for consistency, but I don't care if
we leave them out either.

As always, a thorough review and a lot to think about! Thanks for the
help. It will be at least a couple days before I can pick this up again
(my development system lost a cpu fan yesterday, so I'm taking this as
an opportunity to upgrade ;-)), but I'll get on it as soon as I can.

Joe


pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: array support patch phase 1 patch
Next
From: Tom Lane
Date:
Subject: Re: array support patch phase 1 patch