Thread: Is _ a supported way to create a column of array type?

Is _ a supported way to create a column of array type?

From
Piotr Findeisen
Date:
Hi,

As documented to https://www.postgresql.org/docs/11/arrays.html#ARRAYS-DECLARATION one can create column of an array type using `<typename>[]` form.

Internally, array types get a name in the form of `_<typename>`. 

So -- the question: 
Can a user use `_<typename>` to define a column of array type?
Is it supported?

The reason I am asking is that e.g. int4[] and _int4 behave differently.
Although they look the same, the have different pg_attribute.attndims.

I am testing on Postgres 11.2.

================================================
create table t(a int4[], b _int4);

\d t
                  Table "public.t"
 Column |   Type    | Collation | Nullable | Default
--------+-----------+-----------+----------+---------
 a      | integer[] |           |          |
 b      | integer[] |           |          |


SELECT attname,  attndims FROM pg_attribute att JOIN pg_class tbl ON tbl.oid = att.attrelid WHERE tbl.relname = 't';
 attname  | attndims
----------+----------
...
 a        |        1
 b        |        0
================================================

This has also been discussed previously in 2006 in the https://www.postgresql.org/message-id/8C5B026B51B6854CBE88121DBF097A8651DB95%40ehost010-33.exch010.intermedia.net user group thread. However, it was a while ago (so something might have changed since then) and the conclusion from that discussion wasn't fully clear to me.


Best regards,
Piotr



Re: Is _ a supported way to create a column of array type?

From
Tom Lane
Date:
Piotr Findeisen <piotr.findeisen@starburstdata.com> writes:
> Internally, array types get a name in the form of `_<typename>`.

Typically, yes.

> *Can a user use `_<typename>` to define a column of array type?*

Sure ... didn't you try it?

> *Is it supported?*

Not really, because it's not guaranteed that the name looks like that.
There are various corner cases where something else would be generated,
either to avoid a collision, or because truncation is needed.

However, if you've taken the trouble to check what name actually got
assigned to the array type, it's perfectly valid to use that name.

> The reason I am asking is that e.g. int4[] and _int4 behave differently.
> Although they look the same, the have different pg_attribute.attndims.

Yeah.  Nothing really cares about attndims though ... it's vestigial.
Perhaps we should remove it someday.

            regards, tom lane



Re: Is _ a supported way to create a column of array type?

From
Piotr Findeisen
Date:
Hi Tom,

Thanks for your response.

I think I need to provide some context for my question.
I am maintaining Presto connector to Postgres (https://github.com/prestosql/presto/tree/master/presto-postgresql).

When accessing a table in Postgres, we need to map columns' types to appropriate types in Presto.
For mapping arrays, we need to know number of array dimensions.
Currently we read this from pg_attribute.attndims and this does not work for _<type> columns.

1. is there a better way to get array dimensions for a column of array type?
2. is it possible to make pg_attribute.attndims have correct value when column is defined using _<type> form?


Best,
Piotr




On Thu, Apr 25, 2019 at 11:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Piotr Findeisen <piotr.findeisen@starburstdata.com> writes:
> Internally, array types get a name in the form of `_<typename>`.

Typically, yes.

> *Can a user use `_<typename>` to define a column of array type?*

Sure ... didn't you try it?

> *Is it supported?*

Not really, because it's not guaranteed that the name looks like that.
There are various corner cases where something else would be generated,
either to avoid a collision, or because truncation is needed.

However, if you've taken the trouble to check what name actually got
assigned to the array type, it's perfectly valid to use that name.

> The reason I am asking is that e.g. int4[] and _int4 behave differently.
> Although they look the same, the have different pg_attribute.attndims.

Yeah.  Nothing really cares about attndims though ... it's vestigial.
Perhaps we should remove it someday.

                        regards, tom lane

Re: Is _ a supported way to create a column of array type?

From
Tom Lane
Date:
Piotr Findeisen <piotr.findeisen@starburstdata.com> writes:
> I think I need to provide some context for my question.
> ...
> When accessing a table in Postgres, we need to map columns' types to
> appropriate types in Presto.
> For mapping arrays, we need to know number of array dimensions.
> Currently we read this from pg_attribute.attndims and this does not work
> for _<type> columns.

Well, you've got a conceptual problem there, which is exactly the
assumption that attndims is meaningful :-(.

In the first place, the Postgres type system doesn't distinguish
arrays of different numbers of dimensions, ie, int4[][] is not
really different from int4[].  So you can't attach any very strong
meaning to attndims = 2 vs attndims = 1.

In the second place, we don't bother to fill attndims when the
user doesn't write any brackets, which is what would happen with
a type spec of "_int4" rather than "int4[]".  So I guess you could
say that what attndims records is the number of brackets that were
written in the table creation command, but that unfortunately has
got no real semantic significance.

The right way (TM) to decide if a column is of array type is to
look at the pg_type entry its atttypid points at and see if that
is an array type.  Depending on what you want to do, any of these
tests on the pg_type entry might be reasonable:
1. has nonzero typelem.  (This basically means that the column can
   be subscripted, so it includes fixed-length "array" types such
   as "point", which you might not want to accept.)
2. has nonzero typelem and typlen = -1.  (This restricts it to
   varlena arrays, which are the generic kind of array.)
3. has typcategory "A".  (This should be effectively the same
   as method 2, I think, though the backend code doesn't rely
   on typcategory for such decisions.  Conceivably you'd do this
   if you wanted to let users mark weird types as being arrays.)

> 2. is it possible to make pg_attribute.attndims have correct value when
> column is defined using _<type> form?

Even if we wanted to put work into a column that's so vestigial that
taking it out is a reasonable proposal, we would certainly never
back-patch such a change; nor would existing catalog entries change
even if we did.  So you pretty much have to deal with the facts on
the ground, which are that attndims is largely useless.

            regards, tom lane



Re: Is _ a supported way to create a column of array type?

From
Piotr Findeisen
Date:
Hi Tom,

I think I understand now.

Postgres type system does not distinguish between array types with different
number of dimensions. int[], int[][] and int[][][][][][] are all equivalent to the type system.

Number of dimensions is part of the value though and execution takes care of it.
If I subscript an array with wrong "number of brackets" (as in
`select (array[1,2,3])[1][1][1]`) I get NULL. 

Presto type system however distinguishes array(integer), array(array(integer))...
(using Presto therms). And execution is (expectedly) not as flexible.

We can inspect number of brackets that were written in the table creation
command but that's inferring (and enforcing) strong typing for something that is
not strongly typed. Thus, we can fail at execution.

Do you have any plans to support arrays with different number of dimensions
in the type system?

Best,
Piotr

Re: Is _ a supported way to create a column of array type?

From
Joe Conway
Date:
On 4/26/19 6:50 AM, Piotr Findeisen wrote:
> Presto type system however distinguishes array(integer),
> array(array(integer))...
> (using Presto therms). And execution is (expectedly) not as flexible.

Fine point here -- Postgres does not actually have arrays-of-arrays, it
has multi-dimension arrays.

So in other words, you cannot have an array of element type _int4, but
rather you literally have an array of element type int4 with two
dimension, etc..

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: Is _ a supported way to create a column of array type?

From
Tom Lane
Date:
Piotr Findeisen <piotr.findeisen@starburstdata.com> writes:
> Do you have any plans to support arrays with different number of dimensions
> in the type system?

Mmm ... don't hold your breath.  People have speculated about that,
but I haven't seen any actual proposals, and it's hard to see how
we could do it without creating compatibility problems that would
outweigh the value of the feature.

In very late-model Postgres (I think just 11 and up) you can sort
of fake it by using arrays of domains:

regression=# create domain intarray as int4[];
CREATE DOMAIN
regression=# create table foo (f1 intarray[]);
CREATE TABLE
regression=# insert into foo values(array[array[4]]);
ERROR:  column "f1" is of type intarray[] but expression is of type integer[]
LINE 1: insert into foo values(array[array[4]]);
                               ^
HINT:  You will need to rewrite or cast the expression.
regression=# insert into foo values(array[array[4]::intarray]);
INSERT 0 1

But as this example shows, it's not exactly a transparent solution.
It might be possible to make this specific case work better, but
I think you'd inevitably end up needing lots of explicit casts.

            regards, tom lane