Thread: Is _ a supported way to create a column of array type?
Hi,
As documented to https://www.postgresql.org/docs/11/arrays.html#ARRAYS-DECLARATION one can create column of an array type using `<typename>[]` form.
Internally, array types get a name in the form of `_<typename>`.
This is documented https://www.postgresql.org/docs/11/sql-createtype.html#id-1.9.3.94.5.9
So -- the question:
Can a user use `_<typename>` to define a column of array type?
Is it supported?
The reason I am asking is that e.g. int4[] and _int4 behave differently.
Although they look the same, the have different pg_attribute.attndims.
I am testing on Postgres 11.2.
================================================
create table t(a int4[], b _int4);
\d t
Table "public.t"
Column | Type | Collation | Nullable | Default
--------+-----------+-----------+----------+---------
a | integer[] | | |
b | integer[] | | |
SELECT attname, attndims FROM pg_attribute att JOIN pg_class tbl ON tbl.oid = att.attrelid WHERE tbl.relname = 't';
attname | attndims
----------+----------
...
a | 1
b | 0
================================================
This has also been discussed previously in 2006 in the https://www.postgresql.org/message-id/8C5B026B51B6854CBE88121DBF097A8651DB95%40ehost010-33.exch010.intermedia.net user group thread. However, it was a while ago (so something might have changed since then) and the conclusion from that discussion wasn't fully clear to me.
Best regards,
Piotr
Piotr Findeisen <piotr.findeisen@starburstdata.com> writes: > Internally, array types get a name in the form of `_<typename>`. Typically, yes. > *Can a user use `_<typename>` to define a column of array type?* Sure ... didn't you try it? > *Is it supported?* Not really, because it's not guaranteed that the name looks like that. There are various corner cases where something else would be generated, either to avoid a collision, or because truncation is needed. However, if you've taken the trouble to check what name actually got assigned to the array type, it's perfectly valid to use that name. > The reason I am asking is that e.g. int4[] and _int4 behave differently. > Although they look the same, the have different pg_attribute.attndims. Yeah. Nothing really cares about attndims though ... it's vestigial. Perhaps we should remove it someday. regards, tom lane
Hi Tom,
Thanks for your response.
I think I need to provide some context for my question.
I am maintaining Presto connector to Postgres (https://github.com/prestosql/presto/tree/master/presto-postgresql).
When accessing a table in Postgres, we need to map columns' types to appropriate types in Presto.
For mapping arrays, we need to know number of array dimensions.
Currently we read this from pg_attribute.attndims and this does not work for _<type> columns.
1. is there a better way to get array dimensions for a column of array type?
2. is it possible to make pg_attribute.attndims have correct value when column is defined using _<type> form?
Best,
Piotr
On Thu, Apr 25, 2019 at 11:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Piotr Findeisen <piotr.findeisen@starburstdata.com> writes:
> Internally, array types get a name in the form of `_<typename>`.
Typically, yes.
> *Can a user use `_<typename>` to define a column of array type?*
Sure ... didn't you try it?
> *Is it supported?*
Not really, because it's not guaranteed that the name looks like that.
There are various corner cases where something else would be generated,
either to avoid a collision, or because truncation is needed.
However, if you've taken the trouble to check what name actually got
assigned to the array type, it's perfectly valid to use that name.
> The reason I am asking is that e.g. int4[] and _int4 behave differently.
> Although they look the same, the have different pg_attribute.attndims.
Yeah. Nothing really cares about attndims though ... it's vestigial.
Perhaps we should remove it someday.
regards, tom lane
Piotr Findeisen <piotr.findeisen@starburstdata.com> writes: > I think I need to provide some context for my question. > ... > When accessing a table in Postgres, we need to map columns' types to > appropriate types in Presto. > For mapping arrays, we need to know number of array dimensions. > Currently we read this from pg_attribute.attndims and this does not work > for _<type> columns. Well, you've got a conceptual problem there, which is exactly the assumption that attndims is meaningful :-(. In the first place, the Postgres type system doesn't distinguish arrays of different numbers of dimensions, ie, int4[][] is not really different from int4[]. So you can't attach any very strong meaning to attndims = 2 vs attndims = 1. In the second place, we don't bother to fill attndims when the user doesn't write any brackets, which is what would happen with a type spec of "_int4" rather than "int4[]". So I guess you could say that what attndims records is the number of brackets that were written in the table creation command, but that unfortunately has got no real semantic significance. The right way (TM) to decide if a column is of array type is to look at the pg_type entry its atttypid points at and see if that is an array type. Depending on what you want to do, any of these tests on the pg_type entry might be reasonable: 1. has nonzero typelem. (This basically means that the column can be subscripted, so it includes fixed-length "array" types such as "point", which you might not want to accept.) 2. has nonzero typelem and typlen = -1. (This restricts it to varlena arrays, which are the generic kind of array.) 3. has typcategory "A". (This should be effectively the same as method 2, I think, though the backend code doesn't rely on typcategory for such decisions. Conceivably you'd do this if you wanted to let users mark weird types as being arrays.) > 2. is it possible to make pg_attribute.attndims have correct value when > column is defined using _<type> form? Even if we wanted to put work into a column that's so vestigial that taking it out is a reasonable proposal, we would certainly never back-patch such a change; nor would existing catalog entries change even if we did. So you pretty much have to deal with the facts on the ground, which are that attndims is largely useless. regards, tom lane
Hi Tom,
I think I understand now.
Postgres type system does not distinguish between array types with different
number of dimensions. int[], int[][] and int[][][][][][] are all equivalent to the type system.
Number of dimensions is part of the value though and execution takes care of it.
If I subscript an array with wrong "number of brackets" (as in
`select (array[1,2,3])[1][1][1]`) I get NULL.
Presto type system however distinguishes array(integer), array(array(integer))...
(using Presto therms). And execution is (expectedly) not as flexible.
We can inspect number of brackets that were written in the table creation
command but that's inferring (and enforcing) strong typing for something that is
not strongly typed. Thus, we can fail at execution.
Do you have any plans to support arrays with different number of dimensions
in the type system?
Best,
Piotr
On 4/26/19 6:50 AM, Piotr Findeisen wrote: > Presto type system however distinguishes array(integer), > array(array(integer))... > (using Presto therms). And execution is (expectedly) not as flexible. Fine point here -- Postgres does not actually have arrays-of-arrays, it has multi-dimension arrays. So in other words, you cannot have an array of element type _int4, but rather you literally have an array of element type int4 with two dimension, etc.. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Attachment
Piotr Findeisen <piotr.findeisen@starburstdata.com> writes: > Do you have any plans to support arrays with different number of dimensions > in the type system? Mmm ... don't hold your breath. People have speculated about that, but I haven't seen any actual proposals, and it's hard to see how we could do it without creating compatibility problems that would outweigh the value of the feature. In very late-model Postgres (I think just 11 and up) you can sort of fake it by using arrays of domains: regression=# create domain intarray as int4[]; CREATE DOMAIN regression=# create table foo (f1 intarray[]); CREATE TABLE regression=# insert into foo values(array[array[4]]); ERROR: column "f1" is of type intarray[] but expression is of type integer[] LINE 1: insert into foo values(array[array[4]]); ^ HINT: You will need to rewrite or cast the expression. regression=# insert into foo values(array[array[4]::intarray]); INSERT 0 1 But as this example shows, it's not exactly a transparent solution. It might be possible to make this specific case work better, but I think you'd inevitably end up needing lots of explicit casts. regards, tom lane