Thread: Array type confusion

Array type confusion

From
Peter Eisentraut
Date:
The question is how to determine when a type is an array type (and not
using the leading-underscore convention). A comment in pg_type.h says:

/** typelem is 0 if this is not an array type.  If this is an array* type, typelem is the OID of the type of the
elementsof the array* (it identifies another row in Table pg_type).*/
 

The reverse seems to be false. If typelem is not 0, then the type is not
necessarily an array type. For example, the typelem entries of text,
bpchar, and name point to char (the single-byte variant), while box and
lseg claim to be arrays of "point".

How should this be handled in the context of formatting the types for
reconsumption?


Appendix: The complete list of not-really-array types that have typelem
set is:

bytea        => char
name        => char
int2vector    => int2
text        => char
oidvector    => oid
point        => float8
lseg        => point
path        => point
box        => point
filename    => char
line        => point
unknown        => char
bpchar        => char
varchar        => char


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: Array type confusion

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> The question is how to determine when a type is an array type (and not
> using the leading-underscore convention). A comment in pg_type.h says:
>  * typelem is 0 if this is not an array type.  If this is an array
>  * type, typelem is the OID of the type of the elements of the array
>  * (it identifies another row in Table pg_type).
> The reverse seems to be false. If typelem is not 0, then the type is not
> necessarily an array type. For example, the typelem entries of text,
> bpchar, and name point to char (the single-byte variant), while box and
> lseg claim to be arrays of "point".

I don't think that the typelem values presently given in pg_type for
these datatypes are necessarily sacrosanct.  In fact, some of these
demonstrably don't work.

AFAICT, the array-subscripting code supports two cases: genuine arrays
(variable-size, variable number of dimensions, array header data) and
fixed-length pseudo-array types like oidvector.  So, for example,
the fact that oidvector is declared with typelem = oid makes it possible
to write things like "select proargtypes[1] from pg_proc", even though
oidvector is a basic type and not a genuine array.

The way array_ref tells the difference is that typlen = -1 means a
real array, typlen > 0 means one of the pseudo-array types.  It does
not work to subscript a varlena type that's not really an array.
For example, you get bogus results if you try to subscript a text value.

I believe we need to remove the typelem specifications from these
varlena datatypes:
  17 | bytea  25 | text 602 | path 705 | unknown1042 | bpchar1043 | varchar

since subscripting them doesn't work and can't work without additional
information provided to array_ref.

If we do that then your type formatter can distinguish "real" array
types as being those with typelem != 0 and typlen < 0.  If typlen > 0
then treat it as a basic type regardless of typelem.
        regards, tom lane