Thread: Supporting NULL elements in arrays

Supporting NULL elements in arrays

From
Tom Lane
Date:
I'm starting to think about what it'll take to allow arrays to contain
elements that are NULL.  The behavioral semantics seem pretty
straightforward, but there are a couple of areas that need discussion.

One trouble spot is what the I/O representation should look like.
Since 8.0, the array input parser has rejected empty elements:

regression=# select '{a,,c}'::text[];
ERROR:  malformed array literal: "{a,,c}"
regression=# select '{a, ,c}'::text[];
ERROR:  malformed array literal: "{a, ,c}"
-- the right way to specify a zero-length string is:
regression=# select '{a,"",c}'::text[];  text
----------{a,"",c}
(1 row)

and so the most straightforward thing to do is define an empty element
as meaning a NULL.  But this might be objected to on a couple of grounds:

1. "Since 8.0" isn't really old enough --- there may well be applications
still out there that think '{a,,c}'::text[] should produce a zero-length
string element and not a NULL element.  (Note: this isn't a hazard for
reloading old dump files, because the array output routine has dumped
empty strings as "" since 7.0 if not before.)

2. Even today, the array documentation athttp://developer.postgresql.org/docs/postgres/arrays.html
doesn't actually *say* that empty elements are disallowed.

I don't see any alternatives that seem better, though, and the
empty-element convention at least has the virtue of being reasonably
compatible with what we did for I/O of composite data types.  Anyone
have a better idea?

Another issue is what to do with the internal representation.  I think
that basically we want to insert a null bitmap just following the
dimension info, with the option to omit the bitmap if there are no
null values.  Now, when the bitmap is present, it would be fairly tedious
to calculate the length of the bitmap to determine the offset to the
actual data; and that's an operation that we'd like to be cheap.  What
I'm thinking of doing is commandeering the whole "flags" field of
ArrayType (which is currently unused and should be always zero), and
redefining it thus:zero: bitmap not present (fully backwards-compatible case)not zero: bitmap is present; value is
offsetto start of data
 
I wouldn't propose doing this if I thought we had any pressing reason
to save some array flag bits for some other purpose; but I can't think
of anything else we might want 'em for.  Did anyone have any pet ideas
this would foreclose?

Also, with respect to the binary I/O representation, I'd suggest replacing
the "flag" word with "zero if no bitmap, 1 if bitmap present".  This would
require both sender and receiver to calculate the bitmap length from the
given dimensions, but it seems best to require that for error-checking
reasons.

Thoughts anyone?
        regards, tom lane


Re: Supporting NULL elements in arrays

From
"Pavel Stehule"
Date:
Hello,

it's great news. My personal opinion about formating NULL values  '{a,"",b}  -- wrong, "" means empty string, isn't
null '{a,,b} '   -- ok, maybe not unique,  '{a, NULL, b}' -- longer, clean NULL is NULL  '{a, "NULL", b}' -- "NULL" is
notnull varchar 'NULL'
 

Flags for array? Maybe bit isHash? So, hash array can be in line

array->null array->hash array

or flag for nor regular array (sparse array), not type unique arrays (array 
can contains different types)

Regards
Pavel Stehule

_________________________________________________________________
Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. 
http://www.msn.cz/



Re: Supporting NULL elements in arrays

From
Tom Lane
Date:
"Pavel Stehule" <pavel.stehule@hotmail.com> writes:
> it's great news. My personal opinion about formating NULL values
>    '{a, NULL, b}' -- longer, clean NULL is NULL

Unfortunately, that already has a meaning, and it's not that.
        regards, tom lane


Re: Supporting NULL elements in arrays

From
Andrew Dunstan
Date:

Tom Lane wrote:

>"Pavel Stehule" <pavel.stehule@hotmail.com> writes:
>  
>
>>it's great news. My personal opinion about formating NULL values
>>   '{a, NULL, b}' -- longer, clean NULL is NULL
>>    
>>
>
>Unfortunately, that already has a meaning, and it's not that.
>
>
>  
>

What a pity. I don't see any alternative to the empty element proposal - 
it's worth the possible breakage.

cheers

andrew


Re: Supporting NULL elements in arrays

From
"Pollard, Mike"
Date:
> and so the most straightforward thing to do is define an empty element
> as meaning a NULL.  But this might be objected to on a couple of
grounds:

Can you use a default to allow the user to specify the default value for
an element?  May look a little strange, though, if the user specifies a
default array and a default element value, like:

CREATE TABLE ARR(ARR CHAR(30)[] DEFAULT '{"hello", "good bye"}' NULL);

So the first default is the array default; specify NULL if you don't
want one but do want an array element default; the second, if present,
is the array element default.  I'm not sure I like this or not, but it's
an idea.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc




Re: Supporting NULL elements in arrays

From
Harald Fuchs
Date:
In article <4370B990.7090100@dunslane.net>,
Andrew Dunstan <andrew@dunslane.net> writes:

> Tom Lane wrote:

>> "Pavel Stehule" <pavel.stehule@hotmail.com> writes:
>> 
>>> it's great news. My personal opinion about formating NULL values
>>> '{a, NULL, b}' -- longer, clean NULL is NULL
>>> 
>> 
>> Unfortunately, that already has a meaning, and it's not that.
>> 
>> 
>> 

> What a pity. I don't see any alternative to the empty element proposal
> -
> it's worth the possible breakage.

How about '{a, \N, b}'?



Re: Supporting NULL elements in arrays

From
Tom Lane
Date:
"Pollard, Mike" <mpollard@cincom.com> writes:
> Can you use a default to allow the user to specify the default value for
> an element?

There's no mechanism for specifying a default value for individual
elements within an array; and I've never heard one asked for.  Certainly
it's far less interesting than allowing nulls within an array.  So
I don't think we need to give pride of syntax place to that.
        regards, tom lane


Re: Supporting NULL elements in arrays

From
"Pollard, Mike"
Date:
> and so the most straightforward thing to do is define an empty element
> as meaning a NULL.  But this might be objected to on a couple of
grounds:

Can you use a default to allow the user to specify the default value for
an element?  May look a little strange, though, if the user specifies a
default array and a default element value, like:

CREATE TABLE ARR(ARR CHAR(30)[] DEFAULT '{"hello", "good bye"}' NULL);

So the first default is the array default; specify NULL if you don't
want one but do want an array element default; the second, if present,
is the array element default.  I'm not sure I like this or not, but it's
an idea.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc




Re: Supporting NULL elements in arrays

From
Tom Lane
Date:
Harald Fuchs <hf0923x@protecting.net> writes:
>> Tom Lane wrote:
>>> Unfortunately, that already has a meaning, and it's not that.

> How about '{a, \N, b}'?

That's valid syntax too, ie, adopting that syntax would break
applications that are not broken today.

Not to mention that it would be gratuitously different from the notation
for NULLs in composite-type literals.
        regards, tom lane


Re: Supporting NULL elements in arrays

From
Tom Lane
Date:
I wrote:
> ... the most straightforward thing to do is define an empty element
> as meaning a NULL.  But this might be objected to on a couple of grounds:

I just thought of another, potentially fatal objection: it's ambiguous
whether '{}'::text[] should be taken to mean an empty (zero-length)
array or an array containing a single NULL element.

For backwards compatibility it should mean an empty array, but then
there's no way to represent ARRAY(NULL) in data dumps, which won't
do either.

The only workaround that comes to mind is to allow explicit
specification of what's meant:  '[1:1]{}' would be needed to represent
the one-null case.  Ugly.

Ideas anyone?
        regards, tom lane


Re: Supporting NULL elements in arrays

From
mark@mark.mielke.cc
Date:
On Tue, Nov 08, 2005 at 07:21:34PM -0500, Tom Lane wrote:
> I wrote:
> > ... the most straightforward thing to do is define an empty element
> > as meaning a NULL.  But this might be objected to on a couple of grounds:
> I just thought of another, potentially fatal objection: it's ambiguous
> whether '{}'::text[] should be taken to mean an empty (zero-length)
> array or an array containing a single NULL element.

To take another perspective on this, though - it isn't possible to have
NULL array elements right now, therefore, there is no precident, and who
is to say that {NULL} <> {}?

For example:
   vhosts=> select ('{1, 3}'::text[])[4];    text   ------
   (1 row)
   vhosts=> select ('{}'::text[])[4];    text   ------
   (1 row)

Perhaps NULL at end of array never needs to be stored, and arrays can
be considered to have an infinite number of NULL values at the end?

For array operations that require a length, such as cross-product, or
whatever, the 'length' of the array, would be the number of elements
before the infinite number of NULL values at the end.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: Supporting NULL elements in arrays

From
"Jim C. Nasby"
Date:
On Tue, Nov 08, 2005 at 07:21:34PM -0500, Tom Lane wrote:
> I wrote:
> > ... the most straightforward thing to do is define an empty element
> > as meaning a NULL.  But this might be objected to on a couple of grounds:
> 
> I just thought of another, potentially fatal objection: it's ambiguous
> whether '{}'::text[] should be taken to mean an empty (zero-length)
> array or an array containing a single NULL element.
> 
> For backwards compatibility it should mean an empty array, but then
> there's no way to represent ARRAY(NULL) in data dumps, which won't
> do either.
> 
> The only workaround that comes to mind is to allow explicit
> specification of what's meant:  '[1:1]{}' would be needed to represent
> the one-null case.  Ugly.

Instead of bending over backwards to try and support older cases, would
a compatability mode be possible? Seems that would solve a lot of
problems.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Supporting NULL elements in arrays

From
Joe Conway
Date:
Jim C. Nasby wrote:
> On Tue, Nov 08, 2005 at 07:21:34PM -0500, Tom Lane wrote:
>>>... the most straightforward thing to do is define an empty element
>>>as meaning a NULL.  But this might be objected to on a couple of grounds:
>>
>>I just thought of another, potentially fatal objection: it's ambiguous
>>whether '{}'::text[] should be taken to mean an empty (zero-length)
>>array or an array containing a single NULL element.
>>
>>For backwards compatibility it should mean an empty array, but then
>>there's no way to represent ARRAY(NULL) in data dumps, which won't
>>do either.
>>
>>The only workaround that comes to mind is to allow explicit
>>specification of what's meant:  '[1:1]{}' would be needed to represent
>>the one-null case.  Ugly.
> 
> Instead of bending over backwards to try and support older cases, would
> a compatability mode be possible? Seems that would solve a lot of
> problems.

Last time I thought about this problem, that's what I concluded. I don't 
think there is a reasonable and backward compatible solution.

I also think the best non-compatible solution is to require non-numeric 
elements to be delimited (double quotes, configurable?), and use NULL 
unadorned to represent NULL.

Joe


Re: Supporting NULL elements in arrays

From
Christopher Kings-Lynne
Date:
> I also think the best non-compatible solution is to require non-numeric 
> elements to be delimited (double quotes, configurable?), and use NULL 
> unadorned to represent NULL.

I think the ultimate solution should have null values represented by 
NULL...  I mean NULL is NULL :)

Chris



Re: Supporting NULL elements in arrays

From
Marc Munro
Date:
How about representing null array elements with a special-case cast-like
null?  Something like {::null}

__
Marc

On Tue, 2005-11-08 at 23:30 -0400, pgsql-hackers-owner@postgresql.org
wrote:
> Date: Tue, 08 Nov 2005 19:21:34 -0500
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: pgsql-hackers@postgreSQL.org
> Subject: Re: Supporting NULL elements in arrays
> Message-ID: <10097.1131495694@sss.pgh.pa.us>
>
> I wrote:
> > ... the most straightforward thing to do is define an empty element
> > as meaning a NULL.  But this might be objected to on a couple of
> grounds:
>
> I just thought of another, potentially fatal objection: it's ambiguous
> whether '{}'::text[] should be taken to mean an empty (zero-length)
> array or an array containing a single NULL element.
>
> For backwards compatibility it should mean an empty array, but then
> there's no way to represent ARRAY(NULL) in data dumps, which won't
> do either.
>
> The only workaround that comes to mind is to allow explicit
> specification of what's meant:  '[1:1]{}' would be needed to represent
> the one-null case.  Ugly.
>
> Ideas anyone?
>
>                         regards, tom lane
>

Re: Supporting NULL elements in arrays

From
Tom Lane
Date:
Marc Munro <marc@bloodnok.com> writes:
> How about representing null array elements with a special-case cast-like
> null?  Something like {::null}

The problem with *any* proposal for let's-spell-NULL-like-this
is that "this" is inevitably a string that is valid as a text
data value.  Making the string weirder may reduce the odds of
a failure but the risk is still there, and meanwhile you're making
the null marker less readable.  (The above has the additional problem
of being indistinguishable from a sub-array.)

I think we really need something that is rejected as syntactically
invalid by the existing array input parser.
        regards, tom lane


Re: Supporting NULL elements in arrays

From
mark@mark.mielke.cc
Date:
On Wed, Nov 09, 2005 at 11:22:37AM -0500, Tom Lane wrote:
> Marc Munro <marc@bloodnok.com> writes:
> > How about representing null array elements with a special-case cast-like
> > null?  Something like {::null}
> 
> The problem with *any* proposal for let's-spell-NULL-like-this
> is that "this" is inevitably a string that is valid as a text
> data value.  Making the string weirder may reduce the odds of
> a failure but the risk is still there, and meanwhile you're making
> the null marker less readable.  (The above has the additional problem
> of being indistinguishable from a sub-array.)
> 
> I think we really need something that is rejected as syntactically
> invalid by the existing array input parser.

Hehehe... Feeling silly here:
   {1,2,}NULL{,4,5,}NULL

What are the feelings on representing it similar to how it would be
in memory?
   {1,2,,4,5,}(,,NULL,,,NULL)

Then, there is always the possibility of including a new array text format
for arrays that contain nulls (Q for quote?):
   Q{1,2,NULL,4,5,NULL}

This would activate a mode that would fully quote any non-numeric,
non-array arguments, allowing NULL to be unambiguous?
   Q{'a','b',NULL,'c','d',NULL}

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: Supporting NULL elements in arrays

From
Ron Mayer
Date:
Joe Conway wrote:
> 
> Last time I thought about this problem, that's what I concluded. I don't 
> think there is a reasonable and backward compatible solution.
> 
> I also think the best non-compatible solution is to require non-numeric 
> elements to be delimited (double quotes, configurable?), and use NULL 
> unadorned to represent NULL.


If we're going non-computable, would something that's a superset
of the SQL Standard's array value constructor be useful; or is
the standard's arrays so limited as to not even address the
things that cause issues for postgresql arrays?

If I read the confusing thing right, I think the standard
does use NULL for nulls in arrays, single-quotes for strings,
etc. like "ARRAY['FOO',null,'BAR']" and unadorned numbers
for numbers in arrays.   That's similar to what I think Joe
suggested, but with single rather than double quotes?


Re: Supporting NULL elements in arrays

From
Tom Lane
Date:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> If we're going non-computable, would something that's a superset
> of the SQL Standard's array value constructor be useful;

I think you've confused ARRAY[] expressions with the I/O representation
for array values.  There are many good reasons why those are not
identical, starting with the fact that string literals inside an ARRAY[]
would be subject to different escaping conventions than an I/O value is.
        regards, tom lane


Re: Supporting NULL elements in arrays

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Jim C. Nasby wrote:
>> Instead of bending over backwards to try and support older cases, would
>> a compatability mode be possible? Seems that would solve a lot of
>> problems.

> Last time I thought about this problem, that's what I concluded. I don't 
> think there is a reasonable and backward compatible solution.

> I also think the best non-compatible solution is to require non-numeric 
> elements to be delimited (double quotes, configurable?), and use NULL 
> unadorned to represent NULL.

After further thought I'm starting to agree with this point of view as
well.  I propose the following details:

1. A null element is represented as the unquoted string NULL  (case-insensitive on input).  Any use of quotes or
backslashes turns it into a simple string value "NULL" instead.  array_out  will need to be careful to quote any string
thatmatches NULL.
 

2. For backwards compatibility, we'll invent a GUC parameter  enable_array_nulls that defeats recognition of NULL in
array_in. (Any better ideas about the name of the parameter?)
 

This isn't ideal because of the compatibility risk, but once we get past
the transition period it's a reasonable definition.
        regards, tom lane


Re: Supporting NULL elements in arrays

From
Tom Lane
Date:
I wrote:
> ... Now, when the bitmap is present, it would be fairly tedious
> to calculate the length of the bitmap to determine the offset to the
> actual data; and that's an operation that we'd like to be cheap.  What
> I'm thinking of doing is commandeering the whole "flags" field of
> ArrayType (which is currently unused and should be always zero), and
> redefining it thus:
>     zero: bitmap not present (fully backwards-compatible case)
>     not zero: bitmap is present; value is offset to start of data
> I wouldn't propose doing this if I thought we had any pressing reason
> to save some array flag bits for some other purpose; but I can't think
> of anything else we might want 'em for.  Did anyone have any pet ideas
> this would foreclose?

On trying to recompile things, I find that contrib/intarray is broken
by this change, because it's using the flags field for its own purposes:

/** flags for gist__int_ops, use ArrayType->flags* which is unused (see array.h)*/
#define LEAFKEY        (1<<31)
#define ISLEAFKEY(x)    ( ((ArrayType*)(x))->flags & LEAFKEY )

It seems likely that intarray is going to need some rather significant
work anyway to deal with null elements, so this seems to me to be not
necessarily a fatal objection.  But why exactly does intarray need to
play games with the contents of an array value?
        regards, tom lane


Re: Supporting NULL elements in arrays

From
Teodor Sigaev
Date:
> On trying to recompile things, I find that contrib/intarray is broken
> by this change, because it's using the flags field for its own purposes:
> 
> /*
>  * flags for gist__int_ops, use ArrayType->flags
>  * which is unused (see array.h)
>  */
> #define LEAFKEY        (1<<31)
> #define ISLEAFKEY(x)    ( ((ArrayType*)(x))->flags & LEAFKEY )
> 
> It seems likely that intarray is going to need some rather significant
> work anyway to deal with null elements, so this seems to me to be not
> necessarily a fatal objection.  But why exactly does intarray need to
> play games with the contents of an array value?

Sorry, intarray was first our module for PgSQL. I'll remove usage of 
ArrayType->flags soon.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/