Supporting NULL elements in arrays - Mailing list pgsql-hackers

From Tom Lane
Subject Supporting NULL elements in arrays
Date
Msg-id 29807.1131408544@sss.pgh.pa.us
Whole thread Raw
Responses Re: Supporting NULL elements in arrays  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Supporting NULL elements in arrays  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I'm starting to think about what it'll take to allow arrays to contain
elements that are NULL.  The behavioral semantics seem pretty
straightforward, but there are a couple of areas that need discussion.

One trouble spot is what the I/O representation should look like.
Since 8.0, the array input parser has rejected empty elements:

regression=# select '{a,,c}'::text[];
ERROR:  malformed array literal: "{a,,c}"
regression=# select '{a, ,c}'::text[];
ERROR:  malformed array literal: "{a, ,c}"
-- the right way to specify a zero-length string is:
regression=# select '{a,"",c}'::text[];  text
----------{a,"",c}
(1 row)

and so the most straightforward thing to do is define an empty element
as meaning a NULL.  But this might be objected to on a couple of grounds:

1. "Since 8.0" isn't really old enough --- there may well be applications
still out there that think '{a,,c}'::text[] should produce a zero-length
string element and not a NULL element.  (Note: this isn't a hazard for
reloading old dump files, because the array output routine has dumped
empty strings as "" since 7.0 if not before.)

2. Even today, the array documentation athttp://developer.postgresql.org/docs/postgres/arrays.html
doesn't actually *say* that empty elements are disallowed.

I don't see any alternatives that seem better, though, and the
empty-element convention at least has the virtue of being reasonably
compatible with what we did for I/O of composite data types.  Anyone
have a better idea?

Another issue is what to do with the internal representation.  I think
that basically we want to insert a null bitmap just following the
dimension info, with the option to omit the bitmap if there are no
null values.  Now, when the bitmap is present, it would be fairly tedious
to calculate the length of the bitmap to determine the offset to the
actual data; and that's an operation that we'd like to be cheap.  What
I'm thinking of doing is commandeering the whole "flags" field of
ArrayType (which is currently unused and should be always zero), and
redefining it thus:zero: bitmap not present (fully backwards-compatible case)not zero: bitmap is present; value is
offsetto start of data
 
I wouldn't propose doing this if I thought we had any pressing reason
to save some array flag bits for some other purpose; but I can't think
of anything else we might want 'em for.  Did anyone have any pet ideas
this would foreclose?

Also, with respect to the binary I/O representation, I'd suggest replacing
the "flag" word with "zero if no bitmap, 1 if bitmap present".  This would
require both sender and receiver to calculate the bitmap length from the
given dimensions, but it seems best to require that for error-checking
reasons.

Thoughts anyone?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: broken comment justification logic in new pgindent
Next
From: Aly Dharshi
Date:
Subject: Re: DTrace?