Thread: Supporting NULL elements in arrays
I'm starting to think about what it'll take to allow arrays to contain elements that are NULL. The behavioral semantics seem pretty straightforward, but there are a couple of areas that need discussion. One trouble spot is what the I/O representation should look like. Since 8.0, the array input parser has rejected empty elements: regression=# select '{a,,c}'::text[]; ERROR: malformed array literal: "{a,,c}" regression=# select '{a, ,c}'::text[]; ERROR: malformed array literal: "{a, ,c}" -- the right way to specify a zero-length string is: regression=# select '{a,"",c}'::text[]; text ----------{a,"",c} (1 row) and so the most straightforward thing to do is define an empty element as meaning a NULL. But this might be objected to on a couple of grounds: 1. "Since 8.0" isn't really old enough --- there may well be applications still out there that think '{a,,c}'::text[] should produce a zero-length string element and not a NULL element. (Note: this isn't a hazard for reloading old dump files, because the array output routine has dumped empty strings as "" since 7.0 if not before.) 2. Even today, the array documentation athttp://developer.postgresql.org/docs/postgres/arrays.html doesn't actually *say* that empty elements are disallowed. I don't see any alternatives that seem better, though, and the empty-element convention at least has the virtue of being reasonably compatible with what we did for I/O of composite data types. Anyone have a better idea? Another issue is what to do with the internal representation. I think that basically we want to insert a null bitmap just following the dimension info, with the option to omit the bitmap if there are no null values. Now, when the bitmap is present, it would be fairly tedious to calculate the length of the bitmap to determine the offset to the actual data; and that's an operation that we'd like to be cheap. What I'm thinking of doing is commandeering the whole "flags" field of ArrayType (which is currently unused and should be always zero), and redefining it thus:zero: bitmap not present (fully backwards-compatible case)not zero: bitmap is present; value is offsetto start of data I wouldn't propose doing this if I thought we had any pressing reason to save some array flag bits for some other purpose; but I can't think of anything else we might want 'em for. Did anyone have any pet ideas this would foreclose? Also, with respect to the binary I/O representation, I'd suggest replacing the "flag" word with "zero if no bitmap, 1 if bitmap present". This would require both sender and receiver to calculate the bitmap length from the given dimensions, but it seems best to require that for error-checking reasons. Thoughts anyone? regards, tom lane
Hello, it's great news. My personal opinion about formating NULL values '{a,"",b} -- wrong, "" means empty string, isn't null '{a,,b} ' -- ok, maybe not unique, '{a, NULL, b}' -- longer, clean NULL is NULL '{a, "NULL", b}' -- "NULL" is notnull varchar 'NULL' Flags for array? Maybe bit isHash? So, hash array can be in line array->null array->hash array or flag for nor regular array (sparse array), not type unique arrays (array can contains different types) Regards Pavel Stehule _________________________________________________________________ Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. http://www.msn.cz/
"Pavel Stehule" <pavel.stehule@hotmail.com> writes: > it's great news. My personal opinion about formating NULL values > '{a, NULL, b}' -- longer, clean NULL is NULL Unfortunately, that already has a meaning, and it's not that. regards, tom lane
Tom Lane wrote: >"Pavel Stehule" <pavel.stehule@hotmail.com> writes: > > >>it's great news. My personal opinion about formating NULL values >> '{a, NULL, b}' -- longer, clean NULL is NULL >> >> > >Unfortunately, that already has a meaning, and it's not that. > > > > What a pity. I don't see any alternative to the empty element proposal - it's worth the possible breakage. cheers andrew
> and so the most straightforward thing to do is define an empty element > as meaning a NULL. But this might be objected to on a couple of grounds: Can you use a default to allow the user to specify the default value for an element? May look a little strange, though, if the user specifies a default array and a default element value, like: CREATE TABLE ARR(ARR CHAR(30)[] DEFAULT '{"hello", "good bye"}' NULL); So the first default is the array default; specify NULL if you don't want one but do want an array element default; the second, if present, is the array element default. I'm not sure I like this or not, but it's an idea. Mike Pollard SUPRA Server SQL Engineering and Support Cincom Systems, Inc
In article <4370B990.7090100@dunslane.net>, Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> "Pavel Stehule" <pavel.stehule@hotmail.com> writes: >> >>> it's great news. My personal opinion about formating NULL values >>> '{a, NULL, b}' -- longer, clean NULL is NULL >>> >> >> Unfortunately, that already has a meaning, and it's not that. >> >> >> > What a pity. I don't see any alternative to the empty element proposal > - > it's worth the possible breakage. How about '{a, \N, b}'?
"Pollard, Mike" <mpollard@cincom.com> writes: > Can you use a default to allow the user to specify the default value for > an element? There's no mechanism for specifying a default value for individual elements within an array; and I've never heard one asked for. Certainly it's far less interesting than allowing nulls within an array. So I don't think we need to give pride of syntax place to that. regards, tom lane
> and so the most straightforward thing to do is define an empty element > as meaning a NULL. But this might be objected to on a couple of grounds: Can you use a default to allow the user to specify the default value for an element? May look a little strange, though, if the user specifies a default array and a default element value, like: CREATE TABLE ARR(ARR CHAR(30)[] DEFAULT '{"hello", "good bye"}' NULL); So the first default is the array default; specify NULL if you don't want one but do want an array element default; the second, if present, is the array element default. I'm not sure I like this or not, but it's an idea. Mike Pollard SUPRA Server SQL Engineering and Support Cincom Systems, Inc
Harald Fuchs <hf0923x@protecting.net> writes: >> Tom Lane wrote: >>> Unfortunately, that already has a meaning, and it's not that. > How about '{a, \N, b}'? That's valid syntax too, ie, adopting that syntax would break applications that are not broken today. Not to mention that it would be gratuitously different from the notation for NULLs in composite-type literals. regards, tom lane
I wrote: > ... the most straightforward thing to do is define an empty element > as meaning a NULL. But this might be objected to on a couple of grounds: I just thought of another, potentially fatal objection: it's ambiguous whether '{}'::text[] should be taken to mean an empty (zero-length) array or an array containing a single NULL element. For backwards compatibility it should mean an empty array, but then there's no way to represent ARRAY(NULL) in data dumps, which won't do either. The only workaround that comes to mind is to allow explicit specification of what's meant: '[1:1]{}' would be needed to represent the one-null case. Ugly. Ideas anyone? regards, tom lane
On Tue, Nov 08, 2005 at 07:21:34PM -0500, Tom Lane wrote: > I wrote: > > ... the most straightforward thing to do is define an empty element > > as meaning a NULL. But this might be objected to on a couple of grounds: > I just thought of another, potentially fatal objection: it's ambiguous > whether '{}'::text[] should be taken to mean an empty (zero-length) > array or an array containing a single NULL element. To take another perspective on this, though - it isn't possible to have NULL array elements right now, therefore, there is no precident, and who is to say that {NULL} <> {}? For example: vhosts=> select ('{1, 3}'::text[])[4]; text ------ (1 row) vhosts=> select ('{}'::text[])[4]; text ------ (1 row) Perhaps NULL at end of array never needs to be stored, and arrays can be considered to have an infinite number of NULL values at the end? For array operations that require a length, such as cross-product, or whatever, the 'length' of the array, would be the number of elements before the infinite number of NULL values at the end. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Tue, Nov 08, 2005 at 07:21:34PM -0500, Tom Lane wrote: > I wrote: > > ... the most straightforward thing to do is define an empty element > > as meaning a NULL. But this might be objected to on a couple of grounds: > > I just thought of another, potentially fatal objection: it's ambiguous > whether '{}'::text[] should be taken to mean an empty (zero-length) > array or an array containing a single NULL element. > > For backwards compatibility it should mean an empty array, but then > there's no way to represent ARRAY(NULL) in data dumps, which won't > do either. > > The only workaround that comes to mind is to allow explicit > specification of what's meant: '[1:1]{}' would be needed to represent > the one-null case. Ugly. Instead of bending over backwards to try and support older cases, would a compatability mode be possible? Seems that would solve a lot of problems. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Tue, Nov 08, 2005 at 07:21:34PM -0500, Tom Lane wrote: >>>... the most straightforward thing to do is define an empty element >>>as meaning a NULL. But this might be objected to on a couple of grounds: >> >>I just thought of another, potentially fatal objection: it's ambiguous >>whether '{}'::text[] should be taken to mean an empty (zero-length) >>array or an array containing a single NULL element. >> >>For backwards compatibility it should mean an empty array, but then >>there's no way to represent ARRAY(NULL) in data dumps, which won't >>do either. >> >>The only workaround that comes to mind is to allow explicit >>specification of what's meant: '[1:1]{}' would be needed to represent >>the one-null case. Ugly. > > Instead of bending over backwards to try and support older cases, would > a compatability mode be possible? Seems that would solve a lot of > problems. Last time I thought about this problem, that's what I concluded. I don't think there is a reasonable and backward compatible solution. I also think the best non-compatible solution is to require non-numeric elements to be delimited (double quotes, configurable?), and use NULL unadorned to represent NULL. Joe
> I also think the best non-compatible solution is to require non-numeric > elements to be delimited (double quotes, configurable?), and use NULL > unadorned to represent NULL. I think the ultimate solution should have null values represented by NULL... I mean NULL is NULL :) Chris
How about representing null array elements with a special-case cast-like null? Something like {::null} __ Marc On Tue, 2005-11-08 at 23:30 -0400, pgsql-hackers-owner@postgresql.org wrote: > Date: Tue, 08 Nov 2005 19:21:34 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: pgsql-hackers@postgreSQL.org > Subject: Re: Supporting NULL elements in arrays > Message-ID: <10097.1131495694@sss.pgh.pa.us> > > I wrote: > > ... the most straightforward thing to do is define an empty element > > as meaning a NULL. But this might be objected to on a couple of > grounds: > > I just thought of another, potentially fatal objection: it's ambiguous > whether '{}'::text[] should be taken to mean an empty (zero-length) > array or an array containing a single NULL element. > > For backwards compatibility it should mean an empty array, but then > there's no way to represent ARRAY(NULL) in data dumps, which won't > do either. > > The only workaround that comes to mind is to allow explicit > specification of what's meant: '[1:1]{}' would be needed to represent > the one-null case. Ugly. > > Ideas anyone? > > regards, tom lane >
Marc Munro <marc@bloodnok.com> writes: > How about representing null array elements with a special-case cast-like > null? Something like {::null} The problem with *any* proposal for let's-spell-NULL-like-this is that "this" is inevitably a string that is valid as a text data value. Making the string weirder may reduce the odds of a failure but the risk is still there, and meanwhile you're making the null marker less readable. (The above has the additional problem of being indistinguishable from a sub-array.) I think we really need something that is rejected as syntactically invalid by the existing array input parser. regards, tom lane
On Wed, Nov 09, 2005 at 11:22:37AM -0500, Tom Lane wrote: > Marc Munro <marc@bloodnok.com> writes: > > How about representing null array elements with a special-case cast-like > > null? Something like {::null} > > The problem with *any* proposal for let's-spell-NULL-like-this > is that "this" is inevitably a string that is valid as a text > data value. Making the string weirder may reduce the odds of > a failure but the risk is still there, and meanwhile you're making > the null marker less readable. (The above has the additional problem > of being indistinguishable from a sub-array.) > > I think we really need something that is rejected as syntactically > invalid by the existing array input parser. Hehehe... Feeling silly here: {1,2,}NULL{,4,5,}NULL What are the feelings on representing it similar to how it would be in memory? {1,2,,4,5,}(,,NULL,,,NULL) Then, there is always the possibility of including a new array text format for arrays that contain nulls (Q for quote?): Q{1,2,NULL,4,5,NULL} This would activate a mode that would fully quote any non-numeric, non-array arguments, allowing NULL to be unambiguous? Q{'a','b',NULL,'c','d',NULL} Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Joe Conway wrote: > > Last time I thought about this problem, that's what I concluded. I don't > think there is a reasonable and backward compatible solution. > > I also think the best non-compatible solution is to require non-numeric > elements to be delimited (double quotes, configurable?), and use NULL > unadorned to represent NULL. If we're going non-computable, would something that's a superset of the SQL Standard's array value constructor be useful; or is the standard's arrays so limited as to not even address the things that cause issues for postgresql arrays? If I read the confusing thing right, I think the standard does use NULL for nulls in arrays, single-quotes for strings, etc. like "ARRAY['FOO',null,'BAR']" and unadorned numbers for numbers in arrays. That's similar to what I think Joe suggested, but with single rather than double quotes?
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > If we're going non-computable, would something that's a superset > of the SQL Standard's array value constructor be useful; I think you've confused ARRAY[] expressions with the I/O representation for array values. There are many good reasons why those are not identical, starting with the fact that string literals inside an ARRAY[] would be subject to different escaping conventions than an I/O value is. regards, tom lane
Joe Conway <mail@joeconway.com> writes: > Jim C. Nasby wrote: >> Instead of bending over backwards to try and support older cases, would >> a compatability mode be possible? Seems that would solve a lot of >> problems. > Last time I thought about this problem, that's what I concluded. I don't > think there is a reasonable and backward compatible solution. > I also think the best non-compatible solution is to require non-numeric > elements to be delimited (double quotes, configurable?), and use NULL > unadorned to represent NULL. After further thought I'm starting to agree with this point of view as well. I propose the following details: 1. A null element is represented as the unquoted string NULL (case-insensitive on input). Any use of quotes or backslashes turns it into a simple string value "NULL" instead. array_out will need to be careful to quote any string thatmatches NULL. 2. For backwards compatibility, we'll invent a GUC parameter enable_array_nulls that defeats recognition of NULL in array_in. (Any better ideas about the name of the parameter?) This isn't ideal because of the compatibility risk, but once we get past the transition period it's a reasonable definition. regards, tom lane
I wrote: > ... Now, when the bitmap is present, it would be fairly tedious > to calculate the length of the bitmap to determine the offset to the > actual data; and that's an operation that we'd like to be cheap. What > I'm thinking of doing is commandeering the whole "flags" field of > ArrayType (which is currently unused and should be always zero), and > redefining it thus: > zero: bitmap not present (fully backwards-compatible case) > not zero: bitmap is present; value is offset to start of data > I wouldn't propose doing this if I thought we had any pressing reason > to save some array flag bits for some other purpose; but I can't think > of anything else we might want 'em for. Did anyone have any pet ideas > this would foreclose? On trying to recompile things, I find that contrib/intarray is broken by this change, because it's using the flags field for its own purposes: /** flags for gist__int_ops, use ArrayType->flags* which is unused (see array.h)*/ #define LEAFKEY (1<<31) #define ISLEAFKEY(x) ( ((ArrayType*)(x))->flags & LEAFKEY ) It seems likely that intarray is going to need some rather significant work anyway to deal with null elements, so this seems to me to be not necessarily a fatal objection. But why exactly does intarray need to play games with the contents of an array value? regards, tom lane
> On trying to recompile things, I find that contrib/intarray is broken > by this change, because it's using the flags field for its own purposes: > > /* > * flags for gist__int_ops, use ArrayType->flags > * which is unused (see array.h) > */ > #define LEAFKEY (1<<31) > #define ISLEAFKEY(x) ( ((ArrayType*)(x))->flags & LEAFKEY ) > > It seems likely that intarray is going to need some rather significant > work anyway to deal with null elements, so this seems to me to be not > necessarily a fatal objection. But why exactly does intarray need to > play games with the contents of an array value? Sorry, intarray was first our module for PgSQL. I'll remove usage of ArrayType->flags soon. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/