Thread: Inconsistent behavior on Array & Is Null?

Inconsistent behavior on Array & Is Null?

From
Josh Berkus
Date:
Joe,

First off, pardon me if these are known things which are already fixed in CVS.  
Also, let me again thank you for all the work on Arrays in 7.4; I've been 
able to tremendously simplify quite a number of procedures in my databases 
thanks to the new array support.

Now, the issue:
I'm noticing some inconsistent behavior regarding empty arrays and IS NULL 
status.    For example:

net_test=# select array_upper('{}'::INT[], 1);array_upper
-------------

(1 row)

net_test=# select array_upper('{}'::INT[], 1) IS NULL;?column?
----------t
(1 row)

net_test=# select '{}'::INT[] IS NULL;?column?
----------f
(1 row)

I feel that this is confusing; an empty array should be considered NULL 
everywhere or nowhere.     For that matter, the new array declaration syntax 
does not support empty arrays:

net_test=# select ARRAY[ ]::INT[];
ERROR:  syntax error at or near "]" at character 15
net_test=# select ARRAY[]::INT[];
ERROR:  syntax error at or near "]" at character 14
net_test=#

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Inconsistent behavior on Array & Is Null?

From
Joe Conway
Date:
Josh Berkus wrote:
> I'm noticing some inconsistent behavior regarding empty arrays and IS NULL 
> status.    For example:

> net_test=# select array_upper('{}'::INT[], 1) IS NULL;
>  ?column?
> ----------
>  t
> (1 row)

This is correct. There are no dimensions to an empty array by 
definition. The only other way to handle this would be an ERROR. I 
followed the lead of (the pre-existing function) array_dims() when 
creating array_upper() and array_lower().

> net_test=# select '{}'::INT[] IS NULL;
>  ?column?
> ----------
>  f
> (1 row)

This is also correct, and completely orthogonal to the first example. 
There is a difference between an empty array and NULL, just like there 
is between an empty string and NULL.

> I feel that this is confusing; an empty array should be considered NULL 
> everywhere or nowhere.

As I said above, that makes no more sense than saying '' == NULL

> For that matter, the new array declaration syntax does not support
> empty arrays:
> net_test=# select ARRAY[ ]::INT[];
> ERROR:  syntax error at or near "]" at character 15

This is a known issue, and will not be easily fixed. We discussed it at 
some length last June/July. See especially:

http://archives.postgresql.org/pgsql-hackers/2003-06/msg01174.php
http://archives.postgresql.org/pgsql-hackers/2003-06/msg01195.php
http://archives.postgresql.org/pgsql-hackers/2003-06/msg01196.php
http://archives.postgresql.org/pgsql-hackers/2003-06/msg01298.php

Joe


Re: Inconsistent behavior on Array & Is Null?

From
Josh Berkus
Date:
Joe,

> This is correct. There are no dimensions to an empty array by 
> definition. The only other way to handle this would be an ERROR. I 
> followed the lead of (the pre-existing function) array_dims() when 
> creating array_upper() and array_lower().

What about a 0?    That seems more consistent to me.   If the array is empty, 
its dimensions are not "NULL", meaning "unknown", but in fact zero elements, 
which is a known value.  The way it works now, array_upper on a NULL array 
produces the same results as array_upper on an empty-but-non-null array.

Or is there some concept I'm missing?

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Inconsistent behavior on Array & Is Null?

From
Joe Conway
Date:
Josh Berkus wrote:
> What about a 0?    That seems more consistent to me.   If the array
> is empty, its dimensions are not "NULL", meaning "unknown", but in
> fact zero elements, which is a known value.

They cannot be 0 because 0 is a real index. They are undefined, because 
an empty array has no dimensions. It is entirely possible to have a real 
array that starts at a lower bound of 0 (or even an upper bound of 0).

regression=# select f[0] from (select 99 || array[1,2,3] as f) as t; f
---- 99
(1 row)

regression=# create table a1 (f int[]);
CREATE TABLE
regression=# insert into a1 values('{}');
INSERT 18688045 1
regression=# update a1 set f[0] = 99;
UPDATE 1
regression=# select array_upper(f,1) from a1; array_upper
-------------           0
(1 row)

> The way it works now, array_upper on a NULL array produces the same
> results as array_upper on an empty-but-non-null array.

Sure, and in both cases array_upper is undefined because there are no 
array dimensions to speak of. I guess you might argue that array_upper, 
array_lower, and array_dims should all produce an ERROR on null input 
instead of NULL. But that would have been an un-backward compatible 
change for array_dims at the time array_lower and array_upper were 
created. I don't really believe they should throw an ERROR on an empty 
array though.

Joe


Re: Inconsistent behavior on Array & Is Null?

From
Josh Berkus
Date:
Joe,

> Sure, and in both cases array_upper is undefined because there are no 
> array dimensions to speak of. I guess you might argue that array_upper, 
> array_lower, and array_dims should all produce an ERROR on null input 
> instead of NULL. But that would have been an un-backward compatible 
> change for array_dims at the time array_lower and array_upper were 
> created. I don't really believe they should throw an ERROR on an empty 
> array though.

OK, I understand the logic now.   Thanks.

I guess this is another case where we're haunted by the ANSI committee's 
failure to define both and "unknown" and a "not applicable" value instead of 
the unitary NULL.

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



Re: Inconsistent behavior on Array & Is Null?

From
Greg Stark
Date:
Josh Berkus <josh@agliodbs.com> writes:

> Joe,
> 
> > This is correct. There are no dimensions to an empty array by 
> > definition. The only other way to handle this would be an ERROR. I 
> > followed the lead of (the pre-existing function) array_dims() when 
> > creating array_upper() and array_lower().
> 
> What about a 0?    That seems more consistent to me.   If the array is empty, 
> its dimensions are not "NULL", meaning "unknown", but in fact zero elements, 
> which is a known value.  The way it works now, array_upper on a NULL array 
> produces the same results as array_upper on an empty-but-non-null array.
> 
> Or is there some concept I'm missing?

I would certainly second that. Consider all that making it NULL breaks:

length(a) != array_upper(a)-array_lower(a)

array_upper(a||b) == array_upper(a)+length(b)

If someone implements pop and push it sure makes things weird that push
doesn't always increment the length pop doesn't decrement the length until 0.

Perhaps you're worried that you have pick an arbitrary lower and upper bound
and, strangely, that the upper bound would actually be one less than the lower
bound such as [1,0]. However this isn't really any different than the normal
case. All arrays in postgres have arbitrary lower bounds.

Fwiw, the int_aggregate function from int_aggregate.c in the contrib section
makes arrays that don't violate these invariants. For empty arrays the
dimensions are [0,-1].

This isn't hypothetical for me. Whenever I end up replacing int_aggregate with
a standard function that constructs your style arrays my app would break. I
guess I'll have to add a coalesce(...,0) to my array_length() function to work
around it. Which only seems like strong evidence it's the wrong behaviour.

-- 
greg



Re: Inconsistent behavior on Array & Is Null?

From
Joe Conway
Date:
Greg Stark wrote:
> length(a) != array_upper(a)-array_lower(a)

[You really meant "array_upper(a) - array_lower(a) + 1" I'd guess]

length(A) is a missing function at the moment; the spec actually calls 
it CARDINALITY. Once available, you would use it to determine array 
length. SQL2003 says:
  The result of <cardinality expression> is the number of elements of  the result of the <collection value
expression>.

So, when A is an empty array, CARDINALITY(A) = 0, by definition.

> array_upper(a||b) == array_upper(a)+length(b)

Same here; this would be:

array_upper(a || b) == array_upper(a) + CARDINALITY(b)

and would work just fine. Note that if array-a is NULL, then the spec 
defines a || b as NULL. See section 6.35:
  2) If <array concatenation> is specified, then let AV1 be the value of     <array value expression 1> and let AV2 be
thevalue of     <array primary>.     Case:     a) If either AV1 or AV2 is the null value, then the result of the
<arrayconcatenation> is the null value.     b) If the sum of the cardinality of AV1 and the cardinality of AV2
isgreater than IMDC, then an exception condition is raised:        data exception — array data, right truncation.
c)Otherwise, the result is the array comprising every element of        AV1 followed by every element of AV2.
 

> If someone implements pop and push it sure makes things weird that push
> doesn't always increment the length pop doesn't decrement the length until 0.

I have no idea what you're trying to say here. Current behavior 
certainly increments length by one when you push an element (which is 
what "array || element" effectively does). An empty array has length 0 
before pushing an element on to it, and length 1 afterward. Pushing an 
element onto a NULL array yields NULL, which is not explicitly defined 
by the spec (that I can find), but is certainly consistent with the above.

As far as array_pop is concerned, we discussed the fact that it makes no 
sense in the context of Postgres arrays -- see the archives from last 
year in May.

> Perhaps you're worried that you have pick an arbitrary lower and upper bound
> and, strangely, that the upper bound would actually be one less than the lower
> bound such as [1,0]. However this isn't really any different than the normal
> case. All arrays in postgres have arbitrary lower bounds.

Again, I have no idea what you mean here.


> Fwiw, the int_aggregate function from int_aggregate.c in the contrib section
> makes arrays that don't violate these invariants. For empty arrays the
> dimensions are [0,-1].

Seems rather arbitrary to me. As I said to Josh, an empty array has 
undefined bounds, literally.

> This isn't hypothetical for me. Whenever I end up replacing int_aggregate with
> a standard function that constructs your style arrays my app would break. I
> guess I'll have to add a coalesce(...,0) to my array_length() function to work
> around it. Which only seems like strong evidence it's the wrong behaviour.

Sorry, but these are not "my style" arrays, they are "Berkley style" ;-).

Anyway, CARDINALITY is what you really need -- hopefully I'll be able to 
find time to address that and some ather array items before the 7.5 
freeze. In the meantime, if you have a custom array_length function 
already, why not make it return 0 for empty arrays -- then your problems 
disappear:

create or replace function array_length(anyarray)
returns int as ' select  case   when $1 = ''{}'' then 0   else array_upper($1, 1) - array_lower($1, 1) + 1  end
' language sql;

CREATE FUNCTION
regression=# select array_length(array[1,2,3]); array_length
--------------            3
(1 row)

regression=# select array_length('{}'::int4[]); array_length
--------------            0
(1 row)

regression=# select array[1,2,3] || '{}'::int4[]; ?column?
---------- {1,2,3}
(1 row)

regression=# select array_upper(array[1,2,3], 1) + 
array_length('{}'::int4[]); ?column?
----------        3
(1 row)

HTH,

Joe



Re: Inconsistent behavior on Array & Is Null?

From
Greg Stark
Date:

Joe Conway <mail@joeconway.com> writes:

> Same here; this would be:
> 
> array_upper(a || b) == array_upper(a) + CARDINALITY(b)
> 
> and would work just fine. Note that if array-a is NULL, then the spec defines a
> || b as NULL. See section 6.35:

Why are you talking about when a is NULL? The original question was for when a
was an empty array. That's not an unknown value, it's known to be an array
containing no elements.

It sounds like this is the same type of confusion that led to Oracle treating
empty strings as NULL, which causes no end of headaches.

> Anyway, CARDINALITY is what you really need -- hopefully I'll be able to find
> time to address that and some ather array items before the 7.5 freeze. In the
> meantime, if you have a custom array_length function already, why not make it
> return 0 for empty arrays -- then your problems disappear:
> 
> create or replace function array_length(anyarray)
> returns int as '
>   select
>    case
>     when $1 = ''{}'' then 0
>     else array_upper($1, 1) - array_lower($1, 1) + 1
>    end
> ' language sql;

My argument was that having to write a special case here makes it pretty clear
the idea of equating {} with NULL is the wrong interface.

-- 
greg



Re: Inconsistent behavior on Array & Is Null?

From
Joe Conway
Date:
Greg Stark wrote:

> Joe Conway <mail@joeconway.com> writes:
>>Same here; this would be:
>>
>>array_upper(a || b) == array_upper(a) + CARDINALITY(b)
>>
>>and would work just fine. Note that if array-a is NULL, then the spec defines a
>>|| b as NULL. See section 6.35:
> 
> Why are you talking about when a is NULL? The original question was for when a
> was an empty array. That's not an unknown value, it's known to be an array
> containing no elements.

Did you even look at my examples at the end of the post? I showed your 
example, with an empty array, handled correctly. The mention of a NULL 
array was only for completeness.

> It sounds like this is the same type of confusion that led to Oracle treating
> empty strings as NULL, which causes no end of headaches.

ISTM that you're the one who's confused. There is a very clear 
distinction between a NULL array and an empty array in the present 
implementation. They are *not* treated the same:

regression=# select '{}'::int[], NULL::int[]; int4 | int4
------+------ {}   |
(1 row)

regression=# select '{}'::int[] is NULL, NULL::int[] is NULL; ?column? | ?column?
----------+---------- f        | t
(1 row)

You seem to be saying that because the output of certain functions that 
operate on empty arrays is NULL, it somehow implies that the array is 
being treated as NULL -- that's just plain incorrect.

> My argument was that having to write a special case here makes it pretty clear
> the idea of equating {} with NULL is the wrong interface.

But they're not being equated (see above), so I don't see where there's 
an issue.

Joe



Re: Inconsistent behavior on Array & Is Null?

From
Greg Stark
Date:
Joe Conway <mail@joeconway.com> writes:

> You seem to be saying that because the output of certain functions that operate
> on empty arrays is NULL, it somehow implies that the array is being treated as
> NULL -- that's just plain incorrect.

Not incorrect, but a sign something weird is going on. It should only happen
if the certain functions really are looking for some property that isn't known
for that peculiar value of their parameters. iscomputable(n) perhaps, sqrt(-1)
if we abuse the rules a bit. But array_upper for a non-null array?

array_lower() and array_upper() are returning NULL for a non-null input, the
empty array, even though lower and upper bounds are known just as well as they
are for any other sized array. They are behaving as if there's something
unknown about the empty array that makes it hard to provide a lower bound or
upper bound.

slo=> select array_lower('{}'::int[],1), array_upper('{}'::int[],1);array_lower | array_upper 
-------------+-------------            |            
(1 row)

I know it's possible to work around this special case, but I'm saying it's odd
to have an irregularity in the interface. What justification is there for
breaking the invariant length = upper-lower+1 ?



Yes I read the examples you gave, but you a) had to work around the nit with a
special case in your function and b) still have corner cases where one of the
invariants I named fails, namely: 

> Same here; this would be:
> 
> array_upper(a || b) == array_upper(a) + CARDINALITY(b)
> 
> and would work just fine. Note that if array-a is NULL, then the spec
> defines a || b as NULL. See section 6.35:

test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select array[1,2] as a, array[1,2] as b) as
x;array_upper| ?column? 
 
-------------+----------          4 |        4
(1 row)

test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select '{}'::int[] as a, array[1,2] as b) as
x;array_upper| ?column? 
 
-------------+----------          2 |         
(1 row)



-- 
greg



Re: Inconsistent behavior on Array & Is Null?

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> array_lower() and array_upper() are returning NULL for a non-null
> input, the empty array, even though lower and upper bounds are known
> just as well as they are for any other sized array. They are behaving
> as if there's something unknown about the empty array that makes it
> hard to provide a lower bound or upper bound.

Hm.  My idea of the index lower bound is "the smallest index for which
there is an array member" --- so I agree with Joe that it's not very
well defined for an empty array.  We could pick some arbitrary value,
such as zero, for the LB and UB to be reported for an empty array, but
I can't see any way that you could justify them except as arbitrary
choices.

I think that maybe we ought to question these two properties:* empty array is different from NULL ... really?  Why?*
storinga value into an element of a NULL array yields  a NULL array instead of a singleton array.
 
IIRC the latter is presently true because we couldn't figure out
just what dimensionality to assign, but it might be easier to agree on
that than to resolve these other arguments...
        regards, tom lane


Re: Inconsistent behavior on Array & Is Null?

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Hm.  My idea of the index lower bound is "the smallest index for which
> there is an array member" --- so I agree with Joe that it's not very
> well defined for an empty array.  

Hm. The problem is that they have other implications. Like the upper bound is
one less than the index an element added to the upper end will get, and
similarly the lower bound is one more than the index that would be assigned to
an element added on the low end.

Currently there is a lower bound and upper bound in the implementation even
for empty arrays. I have empty arrays in my table that have a lower bound of
0, and they behave slightly differently than arrays with lower bounds of 1.

> I think that maybe we ought to question these two properties:
>     * empty array is different from NULL ... really?  Why?
>     * storing a value into an element of a NULL array yields
>       a NULL array instead of a singleton array.

Well that breaks other things. Then lots of functions have to become
non-strict to work properly because they should have valid output when passed
null values. Ick.


I'm leaning towards suggesting that postgres should follow sql-99 here and
normalize all array indexes to have a lower bound of 1. Then array_lower and
array_upper become entirely unnecessary. Instead we just have array_length
which is exactly equivalent to my idea of array_upper.

-- 
greg



Re: Inconsistent behavior on Array & Is Null?

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Well that breaks other things. Then lots of functions have to become
> non-strict to work properly because they should have valid output when passed
> null values. Ick.
> ...
> I'm leaning towards suggesting that postgres should follow sql-99 here and
> normalize all array indexes to have a lower bound of 1.

That would break even more things, no?

On the other hand, it'd get rid of the problem that we presently face
with dump/restore of arrays that don't have lower bound 1.  Because
pg_dump doesn't do anything to mark such values, they'll end up with
lower bound 1 after reload anyway.  The fact that we haven't heard lots
of squawks about that suggests to me that not many people are using such
arrays at present ...
        regards, tom lane


Re: Inconsistent behavior on Array & Is Null?

From
Joe Conway
Date:
Greg Stark wrote:
> array_lower() and array_upper() are returning NULL for a non-null input, the
> empty array, even though lower and upper bounds are known just as well as they
> are for any other sized array. They are behaving as if there's something
> unknown about the empty array that makes it hard to provide a lower bound or
> upper bound.

Sorry, but I still disagree. There *is* something unknown about the 
lower and upper bound of an empty array because there are no bounds.

> I know it's possible to work around this special case, but I'm saying it's odd
> to have an irregularity in the interface. What justification is there for
> breaking the invariant length = upper-lower+1 ?

I don't see the spec defined CARDINALITY as a workaround. It defines 
length as the number of elements in the array. When the array is empty, 
that value is clearly 0. Nothing strange about it.

> Yes I read the examples you gave, but you a) had to work around the nit with a
> special case in your function and b) still have corner cases where one of the
> invariants I named fails, namely: 

> test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select '{}'::int[] as a, array[1,2] as b)
asx;
 
>  array_upper | ?column? 
> -------------+----------
>            2 |         
> (1 row)

OK, you got me with this corner case. But using what you described as 
the result int_aggregate would give you in this case (-1), you would get 
an even stranger answer (-1 + 2 = 1) that would still need to be worked 
around.

Joe


Re: Inconsistent behavior on Array & Is Null?

From
Joe Conway
Date:
Tom Lane wrote:
> I think that maybe we ought to question these two properties:
>     * empty array is different from NULL ... really?  Why?

I think this makes sense, similar to the difference between '' and NULL.

>     * storing a value into an element of a NULL array yields
>       a NULL array instead of a singleton array.

Same argument. If you think of text as an array or characters, you get 
this analogy (sort of):

regression=# create table s1(f1 int, f2 text);
CREATE TABLE
regression=# insert into s1 values(1, null);
INSERT 164679 1

regression=# select f1, substr(f2, 1, 1) is null from s1; f1 | ?column?
----+----------  1 | t
(1 row)

regression=# update s1 set f2 = 'a' || substr(f2, 2);
UPDATE 1
regression=# select f1, substr(f2, 1, 1) is null from s1; f1 | ?column?
----+----------  1 | t
(1 row)

Joe



Re: Inconsistent behavior on Array & Is Null?

From
Joe Conway
Date:
Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
>>I'm leaning towards suggesting that postgres should follow sql-99 here and
>>normalize all array indexes to have a lower bound of 1.
> 
> That would break even more things, no?
> 
> On the other hand, it'd get rid of the problem that we presently face
> with dump/restore of arrays that don't have lower bound 1.  Because
> pg_dump doesn't do anything to mark such values, they'll end up with
> lower bound 1 after reload anyway.  The fact that we haven't heard lots
> of squawks about that suggests to me that not many people are using such
> arrays at present ...

The more I think about it, the more I like it. Does everyone else agree 
that a lower bound of 1 complies with the spec?

Joe


Re: Inconsistent behavior on Array & Is Null?

From
Joe Conway
Date:
Greg Stark wrote:
> I'm leaning towards suggesting that postgres should follow sql-99 here and
> normalize all array indexes to have a lower bound of 1. Then array_lower and
> array_upper become entirely unnecessary. Instead we just have array_length
> which is exactly equivalent to my idea of array_upper.
> 

Now we finally have something to agree on ;-)

I do think this is the way to go, but it is a significant hit to 
backward compatibility. Same is true for supporting NULL elements of 
arrays -- maybe we should bite the bullet and make both changes at the 
same time?

Joe



Re: Inconsistent behavior on Array & Is Null?

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> That would break even more things, no?
> 
> On the other hand, it'd get rid of the problem that we presently face
> with dump/restore of arrays that don't have lower bound 1.  Because
> pg_dump doesn't do anything to mark such values, they'll end up with
> lower bound 1 after reload anyway.  The fact that we haven't heard lots
> of squawks about that suggests to me that not many people are using such
> arrays at present ...

You have to be using not only arrays, but the new 7.4 functions provided to
manipulate them. In fact I think you have to be using array_prepend
specifically. But even there since it's not a mutator it's really not that
surprising that the elements of the brand new array it's returning should have
new indexes.

In fact I suspect there are more people with hidden bugs where they depend on
arrays starting at 1. This type of bug is insidious since it's hard to test
for, your application might never generate an array with a lower bound other
than 1 until someone adds some new code using array_prepend somewhere and all
of the sudden you get strange behaviours from unrelated code.

I can have the honour of being the first squawker like you describe, but my
problem was only evidence that having such non-normalized arrays at all was
surprising. I was using int_aggregate.c which generates non-standard arrays
with lower bounds of 0. My code assumed array_upper()+1 == length. After I
dumped and restored all my counts were off by one.

-- 
greg



Re: Inconsistent behavior on Array & Is Null?

From
Greg Stark
Date:
Joe Conway <mail@joeconway.com> writes:

> Greg Stark wrote:
> > I'm leaning towards suggesting that postgres should follow sql-99 here and
> > normalize all array indexes to have a lower bound of 1. Then array_lower and
> > array_upper become entirely unnecessary. Instead we just have array_length
> > which is exactly equivalent to my idea of array_upper.
> >
> 
> Now we finally have something to agree on ;-)
> 
> I do think this is the way to go, but it is a significant hit to backward
> compatibility. Same is true for supporting NULL elements of arrays -- maybe we
> should bite the bullet and make both changes at the same time?

In fact on further thought I think they *have* to be done together.

I forgot that your code did something else cool allowing updates to extend
arrays by directly updating elements outside the current bounds. Ie:

slo=> update test set a = '{}';
UPDATE 1
slo=> update test set a[1] = 1;
UPDATE 1
slo=> select * from test; a  
-----{1}
(1 row)

But if we normalize array indexes to start at 1 then this makes it hard to
fill in the array starting at higher values. For example:

slo=> update test set a = '{}';
UPDATE 1
slo=> update test set a[5] = 5;
UPDATE 1
slo=> select a[5] from test;a 
---5
(1 row)

This really ought to work, it obviously shouldn't allow you to set a[5] and
then surreptitiously move it to a[1]. But nor should it generate an error,
since I may well have a specific meaning for a[5] and may be planning to fill
in a[1]..a[4] later.

The logical thing to do, I think, is pre-fill the entries a[1]..a[4] with
null. This could be implemented by actually storing the NULLs or else storing
some notation that's used to adjust the base of the index to save space.


One thing that can't be made to work like it does now is extending the array
on the low end indefinitely:

slo=> update test set a[1] = 1;
UPDATE 1
slo=> update test set a[0] = 0;
UPDATE 1
slo=> update test set a[-1] = -1;
UPDATE 1
slo=> select * from test;   a     
----------{-1,0,1}
(1 row)


If this all looks familiar it's because Perl, and other languages, also behave
this way:

bash-2.05b$ perl -e '@a = (); $a[10]=10; print join(",",@a),"\n"'
,,,,,,,,,,10
bash-2.05b$ perl -e '@a = (); $a[-1]=-1; print join(",",@a),"\n"'
Modification of non-creatable array value attempted, subscript -1 at -e line 1.

-- 
greg



Re: Inconsistent behavior on Array & Is Null?

From
Greg Stark
Date:
Joe Conway <mail@joeconway.com> writes:

> Greg Stark wrote:
> > array_lower() and array_upper() are returning NULL for a non-null input, the
> > empty array, even though lower and upper bounds are known just as well as they
> > are for any other sized array. They are behaving as if there's something
> > unknown about the empty array that makes it hard to provide a lower bound or
> > upper bound.
> 
> Sorry, but I still disagree. There *is* something unknown about the lower and
> upper bound of an empty array because there are no bounds.

So I understand your point of view now. But I think the intuitive meaning here
for lower/upper bound as the lowest/highest index where an element is present
is only a part of the picture.

lower and upper bound are also related to other properties like where
array_prepend and array_append place things. And of course the array length.

So to give a practical example, say I was implementing a stack using an array.
I push things on by extending the array by storing in arr[array_upper(arr)+1].
(Incidentally I don't think we actually have any way to shorten an array, do
we?) As soon as I pop off the last element I lose my stack pointer. I need a
special case in my code to handle pushing elements in when the array is empty.

In reality array_append() would work fine. It's only array_upper() that's
going out of its way to make things weird. There's still an upper bound,
array_append() knows it, array_upper() just hides this value from the user.


> I don't see the spec defined CARDINALITY as a workaround. It defines length as
> the number of elements in the array. When the array is empty, that value is
> clearly 0. Nothing strange about it.

The workaround is having to have that case handled with a special case if
statement. If array_lower(), array_upper(), array_length()/CARDINALITY are all
defined in a consistent way it doesn't seem like there ought to be any special
cases in the implementations. There should be a simple rigid mathematical
relationship between them. namely "upper-lower+1 = length"



> > test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select '{}'::int[] as a, array[1,2] as
b)as x;
 
> >  array_upper | ?column? -------------+----------
> >            2 |         (1 row)
> 
> OK, you got me with this corner case. But using what you described as the
> result int_aggregate would give you in this case (-1), you would get an even
> stranger answer (-1 + 2 = 1) that would still need to be worked around.

No actually, 1 would be the correct answer, the original array would have
indexes ranging from [0,-1] and the new array would have indexes ranging from
[0,1], ie, two elements. The only strangeness is the unusual lower bound which
isn't the default for postgres arrays constructed from string literals.
Personally I prefer the zero-based indexes but as neither SQL-foo nor
backwards compatibility agree with me here I'll give that up as a lost cause
:)

-- 
greg



Re: Inconsistent behavior on Array & Is Null?

From
Joe Conway
Date:
Greg Stark wrote:
> This really ought to work, it obviously shouldn't allow you to set a[5] and
> then surreptitiously move it to a[1]. But nor should it generate an error,
> since I may well have a specific meaning for a[5] and may be planning to fill
> in a[1]..a[4] later.
> 
> The logical thing to do, I think, is pre-fill the entries a[1]..a[4] with
> null. This could be implemented by actually storing the NULLs or else storing
> some notation that's used to adjust the base of the index to save space.

I agree. I had always envisioned something exactly like that once we 
supported NULL elements. As far as the implementation goes, I think it 
would be very similar to tuples -- a null bitmask that would exist if 
any elements are NULL.

A related question is how to deal with non-existing array elements. 
Until now, you could do:

regression=# select f[0] from (select array[1,2]) as t(f); f
---

(1 row)

Even though index 0 does not exist, you get a NULL value returned 
instead of an ERROR. I'd think if we hardwire a lower bound of 1, this 
should produce an ERROR. Similarly:

regression=# select f[3] from (select array[1,2]) as t(f); f
---

(1 row)

Should this produce an ERROR instead of returning NULL once existing 
array elements can be NULL?

> One thing that can't be made to work like it does now is extending the array
> on the low end indefinitely:
> 
> slo=> update test set a[1] = 1;
> UPDATE 1
> slo=> update test set a[0] = 0;
> UPDATE 1
> slo=> update test set a[-1] = -1;
> UPDATE 1

Right. In the new world order we're describing, the latter two examples 
would have to produce errors.

Joe



Re: Inconsistent behavior on Array & Is Null?

From
Greg Stark
Date:
Joe Conway <mail@joeconway.com> writes:

> Greg Stark wrote:
> > This really ought to work, it obviously shouldn't allow you to set a[5] and
> > then surreptitiously move it to a[1]. But nor should it generate an error,
> > since I may well have a specific meaning for a[5] and may be planning to fill
> > in a[1]..a[4] later.
> > The logical thing to do, I think, is pre-fill the entries a[1]..a[4] with
> > null. This could be implemented by actually storing the NULLs or else storing
> > some notation that's used to adjust the base of the index to save space.
> 
> I agree. I had always envisioned something exactly like that once we supported
> NULL elements. As far as the implementation goes, I think it would be very
> similar to tuples -- a null bitmask that would exist if any elements are NULL.

Well you might still want to store an internal "all indexes below this are
null". That way update foo set a[1000]=1 doesn't require storing even a bitmap
for the first 999 elements. Though might make maintaining the bitmap kind of a
pain. Maintaining the bitmap might be kind of a pain anyways though because
unlike tuples the array size isn't constant.

> A related question is how to deal with non-existing array elements. Until now,
> you could do:

I would have to think about it some more, but my first reaction is that
looking up [0] should generate an error if there can never be a valid entry at
[0]. But looking up indexes above the highest index should return NULL.

There are two broad use cases I see for arrays. Using them to represent tuples
where a[i] means something specific for each i, and using them to represent
sets where order doesn't matter.

In the former case I might want to initialize my column to an empty array and
set only the relevant columns as needed. In that case returning NULL for
entries that haven't been set yet whether they're above the last entry set or
below is most consistent.

In the latter case you really don't want to be looking up anything past the
end and don't want to be storing NULLs at all. So it doesn't really matter
what the behaviour is for referencing elements past the end, but you might
conceivably want to write code like "while (e = a[i++]) ...".




Incidentally I'm using both of these models in my current project. 

I use text[] to represent localized strings, str[1] is always English and
str[2] is always French. When I need to expand to more languages I'll add
str[3] for Spanish or whatever else. It would be a problem if I stored
something in str[2] and then found it in str[1] later. And it could be a bit
awkward to have to prefill str[3] everywhere in the whole database when the
time comes. Having it just silently return NULL would be more convenient.

I also use arrays for sets in a cache table. In that case there would never be
NULLs and the arrays are variable sized. Sometimes with thousands of entries.
The purpose of the cache table is to speed things up so storing the arrays
densely is important.

-- 
greg



Re: Inconsistent behavior on Array & Is Null?

From
Joe Conway
Date:
Greg Stark wrote:
> Joe Conway <mail@joeconway.com> writes:
>>I agree. I had always envisioned something exactly like that once we supported
>>NULL elements. As far as the implementation goes, I think it would be very
>>similar to tuples -- a null bitmask that would exist if any elements are NULL.
> 
> Well you might still want to store an internal "all indexes below this are
> null". That way update foo set a[1000]=1 doesn't require storing even a bitmap
> for the first 999 elements. Though might make maintaining the bitmap kind of a
> pain. Maintaining the bitmap might be kind of a pain anyways though because
> unlike tuples the array size isn't constant.

I don't think it will be worth the complication to do other than a 
straight bitmap -- at least not the first attempt.

>>A related question is how to deal with non-existing array elements. Until now,
>>you could do:
> 
> I would have to think about it some more, but my first reaction is that
> looking up [0] should generate an error if there can never be a valid entry at
> [0]. But looking up indexes above the highest index should return NULL.
> 
> There are two broad use cases I see for arrays. Using them to represent tuples
> where a[i] means something specific for each i, and using them to represent
> sets where order doesn't matter.
> 
> In the former case I might want to initialize my column to an empty array and
> set only the relevant columns as needed. In that case returning NULL for
> entries that haven't been set yet whether they're above the last entry set or
> below is most consistent.

Maybe, but you're still going to need to explicitly set the real upper 
bound element in order for the length/cardinality to be correct. In 
other words, if you really want an array with elements 1 to 1000, but 2 
through 1000 are NULL, you'll need to explicitly set A[1000] = NULL; 
otherwise we'll have no way of knowing that you really want 1000 
elements. Perhaps we'll want some kind of array_init function to create 
an array of a given size filled with all NULL elements (or even some 
arbitrary constant element).

I'd think given the preceding, it would make more sense to throw an 
error whenever trying to access an element greater than the length.

> In the latter case you really don't want to be looking up anything past the
> end and don't want to be storing NULLs at all. So it doesn't really matter
> what the behaviour is for referencing elements past the end, but you might
> conceivably want to write code like "while (e = a[i++]) ...".

See reasoning as above. And if you did somehow wind up with a "real" 
NULL element in this scenario, you'd never know about it. The looping 
could always be:  while (i++ <= length)
or  for (i = 1; i <= length, i++)

Joe



Re: Inconsistent behavior on Array & Is Null?

From
Greg Stark
Date:
Joe Conway <mail@joeconway.com> writes:

> I'd think given the preceding, it would make more sense to throw an error
> whenever trying to access an element greater than the length.

For an analogous situation in SQL I would propose

select (select foo from bar where xyz);

if there are no records in bar it returns NULL. Only if there are multiple
records in bar or some sort of error in the subquery does it produce an error.

Does SQL-99 not say anything about this case? It seems like the kind of thing
a standard should specify.

-- 
greg