Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable - Mailing list pgsql-bugs

From David G. Johnston
Subject Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable
Date
Msg-id CAKFQuwZ5bta1T7q9rHh=iQvuERfxSegxVZ-OQ4Wf9cfG02YsEg@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable  (Linas Vepstas <linasvepstas@gmail.com>)
Responses Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable  (Linas Vepstas <linasvepstas@gmail.com>)
List pgsql-bugs
On Fri, Feb 17, 2017 at 5:17 PM, Linas Vepstas <linasvepstas@gmail.com> wrote:

On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-01-13 18:38:24 +0000, linasvepstas@gmail.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      14494
> Logged by:          Linas Vepstas
> Email address:      linasvepstas@gmail.com
> PostgreSQL version: 9.6.1
> Operating system:   Debian unstable
> Description:
>
> Array handling appears to have regressed,

Regressed from what?  Because these don't look like they'd have changed
recently.

I'd have to do
​​
forensics, but the two other versions I use are 8.4 and 9.1

Let me explain it this way: programmers have a mental model for strings (in C/C++, or in python or in perl, etc): they have a length, and in C/C++, they are null-terminated. There is a concept of a null string: a string of length zero: its a valid string, that you can treat like any other. Its not exceptional, or different in any way.  Standard string ops work on them: they behave like strings of length zero.

The mental model of an array would be similar: an array of length zero, but is still an array, and can be handled using the same kind of code that non-zero length arrays use.  One does not have to write special-case code for the empty array (in must programming languages).

In postgres, zero-length arrays appear to be confusing, non-standard. Special-case code needs to be written to handle them, since they are exceptional. This doubles the amount of unit-testing required, and doubles the attack surface for bugs. It is an unintuitive and surprising situation, for those of us, like me, who have data that contains arrays, some of which are zero-legnth arrays.

I respect that you might want to design postgres the way way it is currently, but it is counter-intuitive and surprising, and opens the door to surprising behaviors and bugs in user-land.  It doesn't "smell" like a rational design choice; instead, it smells like something arbitrary or accidental.

--linas

 
It is a shared belief (in differing strengths) that our array behavior is less than ideal - but it is functional and the opinions of people already using our software and have working applications based around the present behavior are afforded considerable weight when deciding​ whether to fix something not really broken but also not ideal.  Usually the outcome is that we leave things unchanged.

Also, you speak of an array of having "a length" - but that isn't true in PostgreSQL.  Arrays are multi-dimensional.  So at least at a distance you are constructing a flawed mental model of an array based solely upon experience with one-dimensional arrays.

In any case that only addresses:

select * from foo where array_length(outgoing,1)=0 ;

IOW - you make good points but so does PostgreSQL, and given present information pragmatism trumps ideology.

That said, if you actually supply some real examples of what you are doing you might find people willing to point out potentially better ways to do the same thing in PostgreSQL without having to worry about corner cases.

SQL has a concept of null for which we need to retain as consistent handling as possible.  That covers your other items.  "NULL" is not the same as <empty> no matter what the underlying data type is (arrays, strings, etc...).  This covers the other two "unexpecteds".

I would advise you minimize how often NULL appears in your data.  You already seem comfortable with the concept of "empty" and in many situations an empty something is exactly what is meant to be conveyed by the use of NULL.

If you really want to debate/explore the topic the -general list is the proper choice.  This list should be used for true bug reporting which you have not yet done.  Unintended and undocumented regressions are bugs but so far that hasn't been shown to be the case.  The rest of the exposition is discussion, not bug reporting or exploring.

David J.

pgsql-bugs by date:

Previous
From: bjorn@eventmy.com
Date:
Subject: [BUGS] BUG #14552: tsquery converts AND operator into OR when nested insideOR operations
Next
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #14552: tsquery converts AND operator into OR when nested inside OR operations