Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable - Mailing list pgsql-bugs

From Linas Vepstas
Subject Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable
Date
Msg-id CAHrUA35ytxJZ1z-Ahiwo8sfybepUgmQOqJi3m2K2jX5DRcBR3Q@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] BUG #14494: Regression - Null arrays are not queryable  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-bugs
OK. One of these is a change in behavior from earlier versions; I won't be able to double-check to provide details for at least a few weeks. (or have any time for other discussion)

In my case, I am layering a graph DB (a hypergraph DB) on top of SQL. The array is a list of primary keys. Think of a "multi-tree" or a "forest": a bunch of trees (DAG's) but these may share nodes. Thus all rows are nodes, the array is all the children of the node, and leaf-nodes have no children: either an empty array or a null array or a zero-length array, I'm not picky on the representation.

--linas

On Fri, Feb 17, 2017 at 7:08 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Feb 17, 2017 at 5:17 PM, Linas Vepstas <linasvepstas@gmail.com> wrote:

On Fri, Jan 13, 2017 at 3:05 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-01-13 18:38:24 +0000, linasvepstas@gmail.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      14494
> Logged by:          Linas Vepstas
> Email address:      linasvepstas@gmail.com
> PostgreSQL version: 9.6.1
> Operating system:   Debian unstable
> Description:
>
> Array handling appears to have regressed,

Regressed from what?  Because these don't look like they'd have changed
recently.

I'd have to do
​​
forensics, but the two other versions I use are 8.4 and 9.1

Let me explain it this way: programmers have a mental model for strings (in C/C++, or in python or in perl, etc): they have a length, and in C/C++, they are null-terminated. There is a concept of a null string: a string of length zero: its a valid string, that you can treat like any other. Its not exceptional, or different in any way.  Standard string ops work on them: they behave like strings of length zero.

The mental model of an array would be similar: an array of length zero, but is still an array, and can be handled using the same kind of code that non-zero length arrays use.  One does not have to write special-case code for the empty array (in must programming languages).

In postgres, zero-length arrays appear to be confusing, non-standard. Special-case code needs to be written to handle them, since they are exceptional. This doubles the amount of unit-testing required, and doubles the attack surface for bugs. It is an unintuitive and surprising situation, for those of us, like me, who have data that contains arrays, some of which are zero-legnth arrays.

I respect that you might want to design postgres the way way it is currently, but it is counter-intuitive and surprising, and opens the door to surprising behaviors and bugs in user-land.  It doesn't "smell" like a rational design choice; instead, it smells like something arbitrary or accidental.

--linas

 
It is a shared belief (in differing strengths) that our array behavior is less than ideal - but it is functional and the opinions of people already using our software and have working applications based around the present behavior are afforded considerable weight when deciding​ whether to fix something not really broken but also not ideal.  Usually the outcome is that we leave things unchanged.

Also, you speak of an array of having "a length" - but that isn't true in PostgreSQL.  Arrays are multi-dimensional.  So at least at a distance you are constructing a flawed mental model of an array based solely upon experience with one-dimensional arrays.

In any case that only addresses:

select * from foo where array_length(outgoing,1)=0 ;

IOW - you make good points but so does PostgreSQL, and given present information pragmatism trumps ideology.

That said, if you actually supply some real examples of what you are doing you might find people willing to point out potentially better ways to do the same thing in PostgreSQL without having to worry about corner cases.

SQL has a concept of null for which we need to retain as consistent handling as possible.  That covers your other items.  "NULL" is not the same as <empty> no matter what the underlying data type is (arrays, strings, etc...).  This covers the other two "unexpecteds".

I would advise you minimize how often NULL appears in your data.  You already seem comfortable with the concept of "empty" and in many situations an empty something is exactly what is meant to be conveyed by the use of NULL.

If you really want to debate/explore the topic the -general list is the proper choice.  This list should be used for true bug reporting which you have not yet done.  Unintended and undocumented regressions are bugs but so far that hasn't been shown to be the case.  The rest of the exposition is discussion, not bug reporting or exploring.

David J.


pgsql-bugs by date:

Previous
From: "Casey Witt"
Date:
Subject: Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"
Next
From: Wei Congrui
Date:
Subject: Re: [BUGS] BUG #14549: pl/pgsql parser