Thread: Outer joins and NULLs (old subject "ERROR: failed to find conversion function from key_vals_nn to record[]")

bryn@yugabyte.com wrote:

david.g.johnston@gmail.com wrote:

bryn@yugabyte.com wrote:

Can anybody show me an implementation of a realistic use case that follows proper practice — like "every table must a primary key", "a foreign key must refer to a primary key", and "joins may be made only "on" columns one of which has a PK constraint and the other of which has a FK constraint" — where using a not nullable data type brings a problem that wouldn't occur if the column were defined with a nullable data type and an explicit "not null" constraint?

Nothing obvious comes to mind. But frankly:

proper practice includes trying to write idiomatic code for the language you are using so others familiar with the language can learn your code more easily. You are violating this to an extreme degree.

I do not think it to be a good trade-off. SQL writers are practical people and the idioms largely avoid any downsides that the arise from SQL not being some paragon of language design.

-- "\d genres" shows "gk" with a "not null" constraint, whether I write it
-- or not. And convention seems to say "don't clutter you code by writing it".
create table genres(
  gk  int   primary key,
  gv  text  not null
  );

"Primary Key" is defined to be the application of both UNIQUE and NOT NULL constraints...

Yes, I know what "primary key" implies. I meant only to emphasize that the source column for what the "outer join" projects has a not null constraint, that it doesn't apply to the projection of that column, that this is perfectly understandable, and that this isn't a problem. Never mind.

Peter, your email:

www.postgresql.org/message-id/20220618064453.wtz4hxyeptwdh37z%40hjp.at

ended up in my junk folder. (This happens often, but randomly, with emails sent to pgsql-general—and I can't control it.) I read it quickly. And then when I returned to read it more carefully I managed to delete it—irrevocably.

The discussion has wandered so far from the original topic that it seemed just as well to start a new thread with this.


But it would be a problem if there was an actual type which wouldn't include NULL.

The NOT NULL attribute is an attribute of the column, not the type. When you use the primary key (or any other column marked as NOT NULL) the type of the result is just the type of that column, the NOT NULL is dropped.

It seems to me that my new tescase reveals one clear bug and one outcome that seems to me to be a bug. Others might argue that neither is a bug. Of course, I'll be happy to be corrected—especially if I did some typos in my SQL or misinterpreted what I saw.

————————————————————————————————————————————————————————————————————————————————

Thank you for that example. It stimulated me to think harder than I had before and to design a brand new testcase. I didn't need to refer to your example when I wrote what follows.

I'm using the term "view" here as a shorthand for « the results from a "select" ». And I used real views in my testcase to make the programming easier.

I copied my self-contained testcase below so that you can simply run it "as is". (You'll have to add the "drop" statements that you need.)

Here's my interpretation of the testcase output:

Self-evidently, a view does *not* inherit constraints from the columns of its base table(s).

A view on a single table doesn't necessarily inherit the data types of its base table's columns. Rather, the view compilation's analysis is *sometimes* clever enough to notice when a projected column might have a NULL even when the base column doesn't allow NULLs. In this case, if the base column's data type is (in my example) the domain "text_nn", then the corresponding column in the view is given the data type plain "text". My test that uses a single table shows this.

However, the compilation's analysis for a view on a join (at least when it's a two-table "outer join") slavishly inherits the data types from all of the referenced columns—even when the human can easily predict that some projected columns might have NULLs.

It seems to me that this optimistic design choice was unfortunate—and that a pessimistic choice would have been better:

— when the analysis cannot predict the outcome, replace the data type of *every* column that has a "not null domain" data type with the domain's base data type.

But I accept that this cannot be changed now. Might it be possible (in some future PG release) to make the analysis clever enough to deal with the issue at hand (as it already does in my single-table example)?

With "insert-select", you (self-evidently) define the data types and constraints of the target table's columns explicitly, reflecting your analysis of what you expect. Of course, then, the "insert-select" must respect the target table's definition—and there's no paradox. (But see "Finally" below for the "insert-select from self" case.)

But with CTAS, the target table mechanically inherits the data types that the view's columns have been given. Moreover (and this hugely surprises me), the mechanics of CTAS do not respect the constraint that my "text_nn" brings, for some column(s), but somehow manage to "tunnel under" this but even so give the resulting table's corresponding column the "text_nn" data type.

What is the reasoning that led to implementing this behavior? (Notice that any subsequent insert into the table (even using "insert-select") *does* respect the constrains that "not null domains" bring—except for the caveat that I expose in "Finally".

This is the paradox that we've been discussing.

I had expected errors where I now see the "tunneling under". I realize now that I had never tested this explicitly.

It would be foolish, therefore, to define the target table for "insert-select" using "CTAS where false".

B.t.w., and orthogonal to the point here, with CTAS you always have to alter the resulting table manually after the fact to add constraints. But with "insert-select", you can choose whether to define the constraints as part of "create table" or, using "alter table", after the "insert".

If you want the famous advantages of CTAS over "insert-select", you can easily decorate the defining "select list" with appropriate typecasts.

Finally, 

I show that a subsequent "insert" into the table that (automatically) has been given "text_nn" for some column *does* respect the constraint that this brings:

(1) When it defines the to-be-inserted data explicitly with "values".

(2) When the "insert-select" source is a different table from the target.

However, the "text_nn" constraint is *not* respected here:

(3) When the "insert-select" source is the source table itself.

————————————————————————————————————————————————————————————————————————————————

\pset null '~~'

/*
  HERE'S WHAT I SEE AT THE psql PROMPT;

                 Table "u1.st_target"
   Column |  Type   | Collation | Nullable | Default 
  --------+---------+-----------+----------+---------
   k      | integer |           |          | 
   v1     | text_nn |           |          | 
   v2     | text    |           |          | 

                 Table "u1.oj_target"
   Column |  Type   | Collation | Nullable | Default 
  --------+---------+-----------+----------+---------
   k      | integer |           |          | 
   v1     | text_nn |           |          | 
   v2     | text_nn |           |          | 

  psql:testcase.sql:108: ERROR:  domain text_nn does not allow null values
  psql:testcase.sql:116: ERROR:  domain text_nn does not allow null values
    k  |   v1   |   v2    
  -----+--------+---------
    10 | book-1 | genre-1
    20 | book-2 | genre-1
    30 | book-3 | genre-1
    40 | book-4 | genre-2
    50 | book-5 | ~~
   110 | book-1 | genre-1
   120 | book-2 | genre-1
   130 | book-3 | genre-1
   140 | book-4 | genre-2
   150 | book-5 | ~~
*/;

-- Single table
create domain text_nn as text not null;

create table st_source_tab (
  k   int primary key,
  v1  text_nn,
  v2  text_nn);

insert into st_source_tab(k, v1, v2) values
 (10, 'book-1', 'genre-1'),
 (20, 'book-2', 'genre-1'),
 (30, 'book-3', 'genre-1'),
 (40, 'book-4', 'genre-2'),
 (50, 'book-5', '');

create view st_source_view(k, v1, v2) as
select
  k,
  v1,
  case
    when k = 50 then null
    else        v2
  end case
from st_source_tab;

create table st_target as
select k, v1, v2
from st_source_view;

-- \d st_source_view
-- Same as "st_source_view", of course
-- Notice that "v2" is plain "text". But its source column is "text_nn".
\d st_target

----------------------------------------------------------------------
-- Outer join

create table genres(
  pk     int primary key,
  genre  text_nn);

insert into genres(pk, genre) values
  (1, 'genre-1'),
  (2, 'genre-2'),
  (3, 'genre-3');

create table books(
  pk        int primary key,
  title     text_nn,
  genre_pk  int references genres(pk));

insert into books(pk, title, genre_pk) values
  (10, 'book-1', 1),
  (20, 'book-2', 1),
  (30, 'book-3', 1),
  (40, 'book-4', 2),
  (50, 'book-5', null);

-- Make the column names use-case agnostic and match those of st_source_view.
create view oj_source_view(k, v1, v2) as
select
  b.pk,
  b.title,
  g.genre
from
  books b
  left outer join
  genres g
  on b.genre_pk = g.pk;

create table oj_target as
select k, v1, v2
from oj_source_view;

-- \d oj_source_view
-- Same as "oj_source_view", of course
-- Notice that "v2" is plain "text_nn", just as has been discussed.
\d oj_target

----------------------------------------------------------------------
-- Show that the contents of tables "st_target" and "oj_target"
-- are identical. Unsurprising. Just a sanity check.
do $body$
declare
  differ constant boolean :=
    (
    with
      a as (select * from st_target except select * from oj_target),
      b as (select * from oj_target except select * from st_target)
    select (exists(select 1 from a) or exists(select 1 from b))
    );
begin
  assert not differ, '"j_books" versus "r_books_j_view" test failed';
end;
$body$;

----------------------------------------------------------------------
-- Finally: A BRAND NEW PARADOX.

-- Case 1: "insert" using explicit "values()" clause.
-- Fails with error 23502:
-- domain text_nn does not allow null values
insert into oj_target(k, v1, v2) values
 (99, 'book-99', null);

-- Case 2:
-- "insert-select" from non-self.
-- Fails with error 23502:
-- domain text_nn does not allow null values
insert into oj_target
select (k + 200), v1, v2
from st_target;

-- Case 3:
-- "insert-select" from self.
-- SILENTLY SUCCEEDS!
insert into oj_target
select (k + 100), v1, v2
from oj_target;

-- End result.
select k, v1, v2 from oj_target order by k;

Bryn Llewellyn <bryn@yugabyte.com> writes:
> Self-evidently, a view does *not* inherit constraints from the columns of its base table(s).

Check.

> A view on a single table doesn't necessarily inherit the data types of its base table's columns. Rather, the view
compilation'sanalysis is *sometimes* clever enough to notice when a projected column might have a NULL even when the
basecolumn doesn't allow NULLs. In this case, if the base column's data type is (in my example) the domain "text_nn",
thenthe corresponding column in the view is given the data type plain "text". My test that uses a single table shows
this.

This is nonsense.  The parser does not account for domain constraints in
that way.  It would be incorrect to do so, because then the view's data
types could need to change as a consequence of adding/dropping domain
constraints.  I think that your result is actually just an illustration
of the rules in
    https://www.postgresql.org/docs/current/typeconv-union-case.html
about how the output type of a CASE expression is determined ---
specifically, that domains are smashed to base types as soon as
the CASE arms are discovered to not be all of the same type.

            regards, tom lane



On Sun, Jun 19, 2022 at 2:31 PM Bryn Llewellyn <bryn@yugabyte.com> wrote:
It would be foolish, therefore, to define the target table for "insert-select" using "CTAS where false".

SQL is a strongly typed language where the structure of the query output is determined without any consideration of whether said query returns zero, one, or many rows.  Because of this property it is entirely consistent that a CTAS produces a table even if the query execution produces zero rows.

That CTAS chooses to not try and produce constraints on the newly created table by inferring them from the underlying query seems like a reasonable trade-off between functionality and effort.  It simply stops at "data types" and doesn't care for how any given one is implemented.  That domains have constraints is incidental to the entire design.

Allowing domains to be defined as not null at this point is simply something that we (IMO, the documentation is not this strongly worded) don't support but don't error out upon in the interest of backward compatibility.  It, as you note, has some corner-case bugs.  You can avoid those bugs by simply not using a non-null constraint as suggested.

As for the "same source" optimization: the documentation reads - "For example, this can happen in an outer-join query,...", the bug-fix here is to simply add this situation as a second example.  However, it is reasonably considered correct that a record you just read from a table should be able to be written back to said table unchanged.  The "error" is that we allowed the record to exist in the first place, but we absolved ourselves of responsibility with the caveats on the CREATE DOMAIN page.  Subsequent consequences of that mis-use are likewise attributed to said mis-use and are on the user's head for allowing their code to produce the problematic behavior.

At most we should probably go from saying "Best practice therefore..." to "We no longer support setting a not null constraint on a domain but will not error in the interest of not breaking existing uses that are careful to avoid the problematic corner-cases".

David J.
david.g.johnston@gmail.com wrote:

Allowing domains to be defined as not null at this point is simply something that we don't support but don't error out upon in the interest of backward compatibility. (IMO, the documentation is not this strongly worded.) It, as you note, has some corner-case bugs. You can avoid those bugs by simply not using a non-null constraint as suggested.

...At most we should probably go from saying "Best practice therefore..." to "We no longer support setting a not null constraint on a domain but will not error in the interest of not breaking existing uses that are careful to avoid the problematic corner-cases”.

Thank you all for helping me see the problem here. I’d certainly welcome strengthening the doc’s admonition to use wording like "unsupported", "unpredictable results", and "just don’t do this".

I was simply confused. There's no other way to say it. Anyway, my foolish use of domains with "not null" constraints hadn't left the privacy of my own laptop—and I've expunged all those uses now.

Here’s what I now take from that "create domain" note:

When a datum (either an actual value or a NULL) is copied from a "container" whose data type is the domain "d" to another container with that same data type, the constraints that might have been defined for "d" are not re-checked.

I'm using "container" here as an ad hoc, and somewhat loose, umbrella term of art for any of these:

—row-column intersection in a table (or view)
—an attribute of a composite type instance
—a local variable in a PL/pgSQL subprogram or anonymous block 
—a PL/pgSQL subprogram's formal parameter
—a PL/pgSQL subprogram's return datum
—and so on

I quite like this compact illustration of the paradox. (I expect that everybody has their own favorite.)

create domain text_nn as text not null;

create view null_having_null_not_constraint(t_nn, dt) as
with
  c1(t_nn) as (
    values('dog'::text_nn)),

  c2(t_nn) as (
    select (select t_nn from c1 where null))

select
  t_nn, pg_typeof(t_nn)
from c2;


\pset null '~~~~'
select t_nn, dt from null_having_null_not_constraint;

This is the result:

 t_nn |   dt    
------+---------
 ~~~~ | text_nn

And I quite like this demo of the fact that copying a datum between containers with the same constrained domain data type doesn't re-check the constraints:

do $body$
declare
  t_var_nn text_nn := '';
begin
  t_var_nn := (select t_nn from null_having_null_not_constraint);
  assert (t_var_nn is null), 'Expected "t_var_nn" (paradoxically) to be NULL here';
end;
$body$;

I'll use « the "paradoxical" pattern », below, to denote the fact that you can find a NULL in a container whose datatype has a "not null" constraint.

 Here's the summary of my tests:

 ————————————————————————————————————————————————|—————————————————————————————————————
|                                                |                                     |
|  (1) select t_nn, dt from                      |  Follows the "paradoxical" pattern  |
|      null_having_null_not_constraint;          |                                     |
|                                                |                                     |
—————————————————————————————————————————————————|——————————————————————————————————————
|                                                |                                     |
|  (2) select t_nn... into                       |                                     |
|      text_nn local variable                    |  Follows the "paradoxical" pattern  |
|                                                |                                     |
—————————————————————————————————————————————————|——————————————————————————————————————
|                                                |                                     |
|  (3) select t_nn... into                       |                                     |
|      text_nn field in composite type           |  Follows the "paradoxical" pattern  |
|                                                |                                     |
—————————————————————————————————————————————————|——————————————————————————————————————
|                                                |                                     |
|  (4) assign (select t_nn...) to                |                                     |
|      text_nn IN formal                         |  Follows the "paradoxical" pattern  |
|                                                |                                     |
|————————————————————————————————————————————————|——————————————————————————————————————
|                                                |                                     |
|  (5) returning t_nn in function that           |                                     |
|      returns "text_nn"                         |  Follows the "paradoxical" pattern  |
|                                                |                                     |
|————————————————————————————————————————————————|——————————————————————————————————————
|                                                |                                     |
|  (6) select t_nn ... into                      |  UNSURPRISING                       |
|      unconstrained text local variable         |  "domain text_nn does not           |
|      returning text_nn                         |     allow null values"              |
|                                                |                                     |
|————————————————————————————————————————————————|—————————————————————————————————————

I expect that there's yet more tests that I might do. But there's no point because, as I hope That I've understood properly, the "paradoxical" pattern _can_ happen. So it's unimportant to discover every case where it _does_ happen.

Finally, nothing in these discussions has convinced me that there are two kinds of NULL. I continue to believe that "there is no information available" has no nuances. But I do see that there are (at least) two ways that NULL, with this meaning, can be produced:

Either, an explicit assignment says "I have no information"; or (as the "scalar subquery where false" and the "outer join" SQLs have shown, the evaluation of a datum that the query produces concludes "I have no information".