Thread: Null row vs. row of nulls in plpgsql

Null row vs. row of nulls in plpgsql

From

Tom Lane

Date:

27 September 2008, 15:56:42

I looked a bit at the bug report here:
http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php

ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, "null::somerowtype") and a
row of null values (eg, "row(null,null,...)::somerowtype").  When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.

For plpgsql's RECORD variables this doesn't seem hard to fix: just
take out the code in exec_move_row() that manufactures a row of nulls
when the input is null, and maybe make a few small adjustments
elsewhere.  For ROW variables there's a bigger problem, because those
are represented by a list of per-field variables, which doesn't
immediately offer any way to represent overall nullness.  I think it
could be dealt with by adding an explicit "the row as a whole is null"
flag to struct PLpgSQL_row.  I haven't tried to code it though, so I'm
not sure if there are gotchas or unreasonably large code changes needed
to make it happen.

I thought for a little bit about whether we couldn't get rid of ROW
variables entirely, or at least make them work more like RECORD variables
by storing a HeapTuple instead of a list of per-field variables.  But
I soon found out that the reason to have them is to be able to describe
the assignment target of SQL statements that assign to multiple scalar
variables, eg "SELECT ... INTO x,y,z".

Comments?
        regards, tom lane

Fwd: Null row vs. row of nulls in plpgsql

From

"Oleg Serov"

Date:

27 September 2008, 16:59:56

---------- Forwarded message ----------
From: Oleg Serov <serovov@gmail.com>
Date: 2008/9/27
Subject: Re: Null row vs. row of nulls in plpgsql
To: Tom Lane <tgl@sss.pgh.pa.us>

I'm newbie, but i think that adding bool flag to PLpgSQL_row isnull will handle the problem(like in PLpgSQL_var);

2008/9/27 Tom Lane <tgl@sss.pgh.pa.us>

I looked a bit at the bug report here:
http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php

ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, "null::somerowtype") and a
row of null values (eg, "row(null,null,...)::somerowtype"). When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.

For plpgsql's RECORD variables this doesn't seem hard to fix: just
take out the code in exec_move_row() that manufactures a row of nulls
when the input is null, and maybe make a few small adjustments
elsewhere. For ROW variables there's a bigger problem, because those
are represented by a list of per-field variables, which doesn't
immediately offer any way to represent overall nullness. I think it
could be dealt with by adding an explicit "the row as a whole is null"
flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm
not sure if there are gotchas or unreasonably large code changes needed
to make it happen.

I thought for a little bit about whether we couldn't get rid of ROW
variables entirely, or at least make them work more like RECORD variables
by storing a HeapTuple instead of a list of per-field variables. But
I soon found out that the reason to have them is to be able to describe
the assignment target of SQL statements that assign to multiple scalar
variables, eg "SELECT ... INTO x,y,z".

Comments?

regards, tom lane

Re: Null row vs. row of nulls in plpgsql

From

Hannu Krosing

Date:

27 September 2008, 17:43:06

On Sat, 2008-09-27 at 14:56 -0400, Tom Lane wrote:
> I looked a bit at the bug report here:
> http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php
> 
> ISTM that the fundamental problem is that plpgsql doesn't distinguish
> properly between a null row value (eg, "null::somerowtype") and a
> row of null values (eg, "row(null,null,...)::somerowtype").  When that
> code was designed, our main SQL engine was pretty fuzzy about the
> difference too, but now there is a clear semantic distinction.
> 
> For plpgsql's RECORD variables this doesn't seem hard to fix: just
> take out the code in exec_move_row() that manufactures a row of nulls
> when the input is null, and maybe make a few small adjustments
> elsewhere.  For ROW variables there's a bigger problem, because those
> are represented by a list of per-field variables, which doesn't
> immediately offer any way to represent overall nullness.  I think it
> could be dealt with by adding an explicit "the row as a whole is null"
> flag to struct PLpgSQL_row.  I haven't tried to code it though, so I'm
> not sure if there are gotchas or unreasonably large code changes needed
> to make it happen.
> 
> I thought for a little bit about whether we couldn't get rid of ROW
> variables entirely, or at least make them work more like RECORD variables
> by storing a HeapTuple instead of a list of per-field variables.  But
> I soon found out that the reason to have them is to be able to describe
> the assignment target of SQL statements that assign to multiple scalar
> variables, eg "SELECT ... INTO x,y,z".

How hard would it be to have a RECORD that has pointers to those
multiple scalar variables ?

Referring again to my favorite ordinary programming language python, you
can have a very elegant way of assigning a "record" (a tuple in
pythonese) to a set of variables and vice versa

>>> rec = 1,2,3
>>> rec
(1, 2, 3)
>>> a,b,c = rec
>>> a
1
>>> c
3
>>> c,b,a
(3, 2, 1)

In other words, tuples are more or less automatically composed and
decomposed on demand.

I have not yet looked how hard the implementation of this would be for
postgreSQL, but at least the concept should be applicable.

----------------
Hannu

Re: Null row vs. row of nulls in plpgsql

From

Tom Lane

Date:

27 September 2008, 18:01:10

Hannu Krosing <hannu@2ndQuadrant.com> writes:
> Referring again to my favorite ordinary programming language python, you
> can have a very elegant way of assigning a "record" (a tuple in
> pythonese) to a set of variables and vice versa

There are already perfectly good ways to do that in plpgsql, vizrowvar := row(x,y,z);select rowvar.* into x,y,z;
        regards, tom lane

Re: Null row vs. row of nulls in plpgsql

From

Greg Stark

Date:

28 September 2008, 06:06:53

Iirc the reason for this fuzziness came from the SQL spec definition  
of IS NULL for rows. As long as you maintain that level of spec- 
compliance I don't think there are any other important constraints on  
pg behaviour.

greg

--sorry for the top posting but the phone makes it hard to do anything  
else.

On 27 Sep 2008, at 09:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I looked a bit at the bug report here:
> http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php
>
> ISTM that the fundamental problem is that plpgsql doesn't distinguish
> properly between a null row value (eg, "null::somerowtype") and a
> row of null values (eg, "row(null,null,...)::somerowtype").  When that
> code was designed, our main SQL engine was pretty fuzzy about the
> difference too, but now there is a clear semantic distinction.
>
> For plpgsql's RECORD variables this doesn't seem hard to fix: just
> take out the code in exec_move_row() that manufactures a row of nulls
> when the input is null, and maybe make a few small adjustments
> elsewhere.  For ROW variables there's a bigger problem, because those
> are represented by a list of per-field variables, which doesn't
> immediately offer any way to represent overall nullness.  I think it
> could be dealt with by adding an explicit "the row as a whole is null"
> flag to struct PLpgSQL_row.  I haven't tried to code it though, so I'm
> not sure if there are gotchas or unreasonably large code changes  
> needed
> to make it happen.
>
> I thought for a little bit about whether we couldn't get rid of ROW
> variables entirely, or at least make them work more like RECORD  
> variables
> by storing a HeapTuple instead of a list of per-field variables.  But
> I soon found out that the reason to have them is to be able to  
> describe
> the assignment target of SQL statements that assign to multiple scalar
> variables, eg "SELECT ... INTO x,y,z".
>
> Comments?
>
>            regards, tom lane
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: Null row vs. row of nulls in plpgsql

From

Hannu Krosing

Date:

28 September 2008, 08:45:15

On Sun, 2008-09-28 at 04:03 +0300, Greg Stark wrote:
> Iirc the reason for this fuzziness came from the SQL spec definition  
> of IS NULL for rows. As long as you maintain that level of spec- 
> compliance I don't think there are any other important constraints on  
> pg behaviour.

What does SQL spec say about recursive IS NULL for rows ? 

Should we check that IS NULL is true for each row element, or must they
actually be NULL's ?

hannu=# select row(null, null) is NULL;?column? 
----------t
(1 row)

hannu=# select row(null, row(null, null)) is NULL;?column? 
----------f
(1 row)

--------------
Hannu

Re: Null row vs. row of nulls in plpgsql

From

Tom Lane

Date:

28 September 2008, 21:28:41

Greg Stark <greg.stark@enterprisedb.com> writes:
> On 27 Sep 2008, at 09:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ISTM that the fundamental problem is that plpgsql doesn't distinguish
>> properly between a null row value (eg, "null::somerowtype") and a
>> row of null values (eg, "row(null,null,...)::somerowtype").  When that
>> code was designed, our main SQL engine was pretty fuzzy about the
>> difference too, but now there is a clear semantic distinction.

> Iirc the reason for this fuzziness came from the SQL spec definition  
> of IS NULL for rows. As long as you maintain that level of spec- 
> compliance I don't think there are any other important constraints on  
> pg behaviour.

I started to poke into this and found out that it was a bit subtler than
I thought.  It'd be possible to associate a "rowisnull" state value
with a row variable, but the problem is that plpgsql treats the row
fields as independent variables that can be accessed without touching
the row.  In particular you can assign null or nonnull values to
individual fields.  So consider
-- presumably, this'll set rowisnull to TRUE:rowvar := NULL;-- this had better cause rowisnull to become
FALSE:rowvar.field1:= 42;-- does this cause it to become TRUE again?rowvar.field1 := NULL;
 

There are a bunch of implementation problems with making any such
behavior happen, since the row field variables don't currently "know"
that they are members of a row, and indeed it's possible for the same
variable to be a member of more than one row.  But the core issue is
that this interaction seems to fuzz the distinction between "row is
null" and "all the row's elements are null".  In particular, if you
think that rowisnull should be TRUE after the above sequence, then
I think you are saying they are the same thing.  So maybe the spec
authors are smarter than we are.

Thoughts?  What would a consistent behavior look like?
        regards, tom lane

Re: Null row vs. row of nulls in plpgsql

From

"Pavel Stehule"

Date:

29 September 2008, 04:55:45

2008/9/29 Tom Lane <tgl@sss.pgh.pa.us>:
> Greg Stark <greg.stark@enterprisedb.com> writes:
>> On 27 Sep 2008, at 09:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> ISTM that the fundamental problem is that plpgsql doesn't distinguish
>>> properly between a null row value (eg, "null::somerowtype") and a
>>> row of null values (eg, "row(null,null,...)::somerowtype").  When that
>>> code was designed, our main SQL engine was pretty fuzzy about the
>>> difference too, but now there is a clear semantic distinction.
>
>> Iirc the reason for this fuzziness came from the SQL spec definition
>> of IS NULL for rows. As long as you maintain that level of spec-
>> compliance I don't think there are any other important constraints on
>> pg behaviour.
>
> I started to poke into this and found out that it was a bit subtler than
> I thought.  It'd be possible to associate a "rowisnull" state value
> with a row variable, but the problem is that plpgsql treats the row
> fields as independent variables that can be accessed without touching
> the row.  In particular you can assign null or nonnull values to
> individual fields.  So consider
>
>        -- presumably, this'll set rowisnull to TRUE:
>        rowvar := NULL;
>        -- this had better cause rowisnull to become FALSE:
>        rowvar.field1 := 42;
>        -- does this cause it to become TRUE again?
>        rowvar.field1 := NULL;


this sequence is wrong. in SQL rowvar has same behave as pointer. When
you would to fill rowvar you should to call constructor first.

some like
rowvar := NULL; -- null value
rowvar := constructor(null);
rowvar := constructor();
rowvar.field = 42;


regards
Pavel Stehule




>
> There are a bunch of implementation problems with making any such
> behavior happen, since the row field variables don't currently "know"
> that they are members of a row, and indeed it's possible for the same
> variable to be a member of more than one row.  But the core issue is
> that this interaction seems to fuzz the distinction between "row is
> null" and "all the row's elements are null".  In particular, if you
> think that rowisnull should be TRUE after the above sequence, then
> I think you are saying they are the same thing.  So maybe the spec
> authors are smarter than we are.
>
> Thoughts?  What would a consistent behavior look like?
>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Null row vs. row of nulls in plpgsql

From

Bruce Momjian

Date:

07 January 2009, 23:16:40

I assume this is a TODO, right?

---------------------------------------------------------------------------

Tom Lane wrote:
> I looked a bit at the bug report here:
> http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php
> 
> ISTM that the fundamental problem is that plpgsql doesn't distinguish
> properly between a null row value (eg, "null::somerowtype") and a
> row of null values (eg, "row(null,null,...)::somerowtype").  When that
> code was designed, our main SQL engine was pretty fuzzy about the
> difference too, but now there is a clear semantic distinction.
> 
> For plpgsql's RECORD variables this doesn't seem hard to fix: just
> take out the code in exec_move_row() that manufactures a row of nulls
> when the input is null, and maybe make a few small adjustments
> elsewhere.  For ROW variables there's a bigger problem, because those
> are represented by a list of per-field variables, which doesn't
> immediately offer any way to represent overall nullness.  I think it
> could be dealt with by adding an explicit "the row as a whole is null"
> flag to struct PLpgSQL_row.  I haven't tried to code it though, so I'm
> not sure if there are gotchas or unreasonably large code changes needed
> to make it happen.
> 
> I thought for a little bit about whether we couldn't get rid of ROW
> variables entirely, or at least make them work more like RECORD variables
> by storing a HeapTuple instead of a list of per-field variables.  But
> I soon found out that the reason to have them is to be able to describe
> the assignment target of SQL statements that assign to multiple scalar
> variables, eg "SELECT ... INTO x,y,z".
> 
> Comments?
> 
>             regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: Null row vs. row of nulls in plpgsql

From

Tom Lane

Date:

08 January 2009, 01:19:35

Bruce Momjian <bruce@momjian.us> writes:
> I assume this is a TODO, right?

Yah.
        regards, tom lane