Thread: Null row vs. row of nulls in plpgsql
I looked a bit at the bug report here: http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php ISTM that the fundamental problem is that plpgsql doesn't distinguish properly between a null row value (eg, "null::somerowtype") and a row of null values (eg, "row(null,null,...)::somerowtype"). When that code was designed, our main SQL engine was pretty fuzzy about the difference too, but now there is a clear semantic distinction. For plpgsql's RECORD variables this doesn't seem hard to fix: just take out the code in exec_move_row() that manufactures a row of nulls when the input is null, and maybe make a few small adjustments elsewhere. For ROW variables there's a bigger problem, because those are represented by a list of per-field variables, which doesn't immediately offer any way to represent overall nullness. I think it could be dealt with by adding an explicit "the row as a whole is null" flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm not sure if there are gotchas or unreasonably large code changes needed to make it happen. I thought for a little bit about whether we couldn't get rid of ROW variables entirely, or at least make them work more like RECORD variables by storing a HeapTuple instead of a list of per-field variables. But I soon found out that the reason to have them is to be able to describe the assignment target of SQL statements that assign to multiple scalar variables, eg "SELECT ... INTO x,y,z". Comments? regards, tom lane
---------- Forwarded message ----------
From: Oleg Serov <serovov@gmail.com>
Date: 2008/9/27
Subject: Re: Null row vs. row of nulls in plpgsql
To: Tom Lane <tgl@sss.pgh.pa.us>
From: Oleg Serov <serovov@gmail.com>
Date: 2008/9/27
Subject: Re: Null row vs. row of nulls in plpgsql
To: Tom Lane <tgl@sss.pgh.pa.us>
I'm newbie, but i think that adding bool flag to PLpgSQL_row isnull will handle the problem(like in PLpgSQL_var);
2008/9/27 Tom Lane <tgl@sss.pgh.pa.us>
I looked a bit at the bug report here:
http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php
ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, "null::somerowtype") and a
row of null values (eg, "row(null,null,...)::somerowtype"). When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.
For plpgsql's RECORD variables this doesn't seem hard to fix: just
take out the code in exec_move_row() that manufactures a row of nulls
when the input is null, and maybe make a few small adjustments
elsewhere. For ROW variables there's a bigger problem, because those
are represented by a list of per-field variables, which doesn't
immediately offer any way to represent overall nullness. I think it
could be dealt with by adding an explicit "the row as a whole is null"
flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm
not sure if there are gotchas or unreasonably large code changes needed
to make it happen.
I thought for a little bit about whether we couldn't get rid of ROW
variables entirely, or at least make them work more like RECORD variables
by storing a HeapTuple instead of a list of per-field variables. But
I soon found out that the reason to have them is to be able to describe
the assignment target of SQL statements that assign to multiple scalar
variables, eg "SELECT ... INTO x,y,z".
Comments?
regards, tom lane
On Sat, 2008-09-27 at 14:56 -0400, Tom Lane wrote: > I looked a bit at the bug report here: > http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php > > ISTM that the fundamental problem is that plpgsql doesn't distinguish > properly between a null row value (eg, "null::somerowtype") and a > row of null values (eg, "row(null,null,...)::somerowtype"). When that > code was designed, our main SQL engine was pretty fuzzy about the > difference too, but now there is a clear semantic distinction. > > For plpgsql's RECORD variables this doesn't seem hard to fix: just > take out the code in exec_move_row() that manufactures a row of nulls > when the input is null, and maybe make a few small adjustments > elsewhere. For ROW variables there's a bigger problem, because those > are represented by a list of per-field variables, which doesn't > immediately offer any way to represent overall nullness. I think it > could be dealt with by adding an explicit "the row as a whole is null" > flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm > not sure if there are gotchas or unreasonably large code changes needed > to make it happen. > > I thought for a little bit about whether we couldn't get rid of ROW > variables entirely, or at least make them work more like RECORD variables > by storing a HeapTuple instead of a list of per-field variables. But > I soon found out that the reason to have them is to be able to describe > the assignment target of SQL statements that assign to multiple scalar > variables, eg "SELECT ... INTO x,y,z". How hard would it be to have a RECORD that has pointers to those multiple scalar variables ? Referring again to my favorite ordinary programming language python, you can have a very elegant way of assigning a "record" (a tuple in pythonese) to a set of variables and vice versa >>> rec = 1,2,3 >>> rec (1, 2, 3) >>> a,b,c = rec >>> a 1 >>> c 3 >>> c,b,a (3, 2, 1) In other words, tuples are more or less automatically composed and decomposed on demand. I have not yet looked how hard the implementation of this would be for postgreSQL, but at least the concept should be applicable. ---------------- Hannu
Hannu Krosing <hannu@2ndQuadrant.com> writes: > Referring again to my favorite ordinary programming language python, you > can have a very elegant way of assigning a "record" (a tuple in > pythonese) to a set of variables and vice versa There are already perfectly good ways to do that in plpgsql, vizrowvar := row(x,y,z);select rowvar.* into x,y,z; regards, tom lane
Iirc the reason for this fuzziness came from the SQL spec definition of IS NULL for rows. As long as you maintain that level of spec- compliance I don't think there are any other important constraints on pg behaviour. greg --sorry for the top posting but the phone makes it hard to do anything else. On 27 Sep 2008, at 09:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I looked a bit at the bug report here: > http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php > > ISTM that the fundamental problem is that plpgsql doesn't distinguish > properly between a null row value (eg, "null::somerowtype") and a > row of null values (eg, "row(null,null,...)::somerowtype"). When that > code was designed, our main SQL engine was pretty fuzzy about the > difference too, but now there is a clear semantic distinction. > > For plpgsql's RECORD variables this doesn't seem hard to fix: just > take out the code in exec_move_row() that manufactures a row of nulls > when the input is null, and maybe make a few small adjustments > elsewhere. For ROW variables there's a bigger problem, because those > are represented by a list of per-field variables, which doesn't > immediately offer any way to represent overall nullness. I think it > could be dealt with by adding an explicit "the row as a whole is null" > flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm > not sure if there are gotchas or unreasonably large code changes > needed > to make it happen. > > I thought for a little bit about whether we couldn't get rid of ROW > variables entirely, or at least make them work more like RECORD > variables > by storing a HeapTuple instead of a list of per-field variables. But > I soon found out that the reason to have them is to be able to > describe > the assignment target of SQL statements that assign to multiple scalar > variables, eg "SELECT ... INTO x,y,z". > > Comments? > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, 2008-09-28 at 04:03 +0300, Greg Stark wrote: > Iirc the reason for this fuzziness came from the SQL spec definition > of IS NULL for rows. As long as you maintain that level of spec- > compliance I don't think there are any other important constraints on > pg behaviour. What does SQL spec say about recursive IS NULL for rows ? Should we check that IS NULL is true for each row element, or must they actually be NULL's ? hannu=# select row(null, null) is NULL;?column? ----------t (1 row) hannu=# select row(null, row(null, null)) is NULL;?column? ----------f (1 row) -------------- Hannu
Greg Stark <greg.stark@enterprisedb.com> writes: > On 27 Sep 2008, at 09:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ISTM that the fundamental problem is that plpgsql doesn't distinguish >> properly between a null row value (eg, "null::somerowtype") and a >> row of null values (eg, "row(null,null,...)::somerowtype"). When that >> code was designed, our main SQL engine was pretty fuzzy about the >> difference too, but now there is a clear semantic distinction. > Iirc the reason for this fuzziness came from the SQL spec definition > of IS NULL for rows. As long as you maintain that level of spec- > compliance I don't think there are any other important constraints on > pg behaviour. I started to poke into this and found out that it was a bit subtler than I thought. It'd be possible to associate a "rowisnull" state value with a row variable, but the problem is that plpgsql treats the row fields as independent variables that can be accessed without touching the row. In particular you can assign null or nonnull values to individual fields. So consider -- presumably, this'll set rowisnull to TRUE:rowvar := NULL;-- this had better cause rowisnull to become FALSE:rowvar.field1:= 42;-- does this cause it to become TRUE again?rowvar.field1 := NULL; There are a bunch of implementation problems with making any such behavior happen, since the row field variables don't currently "know" that they are members of a row, and indeed it's possible for the same variable to be a member of more than one row. But the core issue is that this interaction seems to fuzz the distinction between "row is null" and "all the row's elements are null". In particular, if you think that rowisnull should be TRUE after the above sequence, then I think you are saying they are the same thing. So maybe the spec authors are smarter than we are. Thoughts? What would a consistent behavior look like? regards, tom lane
2008/9/29 Tom Lane <tgl@sss.pgh.pa.us>: > Greg Stark <greg.stark@enterprisedb.com> writes: >> On 27 Sep 2008, at 09:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> ISTM that the fundamental problem is that plpgsql doesn't distinguish >>> properly between a null row value (eg, "null::somerowtype") and a >>> row of null values (eg, "row(null,null,...)::somerowtype"). When that >>> code was designed, our main SQL engine was pretty fuzzy about the >>> difference too, but now there is a clear semantic distinction. > >> Iirc the reason for this fuzziness came from the SQL spec definition >> of IS NULL for rows. As long as you maintain that level of spec- >> compliance I don't think there are any other important constraints on >> pg behaviour. > > I started to poke into this and found out that it was a bit subtler than > I thought. It'd be possible to associate a "rowisnull" state value > with a row variable, but the problem is that plpgsql treats the row > fields as independent variables that can be accessed without touching > the row. In particular you can assign null or nonnull values to > individual fields. So consider > > -- presumably, this'll set rowisnull to TRUE: > rowvar := NULL; > -- this had better cause rowisnull to become FALSE: > rowvar.field1 := 42; > -- does this cause it to become TRUE again? > rowvar.field1 := NULL; this sequence is wrong. in SQL rowvar has same behave as pointer. When you would to fill rowvar you should to call constructor first. some like rowvar := NULL; -- null value rowvar := constructor(null); rowvar := constructor(); rowvar.field = 42; regards Pavel Stehule > > There are a bunch of implementation problems with making any such > behavior happen, since the row field variables don't currently "know" > that they are members of a row, and indeed it's possible for the same > variable to be a member of more than one row. But the core issue is > that this interaction seems to fuzz the distinction between "row is > null" and "all the row's elements are null". In particular, if you > think that rowisnull should be TRUE after the above sequence, then > I think you are saying they are the same thing. So maybe the spec > authors are smarter than we are. > > Thoughts? What would a consistent behavior look like? > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
I assume this is a TODO, right? --------------------------------------------------------------------------- Tom Lane wrote: > I looked a bit at the bug report here: > http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php > > ISTM that the fundamental problem is that plpgsql doesn't distinguish > properly between a null row value (eg, "null::somerowtype") and a > row of null values (eg, "row(null,null,...)::somerowtype"). When that > code was designed, our main SQL engine was pretty fuzzy about the > difference too, but now there is a clear semantic distinction. > > For plpgsql's RECORD variables this doesn't seem hard to fix: just > take out the code in exec_move_row() that manufactures a row of nulls > when the input is null, and maybe make a few small adjustments > elsewhere. For ROW variables there's a bigger problem, because those > are represented by a list of per-field variables, which doesn't > immediately offer any way to represent overall nullness. I think it > could be dealt with by adding an explicit "the row as a whole is null" > flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm > not sure if there are gotchas or unreasonably large code changes needed > to make it happen. > > I thought for a little bit about whether we couldn't get rid of ROW > variables entirely, or at least make them work more like RECORD variables > by storing a HeapTuple instead of a list of per-field variables. But > I soon found out that the reason to have them is to be able to describe > the assignment target of SQL statements that assign to multiple scalar > variables, eg "SELECT ... INTO x,y,z". > > Comments? > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > I assume this is a TODO, right? Yah. regards, tom lane