Thread: Different results in a loop with RECORD vs ROWTYPE...

Different results in a loop with RECORD vs ROWTYPE...

From
Sean Chittenden
Date:
I'm looping through a single column from a set of results.  If I store
the result in a ROWTYPE of the table, I get NULL values back.  If I
store them in a RECORD type, I get the proper values and life is
happy.  I'm not sure why there'd be any difference, but it screams bug
to me and I haven't been able to reproduce it outside of my production
schema.

I haven't been able to bottle up a test sequence that'd allow others
to repeat this out in the wild, unfortunately, but simply changing
between s.t%ROWTYPE and RECORD fixes the problem... and it shouldn't
as far as I can tell.  Here is a munged version of the
environment/function I'm using:

CREATE TABLE s.t (
        a BIGINT NOT NULL,
        b BIGINT NOT NULL,
        c INT NOT NULL DEFAULT 1::INT,
        d BIGINT NOT NULL
);

CREATE FUNCTION s.t_ins()
        RETURNS TRIGGER
        EXTERNAL SECURITY DEFINER
        AS '
DECLARE
        r_t RECORD; -- s.t%ROWTYPE;
BEGIN
        SELECT CURRVAL(''s1.seq1'') INTO NEW.d;
        FOR r_t IN SELECT z.b FROM s.t z WHERE z.c =3D NEW.c LOOP
                RAISE INFO ''b: %, c: %'', r_t.b, NEW.c;
                PERFORM s.f(r_t.b,NEW.c);
        END LOOP;

        RETURN NEW;
END;
' LANGUAGE 'plpgsql';

CREATE TRIGGER t_ins BEFORE INSERT ON s.t FOR EACH ROW
        EXECUTE PROCEDURE s.t_ins();

I suppose the specifics don't matter, but, here's the difference in
output if I change between RECORD and ROWTYPE:

Using RECORD:

psql:file.sql:1429: INFO:  b: 1, c: 4
psql:file.sql:1429: INFO:  b: 2, c: 4
psql:file.sql:1429: INFO:  b: 3, c: 4
INSERT 2164848 1

And using ROWTYPE:

psql:file.sql:1429: INFO:  b: <NULL>, c: 4
psql:file.sql:1429: ERROR:  ExecInsert: Fail to add null value in not null =
attribute b
CONTEXT:  PL/pgSQL function f line 13 at SQL statement
PL/pgSQL function t_ins line 7 at unknown

???  The _only_ difference between the two is changing things from
RECORD to ROWTYPE.  Inbetween each test run, I'm doing a drop database
and recreating everything from scratch so the test should be clean.

Like I've said, I've tried bottling up a test case that someone else
can use to debug this, but I haven't been able to find anything
reproducible or anything in the code that'd suggest what the problem
is.  I'm hoping that someone who's familiar with pl/pgsql's guts can
look at this and go, "hrm...  It's a problem with assigning a single
value into a rowtype or SPI's memory handling is being overly
aggressive in cleaning this up because of ____," or something, but I
haven't been able to reproduce it with simpler test cases.  :(

This is using a snapshot from HEAD on 2003.05.08.  Here's the test
case I've been trying to use to reproduce this, for those interested
(this test proves nothing, only is a more complete version of what's
listed above that can be run verbatim).  Because the below test case
doesn't work, I'm worried that this is a memory issue and not related
to a specific action, which concerns me a great deal.


\c template1
DROP DATABASE test;
CREATE DATABASE test;
\c test pgsql
CREATE FUNCTION public.plpgsql_call_handler () RETURNS language_handler
        AS '$libdir/plpgsql', 'plpgsql_call_handler'
        LANGUAGE c;

CREATE TRUSTED PROCEDURAL LANGUAGE plpgsql HANDLER public.plpgsql_call_hand=
ler;
\c test
CREATE SCHEMA s;
CREATE TABLE s.t (i INT, j INT);
CREATE SEQUENCE s.seq;
INSERT INTO s.t (i) VALUES (1);
INSERT INTO s.t (i) VALUES (2);
INSERT INTO s.t (i) VALUES (3);
CREATE OR REPLACE FUNCTION s.t_ins()
       RETURNS TRIGGER
       EXTERNAL SECURITY DEFINER
       AS '
DECLARE
    r_t s.t%ROWTYPE; -- RECORD;
BEGIN
    SELECT NEXTVAL(''s.seq'') INTO NEW.j;
    FOR r_t IN SELECT a.i FROM s.t a WHERE a.i IS NOT NULL LOOP
        RAISE INFO ''i: %\tj: %'', r_t.i, NEW.j;
        END LOOP;

        RETURN NEW;
END;
' LANGUAGE 'plpgsql';
CREATE TRIGGER t_ins BEFORE INSERT ON s.t FOR EACH ROW EXECUTE PROCEDURE s.=
t_ins();
INSERT INTO s.t (i) VALUES (4);
INSERT INTO s.t (i) VALUES (5);
SELECT * FROM s.t;



Any hints are invaluable.  -sc

--=20
Sean Chittenden

Re: Different results in a loop with RECORD vs ROWTYPE...

From
Sean Chittenden
Date:
> I'm looping through a single column from a set of results.  If I
> store the result in a ROWTYPE of the table, I get NULL values back.
> If I store them in a RECORD type, I get the proper values and life
> is happy.  I'm not sure why there'd be any difference, but it
> screams bug to me and I haven't been able to reproduce it outside of
> my production schema.

Hrm, sorry for replying to myself, but moving the rest of uses of
ROWTYPE to RECORD has solved what I originally thought was a logic
problem with recursive pl/pgsql calls, but was a problem stemming from
using ROWTYPE instead of RECORD.  Using ROWTYPE, I wasn't getting
results back that I was expecting, using RECORD, I was getting the
expected rows back so that I could operate on them in a loop.  ROWTYPE
seems to be broken somehow, somewhere, though I'm not sure in what way
or where.  Apologies in advance for the lack of more details... I'm
not sure where to look to get more details in this instance.

My best guess is there's a bug in exec_assign_value in that it doesn't
handle assigning to PLPGSQL_DTYPE_ROW... but I'm not seeing anything
in the error logs and there aren't any errors being thrown.  So,
failing that, maybe something in exec_move_row or exec_stmt_fors in
pl_exec.c, but I can't see anything that really jumps out other than
that.

Other ideas?  -sc

--=20
Sean Chittenden

Re: Different results in a loop with RECORD vs ROWTYPE...

From
Tom Lane
Date:
Sean Chittenden <sean@chittenden.org> writes:
> My best guess is there's a bug in exec_assign_value in that it doesn't
> handle assigning to PLPGSQL_DTYPE_ROW... but I'm not seeing anything
> in the error logs and there aren't any errors being thrown.

Can you give us a test case now that you understand the issue better?

            regards, tom lane

Re: Different results in a loop with RECORD vs ROWTYPE...

From
Sean Chittenden
Date:
> > My best guess is there's a bug in exec_assign_value in that it
> > doesn't handle assigning to PLPGSQL_DTYPE_ROW... but I'm not
> > seeing anything in the error logs and there aren't any errors
> > being thrown.
>
> Can you give us a test case now that you understand the issue
> better?

Sweet!  Yes!  :-] Only took me forever and a day to get it down small
enough without cutting too much.  ::sigh:: What a relief.  Alright, my
best guess is that this is a memory problem.  If you remove column w
from the table s.c, everything works with ROWTYPE.  Why that would
affect this, I have no clue, but it does.  Use a RECORD type, and
you're good to go.  ::shrug:: Anyway, I've attached the test case.
Change to type RECORD if you'd like to verify that things work.
Please feel free to ask questions.  -sc







#### ORIGINAL EMAIL before test case ####

Unfortunately no.  I'm working on trimming away at this package's
schema to try and get to the lowest common denominator... and haven't
been able to come up with anything conclusive.  I got it down to less
than 200 lines, then when I went to trim a little more, I went too far
in my cutting and the backend did the right thing.  What's worse, is I
cut an adjacent table that had nothing to do with the table/procedure
in question.  Anyway, when I went to use my ~ backup file, the backup
file was nearly from the start back at nearly 30K lines.  ::sigh::
Working from the ground up wasn't yielding any useful results either.
:( I'm going to take another shot at it later and manually make
backups, but I've blown my whole day trying to track this down one way
or another and need to finish up some other things 1st.

I fired up gdb though and have traced through the execution of the
backend and have been able to generate two different execution paths
for a RECORD and ROWTYPE.  I went through and read the two paths side
by side, but I don't understand enough of the backend to be able to
use the information in the structures effectively.  My biggest problem
is I don't know what data/variables I should be watching out for and
what's superfluous.  In my simplified test case, things work
correctly.  When I use my regression tests + ROWTYPE, things bomb
miserably.  If I use RECORD and I'm just fine.

Tom, maybe you have some advice with regards to what I can look for.
The select statement is returning the right data, that much I can
verify with gdb.  What I'm lost on is figuring out where the value for
the datum is stored.  I'm wondering if this isn't a BIGINT issue, but
like I said, I haven't been able to reproduce this problem with a
BIGINT data type in my tables.  Because of that inability to reproduce
this on a smaller scale, I'm worried it's a memory management problem.
Internally, the attype is correctly listed as 20 in all of the
structures, from what I've observed... but somewhere along the way,
the datum is being returned as NULL in the case of a ROWTYPE and not a
RECORD.

Any other ideas?  -sc

--
Sean Chittenden

Attachment

Re: Different results in a loop with RECORD vs ROWTYPE...

From
Tom Lane
Date:
Sean Chittenden <sean@chittenden.org> writes:
> CREATE TABLE s.c (
>     x BIGINT NOT NULL,
>     y BIGINT NOT NULL,
>     w INT NOT NULL DEFAULT 1::INT
> );

> DECLARE
>     r_c s.c%ROWTYPE; -- RECORD;
> BEGIN
>     FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP
>         PERFORM s.add_y_to_x(r_c.y,NEW.z);

It seems to me that the rowtype of this SELECT's result is (y bigint).
When you declare r_c as RECORD, it adopts that rowtype, and so the
reference to r_c.y in the PERFORM delivers the value you want.  But
when you declare r_c as s.c%ROWTYPE, that is (x bigint, y bigint, w int),
the result of the SELECT's first column is delivered into r_c.x and then
the other two columns are set to null.  So r_c.y is null in the PERFORM.

I think this is basically pilot error, though one could certainly argue
that the system ought to be complaining that the SELECT didn't deliver
enough columns to fill the rowtype variable.  Any thoughts?  Can anyone
report what Oracle's pl/sql does in comparable situations?

            regards, tom lane

Re: Different results in a loop with RECORD vs ROWTYPE...

From
Sean Chittenden
Date:
> > CREATE TABLE s.c (
> >     x BIGINT NOT NULL,
> >     y BIGINT NOT NULL,
> >     w INT NOT NULL DEFAULT 1::INT
> > );
>
> > DECLARE
> >     r_c s.c%ROWTYPE; -- RECORD;
> > BEGIN
> >     FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP
> >         PERFORM s.add_y_to_x(r_c.y,NEW.z);
>
> It seems to me that the rowtype of this SELECT's result is (y bigint).
> When you declare r_c as RECORD, it adopts that rowtype, and so the
> reference to r_c.y in the PERFORM delivers the value you want.  But
> when you declare r_c as s.c%ROWTYPE, that is (x bigint, y bigint, w int),
> the result of the SELECT's first column is delivered into r_c.x and then
> the other two columns are set to null.  So r_c.y is null in the PERFORM.
>
> I think this is basically pilot error, though one could certainly argue
> that the system ought to be complaining that the SELECT didn't deliver
> enough columns to fill the rowtype variable.  Any thoughts?

Oooh, if indeed that is the way that things are implemented, then yes,
that is pilot error.  I should submit some doco to that effect because
that would have been most useful to know upfront.

I was under the impression that a ROWTYPE was basically akin to a C
structure that represented a ROW from the specified table.  Each
column was a pointer to the datum returned by the SELECT.  Therefore,
if r_c is defined as s.c%ROWTYPE, then r_c.x, r_c.y, and r_c.w would
all be initialized to NULLs until the FOR r_c IN SELECT populated the
values of the r_c structure, with r_c.y mapping to d.y.  Granted the
mapping would break down instantly if the SELECT was rewritten as:

    FOR r_c IN SELECT d.y AS x...

but I'd think that'd be a powerful feature that could be easily
abused, but very useful if indeed ROWTYPEs were just pointers to the
returned datums... instead, datums are copied, something I was not
wild to discover.  I thought everything was done by reference in
pl/pgsql.

Are there any pl/pgsql -> C converters?

-sc

--
Sean Chittenden

Re: Different results in a loop with RECORD vs ROWTYPE...

From
Tom Lane
Date:
Sean Chittenden <sean@chittenden.org> writes:
> CREATE TABLE s.c (
> x BIGINT NOT NULL,
> y BIGINT NOT NULL,
> w INT NOT NULL DEFAULT 1::INT
> );
>>
> DECLARE
> r_c s.c%ROWTYPE; -- RECORD;
> BEGIN
> FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP
> PERFORM s.add_y_to_x(r_c.y,NEW.z);

> I was under the impression that a ROWTYPE was basically akin to a C
> structure that represented a ROW from the specified table.

Indeed, but your SELECT doesn't deliver a ROW from the specified table.
It only delivers one column.  If you'd said "SELECT * FROM s.c" then
things would have worked as you expect.  But in the above command, the
column matching is positional, and so it's r_c.x not r_c.y that gets
loaded with the sole column supplied by the SELECT.

I don't think that the choice of positional matching is wrong, and in
any case we couldn't change it without breaking a lot of existing
plpgsql code.  Arguably it should be an error to supply the wrong number
of columns to fill a rowtype result variable, though.

            regards, tom lane

Re: Different results in a loop with RECORD vs ROWTYPE...

From
Josh Berkus
Date:
Tom,

I believe that we should raise an exception if the SELECT statement does not
match the ROWTYPE, though of course we'd have to provide a
backward-compatible GUC in case anyone is counting on the current behavior.

Personally, since %ROWTYPE has no efficiency gain in PL/pgSQL over RECORD, I
seldom use it, though I use RECORD about 200 times per project.

If you're still interested, I will consult my PL/SQL bible this afternoon to
see what Oracle 8i does in this case.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Different results in a loop with RECORD vs ROWTYPE...

From
Sean Chittenden
Date:
> > CREATE TABLE s.c (
> > x BIGINT NOT NULL,
> > y BIGINT NOT NULL,
> > w INT NOT NULL DEFAULT 1::INT
> > );
> >>
> > DECLARE
> > r_c s.c%ROWTYPE; -- RECORD;
> > BEGIN
> > FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP
> > PERFORM s.add_y_to_x(r_c.y,NEW.z);
>
> > I was under the impression that a ROWTYPE was basically akin to a C
> > structure that represented a ROW from the specified table.
>
> Indeed, but your SELECT doesn't deliver a ROW from the specified
> table.  It only delivers one column.  If you'd said "SELECT * FROM
> s.c" then things would have worked as you expect.  But in the above
> command, the column matching is positional, and so it's r_c.x not
> r_c.y that gets loaded with the sole column supplied by the SELECT.
>
> I don't think that the choice of positional matching is wrong, and
> in any case we couldn't change it without breaking a lot of existing
> plpgsql code.  Arguably it should be an error to supply the wrong
> number of columns to fill a rowtype result variable, though.

:-/ I would argue differently for the below points, but it's kinda
mute given RECORD does the job.

*) Explicitly using ROWTYPE is like using a statically declared
 language.  In the event that the schema changes, the stored
 procedure would break and with good cause.  Using a RECORD type
 obviates this possibility and things may silently continue to work
 with differing results.

*) SELECT * into a ROWTYPE is an expensive practice and I would argue
 horribly inefficient in that an entire row has been lifted off of
 disk, copied from kernel to user space, and then gets copied into the
 resulting ROWTYPE.  Minimizing this I would think would be of great
 interest to DBAs where memory and disk cycles are precious.  If
 SELECT'ing a single row, or omitting specific columns from a row
 throws off the placement of data into ROWTYPEs because it
 sequentially places data into the ROWTYPE var, the logic of placing
 data into ROWTYPEs is incorrect and that placement of data into a
 ROWTYPE should be done by matching the name of the resulting column
 from the SELECT into the corresponding name of the element in the
 ROWTYPE.  The name of an element in a ROWTYPE and table is guaranteed
 to be unique by their very nature.

Anyway, my thoughts in the event that pl/pgsql gets its famed overhaul
on the script side.

*shrugs* Given the large amount of pl/pgsql code that I've got running
around, I'm slowly (2-3mo completion time it's looking like) working
on have pl/pgsql scripts compiled to C and then .so's automatically
loaded from their sequentially numbered names stored in the
data/plpgsql.  Recompiling the .so between version dumps isn't an
issue because the whole db has to be reimported.  If anyone's got any
advice on the topic, I'm all ears, but that's the direction I'm
working toward to help solve some of the inefficiencies in pl/pgsql.
I've nabbed the gram.y and scan.l files from plpgsql so scripts should
be 100% compatible.  I figure every BSD and likely every linux server
box likely has a compiler on board, same with Slowaris if they're
running postgresql and are looking for speed.  But, if they don't,
then they can use plpgsql instead of plpgsqlc.  FWIW, compiler flags
and variables are stored in system catalogs so those can be changed
(Tom, any preferences on a system catalog name for compiler bits, or
should I convert things to abuse the GUC infrastructure).  I'm
unconvinced that there is a need for a way to recompile a function
other than DROP'ing it and re-CREATE'ing it.  Any need for an ALTER
FUNCTION s.f() RECOMPILE?  Since .so's are mmpap()'ed on almost every
system, this is also of memory interest to folks, never mind the
speed.

-sc

--
Sean Chittenden

Re: Different results in a loop with RECORD vs ROWTYPE...

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I believe that we should raise an exception if the SELECT statement does not
> match the ROWTYPE, though of course we'd have to provide a
> backward-compatible GUC in case anyone is counting on the current behavior.

Digging in the code, I see this comment in exec_move_row:

     * NOTE: this code used to demand row->nfields == tup->t_data->t_natts,
     * but that's wrong.  The tuple might have more fields than we
     * expected if it's from an inheritance-child table of the current
     * table, or it might have fewer if the table has had columns added by
     * ALTER TABLE. Ignore extra columns and assume NULL for missing
     * columns, the same as heap_getattr would do.

So blindly restoring the column-count check will break things.  I think
that the above considerations only apply to one of the call sites of
exec_move_row, so we could make the other ones apply the check, but
clearly it's a little more ticklish than one would think.

> If you're still interested, I will consult my PL/SQL bible this afternoon to
> see what Oracle 8i does in this case.

Since plpgsql is generally supposed to be a slavish imitation of Oracle,
it would be good to know what they do...

            regards, tom lane

Re: Different results in a loop with RECORD vs ROWTYPE...

From
Josh Berkus
Date:
Tom,

> > If you're still interested, I will consult my PL/SQL bible this afterno=
on=20
to=20
> > see what Oracle 8i does in this case.
>=20
> Since plpgsql is generally supposed to be a slavish imitation of Oracle,
> it would be good to know what they do...

Unfortunately, PL/PQL Programming, 2nd Edition, does not make it clear what=
=20
happens if you select the wrong columns for a given %ROWTYPE.  Like PL/pgSQ=
L,=20
PL/SQL Programming encourages the use of RECORDs rather than %ROWTYPE.

I'll post to the SQL list to see if any of the Oracle hackers there knows.

--=20
-Josh Berkus
 Aglio Database Solutions
 San Francisco