Thread: Different results in a loop with RECORD vs ROWTYPE...
I'm looping through a single column from a set of results. If I store the result in a ROWTYPE of the table, I get NULL values back. If I store them in a RECORD type, I get the proper values and life is happy. I'm not sure why there'd be any difference, but it screams bug to me and I haven't been able to reproduce it outside of my production schema. I haven't been able to bottle up a test sequence that'd allow others to repeat this out in the wild, unfortunately, but simply changing between s.t%ROWTYPE and RECORD fixes the problem... and it shouldn't as far as I can tell. Here is a munged version of the environment/function I'm using: CREATE TABLE s.t ( a BIGINT NOT NULL, b BIGINT NOT NULL, c INT NOT NULL DEFAULT 1::INT, d BIGINT NOT NULL ); CREATE FUNCTION s.t_ins() RETURNS TRIGGER EXTERNAL SECURITY DEFINER AS ' DECLARE r_t RECORD; -- s.t%ROWTYPE; BEGIN SELECT CURRVAL(''s1.seq1'') INTO NEW.d; FOR r_t IN SELECT z.b FROM s.t z WHERE z.c =3D NEW.c LOOP RAISE INFO ''b: %, c: %'', r_t.b, NEW.c; PERFORM s.f(r_t.b,NEW.c); END LOOP; RETURN NEW; END; ' LANGUAGE 'plpgsql'; CREATE TRIGGER t_ins BEFORE INSERT ON s.t FOR EACH ROW EXECUTE PROCEDURE s.t_ins(); I suppose the specifics don't matter, but, here's the difference in output if I change between RECORD and ROWTYPE: Using RECORD: psql:file.sql:1429: INFO: b: 1, c: 4 psql:file.sql:1429: INFO: b: 2, c: 4 psql:file.sql:1429: INFO: b: 3, c: 4 INSERT 2164848 1 And using ROWTYPE: psql:file.sql:1429: INFO: b: <NULL>, c: 4 psql:file.sql:1429: ERROR: ExecInsert: Fail to add null value in not null = attribute b CONTEXT: PL/pgSQL function f line 13 at SQL statement PL/pgSQL function t_ins line 7 at unknown ??? The _only_ difference between the two is changing things from RECORD to ROWTYPE. Inbetween each test run, I'm doing a drop database and recreating everything from scratch so the test should be clean. Like I've said, I've tried bottling up a test case that someone else can use to debug this, but I haven't been able to find anything reproducible or anything in the code that'd suggest what the problem is. I'm hoping that someone who's familiar with pl/pgsql's guts can look at this and go, "hrm... It's a problem with assigning a single value into a rowtype or SPI's memory handling is being overly aggressive in cleaning this up because of ____," or something, but I haven't been able to reproduce it with simpler test cases. :( This is using a snapshot from HEAD on 2003.05.08. Here's the test case I've been trying to use to reproduce this, for those interested (this test proves nothing, only is a more complete version of what's listed above that can be run verbatim). Because the below test case doesn't work, I'm worried that this is a memory issue and not related to a specific action, which concerns me a great deal. \c template1 DROP DATABASE test; CREATE DATABASE test; \c test pgsql CREATE FUNCTION public.plpgsql_call_handler () RETURNS language_handler AS '$libdir/plpgsql', 'plpgsql_call_handler' LANGUAGE c; CREATE TRUSTED PROCEDURAL LANGUAGE plpgsql HANDLER public.plpgsql_call_hand= ler; \c test CREATE SCHEMA s; CREATE TABLE s.t (i INT, j INT); CREATE SEQUENCE s.seq; INSERT INTO s.t (i) VALUES (1); INSERT INTO s.t (i) VALUES (2); INSERT INTO s.t (i) VALUES (3); CREATE OR REPLACE FUNCTION s.t_ins() RETURNS TRIGGER EXTERNAL SECURITY DEFINER AS ' DECLARE r_t s.t%ROWTYPE; -- RECORD; BEGIN SELECT NEXTVAL(''s.seq'') INTO NEW.j; FOR r_t IN SELECT a.i FROM s.t a WHERE a.i IS NOT NULL LOOP RAISE INFO ''i: %\tj: %'', r_t.i, NEW.j; END LOOP; RETURN NEW; END; ' LANGUAGE 'plpgsql'; CREATE TRIGGER t_ins BEFORE INSERT ON s.t FOR EACH ROW EXECUTE PROCEDURE s.= t_ins(); INSERT INTO s.t (i) VALUES (4); INSERT INTO s.t (i) VALUES (5); SELECT * FROM s.t; Any hints are invaluable. -sc --=20 Sean Chittenden
> I'm looping through a single column from a set of results. If I > store the result in a ROWTYPE of the table, I get NULL values back. > If I store them in a RECORD type, I get the proper values and life > is happy. I'm not sure why there'd be any difference, but it > screams bug to me and I haven't been able to reproduce it outside of > my production schema. Hrm, sorry for replying to myself, but moving the rest of uses of ROWTYPE to RECORD has solved what I originally thought was a logic problem with recursive pl/pgsql calls, but was a problem stemming from using ROWTYPE instead of RECORD. Using ROWTYPE, I wasn't getting results back that I was expecting, using RECORD, I was getting the expected rows back so that I could operate on them in a loop. ROWTYPE seems to be broken somehow, somewhere, though I'm not sure in what way or where. Apologies in advance for the lack of more details... I'm not sure where to look to get more details in this instance. My best guess is there's a bug in exec_assign_value in that it doesn't handle assigning to PLPGSQL_DTYPE_ROW... but I'm not seeing anything in the error logs and there aren't any errors being thrown. So, failing that, maybe something in exec_move_row or exec_stmt_fors in pl_exec.c, but I can't see anything that really jumps out other than that. Other ideas? -sc --=20 Sean Chittenden
Sean Chittenden <sean@chittenden.org> writes: > My best guess is there's a bug in exec_assign_value in that it doesn't > handle assigning to PLPGSQL_DTYPE_ROW... but I'm not seeing anything > in the error logs and there aren't any errors being thrown. Can you give us a test case now that you understand the issue better? regards, tom lane
> > My best guess is there's a bug in exec_assign_value in that it > > doesn't handle assigning to PLPGSQL_DTYPE_ROW... but I'm not > > seeing anything in the error logs and there aren't any errors > > being thrown. > > Can you give us a test case now that you understand the issue > better? Sweet! Yes! :-] Only took me forever and a day to get it down small enough without cutting too much. ::sigh:: What a relief. Alright, my best guess is that this is a memory problem. If you remove column w from the table s.c, everything works with ROWTYPE. Why that would affect this, I have no clue, but it does. Use a RECORD type, and you're good to go. ::shrug:: Anyway, I've attached the test case. Change to type RECORD if you'd like to verify that things work. Please feel free to ask questions. -sc #### ORIGINAL EMAIL before test case #### Unfortunately no. I'm working on trimming away at this package's schema to try and get to the lowest common denominator... and haven't been able to come up with anything conclusive. I got it down to less than 200 lines, then when I went to trim a little more, I went too far in my cutting and the backend did the right thing. What's worse, is I cut an adjacent table that had nothing to do with the table/procedure in question. Anyway, when I went to use my ~ backup file, the backup file was nearly from the start back at nearly 30K lines. ::sigh:: Working from the ground up wasn't yielding any useful results either. :( I'm going to take another shot at it later and manually make backups, but I've blown my whole day trying to track this down one way or another and need to finish up some other things 1st. I fired up gdb though and have traced through the execution of the backend and have been able to generate two different execution paths for a RECORD and ROWTYPE. I went through and read the two paths side by side, but I don't understand enough of the backend to be able to use the information in the structures effectively. My biggest problem is I don't know what data/variables I should be watching out for and what's superfluous. In my simplified test case, things work correctly. When I use my regression tests + ROWTYPE, things bomb miserably. If I use RECORD and I'm just fine. Tom, maybe you have some advice with regards to what I can look for. The select statement is returning the right data, that much I can verify with gdb. What I'm lost on is figuring out where the value for the datum is stored. I'm wondering if this isn't a BIGINT issue, but like I said, I haven't been able to reproduce this problem with a BIGINT data type in my tables. Because of that inability to reproduce this on a smaller scale, I'm worried it's a memory management problem. Internally, the attype is correctly listed as 20 in all of the structures, from what I've observed... but somewhere along the way, the datum is being returned as NULL in the case of a ROWTYPE and not a RECORD. Any other ideas? -sc -- Sean Chittenden
Attachment
Sean Chittenden <sean@chittenden.org> writes: > CREATE TABLE s.c ( > x BIGINT NOT NULL, > y BIGINT NOT NULL, > w INT NOT NULL DEFAULT 1::INT > ); > DECLARE > r_c s.c%ROWTYPE; -- RECORD; > BEGIN > FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP > PERFORM s.add_y_to_x(r_c.y,NEW.z); It seems to me that the rowtype of this SELECT's result is (y bigint). When you declare r_c as RECORD, it adopts that rowtype, and so the reference to r_c.y in the PERFORM delivers the value you want. But when you declare r_c as s.c%ROWTYPE, that is (x bigint, y bigint, w int), the result of the SELECT's first column is delivered into r_c.x and then the other two columns are set to null. So r_c.y is null in the PERFORM. I think this is basically pilot error, though one could certainly argue that the system ought to be complaining that the SELECT didn't deliver enough columns to fill the rowtype variable. Any thoughts? Can anyone report what Oracle's pl/sql does in comparable situations? regards, tom lane
> > CREATE TABLE s.c ( > > x BIGINT NOT NULL, > > y BIGINT NOT NULL, > > w INT NOT NULL DEFAULT 1::INT > > ); > > > DECLARE > > r_c s.c%ROWTYPE; -- RECORD; > > BEGIN > > FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP > > PERFORM s.add_y_to_x(r_c.y,NEW.z); > > It seems to me that the rowtype of this SELECT's result is (y bigint). > When you declare r_c as RECORD, it adopts that rowtype, and so the > reference to r_c.y in the PERFORM delivers the value you want. But > when you declare r_c as s.c%ROWTYPE, that is (x bigint, y bigint, w int), > the result of the SELECT's first column is delivered into r_c.x and then > the other two columns are set to null. So r_c.y is null in the PERFORM. > > I think this is basically pilot error, though one could certainly argue > that the system ought to be complaining that the SELECT didn't deliver > enough columns to fill the rowtype variable. Any thoughts? Oooh, if indeed that is the way that things are implemented, then yes, that is pilot error. I should submit some doco to that effect because that would have been most useful to know upfront. I was under the impression that a ROWTYPE was basically akin to a C structure that represented a ROW from the specified table. Each column was a pointer to the datum returned by the SELECT. Therefore, if r_c is defined as s.c%ROWTYPE, then r_c.x, r_c.y, and r_c.w would all be initialized to NULLs until the FOR r_c IN SELECT populated the values of the r_c structure, with r_c.y mapping to d.y. Granted the mapping would break down instantly if the SELECT was rewritten as: FOR r_c IN SELECT d.y AS x... but I'd think that'd be a powerful feature that could be easily abused, but very useful if indeed ROWTYPEs were just pointers to the returned datums... instead, datums are copied, something I was not wild to discover. I thought everything was done by reference in pl/pgsql. Are there any pl/pgsql -> C converters? -sc -- Sean Chittenden
Sean Chittenden <sean@chittenden.org> writes: > CREATE TABLE s.c ( > x BIGINT NOT NULL, > y BIGINT NOT NULL, > w INT NOT NULL DEFAULT 1::INT > ); >> > DECLARE > r_c s.c%ROWTYPE; -- RECORD; > BEGIN > FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP > PERFORM s.add_y_to_x(r_c.y,NEW.z); > I was under the impression that a ROWTYPE was basically akin to a C > structure that represented a ROW from the specified table. Indeed, but your SELECT doesn't deliver a ROW from the specified table. It only delivers one column. If you'd said "SELECT * FROM s.c" then things would have worked as you expect. But in the above command, the column matching is positional, and so it's r_c.x not r_c.y that gets loaded with the sole column supplied by the SELECT. I don't think that the choice of positional matching is wrong, and in any case we couldn't change it without breaking a lot of existing plpgsql code. Arguably it should be an error to supply the wrong number of columns to fill a rowtype result variable, though. regards, tom lane
Tom, I believe that we should raise an exception if the SELECT statement does not match the ROWTYPE, though of course we'd have to provide a backward-compatible GUC in case anyone is counting on the current behavior. Personally, since %ROWTYPE has no efficiency gain in PL/pgSQL over RECORD, I seldom use it, though I use RECORD about 200 times per project. If you're still interested, I will consult my PL/SQL bible this afternoon to see what Oracle 8i does in this case. -- Josh Berkus Aglio Database Solutions San Francisco
> > CREATE TABLE s.c ( > > x BIGINT NOT NULL, > > y BIGINT NOT NULL, > > w INT NOT NULL DEFAULT 1::INT > > ); > >> > > DECLARE > > r_c s.c%ROWTYPE; -- RECORD; > > BEGIN > > FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP > > PERFORM s.add_y_to_x(r_c.y,NEW.z); > > > I was under the impression that a ROWTYPE was basically akin to a C > > structure that represented a ROW from the specified table. > > Indeed, but your SELECT doesn't deliver a ROW from the specified > table. It only delivers one column. If you'd said "SELECT * FROM > s.c" then things would have worked as you expect. But in the above > command, the column matching is positional, and so it's r_c.x not > r_c.y that gets loaded with the sole column supplied by the SELECT. > > I don't think that the choice of positional matching is wrong, and > in any case we couldn't change it without breaking a lot of existing > plpgsql code. Arguably it should be an error to supply the wrong > number of columns to fill a rowtype result variable, though. :-/ I would argue differently for the below points, but it's kinda mute given RECORD does the job. *) Explicitly using ROWTYPE is like using a statically declared language. In the event that the schema changes, the stored procedure would break and with good cause. Using a RECORD type obviates this possibility and things may silently continue to work with differing results. *) SELECT * into a ROWTYPE is an expensive practice and I would argue horribly inefficient in that an entire row has been lifted off of disk, copied from kernel to user space, and then gets copied into the resulting ROWTYPE. Minimizing this I would think would be of great interest to DBAs where memory and disk cycles are precious. If SELECT'ing a single row, or omitting specific columns from a row throws off the placement of data into ROWTYPEs because it sequentially places data into the ROWTYPE var, the logic of placing data into ROWTYPEs is incorrect and that placement of data into a ROWTYPE should be done by matching the name of the resulting column from the SELECT into the corresponding name of the element in the ROWTYPE. The name of an element in a ROWTYPE and table is guaranteed to be unique by their very nature. Anyway, my thoughts in the event that pl/pgsql gets its famed overhaul on the script side. *shrugs* Given the large amount of pl/pgsql code that I've got running around, I'm slowly (2-3mo completion time it's looking like) working on have pl/pgsql scripts compiled to C and then .so's automatically loaded from their sequentially numbered names stored in the data/plpgsql. Recompiling the .so between version dumps isn't an issue because the whole db has to be reimported. If anyone's got any advice on the topic, I'm all ears, but that's the direction I'm working toward to help solve some of the inefficiencies in pl/pgsql. I've nabbed the gram.y and scan.l files from plpgsql so scripts should be 100% compatible. I figure every BSD and likely every linux server box likely has a compiler on board, same with Slowaris if they're running postgresql and are looking for speed. But, if they don't, then they can use plpgsql instead of plpgsqlc. FWIW, compiler flags and variables are stored in system catalogs so those can be changed (Tom, any preferences on a system catalog name for compiler bits, or should I convert things to abuse the GUC infrastructure). I'm unconvinced that there is a need for a way to recompile a function other than DROP'ing it and re-CREATE'ing it. Any need for an ALTER FUNCTION s.f() RECOMPILE? Since .so's are mmpap()'ed on almost every system, this is also of memory interest to folks, never mind the speed. -sc -- Sean Chittenden
Josh Berkus <josh@agliodbs.com> writes: > I believe that we should raise an exception if the SELECT statement does not > match the ROWTYPE, though of course we'd have to provide a > backward-compatible GUC in case anyone is counting on the current behavior. Digging in the code, I see this comment in exec_move_row: * NOTE: this code used to demand row->nfields == tup->t_data->t_natts, * but that's wrong. The tuple might have more fields than we * expected if it's from an inheritance-child table of the current * table, or it might have fewer if the table has had columns added by * ALTER TABLE. Ignore extra columns and assume NULL for missing * columns, the same as heap_getattr would do. So blindly restoring the column-count check will break things. I think that the above considerations only apply to one of the call sites of exec_move_row, so we could make the other ones apply the check, but clearly it's a little more ticklish than one would think. > If you're still interested, I will consult my PL/SQL bible this afternoon to > see what Oracle 8i does in this case. Since plpgsql is generally supposed to be a slavish imitation of Oracle, it would be good to know what they do... regards, tom lane
Tom, > > If you're still interested, I will consult my PL/SQL bible this afterno= on=20 to=20 > > see what Oracle 8i does in this case. >=20 > Since plpgsql is generally supposed to be a slavish imitation of Oracle, > it would be good to know what they do... Unfortunately, PL/PQL Programming, 2nd Edition, does not make it clear what= =20 happens if you select the wrong columns for a given %ROWTYPE. Like PL/pgSQ= L,=20 PL/SQL Programming encourages the use of RECORDs rather than %ROWTYPE. I'll post to the SQL list to see if any of the Oracle hackers there knows. --=20 -Josh Berkus Aglio Database Solutions San Francisco