Thread: Struggling with set-returning functions, seeking advice

Struggling with set-returning functions, seeking advice

From
Jason Topaz
Date:
Greetings.  I've successfully gotten some code to work, but wondering
whether anybody can offer advice on a better way to do this.  I've
read everything I can think of, including the recent paper on
PostgreSQL 7.3 set-returning functions.  Also, can anybody comment
whether I have found a PostgreSQL bug?  (though I highly doubt
it... I'm a newbie!).  Here's the story...

I have data in a table:

db=> SELECT * FROM ranges;
 id | startindex | endindex | otherdata
----+------------+----------+-----------
  1 |          5 |        9 |        42
  2 |          4 |        6 |        93
(2 rows)


I'm trying to create a view that transforms this table into multiple
rows
per source row - one for each "index" between startindex and last
index.  It also needs to do calculations based on the index and
otherdata.  (For sample purposes, I am just saying the calculation is
index+otherdata).  So the view I would like to see is:

db=> SELECT * FROM ranges_setview;
 id | index | calc
----+-------+------
  1 |     5 |   47
  1 |     6 |   48
  1 |     7 |   49
  1 |     8 |   50
  1 |     9 |   51
  2 |     4 |   97
  2 |     5 |   98
  2 |     6 |   99
(8 rows)


I did get this to work with the enclosed code using a set-returning
function in plpgsql.  But I have quite a few questions. Can anybody help
out?

1) Note that my first attempt (view "ranges_setview_broken", referring
   to a pgplsql set function "make_rows") fails.  But when I make a
   second function with identical signature, but in language 'sql'
   (it's just a passthrough to my original pgplsql function), suddenly
   the server error goes away.  This seems strange to me.  Is there
   a reason my pgplsql function should behave differently than the
   sql function?  Or is this a PostgreSQL bug?  version() is:

     PostgreSQL 7.3.2 on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gcc (GCC) 3.2.2 20030213 (Red Hat Linux 8.0 3.2.2-1)

2) Any advice on efficiency?  When using the view I create, my
   function make_rows gets called twice - with the same parameters -
   for every source row in table "ranges", since I extract two fields
   from the returned data.  Confirmed this with "RAISE INFO".  It
   seems like there should be a way to call make_rows only once per
   source row, but I can't figure out how to organize the query to
   do this.

   I would love for the make_rows() call to appear in the FROM clause,
   but each time I try, PostgreSQL seems to be unhappy with the fact
   that the input parameters need to come from another target table.
   Hmm... any way to rig this up with a correlated subquery?


3) Any recommendations for a completely different way to approach this
   table transformation?  Incidentally, I have simplified a lot for
   purposes of this posting, but the real application involves
transforming
   time-series data (stored in an array field) into a expanded table
   where there is one row per original element in the array.

4) The approach I'm using - putting set-value functions as target
   columns of the select statement - is described in the manual as
   deprecated and likely to go away.  What other way can I accomplish
   my task?  (maybe somebody's clever answer to quetion #2 or #3 will
   solve this problem anyway).


CREATE TABLE ranges (
    id INTEGER PRIMARY KEY,
    startindex INTEGER NOT NULL,
    endindex INTEGER NOT NULL,
    otherdata INTEGER NOT NULL
);

INSERT INTO ranges VALUES(1, 5, 9, 42);
INSERT INTO ranges VALUES(2, 4, 6, 93);


CREATE TYPE myrecord_type AS (
    index INTEGER,
    calc INTEGER
);

-- Generate a table with rows having 'index' numbered
--  from $1 to $2, and 'calc' set to 'index' + $3
CREATE OR REPLACE FUNCTION make_rows(INTEGER, INTEGER, INTEGER)
  RETURNS SETOF myrecord_type
  LANGUAGE 'plpgsql'
  AS '
    DECLARE
      rec myrecord_type%ROWTYPE;
    BEGIN
      FOR index IN $1..$2 LOOP
    SELECT index, index + $3 INTO rec;
        RETURN NEXT rec;
      END LOOP;
      RETURN;
    END
  ';


-- Using this view generates an error:
--   "Set-valued function called in context that cannot accept a set"

CREATE OR REPLACE VIEW ranges_setview_broken AS (
    SELECT
      id,
      (make_rows(startindex, endindex, otherdata)).index,
      (make_rows(startindex, endindex, otherdata)).calc
    FROM ranges
);

-- SQL wrapper function passing through to plpgsql function

CREATE OR REPLACE FUNCTION make_rows_sql(INTEGER, INTEGER, INTEGER)
  RETURNS SETOF myrecord_type
  LANGUAGE 'sql'
  AS 'select * from make_rows($1, $2, $3)';


-- Identical view to last time, but using wrapper
-- instead of original function.  Works fine.

CREATE OR REPLACE VIEW ranges_setview_notbroken AS (
    SELECT
          id,
      (make_rows_sql(startindex, endindex, otherdata)).index,
      (make_rows_sql(startindex, endindex, otherdata)).calc
    FROM ranges
);


Thanks in advance for any pointers!

--
Jason Topaz
topaz@panix.com



Re: Struggling with set-returning functions, seeking advice

From
Tom Lane
Date:
Jason Topaz <topaz@panix.com> writes:
> 1) Note that my first attempt (view "ranges_setview_broken", referring
>    to a pgplsql set function "make_rows") fails.  But when I make a
>    second function with identical signature, but in language 'sql'
>    (it's just a passthrough to my original pgplsql function), suddenly
>    the server error goes away.  This seems strange to me.

It's an implementation artifact: plpgsql uses a different implementation
method to return sets than sql does.  sql's method works both in SELECT
lists and in FROM, plpgsql's only works in FROM.

The trouble with a set function in FROM is that it can't take any
parameters that are extracted from other tables in the query.  There
has been some talk of fixing that by implementing SQL99's LATERAL()
syntax, but we haven't yet wrapped our heads around exactly what would
be involved there.  In the meantime, the way you're doing it is probably
as good as you're going to get.

            regards, tom lane

Re: Struggling with set-returning functions, seeking advice

From
Joe Conway
Date:
Jason Topaz wrote:
> 3) Any recommendations for a completely different way to approach this
>    table transformation?  Incidentally, I have simplified a lot for
>    purposes of this posting, but the real application involves
> transforming
>    time-series data (stored in an array field) into a expanded table
>    where there is one row per original element in the array.
>

In addition to Tom's comments I'd add that for your real application, if
you are familiar with S-PLUS or R, you might find that it is easier to
do what you want using PL/R. The S language is a good fit if you have
arrays you need to manipulate.

I typically solve this kind of problem by passing any WHERE clause
criteria as arguments, build and execute the SQL statement, manipulate
that data, and then return just the final results via a FROM clause PL/R
function.

You can get PL/R here:
   http://www.joeconway.com/plr/

HTH,

Joe