Thread: Is This A Set Based Solution?

Is This A Set Based Solution?

From
Stefan Berglund
Date:
Hi-

Below is a small test case that illustrates what I'm attempting which is
to provide a comma separated list of numbers to a procedure which
subsequently uses this list in a join with another table.

My questions are is this a set based solution and is this the best
approach in terms of using the data types and methods afforded by
PostgreSQL?  I'm mostly inquiring about the double FOR loop which just
doesn't feel right to me and I'd also like to feel that I'm generally on
the right track before converting the other 400 procedures from SQL
Server 2000 to PostgreSQL.

CREATE TYPE fn_return_int4 AS (N int);

CREATE TABLE test_table (
  id SMALLINT not null,
  tname varchar(50) not null);

INSERT INTO test_table
      SELECT 1, 'Adams'
UNION SELECT 2, 'Baker'
UNION SELECT 3, 'Chrysler'
UNION SELECT 4, 'Douglas'
UNION SELECT 5, 'Everyman';

CREATE OR REPLACE FUNCTION fn_Split_List (
  pList TEXT) RETURNS SETOF fn_return_int4 AS $fn_Split_List$

DECLARE
  v_row fn_return_int4%rowtype;
  v_list alias for $1;
  v_delim text := ',';
  v_arr text[];

BEGIN
  v_arr := string_to_array(v_list, v_delim);
  FOR i IN array_lower(v_arr, 1)..array_upper(v_arr, 1) LOOP
    FOR v_row IN SELECT v_arr[i] LOOP
      RETURN NEXT v_row;
    END LOOP;
  END LOOP;
RETURN;
END;
$fn_Split_List$ LANGUAGE plpgsql;

SELECT *
FROM
  fn_Split_List('5,1,3') SL INNER JOIN
  test_table T ON SL.N=T.ID;

I did discover that I was able to define the function with a native type
but then the usage looked a little odd:

SELECT *
FROM
  fn_Split_List('5,1,3') SL INNER JOIN
  test_table T ON SL=T.ID;

Stefan Berglund
www.horseshowtime.com
Online Show Entry - Instant Internet Horse Show Schedules and Results
stefan@horseshowtime.com
tel  714.968.9112   fax  714.968.5940

Re: Is This A Set Based Solution?

From
Oleg Bartunov
Date:
I don't know if you could change your schema. but I'd consider your
problem as a overlapping arrays task and use contrib/intarray for that.

Oleg
On Fri, 9 Mar 2007, Stefan Berglund wrote:

> Hi-
>
> Below is a small test case that illustrates what I'm attempting which is
> to provide a comma separated list of numbers to a procedure which
> subsequently uses this list in a join with another table.
>
> My questions are is this a set based solution and is this the best
> approach in terms of using the data types and methods afforded by
> PostgreSQL?  I'm mostly inquiring about the double FOR loop which just
> doesn't feel right to me and I'd also like to feel that I'm generally on
> the right track before converting the other 400 procedures from SQL
> Server 2000 to PostgreSQL.
>
> CREATE TYPE fn_return_int4 AS (N int);
>
> CREATE TABLE test_table (
>  id SMALLINT not null,
>  tname varchar(50) not null);
>
> INSERT INTO test_table
>      SELECT 1, 'Adams'
> UNION SELECT 2, 'Baker'
> UNION SELECT 3, 'Chrysler'
> UNION SELECT 4, 'Douglas'
> UNION SELECT 5, 'Everyman';
>
> CREATE OR REPLACE FUNCTION fn_Split_List (
>  pList TEXT) RETURNS SETOF fn_return_int4 AS $fn_Split_List$
>
> DECLARE
>  v_row fn_return_int4%rowtype;
>  v_list alias for $1;
>  v_delim text := ',';
>  v_arr text[];
>
> BEGIN
>  v_arr := string_to_array(v_list, v_delim);
>  FOR i IN array_lower(v_arr, 1)..array_upper(v_arr, 1) LOOP
>    FOR v_row IN SELECT v_arr[i] LOOP
>      RETURN NEXT v_row;
>    END LOOP;
>  END LOOP;
> RETURN;
> END;
> $fn_Split_List$ LANGUAGE plpgsql;
>
> SELECT *
> FROM
>  fn_Split_List('5,1,3') SL INNER JOIN
>  test_table T ON SL.N=T.ID;
>
> I did discover that I was able to define the function with a native type
> but then the usage looked a little odd:
>
> SELECT *
> FROM
>  fn_Split_List('5,1,3') SL INNER JOIN
>  test_table T ON SL=T.ID;
>
> Stefan Berglund
> www.horseshowtime.com
> Online Show Entry - Instant Internet Horse Show Schedules and Results
> stefan@horseshowtime.com
> tel  714.968.9112   fax  714.968.5940
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Is This A Set Based Solution?

From
Tom Lane
Date:
Stefan Berglund <stefan_berglund@msn.com> writes:
> Below is a small test case that illustrates what I'm attempting which is
> to provide a comma separated list of numbers to a procedure which
> subsequently uses this list in a join with another table.

> My questions are is this a set based solution and is this the best
> approach in terms of using the data types and methods afforded by
> PostgreSQL?  I'm mostly inquiring about the double FOR loop which just
> doesn't feel right to me ...

It looks pretty ugly to me too, but you haven't explained your problem
clearly enough for anyone to be able to recommend a better solution path.
Why do you feel you need to do this?  What is the context?

            regards, tom lane

Re: Is This A Set Based Solution?

From
Stefan Berglund
Date:
On Sat, 10 Mar 2007 00:37:08 -0500, tgl@sss.pgh.pa.us (Tom Lane) wrote:
 in <7887.1173505028@sss.pgh.pa.us>

>Stefan Berglund <stefan_berglund@msn.com> writes:
>> Below is a small test case that illustrates what I'm attempting which is
>> to provide a comma separated list of numbers to a procedure which
>> subsequently uses this list in a join with another table.
>
>> My questions are is this a set based solution and is this the best
>> approach in terms of using the data types and methods afforded by
>> PostgreSQL?  I'm mostly inquiring about the double FOR loop which just
>> doesn't feel right to me ...
>
>It looks pretty ugly to me too, but you haven't explained your problem
>clearly enough for anyone to be able to recommend a better solution path.
>Why do you feel you need to do this?  What is the context?

I've been lurking for several thousand posts and I'm flattered that
you've responded but I'm also a little flustered that I failed to
communicate so I'll try again.

In SQL Server I was able to pass a string of IDs such as '1,5,3' to a
procedure that would create a set of tuples where each tuple was one of
those ids:

Row 1 : 1
Row 2: 5
Row 3: 3

I could then use this table in a join with another table.

What I want to do is to create a function that takes a comma separated
string of numbers and produces a table (where each row is one of those
numbers) that can be joined to other tables as in the example first
provided.

fn_Split_List is supposed to take a list of numbers and return a table
of rows of those numbers.

I hope this better explains what I'm trying to do but somehow from your
reaction I get the feeling that I'm missing something really basic?

---
This posting is provided "AS IS" with no warranties and no guarantees either express or implied.

Stefan Berglund

Re: Is This A Set Based Solution?

From
Stefan Berglund
Date:
On Sat, 10 Mar 2007 08:26:32 +0300 (MSK), oleg@sai.msu.su (Oleg
Bartunov) wrote:
 in <Pine.LNX.4.64.0703100824300.400@sn.sai.msu.ru>

>I don't know if you could change your schema. but I'd consider your
>problem as a overlapping arrays task and use contrib/intarray for that.
>
>Oleg

I can very definitely change my schema at this point.  I'm refactoring
an application from SQL Server to PostgreSQL and I'm doing a lot of
exploring trying to find the best fits.  I'll see what I can glean from
you've indicated but that sounds like what I'm looking for.  The string
converts easily to an array but then what's the best way to get from an
array to a table?

---
This posting is provided "AS IS" with no warranties and no guarantees either express or implied.

Stefan Berglund

Re: Is This A Set Based Solution?

From
Tom Lane
Date:
Stefan Berglund <sorry.no.koolaid@for.me> writes:
> On Sat, 10 Mar 2007 00:37:08 -0500, tgl@sss.pgh.pa.us (Tom Lane) wrote:
>> It looks pretty ugly to me too, but you haven't explained your problem
>> clearly enough for anyone to be able to recommend a better solution path.
>> Why do you feel you need to do this?  What is the context?

> What I want to do is to create a function that takes a comma separated
> string of numbers and produces a table (where each row is one of those
> numbers) that can be joined to other tables as in the example first
> provided.

That was what you said before.  The question is why you need to do that.
It strikes me that having such a requirement is a symptom of poor data
representation choices.  Perhaps an array would be better, or maybe you
ought to refactor your table layout altogether.  But, as I said, you
haven't provided any info that would let someone give advice at that
level.

            regards, tom lane

Re: Is This A Set Based Solution?

From
Stefan Berglund
Date:
On Mon, 12 Mar 2007 10:41:21 -0400, tgl@sss.pgh.pa.us (Tom Lane) wrote:
 in <15437.1173710481@sss.pgh.pa.us>

>Stefan Berglund <sorry.no.koolaid@for.me> writes:
>> On Sat, 10 Mar 2007 00:37:08 -0500, tgl@sss.pgh.pa.us (Tom Lane) wrote:
>>> It looks pretty ugly to me too, but you haven't explained your problem
>>> clearly enough for anyone to be able to recommend a better solution path.
>>> Why do you feel you need to do this?  What is the context?
>
>> What I want to do is to create a function that takes a comma separated
>> string of numbers and produces a table (where each row is one of those
>> numbers) that can be joined to other tables as in the example first
>> provided.
>
>That was what you said before.  The question is why you need to do that.
>It strikes me that having such a requirement is a symptom of poor data
>representation choices.  Perhaps an array would be better, or maybe you
>ought to refactor your table layout altogether.  But, as I said, you
>haven't provided any info that would let someone give advice at that
>level.

Perhaps it is a case of poor data representation choices and that is
exactly why I posted originally - because I wasn't sure if that was the
best way of doing what I want to do:

I have an app where the user makes multiple selections from a list.  I
can either construct a huge WHERE clause such as SELECT blah blah FROM
foo WHERE (ID = 53016 OR ID = 27 OR ID = 292 OR ID = 512) or I could
alternatively pass the string of IDs ('53016,27,292,512') to a table
returning function which TABLE is then JOINed with the table I wish to
query instead of using the unwieldy WHERE clause.  The latter strikes me
as a far more scalable method since it eliminates having to use dynamic
SQL to construct the ridiculously long WHERE clause which will no doubt
ultimately bump up against parser length restrictions or some such.

I didn't find any examples that showed JOINing an array with a table.
How do other developers solve this basic problem and why does my
approach seem so foreign?

SELECT blah blah
FROM
  fn_Split_List('53016,27,292,512') SL INNER JOIN
  foo T ON SL.N=T.ID;

or

SELECT blah blah
FROM foo
WHERE (ID = 53016 OR ID = 27 OR ID = 292 OR ID = 512)
---
Stefan Berglund

Re: Is This A Set Based Solution?

From
George Weaver
Date:
Stefan Berglund wrote:

> foo WHERE (ID = 53016 OR ID = 27 OR ID = 292 OR ID = 512) or I could
> alternatively pass the string of IDs ('53016,27,292,512') to a table
> returning function which TABLE is then JOINed with the table I wish to

Stefan,

The user selections will be in some sort of list.  Could you not use WHERE
ID IN (the list)?

Regards,
George


Re: Is This A Set Based Solution?

From
Bruno Wolff III
Date:
On Mon, Mar 12, 2007 at 11:15:01 -0700,
  Stefan Berglund <sorry.no.koolaid@for.me> wrote:
>
> I have an app where the user makes multiple selections from a list.  I
> can either construct a huge WHERE clause such as SELECT blah blah FROM
> foo WHERE (ID = 53016 OR ID = 27 OR ID = 292 OR ID = 512) or I could
> alternatively pass the string of IDs ('53016,27,292,512') to a table
> returning function which TABLE is then JOINed with the table I wish to
> query instead of using the unwieldy WHERE clause.  The latter strikes me
> as a far more scalable method since it eliminates having to use dynamic
> SQL to construct the ridiculously long WHERE clause which will no doubt
> ultimately bump up against parser length restrictions or some such.

How big is huge?
If the list of IDs is in the 1000s or higher, then it may be better to
load the data into a temp table and ANALYSE it before running your query.
Otherwise, for smaller lists the IN suggestion should work well in recent
versions.

Re: Is This A Set Based Solution?

From
Tino Wildenhain
Date:
Bruno Wolff III schrieb:
> On Mon, Mar 12, 2007 at 11:15:01 -0700,
>   Stefan Berglund <sorry.no.koolaid@for.me> wrote:
>> I have an app where the user makes multiple selections from a list.  I
>> can either construct a huge WHERE clause such as SELECT blah blah FROM
>> foo WHERE (ID = 53016 OR ID = 27 OR ID = 292 OR ID = 512) or I could
>> alternatively pass the string of IDs ('53016,27,292,512') to a table
>> returning function which TABLE is then JOINed with the table I wish to
>> query instead of using the unwieldy WHERE clause.  The latter strikes me
>> as a far more scalable method since it eliminates having to use dynamic
>> SQL to construct the ridiculously long WHERE clause which will no doubt
>> ultimately bump up against parser length restrictions or some such.
>
> How big is huge?
> If the list of IDs is in the 1000s or higher, then it may be better to
> load the data into a temp table and ANALYSE it before running your query.
> Otherwise, for smaller lists the IN suggestion should work well in recent
> versions.

Show me a user which really clicks on 1000 or more checkboxes on a
webpage or similar ;)
I'd think around 20 values is plenty.

Regards
Tino

Re: Is This A Set Based Solution?

From
Jorge Godoy
Date:
Tino Wildenhain <tino@wildenhain.de> writes:

> Show me a user which really clicks on 1000 or more checkboxes on a
> webpage or similar ;)
> I'd think around 20 values is plenty.

On the other hand, show me a page with 1000 or more checkboxes to be clicked
at once and I'd show a developer / designer that needs a new career... :-)

--
Jorge Godoy      <jgodoy@gmail.com>

Re: Is This A Set Based Solution?

From
Stefan Berglund
Date:
On Thu, 15 Mar 2007 09:47:27 -0500, gweaver@shaw.ca (George Weaver)
wrote:
 in <008001c76710$da487db0$6400a8c0@Dell4500>

>
>Stefan Berglund wrote:
>
>> foo WHERE (ID = 53016 OR ID = 27 OR ID = 292 OR ID = 512) or I could
>> alternatively pass the string of IDs ('53016,27,292,512') to a table
>> returning function which TABLE is then JOINed with the table I wish to
>
>The user selections will be in some sort of list.  Could you not use WHERE
>ID IN (the list)?

Coming from SQL Server where that is not allowed, it didn't occur to me
that PostgreSQL would allow a substitutable parameter in the IN clause.
However, it seems that it can't be done in this fashion without using
dynamic SQL unless I'm missing something.

I tried this:

create or replace function foo(plist TEXT)
  RETURNS SETOF Show_Entries as $$

  SELECT *
  FROM Show_Entries
  WHERE Show_ID = 1250 AND Show_Number IN ($1);

$$ LANGUAGE sql;

When I use select * from foo('101,110,115,120'); I get no results.  When
I use select * from foo(101,110,115,120); I get the correct results.

At any rate, I'm happy with what I've come up with and so far
performance is excellent:

CREATE TABLE test_table (
  id int not null,
  tname varchar(50) not null);

INSERT INTO test_table
      SELECT 1, 'Adams'
UNION SELECT 2, 'Baker'
UNION SELECT 3, 'Chrysler'
UNION SELECT 4, 'Douglas'
UNION SELECT 5, 'Everyman';

CREATE OR REPLACE FUNCTION foo (
  pList TEXT) RETURNS SETOF INTEGER AS $foo$

DECLARE
  v_arr text[];

BEGIN
  v_arr := string_to_array($1, ',');
  FOR i IN array_lower(v_arr, 1)..array_upper(v_arr, 1) LOOP
    RETURN NEXT v_arr[i]::int;
  END LOOP;
RETURN;
END;
$foo$ LANGUAGE plpgsql;

SELECT *
FROM
  foo('5,1,3') SL INNER JOIN
  test_table T ON SL=T.ID;

SELECT * FROM foo('52001,17,22,42,47') ORDER BY foo;

---
Stefan Berglund

Re: Is This A Set Based Solution?

From
Stefan Berglund
Date:
On Thu, 15 Mar 2007 15:46:33 -0500, bruno@wolff.to (Bruno Wolff III)
wrote:
 in <20070315204633.GA2156@wolff.to>

>On Mon, Mar 12, 2007 at 11:15:01 -0700,
>  Stefan Berglund <sorry.no.koolaid@for.me> wrote:
>>
>> I have an app where the user makes multiple selections from a list.  I
>> can either construct a huge WHERE clause such as SELECT blah blah FROM
>> foo WHERE (ID = 53016 OR ID = 27 OR ID = 292 OR ID = 512) or I could
>> alternatively pass the string of IDs ('53016,27,292,512') to a table
>> returning function which TABLE is then JOINed with the table I wish to
>> query instead of using the unwieldy WHERE clause.  The latter strikes me
>> as a far more scalable method since it eliminates having to use dynamic
>> SQL to construct the ridiculously long WHERE clause which will no doubt
>> ultimately bump up against parser length restrictions or some such.
>
>How big is huge?
>If the list of IDs is in the 1000s or higher, then it may be better to
>load the data into a temp table and ANALYSE it before running your query.
>Otherwise, for smaller lists the IN suggestion should work well in recent
>versions.

Sorry, huge was an exaggeration.  I doubt it would ever approach 1000 -
more like a couple hundred.  I'll look at it a little closer.

---
Stefan Berglund

Re: Is This A Set Based Solution?

From
Stefan Berglund
Date:
On Fri, 16 Mar 2007 09:38:52 -0300, jgodoy@gmail.com (Jorge Godoy)
wrote:
 in <877ithjrc3.fsf@gmail.com>

>Tino Wildenhain <tino@wildenhain.de> writes:
>
>> Show me a user which really clicks on 1000 or more checkboxes on a
>> webpage or similar ;)
>> I'd think around 20 values is plenty.
>
>On the other hand, show me a page with 1000 or more checkboxes to be clicked
>at once and I'd show a developer / designer that needs a new career... :-)

Just to allay your fears, a fairly typical scenario might have the user
presented with a list of from twenty to fifty names of horse trainers
depending on the size of the show.  Since each trainer can have anywhere
from one to thirty or forty horses in their barn you can do the math to
see that the list of IDs passed from the app to the database can be
anywhere from a single ID up to possibly thousands of IDs.

---
Stefan Berglund

Re: Is This A Set Based Solution?

From
Stefan Berglund
Date:
On Sat, 10 Mar 2007 08:26:32 +0300 (MSK), oleg@sai.msu.su (Oleg
Bartunov) wrote:
 in <Pine.LNX.4.64.0703100824300.400@sn.sai.msu.ru>

>I don't know if you could change your schema. but I'd consider your
>problem as a overlapping arrays task and use contrib/intarray for that.

That's a nice piece of work, Oleg, and extremely quick.  I played with
it and pored over the docs but it just seems to keep coming back to the
fact that all of the array type manipulations are column based as
opposed to row based.  In fact, this from section 8.10.5 of the docs
confirms it:

Tip
Arrays are not sets; searching for specific array elements may be a sign
of database misdesign. Consider using a separate table with a row for
each item that would be an array element. This will be easier to search,
and is likely to scale up better to large numbers of elements.

I was able to clean up the function I originally posted removing the
extraneous LOOP and I'm more than happy with the performance.  What's
funny is that the function as it now stands is what I initially obtained
by googling, but I mistakenly added the extra loop.  :-)

What I finally came up with is here:
<r7viv2hr16uvmsb4tti398rr8oce4e1nvm@4ax.com>

---
Stefan Berglund

Re: Is This A Set Based Solution?

From
Martijn van Oosterhout
Date:
On Thu, Mar 15, 2007 at 10:26:48AM -0700, Stefan Berglund wrote:
> that PostgreSQL would allow a substitutable parameter in the IN clause.
> However, it seems that it can't be done in this fashion without using
> dynamic SQL unless I'm missing something.

The substitutable list has to be an array, not a text value. So
if the parameter is specified as ARRAY OF INTEGER, you can call it
like:

SELECT foo(ARRAY[1,2,3,45]);
SELECT foo('{1,2,3,4,56}');

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Is This A Set Based Solution?

From
Tom Lane
Date:
Stefan Berglund <sorry.no.koolaid@for.me> writes:
> I tried this:

> create or replace function foo(plist TEXT)
>   RETURNS SETOF Show_Entries as $$

>   SELECT *
>   FROM Show_Entries
>   WHERE Show_ID = 1250 AND Show_Number IN ($1);

> $$ LANGUAGE sql;

> When I use select * from foo('101,110,115,120'); I get no results.  When
> I use select * from foo(101,110,115,120); I get the correct results.

Just for the record, the reason that didn't work is that Postgres saw it
as a comparison to a single scalar IN-list item.  What you had was
effectively

   WHERE Show_ID = 1250 AND Show_Number::text IN ('101,110,115,120');

which of course will fail to find any rows.

In recent releases (8.2 for sure, don't remember if 8.1 can do this
efficiently) you could instead do

create or replace function foo(plist int[])
  RETURNS SETOF Show_Entries as $$

  SELECT *
  FROM Show_Entries
  WHERE Show_ID = 1250 AND Show_Number IN ($1);

$$ LANGUAGE sql;

select * from foo(array[101,110,115,120]);

            regards, tom lane