Thread: SQL advice needed

SQL advice needed

From
Torsten Förtsch
Date:
Hi,

I have a volatile function that returns multiple rows. It may also
return nothing. Now, I want to write an SQL statement that calls this
function until it returns an empty result set and returns all the rows.

So, in principle I want to:

WITH RECURSIVE
t AS (
  SELECT * FROM xx()
  UNION ALL
  SELECT * FROM xx()
)
SELECT * FROM t;

But that's not recursive because the union all part lacks a reference to t.

Next I tried this:

WITH RECURSIVE
t AS (
  SELECT * FROM xx()
  UNION ALL
  SELECT * FROM xx() WHERE EXISTS (SELECT 1 FROM t)
)
SELECT * FROM t;

But the reference to t is not allowed in a subquery.

What's the best (or at least a working) way to achieve what I want?

I can do it in plpgsql. But that would mean to accumulate the complete
result in memory first, right? I need to avoid that.

Thanks,
Torsten


Re: SQL advice needed

From
Merlin Moncure
Date:
On Mon, Mar 17, 2014 at 3:21 PM, Torsten Förtsch
<torsten.foertsch@gmx.net> wrote:
> Hi,
>
> I have a volatile function that returns multiple rows. It may also
> return nothing. Now, I want to write an SQL statement that calls this
> function until it returns an empty result set and returns all the rows.
>
> So, in principle I want to:
>
> WITH RECURSIVE
> t AS (
>   SELECT * FROM xx()
>   UNION ALL
>   SELECT * FROM xx()
> )
> SELECT * FROM t;
>
> But that's not recursive because the union all part lacks a reference to t.
>
> Next I tried this:
>
> WITH RECURSIVE
> t AS (
>   SELECT * FROM xx()
>   UNION ALL
>   SELECT * FROM xx() WHERE EXISTS (SELECT 1 FROM t)
> )
> SELECT * FROM t;
>
> But the reference to t is not allowed in a subquery.
>
> What's the best (or at least a working) way to achieve what I want?
>
> I can do it in plpgsql. But that would mean to accumulate the complete
> result in memory first, right? I need to avoid that.

I would test that assumption.   This is better handled in loop IMO.

LOOP
  RETURN QUERY SELECT * FROM xx();
  IF NOT found
  THEN
    RETURN;
  END IF;
END LOOP;

merlin


Re: SQL advice needed

From
David Johnston
Date:
Torsten Förtsch wrote
> Hi,
>
> I have a volatile function that returns multiple rows. It may also
> return nothing. Now, I want to write an SQL statement that calls this
> function until it returns an empty result set and returns all the rows.
>
> What's the best (or at least a working) way to achieve what I want?
>
> I can do it in plpgsql. But that would mean to accumulate the complete
> result in memory first, right? I need to avoid that.

You are describing procedural logic.  If you need intermediate steps before
"returns all the rows" then either those intermediate steps stay in memory
OR you stick them on a table somewhere and, when your procedure is done,
send back a cursor over that, possibly temporary, table.

I don't think abusing WITH/RECURSIVE is going to be viable.

You should also consider whether you can do what you need using set-logic
(i.e., pure SQL).  At worse it will be a learning exercise and a performance
comparator.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/SQL-advice-needed-tp5796431p5796436.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: SQL advice needed

From
Torsten Förtsch
Date:
On 17/03/14 21:42, Merlin Moncure wrote:
>> I can do it in plpgsql. But that would mean to accumulate the complete
>> > result in memory first, right? I need to avoid that.
> I would test that assumption.   This is better handled in loop IMO.
>
> LOOP
>   RETURN QUERY SELECT * FROM xx();
>   IF NOT found
>   THEN
>     RETURN;
>   END IF;
> END LOOP;

At least according to the manual it is stored in memory:

<cite>
Note: The current implementation of RETURN NEXT and RETURN QUERY stores
the entire result set before returning from the function, as discussed
above. That means that if a PL/pgSQL function produces a very large
result set, performance might be poor: data will be written to disk to
avoid memory exhaustion, but the function itself will not return until
the entire result set has been generated. A future version of PL/pgSQL
might allow users to define set-returning functions that do not have
this limitation. Currently, the point at which data begins being written
to disk is controlled by the work_mem configuration variable.
Administrators who have sufficient memory to store larger result sets in
memory should consider increasing this parameter.
</cite>

I didn't test that, though.

Torsten


Re: SQL advice needed

From
Merlin Moncure
Date:
On Mon, Mar 17, 2014 at 4:20 PM, Torsten Förtsch
<torsten.foertsch@gmx.net> wrote:
> On 17/03/14 21:42, Merlin Moncure wrote:
>>> I can do it in plpgsql. But that would mean to accumulate the complete
>>> > result in memory first, right? I need to avoid that.
>> I would test that assumption.   This is better handled in loop IMO.
>>
>> LOOP
>>   RETURN QUERY SELECT * FROM xx();
>>   IF NOT found
>>   THEN
>>     RETURN;
>>   END IF;
>> END LOOP;
>
> At least according to the manual it is stored in memory:
>
> <cite>
> Note: The current implementation of RETURN NEXT and RETURN QUERY stores
> the entire result set before returning from the function, as discussed
> above. That means that if a PL/pgSQL function produces a very large
> result set, performance might be poor: data will be written to disk to
> avoid memory exhaustion, but the function itself will not return until
> the entire result set has been generated. A future version of PL/pgSQL
> might allow users to define set-returning functions that do not have
> this limitation. Currently, the point at which data begins being written
> to disk is controlled by the work_mem configuration variable.
> Administrators who have sufficient memory to store larger result sets in
> memory should consider increasing this parameter.
> </cite>
>
> I didn't test that, though.

The manual says describes the exact opposite of what you said you
thought it does -- large result sets are paged out to disk, not stored
in memory (this is a feature).  CTEs use a similar tactic so it's a
wash.  The performance overhead of a tuplestore is probably not as bad
as you think -- just test it out some simulated results and monitor
performance.  Either way, work_mem controls it.  It's generally
dangerous to crank work_mem to huge values but it's ok to set it
temporarily via SET to huge values (say to 1GB) for a query
particularly if you know that it's only getting issued by one caller
at a time.

merlin