Thread: [HACKERS] Dynamic result sets from procedures

[HACKERS] Dynamic result sets from procedures

From

Peter Eisentraut

Date:

01 November 2017, 03:08:47

This patch is more of a demo of what could be done, not my primary
focus, but if there is interest and some assistance, maybe we can make
something out of it.  This patch also goes on top of "SQL procedures"
version 1.

The purpose is to return multiple result sets from a procedure.  This
is, I think, a common request when coming from MS SQL and DB2.  MS SQL
has a completely different procedure syntax, but this proposal is
compatible with DB2, which as usual was the model for the SQL standard.
So this is what it can do:

CREATE PROCEDURE pdrstest1()
LANGUAGE SQL
AS $$
DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2;
DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3;
$$;

CALL pdrstest1();

and that returns those two result sets to the client.

That's all it does for now.  Things get more complex when you consider
nested calls.  The SQL standard describes additional facilities how an
outer procedure can accept a called procedure's result sets, or not.  In
the thread on transaction control, I mentioned that we might need some
kind of procedure call stack.  Something like that would be needed here
as well.  There are also probably some namespacing issues around the
cursors that need more investigation.

A more mundane issue is how we get psql to print multiple result sets.
I have included here a patch that does that, and you can see that new
result sets start popping up in the regression tests already.  There is
also one need error that needs further investigation.

We need to think about how the \timing option should work in such
scenarios.  Right now it does

start timer
run query
fetch result
stop timer
print result

If we had multiple result sets, the most natural flow would be

start timer
run query
while result sets
    fetch result
    print result
stop timer
print time

but that would include the printing time in the total time, which the
current code explicitly does not.  We could also temporarily save the
result sets, like

start timer
run query
while result sets
    fetch result
stop timer
foreach result set
    print result

but that would have a lot more overhead, potentially.

Thoughts?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Dynamic result sets from procedures

From

Craig Ringer

Date:

01 November 2017, 08:35:28

On 1 November 2017 at 05:08, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

> CREATE PROCEDURE pdrstest1()
> LANGUAGE SQL
> AS $$
> DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2;
> DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3;
> $$;
>
> CALL pdrstest1();

FWIW, this is similar to the model already used by PgJDBC to emulate
multiple result sets, though the current support in the driver is
rather crude. It detects a REFCURSOR in an output parameter / result
set and transparently FETCHes the result set, making it look to the
client app like it's a nested result set.

This shouldn't conflict with what you're doing because the driver does
not follow the JDBC standard behaviour of using
Statement.getMoreResults() and Statement.getResultSet() for multiple
result sets. That's currently only used by PgJDBC when fetching result
sets from batch query executions. Instead, the multiple result set
emulation requires the caller to 'getObject' the 'refcursor' field's
result-object, then cast it to ResultSet, and treat it as a new
(nested) result set.

True multiple result sets would be exposed in PgJDBC via getMoreResults().

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic result sets from procedures

From

Pavel Stehule

Date:

01 November 2017, 16:23:47

2017-10-31 22:08 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:

This patch is more of a demo of what could be done, not my primary
focus, but if there is interest and some assistance, maybe we can make
something out of it. This patch also goes on top of "SQL procedures"
version 1.

The purpose is to return multiple result sets from a procedure. This
is, I think, a common request when coming from MS SQL and DB2. MS SQL
has a completely different procedure syntax, but this proposal is
compatible with DB2, which as usual was the model for the SQL standard.
So this is what it can do:

CREATE PROCEDURE pdrstest1()
LANGUAGE SQL
AS $$
DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2;
DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3;
$$;

CALL pdrstest1();

and that returns those two result sets to the client.

That's all it does for now. Things get more complex when you consider
nested calls. The SQL standard describes additional facilities how an
outer procedure can accept a called procedure's result sets, or not. In
the thread on transaction control, I mentioned that we might need some
kind of procedure call stack. Something like that would be needed here
as well. There are also probably some namespacing issues around the
cursors that need more investigation.

A more mundane issue is how we get psql to print multiple result sets.
I have included here a patch that does that, and you can see that new
result sets start popping up in the regression tests already. There is
also one need error that needs further investigation.

We need to think about how the \timing option should work in such
scenarios. Right now it does

start timer
run query
fetch result
stop timer
print result

If we had multiple result sets, the most natural flow would be

start timer
run query
while result sets
fetch result
print result
stop timer
print time

but that would include the printing time in the total time, which the
current code explicitly does not. We could also temporarily save the
result sets, like

start timer
run query
while result sets
fetch result
stop timer
foreach result set
print result

but that would have a lot more overhead, potentially.

Thoughts?

Has the total time sense in this case?

should not be total time related to any fetched result?

Regards

Pavel

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic result sets from procedures

From

Robert Haas

Date:

02 November 2017, 08:40:52

On Wed, Nov 1, 2017 at 2:38 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> So this is what it can do:
>
> CREATE PROCEDURE pdrstest1()
> LANGUAGE SQL
> AS $$
> DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2;
> DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3;
> $$;
>
> CALL pdrstest1();
>
> and that returns those two result sets to the client.

That seems like it is at least arguably a wire protocol break.  Today,
if you send a string containing only one command, you will only get
one answer.

I'm not saying that makes this change utterly unacceptable or anything
-- but I wonder how much application code it will break, and whether
any steps need to be taken to reduce breakage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic result sets from procedures

From

"Daniel Verite"

Date:

03 November 2017, 02:40:31

    Peter Eisentraut wrote:

> CREATE PROCEDURE pdrstest1()
> LANGUAGE SQL
> AS $$
> DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2;
> DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3;
> $$;
>
> CALL pdrstest1();
>
> and that returns those two result sets to the client.

If applied to plpgsql, to return a dynamic result, the following
does work:

CREATE PROCEDURE test()
LANGUAGE plpgsql
AS $$
DECLAREquery text:= 'SELECT 1 AS col1, 2 AS col2';
BEGINEXECUTE 'DECLARE c CURSOR WITH RETURN FOR ' || query;
END;
$$;

This method could be used, for instance, to build a pivot with dynamic
columns in a single client-server round-trip, which is not possible today
with the query-calling-functions interface.
More generally, I guess this should help in the whole class of situations
where the client needs polymorphic results, which is awesome.

But instead of having procedures not return anything,
couldn't they return whatever resultset(s) they want to
("no resultset" being just a particular case of "anything"),
so that we could leave out cursors and simply write:

CREATE PROCEDURE test()
LANGUAGE plpgsql
AS $$ RETURN QUERY    EXECUTE 'SELECT 1 AS col1, 2 AS col2';
END;
$$;

Or is that not possible or not desirable?

Similarly, for the SQL language, I wonder if the above example
could be simplified to:

CREATE PROCEDURE pdrstest1()
LANGUAGE SQL
AS $$SELECT * FROM cp_test2;SELECT * FROM cp_test3;
$$;
by which the two result sets would go back to the client again
without declaring explicit cursors.
Currently, it does not error out and no result set is sent.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic result sets from procedures

From

Peter Eisentraut

Date:

03 November 2017, 23:48:20

On 11/1/17 06:23, Pavel Stehule wrote:
>     We need to think about how the \timing option should work in such
>     scenarios.  Right now it does

> Has the total time sense  in this case?
> 
> should not be total time related to any fetched result?

The \timing option in psql measures from the time you send off the
statement until you get all the results back.  I don't see a fundamental
reason to change that if a statement happens to produce multiple
results.  The results already come over the write and are processed by
libpq.  So the time is already measured.  It's just that psql doesn't
print the non-last result sets.

We don't have any way to measure from psql how long each individual
result set took to compose.  For that you will need deeper tools like
EXPLAIN ANALYZE and some way to process that data.  That is way beyond
what \timing currently does.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic result sets from procedures

From

Peter Eisentraut

Date:

03 November 2017, 23:50:28

On 11/1/17 22:40, Robert Haas wrote:
> That seems like it is at least arguably a wire protocol break.  Today,
> if you send a string containing only one command, you will only get
> one answer.

The wire protocol already supports this.  And the wire protocol doesn't
really know about statements, only about a command string that might
contain multiple commands.  So I don't think anything would break.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic result sets from procedures

From

Peter Eisentraut

Date:

03 November 2017, 23:55:03

On 11/2/17 16:40, Daniel Verite wrote:
> But instead of having procedures not return anything,
> couldn't they return whatever resultset(s) they want to
> ("no resultset" being just a particular case of "anything"),
> so that we could leave out cursors and simply write:

We could in general design this any way we want.  I'm just going by
what's in the SQL standard and in existing implementations.

> CREATE PROCEDURE test()
> LANGUAGE plpgsql
> AS $$
>   RETURN QUERY    EXECUTE 'SELECT 1 AS col1, 2 AS col2';
> END;
> $$;
> 
> Or is that not possible or not desirable?

RETURN means the execution ends there, so how would you return multiple
result sets?

> Similarly, for the SQL language, I wonder if the above example
> could be simplified to:
> 
> CREATE PROCEDURE pdrstest1()
> LANGUAGE SQL
> AS $$
>  SELECT * FROM cp_test2;
>  SELECT * FROM cp_test3;
> $$;
> by which the two result sets would go back to the client again
> without declaring explicit cursors.
> Currently, it does not error out and no result set is sent.

But maybe you don't want to return all those results, so you'd need a
way to designate which ones, e.g.,

AS $$
SELECT set_config('something', 'value');
SELECT * FROM interesting_table;  -- return only this one
SELECT set_config('something', 'oldvalue');
$$;

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic result sets from procedures

From

"Daniel Verite"

Date:

04 November 2017, 20:59:41

    Peter Eisentraut wrote:

> > CREATE PROCEDURE test()
> > LANGUAGE plpgsql
> > AS $$
> >   RETURN QUERY    EXECUTE 'SELECT 1 AS col1, 2 AS col2';
> > END;
> > $$;
> >
> > Or is that not possible or not desirable?
>
> RETURN means the execution ends there, so how would you return multiple
> result sets?

RETURN alone yes, but RETURN QUERY continues the execution, appending
rows to the single result set of the function. In the case of a
procedure, I guess each RETURN QUERY could generate an independant
result set.

> But maybe you don't want to return all those results, so you'd need a
> way to designate which ones, e.g.,
>
> AS $$
> SELECT set_config('something', 'value');
> SELECT * FROM interesting_table;  -- return only this one
> SELECT set_config('something', 'oldvalue');
> $$;

Yes, in that case, lacking PERFORM in SQL, nothing simple comes to
mind on how to return certain results and not others.
But if it was in an SQL function, it wouldn't return the rows of
"interesting_table" either. I think it would be justified to say to just
use plpgsql for that kind of sequence.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic result sets from procedures

From

"Daniel Verite"

Date:

16 November 2017, 17:27:48

Peter Eisentraut wrote:

> There is also one need error that needs further investigation.

I've looked at this bit in the regression tests about \gexec:

--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -232,11 +232,7 @@ union allselect 'drop table gexec_test', 'select ''2000-01-01''::date as party_over'\gexecselect 1
asones 
- ones
-------
-    1
-(1 row)
-
+ERROR:  DECLARE CURSOR can only be used in transaction blocks

This can be interpreted as two separate errors:

* \gexec ignores the first result

postgres=# select 'select 1','select 2' \gexec?column?
----------2
(1 row)

* \gexec fails with FETCH_COUNT
 postgres=# \set FETCH_COUNT 1 postgres=# select 'select 1','select 2' \gexec ERROR:  DECLARE CURSOR can only be used
intransaction blocks  ?column?  ----------  2 (1 row) 

The two issues are due to SendQuery() being reentered
for the gexec'd queries when it hasn't finished yet with the
main query.
I believe that just collecting all results of \gexec before
executing any of them would solve both errors.

Also doing a bit more testing I've seen these other issues:

* combining multiple result sets and FETCH_COUNT doesn't work:
 postgres=# \set FETCH_COUNT 1 postgres=# select 1 \; select 2; postgres=#


* last error is not recorded for \errverbose :
 postgres=# select foo; ERROR:  column "foo" does not exist LINE 1: select foo;     ^ postgres=# \errverbose There is
noprevious error. 

* memory leaks on PGResults.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: [HACKERS] Dynamic result sets from procedures

From

Michael Paquier

Date:

30 November 2017, 10:37:35

On Thu, Nov 16, 2017 at 8:27 PM, Daniel Verite <daniel@manitou-mail.org> wrote:
>         Peter Eisentraut wrote:
>
>> There is also one need error that needs further investigation.
>
> I've looked at this bit in the regression tests about \gexec:
>
> --- a/src/test/regress/expected/psql.out
> +++ b/src/test/regress/expected/psql.out
> @@ -232,11 +232,7 @@ union all
>  select 'drop table gexec_test', 'select ''2000-01-01''::date as party_over'
>  \gexec
>  select 1 as ones
> - ones
> -------
> -    1
> -(1 row)
> -
> +ERROR:  DECLARE CURSOR can only be used in transaction blocks
>
> This can be interpreted as two separate errors:
>
> * \gexec ignores the first result
>
> postgres=# select 'select 1','select 2' \gexec
>  ?column?
> ----------
>         2
> (1 row)
>
> * \gexec fails with FETCH_COUNT
>
>   postgres=# \set FETCH_COUNT 1
>   postgres=# select 'select 1','select 2' \gexec
>   ERROR:  DECLARE CURSOR can only be used in transaction blocks
>    ?column?
>   ----------
>           2
>   (1 row)
>
> The two issues are due to SendQuery() being reentered
> for the gexec'd queries when it hasn't finished yet with the
> main query.
> I believe that just collecting all results of \gexec before
> executing any of them would solve both errors.
>
> Also doing a bit more testing I've seen these other issues:
>
> * combining multiple result sets and FETCH_COUNT doesn't work:
>
>   postgres=# \set FETCH_COUNT 1
>   postgres=# select 1 \; select 2;
>   postgres=#
>
>
> * last error is not recorded for \errverbose :
>
>   postgres=# select foo;
>   ERROR:  column "foo" does not exist
>   LINE 1: select foo;
>                  ^
>   postgres=# \errverbose
>   There is no previous error.
>
> * memory leaks on PGResults.

Peter, are you planning to answer to those complains? For now I am
switching the patch as returned with feedback as this thread has no
activity for two weeks.
-- 
Michael