Thread: [HACKERS] Dynamic result sets from procedures
This patch is more of a demo of what could be done, not my primary focus, but if there is interest and some assistance, maybe we can make something out of it. This patch also goes on top of "SQL procedures" version 1. The purpose is to return multiple result sets from a procedure. This is, I think, a common request when coming from MS SQL and DB2. MS SQL has a completely different procedure syntax, but this proposal is compatible with DB2, which as usual was the model for the SQL standard. So this is what it can do: CREATE PROCEDURE pdrstest1() LANGUAGE SQL AS $$ DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2; DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3; $$; CALL pdrstest1(); and that returns those two result sets to the client. That's all it does for now. Things get more complex when you consider nested calls. The SQL standard describes additional facilities how an outer procedure can accept a called procedure's result sets, or not. In the thread on transaction control, I mentioned that we might need some kind of procedure call stack. Something like that would be needed here as well. There are also probably some namespacing issues around the cursors that need more investigation. A more mundane issue is how we get psql to print multiple result sets. I have included here a patch that does that, and you can see that new result sets start popping up in the regression tests already. There is also one need error that needs further investigation. We need to think about how the \timing option should work in such scenarios. Right now it does start timer run query fetch result stop timer print result If we had multiple result sets, the most natural flow would be start timer run query while result sets fetch result print result stop timer print time but that would include the printing time in the total time, which the current code explicitly does not. We could also temporarily save the result sets, like start timer run query while result sets fetch result stop timer foreach result set print result but that would have a lot more overhead, potentially. Thoughts? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 1 November 2017 at 05:08, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > CREATE PROCEDURE pdrstest1() > LANGUAGE SQL > AS $$ > DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2; > DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3; > $$; > > CALL pdrstest1(); FWIW, this is similar to the model already used by PgJDBC to emulate multiple result sets, though the current support in the driver is rather crude. It detects a REFCURSOR in an output parameter / result set and transparently FETCHes the result set, making it look to the client app like it's a nested result set. This shouldn't conflict with what you're doing because the driver does not follow the JDBC standard behaviour of using Statement.getMoreResults() and Statement.getResultSet() for multiple result sets. That's currently only used by PgJDBC when fetching result sets from batch query executions. Instead, the multiple result set emulation requires the caller to 'getObject' the 'refcursor' field's result-object, then cast it to ResultSet, and treat it as a new (nested) result set. True multiple result sets would be exposed in PgJDBC via getMoreResults(). -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
2017-10-31 22:08 GMT+01:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.com>:
This patch is more of a demo of what could be done, not my primary
focus, but if there is interest and some assistance, maybe we can make
something out of it. This patch also goes on top of "SQL procedures"
version 1.
The purpose is to return multiple result sets from a procedure. This
is, I think, a common request when coming from MS SQL and DB2. MS SQL
has a completely different procedure syntax, but this proposal is
compatible with DB2, which as usual was the model for the SQL standard.
So this is what it can do:
CREATE PROCEDURE pdrstest1()
LANGUAGE SQL
AS $$
DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2;
DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3;
$$;
CALL pdrstest1();
and that returns those two result sets to the client.
That's all it does for now. Things get more complex when you consider
nested calls. The SQL standard describes additional facilities how an
outer procedure can accept a called procedure's result sets, or not. In
the thread on transaction control, I mentioned that we might need some
kind of procedure call stack. Something like that would be needed here
as well. There are also probably some namespacing issues around the
cursors that need more investigation.
A more mundane issue is how we get psql to print multiple result sets.
I have included here a patch that does that, and you can see that new
result sets start popping up in the regression tests already. There is
also one need error that needs further investigation.
We need to think about how the \timing option should work in such
scenarios. Right now it does
start timer
run query
fetch result
stop timer
print result
If we had multiple result sets, the most natural flow would be
start timer
run query
while result sets
fetch result
print result
stop timer
print time
but that would include the printing time in the total time, which the
current code explicitly does not. We could also temporarily save the
result sets, like
start timer
run query
while result sets
fetch result
stop timer
foreach result set
print result
but that would have a lot more overhead, potentially.
Thoughts?
Has the total time sense in this case?
should not be total time related to any fetched result?
Regards
Pavel
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Nov 1, 2017 at 2:38 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > So this is what it can do: > > CREATE PROCEDURE pdrstest1() > LANGUAGE SQL > AS $$ > DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2; > DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3; > $$; > > CALL pdrstest1(); > > and that returns those two result sets to the client. That seems like it is at least arguably a wire protocol break. Today, if you send a string containing only one command, you will only get one answer. I'm not saying that makes this change utterly unacceptable or anything -- but I wonder how much application code it will break, and whether any steps need to be taken to reduce breakage. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Peter Eisentraut wrote: > CREATE PROCEDURE pdrstest1() > LANGUAGE SQL > AS $$ > DECLARE c1 CURSOR WITH RETURN FOR SELECT * FROM cp_test2; > DECLARE c2 CURSOR WITH RETURN FOR SELECT * FROM cp_test3; > $$; > > CALL pdrstest1(); > > and that returns those two result sets to the client. If applied to plpgsql, to return a dynamic result, the following does work: CREATE PROCEDURE test() LANGUAGE plpgsql AS $$ DECLAREquery text:= 'SELECT 1 AS col1, 2 AS col2'; BEGINEXECUTE 'DECLARE c CURSOR WITH RETURN FOR ' || query; END; $$; This method could be used, for instance, to build a pivot with dynamic columns in a single client-server round-trip, which is not possible today with the query-calling-functions interface. More generally, I guess this should help in the whole class of situations where the client needs polymorphic results, which is awesome. But instead of having procedures not return anything, couldn't they return whatever resultset(s) they want to ("no resultset" being just a particular case of "anything"), so that we could leave out cursors and simply write: CREATE PROCEDURE test() LANGUAGE plpgsql AS $$ RETURN QUERY EXECUTE 'SELECT 1 AS col1, 2 AS col2'; END; $$; Or is that not possible or not desirable? Similarly, for the SQL language, I wonder if the above example could be simplified to: CREATE PROCEDURE pdrstest1() LANGUAGE SQL AS $$SELECT * FROM cp_test2;SELECT * FROM cp_test3; $$; by which the two result sets would go back to the client again without declaring explicit cursors. Currently, it does not error out and no result set is sent. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 11/1/17 06:23, Pavel Stehule wrote: > We need to think about how the \timing option should work in such > scenarios. Right now it does > Has the total time sense in this case? > > should not be total time related to any fetched result? The \timing option in psql measures from the time you send off the statement until you get all the results back. I don't see a fundamental reason to change that if a statement happens to produce multiple results. The results already come over the write and are processed by libpq. So the time is already measured. It's just that psql doesn't print the non-last result sets. We don't have any way to measure from psql how long each individual result set took to compose. For that you will need deeper tools like EXPLAIN ANALYZE and some way to process that data. That is way beyond what \timing currently does. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 11/1/17 22:40, Robert Haas wrote: > That seems like it is at least arguably a wire protocol break. Today, > if you send a string containing only one command, you will only get > one answer. The wire protocol already supports this. And the wire protocol doesn't really know about statements, only about a command string that might contain multiple commands. So I don't think anything would break. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 11/2/17 16:40, Daniel Verite wrote: > But instead of having procedures not return anything, > couldn't they return whatever resultset(s) they want to > ("no resultset" being just a particular case of "anything"), > so that we could leave out cursors and simply write: We could in general design this any way we want. I'm just going by what's in the SQL standard and in existing implementations. > CREATE PROCEDURE test() > LANGUAGE plpgsql > AS $$ > RETURN QUERY EXECUTE 'SELECT 1 AS col1, 2 AS col2'; > END; > $$; > > Or is that not possible or not desirable? RETURN means the execution ends there, so how would you return multiple result sets? > Similarly, for the SQL language, I wonder if the above example > could be simplified to: > > CREATE PROCEDURE pdrstest1() > LANGUAGE SQL > AS $$ > SELECT * FROM cp_test2; > SELECT * FROM cp_test3; > $$; > by which the two result sets would go back to the client again > without declaring explicit cursors. > Currently, it does not error out and no result set is sent. But maybe you don't want to return all those results, so you'd need a way to designate which ones, e.g., AS $$ SELECT set_config('something', 'value'); SELECT * FROM interesting_table; -- return only this one SELECT set_config('something', 'oldvalue'); $$; -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Peter Eisentraut wrote: > > CREATE PROCEDURE test() > > LANGUAGE plpgsql > > AS $$ > > RETURN QUERY EXECUTE 'SELECT 1 AS col1, 2 AS col2'; > > END; > > $$; > > > > Or is that not possible or not desirable? > > RETURN means the execution ends there, so how would you return multiple > result sets? RETURN alone yes, but RETURN QUERY continues the execution, appending rows to the single result set of the function. In the case of a procedure, I guess each RETURN QUERY could generate an independant result set. > But maybe you don't want to return all those results, so you'd need a > way to designate which ones, e.g., > > AS $$ > SELECT set_config('something', 'value'); > SELECT * FROM interesting_table; -- return only this one > SELECT set_config('something', 'oldvalue'); > $$; Yes, in that case, lacking PERFORM in SQL, nothing simple comes to mind on how to return certain results and not others. But if it was in an SQL function, it wouldn't return the rows of "interesting_table" either. I think it would be justified to say to just use plpgsql for that kind of sequence. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Peter Eisentraut wrote: > There is also one need error that needs further investigation. I've looked at this bit in the regression tests about \gexec: --- a/src/test/regress/expected/psql.out +++ b/src/test/regress/expected/psql.out @@ -232,11 +232,7 @@ union allselect 'drop table gexec_test', 'select ''2000-01-01''::date as party_over'\gexecselect 1 asones - ones ------- - 1 -(1 row) - +ERROR: DECLARE CURSOR can only be used in transaction blocks This can be interpreted as two separate errors: * \gexec ignores the first result postgres=# select 'select 1','select 2' \gexec?column? ----------2 (1 row) * \gexec fails with FETCH_COUNT postgres=# \set FETCH_COUNT 1 postgres=# select 'select 1','select 2' \gexec ERROR: DECLARE CURSOR can only be used intransaction blocks ?column? ---------- 2 (1 row) The two issues are due to SendQuery() being reentered for the gexec'd queries when it hasn't finished yet with the main query. I believe that just collecting all results of \gexec before executing any of them would solve both errors. Also doing a bit more testing I've seen these other issues: * combining multiple result sets and FETCH_COUNT doesn't work: postgres=# \set FETCH_COUNT 1 postgres=# select 1 \; select 2; postgres=# * last error is not recorded for \errverbose : postgres=# select foo; ERROR: column "foo" does not exist LINE 1: select foo; ^ postgres=# \errverbose There is noprevious error. * memory leaks on PGResults. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On Thu, Nov 16, 2017 at 8:27 PM, Daniel Verite <daniel@manitou-mail.org> wrote: > Peter Eisentraut wrote: > >> There is also one need error that needs further investigation. > > I've looked at this bit in the regression tests about \gexec: > > --- a/src/test/regress/expected/psql.out > +++ b/src/test/regress/expected/psql.out > @@ -232,11 +232,7 @@ union all > select 'drop table gexec_test', 'select ''2000-01-01''::date as party_over' > \gexec > select 1 as ones > - ones > ------- > - 1 > -(1 row) > - > +ERROR: DECLARE CURSOR can only be used in transaction blocks > > This can be interpreted as two separate errors: > > * \gexec ignores the first result > > postgres=# select 'select 1','select 2' \gexec > ?column? > ---------- > 2 > (1 row) > > * \gexec fails with FETCH_COUNT > > postgres=# \set FETCH_COUNT 1 > postgres=# select 'select 1','select 2' \gexec > ERROR: DECLARE CURSOR can only be used in transaction blocks > ?column? > ---------- > 2 > (1 row) > > The two issues are due to SendQuery() being reentered > for the gexec'd queries when it hasn't finished yet with the > main query. > I believe that just collecting all results of \gexec before > executing any of them would solve both errors. > > Also doing a bit more testing I've seen these other issues: > > * combining multiple result sets and FETCH_COUNT doesn't work: > > postgres=# \set FETCH_COUNT 1 > postgres=# select 1 \; select 2; > postgres=# > > > * last error is not recorded for \errverbose : > > postgres=# select foo; > ERROR: column "foo" does not exist > LINE 1: select foo; > ^ > postgres=# \errverbose > There is no previous error. > > * memory leaks on PGResults. Peter, are you planning to answer to those complains? For now I am switching the patch as returned with feedback as this thread has no activity for two weeks. -- Michael