Thread: dynamic result sets support in extended query protocol
I want to progress work on stored procedures returning multiple result sets. Examples of how this could work on the SQL side have previously been shown [0]. We also have ongoing work to make psql show multiple result sets [1]. This appears to work fine in the simple query protocol. But the extended query protocol doesn't support multiple result sets at the moment [2]. This would be desirable to be able to use parameter binding, and also since one of the higher-level goals would be to support the use case of stored procedures returning multiple result sets via JDBC. [0]: https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com [1]: https://commitfest.postgresql.org/29/2096/ [2]: https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us (Terminology: I'm calling this project "dynamic result sets", which includes several concepts: 1) multiple result sets, 2) those result sets can have different structures, 3) the structure of the result sets is decided at run time, not declared in the schema/procedure definition/etc.) One possibility I rejected was to invent a third query protocol beside the simple and extended one. This wouldn't really match with the requirements of JDBC and similar APIs because the APIs for sending queries don't indicate whether dynamic result sets are expected or required, you only indicate that later by how you process the result sets. So we really need to use the existing ways of sending off the queries. Also, avoiding a third query protocol is probably desirable in general to avoid extra code and APIs. So here is my sketch on how this functionality could be woven into the extended query protocol. I'll go through how the existing protocol exchange works and then point out the additions that I have in mind. These additions could be enabled by a _pq_ startup parameter sent by the client. Alternatively, it might also work without that because the client would just reject protocol messages it doesn't understand, but that's probably less desirable behavior. So here is how it goes: C: Parse S: ParseComplete At this point, the server would know whether the statement it has parsed can produce dynamic result sets. For a stored procedure, this would be declared with the procedure definition, so when the CALL statement is parsed, this can be noticed. I don't actually plan any other cases, but for the sake of discussion, perhaps some variant of EXPLAIN could also return multiple result sets, and that could also be detected from parsing the EXPLAIN invocation. At this point a client would usually do C: Describe (statement) S: ParameterDescription S: RowDescription New would be that the server would now also respond with a new message, say, S: DynamicResultInfo that indicates that dynamic result sets will follow later. The message would otherwise be empty. (We could perhaps include the number of result sets, but this might not actually be useful, and perhaps it's better not to spent effort on counting things that don't need to be counted.) (If we don't guard this by a _pq_ startup parameter from the client, an old client would now error out because of an unexpected protocol message.) Now the normal bind and execute sequence follows: C: Bind S: BindComplete (C: Describe (portal)) (S: RowDescription) C: Execute S: ... (DataRows) S: CommandComplete In the case of a CALL with output parameters, this "primary" result set contains one row with the output parameters (existing behavior). Now, if the client has seen DynamicResultInfo earlier, it should now go into a new subsequence to get the remaining result sets, like this (naming obviously to be refined): C: NextResult S: NextResultReady C: Describe (portal) S: RowDescription C: Execute .... S: CommandComplete C: NextResult ... C: NextResult S: NoNextResult C: Sync S: ReadyForQuery I think this would all have to use the unnamed portal, but perhaps there could be other uses with named portals. Some details to be worked out. One could perhaps also do without the DynamicResultInfo message and just put extra information into the CommandComplete message indicating "there are more result sets after this one". (Following the model from the simple query protocol, CommandComplete really means one result set complete, not the whole top-level command. ReadyForQuery means the whole command is complete. This is perhaps debatable, and interesting questions could also arise when considering what should happen in the simple query protocol when a query string consists of multiple commands each returning multiple result sets. But it doesn't really seem sensible to cater to that.) One thing that's missing in this sequence is a way to specify the desired output format (text/binary) for each result set. This could be added to the NextResult message, but at that point the client doesn't yet know the number of columns in the result set, so we could only do it globally. Then again, since the result sets are dynamic, it's less likely that a client would be coded to set per-column output codes. Then again, I would hate to bake such a restriction into the protocol, because some is going to try. (I suspect what would be more useful in practice is to designate output formats per data type.) So if we wanted to have this fully featured, it might have to look something like this: C: NextResult S: NextResultReady C: Describe (dynamic) (new message subkind) S: RowDescription C: Bind (zero parameters, optionally format codes) S: BindComplete C: Describe (portal) S: RowDescription C: Execute ... While this looks more complicated, client libraries could reuse existing code that starts processing with a Bind message and continues to CommandComplete, and then just loops back around. The mapping of this to libpq in a simple case could look like this: PQsendQueryParams(conn, "CALL ...", ...); PQgetResult(...); // gets output parameters PQnextResult(...); // new: sends NextResult+Bind PQgetResult(...); // and repeat Again, it's not clear here how to declare the result column output formats. Since libpq doesn't appear to expose the Bind message separately, I'm not sure what to do here. In JDBC, the NextResult message would correspond to the Statement.getMoreResults() method. It will need a bit of conceptual adjustment because the first result set sent on the protocol is actually the output parameters, which the JDBC API returns separately from a ResultSet, so the initial CallableStatement.execute() call will need to process the primary result set and then send NextResult and obtain the first dynamic result as the first ResultSet for its API, but that can be handled internally. Thoughts so far? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Are you proposing to bump up the protocol version (either major or minor)? I am asking because it seems you are going to introduce some new message types. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp > I want to progress work on stored procedures returning multiple result > sets. Examples of how this could work on the SQL side have previously > been shown [0]. We also have ongoing work to make psql show multiple > result sets [1]. This appears to work fine in the simple query > protocol. But the extended query protocol doesn't support multiple > result sets at the moment [2]. This would be desirable to be able to > use parameter binding, and also since one of the higher-level goals > would be to support the use case of stored procedures returning > multiple result sets via JDBC. > > [0]: > https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com > [1]: https://commitfest.postgresql.org/29/2096/ > [2]: > https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us > > (Terminology: I'm calling this project "dynamic result sets", which > includes several concepts: 1) multiple result sets, 2) those result > sets can have different structures, 3) the structure of the result > sets is decided at run time, not declared in the schema/procedure > definition/etc.) > > One possibility I rejected was to invent a third query protocol beside > the simple and extended one. This wouldn't really match with the > requirements of JDBC and similar APIs because the APIs for sending > queries don't indicate whether dynamic result sets are expected or > required, you only indicate that later by how you process the result > sets. So we really need to use the existing ways of sending off the > queries. Also, avoiding a third query protocol is probably desirable > in general to avoid extra code and APIs. > > So here is my sketch on how this functionality could be woven into the > extended query protocol. I'll go through how the existing protocol > exchange works and then point out the additions that I have in mind. > > These additions could be enabled by a _pq_ startup parameter sent by > the client. Alternatively, it might also work without that because > the client would just reject protocol messages it doesn't understand, > but that's probably less desirable behavior. > > So here is how it goes: > > C: Parse > S: ParseComplete > > At this point, the server would know whether the statement it has > parsed can produce dynamic result sets. For a stored procedure, this > would be declared with the procedure definition, so when the CALL > statement is parsed, this can be noticed. I don't actually plan any > other cases, but for the sake of discussion, perhaps some variant of > EXPLAIN could also return multiple result sets, and that could also be > detected from parsing the EXPLAIN invocation. > > At this point a client would usually do > > C: Describe (statement) > S: ParameterDescription > S: RowDescription > > New would be that the server would now also respond with a new > message, say, > > S: DynamicResultInfo > > that indicates that dynamic result sets will follow later. The > message would otherwise be empty. (We could perhaps include the > number of result sets, but this might not actually be useful, and > perhaps it's better not to spent effort on counting things that don't > need to be counted.) > > (If we don't guard this by a _pq_ startup parameter from the client, > an old client would now error out because of an unexpected protocol > message.) > > Now the normal bind and execute sequence follows: > > C: Bind > S: BindComplete > (C: Describe (portal)) > (S: RowDescription) > C: Execute > S: ... (DataRows) > S: CommandComplete > > In the case of a CALL with output parameters, this "primary" result > set contains one row with the output parameters (existing behavior). > > Now, if the client has seen DynamicResultInfo earlier, it should now > go into a new subsequence to get the remaining result sets, like this > (naming obviously to be refined): > > C: NextResult > S: NextResultReady > C: Describe (portal) > S: RowDescription > C: Execute > .... > S: CommandComplete > C: NextResult > ... > C: NextResult > S: NoNextResult > C: Sync > S: ReadyForQuery > > I think this would all have to use the unnamed portal, but perhaps > there could be other uses with named portals. Some details to be > worked out. > > One could perhaps also do without the DynamicResultInfo message and > just put extra information into the CommandComplete message indicating > "there are more result sets after this one". > > (Following the model from the simple query protocol, CommandComplete > really means one result set complete, not the whole top-level > command. ReadyForQuery means the whole command is complete. This is > perhaps debatable, and interesting questions could also arise when > considering what should happen in the simple query protocol when a > query string consists of multiple commands each returning multiple > result sets. But it doesn't really seem sensible to cater to that.) > > One thing that's missing in this sequence is a way to specify the > desired output format (text/binary) for each result set. This could > be added to the NextResult message, but at that point the client > doesn't yet know the number of columns in the result set, so we could > only do it globally. Then again, since the result sets are dynamic, > it's less likely that a client would be coded to set per-column output > codes. Then again, I would hate to bake such a restriction into the > protocol, because some is going to try. (I suspect what would be more > useful in practice is to designate output formats per data type.) So > if we wanted to have this fully featured, it might have to look > something like this: > > C: NextResult > S: NextResultReady > C: Describe (dynamic) (new message subkind) > S: RowDescription > C: Bind (zero parameters, optionally format codes) > S: BindComplete > C: Describe (portal) > S: RowDescription > C: Execute > ... > > While this looks more complicated, client libraries could reuse > existing code that starts processing with a Bind message and continues > to CommandComplete, and then just loops back around. > > The mapping of this to libpq in a simple case could look like this: > > PQsendQueryParams(conn, "CALL ...", ...); > PQgetResult(...); // gets output parameters > PQnextResult(...); // new: sends NextResult+Bind > PQgetResult(...); // and repeat > > Again, it's not clear here how to declare the result column output > formats. Since libpq doesn't appear to expose the Bind message > separately, I'm not sure what to do here. > > In JDBC, the NextResult message would correspond to the > Statement.getMoreResults() method. It will need a bit of conceptual > adjustment because the first result set sent on the protocol is > actually the output parameters, which the JDBC API returns separately > from a ResultSet, so the initial CallableStatement.execute() call will > need to process the primary result set and then send NextResult and > obtain the first dynamic result as the first ResultSet for its API, > but that can be handled internally. > > Thoughts so far? > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > >
On 2020-10-08 10:23, Tatsuo Ishii wrote: > Are you proposing to bump up the protocol version (either major or > minor)? I am asking because it seems you are going to introduce some > new message types. It wouldn't be a new major version. It could either be a new minor version, or it would be guarded by a _pq_ protocol message to enable this functionality from the client, as described. Or both? We haven't done this sort of thing a lot, so some discussion on the details might be necessary. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10/8/20 3:46 AM, Peter Eisentraut wrote: > I want to progress work on stored procedures returning multiple result > sets. Examples of how this could work on the SQL side have previously > been shown [0]. We also have ongoing work to make psql show multiple > result sets [1]. This appears to work fine in the simple query > protocol. But the extended query protocol doesn't support multiple > result sets at the moment [2]. This would be desirable to be able to > use parameter binding, and also since one of the higher-level goals > would be to support the use case of stored procedures returning > multiple result sets via JDBC. > > [0]: > https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com > [1]: https://commitfest.postgresql.org/29/2096/ > [2]: > https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us > > (Terminology: I'm calling this project "dynamic result sets", which > includes several concepts: 1) multiple result sets, 2) those result > sets can have different structures, 3) the structure of the result > sets is decided at run time, not declared in the schema/procedure > definition/etc.) > > One possibility I rejected was to invent a third query protocol beside > the simple and extended one. This wouldn't really match with the > requirements of JDBC and similar APIs because the APIs for sending > queries don't indicate whether dynamic result sets are expected or > required, you only indicate that later by how you process the result > sets. So we really need to use the existing ways of sending off the > queries. Also, avoiding a third query protocol is probably desirable > in general to avoid extra code and APIs. > > So here is my sketch on how this functionality could be woven into the > extended query protocol. I'll go through how the existing protocol > exchange works and then point out the additions that I have in mind. > > These additions could be enabled by a _pq_ startup parameter sent by > the client. Alternatively, it might also work without that because > the client would just reject protocol messages it doesn't understand, > but that's probably less desirable behavior. > > So here is how it goes: > > C: Parse > S: ParseComplete > > At this point, the server would know whether the statement it has > parsed can produce dynamic result sets. For a stored procedure, this > would be declared with the procedure definition, so when the CALL > statement is parsed, this can be noticed. I don't actually plan any > other cases, but for the sake of discussion, perhaps some variant of > EXPLAIN could also return multiple result sets, and that could also be > detected from parsing the EXPLAIN invocation. > > At this point a client would usually do > > C: Describe (statement) > S: ParameterDescription > S: RowDescription > > New would be that the server would now also respond with a new > message, say, > > S: DynamicResultInfo > > that indicates that dynamic result sets will follow later. The > message would otherwise be empty. (We could perhaps include the > number of result sets, but this might not actually be useful, and > perhaps it's better not to spent effort on counting things that don't > need to be counted.) > > (If we don't guard this by a _pq_ startup parameter from the client, > an old client would now error out because of an unexpected protocol > message.) > > Now the normal bind and execute sequence follows: > > C: Bind > S: BindComplete > (C: Describe (portal)) > (S: RowDescription) > C: Execute > S: ... (DataRows) > S: CommandComplete > > In the case of a CALL with output parameters, this "primary" result > set contains one row with the output parameters (existing behavior). > > Now, if the client has seen DynamicResultInfo earlier, it should now > go into a new subsequence to get the remaining result sets, like this > (naming obviously to be refined): > > C: NextResult > S: NextResultReady > C: Describe (portal) > S: RowDescription > C: Execute > .... > S: CommandComplete > C: NextResult > ... > C: NextResult > S: NoNextResult > C: Sync > S: ReadyForQuery > > I think this would all have to use the unnamed portal, but perhaps > there could be other uses with named portals. Some details to be > worked out. > > One could perhaps also do without the DynamicResultInfo message and > just put extra information into the CommandComplete message indicating > "there are more result sets after this one". > > (Following the model from the simple query protocol, CommandComplete > really means one result set complete, not the whole top-level command. > ReadyForQuery means the whole command is complete. This is perhaps > debatable, and interesting questions could also arise when considering > what should happen in the simple query protocol when a query string > consists of multiple commands each returning multiple result sets. > But it doesn't really seem sensible to cater to that.) > > One thing that's missing in this sequence is a way to specify the > desired output format (text/binary) for each result set. This could > be added to the NextResult message, but at that point the client > doesn't yet know the number of columns in the result set, so we could > only do it globally. Then again, since the result sets are dynamic, > it's less likely that a client would be coded to set per-column output > codes. Then again, I would hate to bake such a restriction into the > protocol, because some is going to try. (I suspect what would be more > useful in practice is to designate output formats per data type.) So > if we wanted to have this fully featured, it might have to look > something like this: > > C: NextResult > S: NextResultReady > C: Describe (dynamic) (new message subkind) > S: RowDescription > C: Bind (zero parameters, optionally format codes) > S: BindComplete > C: Describe (portal) > S: RowDescription > C: Execute > ... > > While this looks more complicated, client libraries could reuse > existing code that starts processing with a Bind message and continues > to CommandComplete, and then just loops back around. > > The mapping of this to libpq in a simple case could look like this: > > PQsendQueryParams(conn, "CALL ...", ...); > PQgetResult(...); // gets output parameters > PQnextResult(...); // new: sends NextResult+Bind > PQgetResult(...); // and repeat > > Again, it's not clear here how to declare the result column output > formats. Since libpq doesn't appear to expose the Bind message > separately, I'm not sure what to do here. > > In JDBC, the NextResult message would correspond to the > Statement.getMoreResults() method. It will need a bit of conceptual > adjustment because the first result set sent on the protocol is > actually the output parameters, which the JDBC API returns separately > from a ResultSet, so the initial CallableStatement.execute() call will > need to process the primary result set and then send NextResult and > obtain the first dynamic result as the first ResultSet for its API, > but that can be handled internally. > > Thoughts so far? > Exciting stuff. But I'm a bit concerned about the sequence of resultsets. The JDBC docco for CallableStatement says: A CallableStatement can return one ResultSet object or multiple ResultSet objects. Multiple ResultSet objects are handled using operations inherited from Statement. For maximum portability, a call's ResultSet objects and update counts should be processed prior to getting the values of output parameters. And this is more or less in line with the pattern that I've seen when converting SPs from other systems - the OUT params are usually set at the end with things like status flags and error messages. If the OUT parameter resultset has to come first (which is how I read your proposal - please correct me if I'm wrong) we'll have to stack up all the resultsets until the SP returns, then send the OUT params, then send the remaining resultsets. That seems ... suboptimal. The alternative would be to send the OUT params last. That might result in the driver needing to do some lookahead and caching, but I don't think it's unmanageable. Of course, your protocol would also need changing. cheers andrew -- Andrew Dunstan EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 9 Oct 2020 at 13:33, Andrew Dunstan <andrew@dunslane.net> wrote:
On 10/8/20 3:46 AM, Peter Eisentraut wrote:
> I want to progress work on stored procedures returning multiple result
> sets. Examples of how this could work on the SQL side have previously
> been shown [0]. We also have ongoing work to make psql show multiple
> result sets [1]. This appears to work fine in the simple query
> protocol. But the extended query protocol doesn't support multiple
> result sets at the moment [2]. This would be desirable to be able to
> use parameter binding, and also since one of the higher-level goals
> would be to support the use case of stored procedures returning
> multiple result sets via JDBC.
>
> [0]:
> https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com
> [1]: https://commitfest.postgresql.org/29/2096/
> [2]:
> https://www.postgresql.org/message-id/9507.1534370765%40sss.pgh.pa.us
>
> (Terminology: I'm calling this project "dynamic result sets", which
> includes several concepts: 1) multiple result sets, 2) those result
> sets can have different structures, 3) the structure of the result
> sets is decided at run time, not declared in the schema/procedure
> definition/etc.)
>
> One possibility I rejected was to invent a third query protocol beside
> the simple and extended one. This wouldn't really match with the
> requirements of JDBC and similar APIs because the APIs for sending
> queries don't indicate whether dynamic result sets are expected or
> required, you only indicate that later by how you process the result
> sets. So we really need to use the existing ways of sending off the
> queries. Also, avoiding a third query protocol is probably desirable
> in general to avoid extra code and APIs.
>
> So here is my sketch on how this functionality could be woven into the
> extended query protocol. I'll go through how the existing protocol
> exchange works and then point out the additions that I have in mind.
>
> These additions could be enabled by a _pq_ startup parameter sent by
> the client. Alternatively, it might also work without that because
> the client would just reject protocol messages it doesn't understand,
> but that's probably less desirable behavior.
>
> So here is how it goes:
>
> C: Parse
> S: ParseComplete
>
> At this point, the server would know whether the statement it has
> parsed can produce dynamic result sets. For a stored procedure, this
> would be declared with the procedure definition, so when the CALL
> statement is parsed, this can be noticed. I don't actually plan any
> other cases, but for the sake of discussion, perhaps some variant of
> EXPLAIN could also return multiple result sets, and that could also be
> detected from parsing the EXPLAIN invocation.
>
> At this point a client would usually do
>
> C: Describe (statement)
> S: ParameterDescription
> S: RowDescription
>
> New would be that the server would now also respond with a new
> message, say,
>
> S: DynamicResultInfo
>
> that indicates that dynamic result sets will follow later. The
> message would otherwise be empty. (We could perhaps include the
> number of result sets, but this might not actually be useful, and
> perhaps it's better not to spent effort on counting things that don't
> need to be counted.)
>
> (If we don't guard this by a _pq_ startup parameter from the client,
> an old client would now error out because of an unexpected protocol
> message.)
>
> Now the normal bind and execute sequence follows:
>
> C: Bind
> S: BindComplete
> (C: Describe (portal))
> (S: RowDescription)
> C: Execute
> S: ... (DataRows)
> S: CommandComplete
>
> In the case of a CALL with output parameters, this "primary" result
> set contains one row with the output parameters (existing behavior).
>
> Now, if the client has seen DynamicResultInfo earlier, it should now
> go into a new subsequence to get the remaining result sets, like this
> (naming obviously to be refined):
>
> C: NextResult
> S: NextResultReady
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ....
> S: CommandComplete
> C: NextResult
> ...
> C: NextResult
> S: NoNextResult
> C: Sync
> S: ReadyForQuery
>
> I think this would all have to use the unnamed portal, but perhaps
> there could be other uses with named portals. Some details to be
> worked out.
>
> One could perhaps also do without the DynamicResultInfo message and
> just put extra information into the CommandComplete message indicating
> "there are more result sets after this one".
>
> (Following the model from the simple query protocol, CommandComplete
> really means one result set complete, not the whole top-level command.
> ReadyForQuery means the whole command is complete. This is perhaps
> debatable, and interesting questions could also arise when considering
> what should happen in the simple query protocol when a query string
> consists of multiple commands each returning multiple result sets.
> But it doesn't really seem sensible to cater to that.)
>
> One thing that's missing in this sequence is a way to specify the
> desired output format (text/binary) for each result set. This could
> be added to the NextResult message, but at that point the client
> doesn't yet know the number of columns in the result set, so we could
> only do it globally. Then again, since the result sets are dynamic,
> it's less likely that a client would be coded to set per-column output
> codes. Then again, I would hate to bake such a restriction into the
> protocol, because some is going to try. (I suspect what would be more
> useful in practice is to designate output formats per data type.) So
> if we wanted to have this fully featured, it might have to look
> something like this:
>
> C: NextResult
> S: NextResultReady
> C: Describe (dynamic) (new message subkind)
> S: RowDescription
> C: Bind (zero parameters, optionally format codes)
> S: BindComplete
> C: Describe (portal)
> S: RowDescription
> C: Execute
> ...
>
> While this looks more complicated, client libraries could reuse
> existing code that starts processing with a Bind message and continues
> to CommandComplete, and then just loops back around.
>
> The mapping of this to libpq in a simple case could look like this:
>
> PQsendQueryParams(conn, "CALL ...", ...);
> PQgetResult(...); // gets output parameters
> PQnextResult(...); // new: sends NextResult+Bind
> PQgetResult(...); // and repeat
>
> Again, it's not clear here how to declare the result column output
> formats. Since libpq doesn't appear to expose the Bind message
> separately, I'm not sure what to do here.
>
> In JDBC, the NextResult message would correspond to the
> Statement.getMoreResults() method. It will need a bit of conceptual
> adjustment because the first result set sent on the protocol is
> actually the output parameters, which the JDBC API returns separately
> from a ResultSet, so the initial CallableStatement.execute() call will
> need to process the primary result set and then send NextResult and
> obtain the first dynamic result as the first ResultSet for its API,
> but that can be handled internally.
>
> Thoughts so far?
>
Exciting stuff. But I'm a bit concerned about the sequence of
resultsets. The JDBC docco for CallableStatement says:
A CallableStatement can return one ResultSet object or multiple
ResultSet objects. Multiple ResultSet objects are handled using
operations inherited from Statement.
For maximum portability, a call's ResultSet objects and update
counts should be processed prior to getting the values of output
parameters.
And this is more or less in line with the pattern that I've seen when
converting SPs from other systems - the OUT params are usually set at
the end with things like status flags and error messages.
If the OUT parameter resultset has to come first (which is how I read
your proposal - please correct me if I'm wrong) we'll have to stack up
all the resultsets until the SP returns, then send the OUT params, then
send the remaining resultsets. That seems ... suboptimal. The
alternative would be to send the OUT params last. That might result in
the driver needing to do some lookahead and caching, but I don't think
it's unmanageable. Of course, your protocol would also need changing.
cheers
andrew
--
Andrew Dunstan
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Currently the JDBC driver does NOT do :
At this point a client would usually do
>
> C: Describe (statement)
> S: ParameterDescription
> S: RowDescription
>
> C: Describe (statement)
> S: ParameterDescription
> S: RowDescription
We do not do the Describe until we use a named statement and decide that the extra round trip is worth it.
Making this assumption will cause a performance regression on all queries.
If we are going to make a protocol change there are a number of other things the drivers want. https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md
Thanks,
Dave
Hi, On 2020-10-08 09:46:38 +0200, Peter Eisentraut wrote: > New would be that the server would now also respond with a new message, say, > > S: DynamicResultInfo > Now, if the client has seen DynamicResultInfo earlier, it should now go into > a new subsequence to get the remaining result sets, like this (naming > obviously to be refined): Hm. Isn't this going to be a lot more latency sensitive than we'd like? This would basically require at least one additional roundtrip for everything that *potentially* could return multiple result sets, even if no additional results are returned, right? And it'd add at least one additional roundtrip for every result set that's actually sent. Is there really a good reason for forcing the client to issue NextResult, Describe, Execute for each of the dynamic result sets? It's not like there's really a case for allowing the clients to skip them, right? Why aren't we sending something more like S: CommandPartiallyComplete S: RowDescription S: DataRow... S: CommandPartiallyComplete S: RowDescription S: DataRow... ... S: CommandComplete C: Sync gated by a _pq_ parameter, of course. > I think this would all have to use the unnamed portal, but perhaps there > could be other uses with named portals. Some details to be worked out. Which'd avoid this too, but: > One thing that's missing in this sequence is a way to specify the desired > output format (text/binary) for each result set. Is a good point. I personally think avoiding the back and forth is more important though. But if we could address both at the same time... > (I suspect what would be more useful in practice is to designate > output formats per data type.) Yea, that'd be *really* useful. It sucks that we basically require multiple round trips to make realistic use of the binary data for the few types where it's a huge win (e.g. bytea). Greetings, Andres Freund
Hi,
On Fri, 9 Oct 2020 at 14:46, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-10-08 09:46:38 +0200, Peter Eisentraut wrote:
> New would be that the server would now also respond with a new message, say,
>
> S: DynamicResultInfo
> Now, if the client has seen DynamicResultInfo earlier, it should now go into
> a new subsequence to get the remaining result sets, like this (naming
> obviously to be refined):
Hm. Isn't this going to be a lot more latency sensitive than we'd like?
This would basically require at least one additional roundtrip for
everything that *potentially* could return multiple result sets, even if
no additional results are returned, right? And it'd add at least one
additional roundtrip for every result set that's actually sent.
Agreed as mentioned.
Is there really a good reason for forcing the client to issue
NextResult, Describe, Execute for each of the dynamic result sets? It's
not like there's really a case for allowing the clients to skip them,
right? Why aren't we sending something more like
S: CommandPartiallyComplete
S: RowDescription
S: DataRow...
S: CommandPartiallyComplete
S: RowDescription
S: DataRow...
...
S: CommandComplete
C: Sync
gated by a _pq_ parameter, of course.
> I think this would all have to use the unnamed portal, but perhaps there
> could be other uses with named portals. Some details to be worked out.
Which'd avoid this too, but:
> One thing that's missing in this sequence is a way to specify the desired
> output format (text/binary) for each result set.
Is a good point. I personally think avoiding the back and forth is more
important though. But if we could address both at the same time...
> (I suspect what would be more useful in practice is to designate
> output formats per data type.)
Yea, that'd be *really* useful. It sucks that we basically require
multiple round trips to make realistic use of the binary data for the
few types where it's a huge win (e.g. bytea).
Yes!!! Ideally in the startup message.
Dave
Hi, On 2020-10-09 14:49:11 -0400, Dave Cramer wrote: > On Fri, 9 Oct 2020 at 14:46, Andres Freund <andres@anarazel.de> wrote: > > > (I suspect what would be more useful in practice is to designate > > > output formats per data type.) > > > > Yea, that'd be *really* useful. It sucks that we basically require > > multiple round trips to make realistic use of the binary data for the > > few types where it's a huge win (e.g. bytea). > > > > Yes!!! Ideally in the startup message. I don't think startup is a good choice. For one, it's size limited. But more importantly, before having successfully established a connection, there's really no way the driver can know which types it should list as to be sent in binary (consider e.g. some postgis types, which'd greatly benefit from being sent in binary, but also just version dependent stuff). The hard part around this really is whether and how to deal with changes in type definitions. From types just being created - comparatively simple - to extensions being dropped and recreated, with oids potentially being reused. Greetings, Andres Freund
On Fri, 9 Oct 2020 at 14:59, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-10-09 14:49:11 -0400, Dave Cramer wrote:
> On Fri, 9 Oct 2020 at 14:46, Andres Freund <andres@anarazel.de> wrote:
> > > (I suspect what would be more useful in practice is to designate
> > > output formats per data type.)
> >
> > Yea, that'd be *really* useful. It sucks that we basically require
> > multiple round trips to make realistic use of the binary data for the
> > few types where it's a huge win (e.g. bytea).
> >
>
> Yes!!! Ideally in the startup message.
I don't think startup is a good choice. For one, it's size limited. But
more importantly, before having successfully established a connection,
there's really no way the driver can know which types it should list as
to be sent in binary (consider e.g. some postgis types, which'd greatly
benefit from being sent in binary, but also just version dependent
stuff).
For the most part we know exactly which types we want in binary for 99% of queries.
The hard part around this really is whether and how to deal with changes
in type definitions. From types just being created - comparatively
simple - to extensions being dropped and recreated, with oids
potentially being reused.
Fair point but this is going to be much more complex than just sending most of the results in binary which would speed up the overwhelming majority of queries
Dave Cramer
On 2020-10-09 21:02, Dave Cramer wrote: > For the most part we know exactly which types we want in binary for 99% > of queries. > > The hard part around this really is whether and how to deal with changes > in type definitions. From types just being created - comparatively > simple - to extensions being dropped and recreated, with oids > potentially being reused. > > > Fair point but this is going to be much more complex than just sending > most of the results in binary which would speed up the overwhelming > majority of queries I've been studying in more detail how the JDBC driver handles binary format use. Having some kind of message "use binary for these types" would match its requirements quite exactly. (I have also studied npgsql, but it appears to work quite differently. More input from there and other places with similar requirements would be welcome.) The question as mentioned above is how to deal with type changes. Let's work through a couple of options. We could send the type/format list with every query. For example, we could extend/enhance/alter the Bind message so that instead of a format-per-column it sends a format-per-type. But then you'd need to send the complete type list every time. The JDBC driver currently has 20+ types already hardcoded and more optionally, so you'd send 100+ bytes for every query, plus required effort for encoding and decoding. That seems unattractive. Or we send the type/format list once near the beginning of the session. Then we need to deal with types being recreated or updated etc. The first option is that we "lock" the types against changes (ignoring whether that's actually possible right now). That would mean you couldn't update an affected type/extension while a JDBC session is active. That's no good. (Imagine connection pools with hours of server lifetime.) Another option is that we invalidate the session when a thus-registered type changes. Also no good. (We don't want an extension upgrade suddenly breaking all open connections.) Finally, we could do it an a best-effort basis. We use binary format for registered types, until there is some invalidation event for the type, at which point we revert to default/text format until the end of a session (or until another protocol message arrives re-registering the type). This should work, because the result row descriptor contains the actual format type, and there is no guarantee that it's the same one that was requested. So how about that last option? I imagine a new protocol message, say, TypeFormats, that contains a number of type/format pairs. The message would typically be sent right after the first ReadyForQuery, gets no response. It could also be sent at any other time, but I expect that to be less used in practice. Binary format is used for registered types if they have binary format support functions, otherwise text continues to be used. There is no error response for types without binary support. (There should probably be an error response for registering a type that does not exist.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, 20 Oct 2020 at 05:57, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2020-10-09 21:02, Dave Cramer wrote:
> For the most part we know exactly which types we want in binary for 99%
> of queries.
>
> The hard part around this really is whether and how to deal with changes
> in type definitions. From types just being created - comparatively
> simple - to extensions being dropped and recreated, with oids
> potentially being reused.
>
>
> Fair point but this is going to be much more complex than just sending
> most of the results in binary which would speed up the overwhelming
> majority of queries
I've been studying in more detail how the JDBC driver handles binary
format use. Having some kind of message "use binary for these types"
would match its requirements quite exactly. (I have also studied
npgsql, but it appears to work quite differently. More input from there
and other places with similar requirements would be welcome.) The
question as mentioned above is how to deal with type changes. Let's
work through a couple of options.
I've added Vladimir (pgjdbc), Shay (npgsql) and Mark Paluch (r2dbc) to this discussion.
I'm sure there are others but I'm not acquainted with them
We could send the type/format list with every query. For example, we
could extend/enhance/alter the Bind message so that instead of a
format-per-column it sends a format-per-type. But then you'd need to
send the complete type list every time. The JDBC driver currently has
20+ types already hardcoded and more optionally, so you'd send 100+
bytes for every query, plus required effort for encoding and decoding.
That seems unattractive.
Or we send the type/format list once near the beginning of the session.
Then we need to deal with types being recreated or updated etc.
The first option is that we "lock" the types against changes (ignoring
whether that's actually possible right now). That would mean you
couldn't update an affected type/extension while a JDBC session is
active. That's no good. (Imagine connection pools with hours of server
lifetime.)
Another option is that we invalidate the session when a thus-registered
type changes. Also no good. (We don't want an extension upgrade
suddenly breaking all open connections.)
Agreed the first 2 options are not viable.
Finally, we could do it an a best-effort basis. We use binary format
for registered types, until there is some invalidation event for the
type, at which point we revert to default/text format until the end of a
session (or until another protocol message arrives re-registering the
type).
Does the driver tell the server what registered types it wants in binary ?
This should work, because the result row descriptor contains the
actual format type, and there is no guarantee that it's the same one
that was requested.
So how about that last option? I imagine a new protocol message, say,
TypeFormats, that contains a number of type/format pairs. The message
would typically be sent right after the first ReadyForQuery, gets no
response.
This seems a bit hard to control. How long do you wait for no response?
It could also be sent at any other time, but I expect that to
be less used in practice. Binary format is used for registered types if
they have binary format support functions, otherwise text continues to
be used. There is no error response for types without binary support.
(There should probably be an error response for registering a type that
does not exist.)
I'm not sure we (pgjdbc) want all types with binary support functions sent automatically. Turns out that decoding binary is sometimes slower than decoding the text and the on wire overhead isn't significant. Timestamps/dates with timezone are also interesting as the binary output does not include the timezone.
The notion of a status change message is appealing however. I used the term status change on purpose as there are other server changes we would like to be made aware of. For instance if someone changes the search path, we would like to know. I'm sort of expanding the scope here but if we are imagining ... :)
Dave
Very interesting conversation, thanks for including me Dave. Here are some thoughts from the Npgsql perspective,
Re the binary vs. text discussion... A long time ago, Npgsql became a "binary-only" driver, meaning that it never sends or receives values in text encoding, and practically always uses the extended protocol. This was because in most (all?) cases, encoding/decoding binary is more efficient, and maintaining two encoders/decoders (one for text, one for binary) made less and less sense. So by default, Npgsql just requests "all binary" in all Bind messages it sends (there's an API for the user to request text, in which case they get pure strings which they're responsible for parsing). Binary handling is implemented for almost all PG types which support it, and I've hardly seen any complaints about this for the last few years. I'd be interested in any arguments against this decision (Dave, when have you seen that decoding binary is slower than decoding text?).
Given the above, allowing the client to specify in advance which types should be in binary sounds good, but wouldn't help Npgsql much (since by default it already requests binary for everything). It would slightly help in allowing binary-unsupported types to automatically come back as text without manual user API calls, but as I wrote above this is an extremely rare scenario that people don't care much about.
> Is there really a good reason for forcing the client to issue NextResult, Describe, Execute for each of the dynamic result sets?
I very much agree - it should be possible to execute a procedure and consume all results in a single roundtrip, otherwise this is quite a perf killer.
I very much agree - it should be possible to execute a procedure and consume all results in a single roundtrip, otherwise this is quite a perf killer.
Peter, from your original message:
> Following the model from the simple query protocol, CommandComplete really means one result set complete, not the whole top-level command. ReadyForQuery means the whole command is complete. This is perhaps debatable, and interesting questions could also arise when considering what should happen in the simple query protocol when a query string consists of multiple commands each returning multiple result sets. But it doesn't really seem sensible to cater to that
Npgsql implements batching of multiple statements via the extended protocol in a similar way. In other words, the .NET API allows users to pack multiple SQL statements and execute them in one roundtrip, and Npgsql does this by sending Parse1/Bind1/Describe1/Execute1/Parse2/Bind2/Describe2/Execute2/Sync. So CommandComplete signals completion of a single statement in the batch, whereas ReadyForQuery signals completion of the entire batch. This means that the "interesting questions" mentioned above are possibly relevant to the extended protocol as well.
Regarding decoding binary vs text performance: There can be a significant performance cost to fetching the binary format over the text format for types such as text. See https://www.postgresql.org/message-id/CAMovtNoHFod2jMAKQjjxv209PCTJx5Kc66anwWvX0mEiaXwgmA%40mail.gmail.com for the previous discussion.
From the pgx driver (https://github.com/jackc/pgx) perspective:
A "use binary for these types" message sent once at the beginning of the session would not only be helpful for dynamic result sets but could simplify use of the extended protocol in general.
Upthread someone posted a page pgjdbc detailing desired changes to the backend protocol (https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md). I concur with almost everything there, but in particular the first suggestion of the backend automatically converting binary values like it does text values would be huge. That combined with the "use binary for these types" message could greatly simplify the driver side work in using the binary format.
CommandComplete vs ReadyForQuery -- pgx does the same as Npgsql in that it bundles batches multiple queries together in the extended protocol and uses CommandComplete for statement completion and ReadyForQuery for batch completion.
On Tue, Oct 20, 2020 at 9:28 AM Shay Rojansky <roji@roji.org> wrote:
Very interesting conversation, thanks for including me Dave. Here are some thoughts from the Npgsql perspective,Re the binary vs. text discussion... A long time ago, Npgsql became a "binary-only" driver, meaning that it never sends or receives values in text encoding, and practically always uses the extended protocol. This was because in most (all?) cases, encoding/decoding binary is more efficient, and maintaining two encoders/decoders (one for text, one for binary) made less and less sense. So by default, Npgsql just requests "all binary" in all Bind messages it sends (there's an API for the user to request text, in which case they get pure strings which they're responsible for parsing). Binary handling is implemented for almost all PG types which support it, and I've hardly seen any complaints about this for the last few years. I'd be interested in any arguments against this decision (Dave, when have you seen that decoding binary is slower than decoding text?).Given the above, allowing the client to specify in advance which types should be in binary sounds good, but wouldn't help Npgsql much (since by default it already requests binary for everything). It would slightly help in allowing binary-unsupported types to automatically come back as text without manual user API calls, but as I wrote above this is an extremely rare scenario that people don't care much about.> Is there really a good reason for forcing the client to issue NextResult, Describe, Execute for each of the dynamic result sets?
I very much agree - it should be possible to execute a procedure and consume all results in a single roundtrip, otherwise this is quite a perf killer.Peter, from your original message:> Following the model from the simple query protocol, CommandComplete really means one result set complete, not the whole top-level command. ReadyForQuery means the whole command is complete. This is perhaps debatable, and interesting questions could also arise when considering what should happen in the simple query protocol when a query string consists of multiple commands each returning multiple result sets. But it doesn't really seem sensible to cater to thatNpgsql implements batching of multiple statements via the extended protocol in a similar way. In other words, the .NET API allows users to pack multiple SQL statements and execute them in one roundtrip, and Npgsql does this by sending Parse1/Bind1/Describe1/Execute1/Parse2/Bind2/Describe2/Execute2/Sync. So CommandComplete signals completion of a single statement in the batch, whereas ReadyForQuery signals completion of the entire batch. This means that the "interesting questions" mentioned above are possibly relevant to the extended protocol as well.
Hi, On 2020-10-20 18:55:41 -0500, Jack Christensen wrote: > Upthread someone posted a page pgjdbc detailing desired changes to the > backend protocol ( > https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md). A lot of the stuff on there seems way beyond what can be achieved in something incrementally added to the protocol. Fair enough in an article about "v4" of the protocol. But I don't think we are - nor should we be - talking about a full new protocol version here. Instead we are talking about extending the protocol, where the extensions are opt-in. Greetings, Andres Freund
On Tue, 20 Oct 2020 at 20:09, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-10-20 18:55:41 -0500, Jack Christensen wrote:
> Upthread someone posted a page pgjdbc detailing desired changes to the
> backend protocol (
> https://github.com/pgjdbc/pgjdbc/blob/master/backend_protocol_v4_wanted_features.md).
A lot of the stuff on there seems way beyond what can be achieved in
something incrementally added to the protocol. Fair enough in an article
about "v4" of the protocol. But I don't think we are - nor should we be
- talking about a full new protocol version here. Instead we are talking
about extending the protocol, where the extensions are opt-in.
You are correct we are not talking about a whole new protocol, but why not ?
Seems to me we would have a lot more latitude to get it right if we didn't have this limitation.
Dave
Hi, On 2020-10-20 20:17:45 -0400, Dave Cramer wrote: > You are correct we are not talking about a whole new protocol, but why not ? > Seems to me we would have a lot more latitude to get it right if we didn't > have this limitation. A new protocol will face a much bigger adoption hurdle, and there's much stuff that we'll want to do that we'll have a hard time ever getting off the ground. Whereas opt-in extensions are much easier to get off the ground. Greetings, Andres Freund
On 2020-10-20 12:24, Dave Cramer wrote: > Finally, we could do it an a best-effort basis. We use binary format > for registered types, until there is some invalidation event for the > type, at which point we revert to default/text format until the end > of a > session (or until another protocol message arrives re-registering the > type). > > Does the driver tell the server what registered types it wants in binary ? Yes, the driver tells the server, "whenever you send these types, send them in binary" (all other types keep sending in text). > This should work, because the result row descriptor contains the > actual format type, and there is no guarantee that it's the same one > that was requested. > > So how about that last option? I imagine a new protocol message, say, > TypeFormats, that contains a number of type/format pairs. The message > would typically be sent right after the first ReadyForQuery, gets no > response. > > This seems a bit hard to control. How long do you wait for no response? In this design, you don't need a response. > It could also be sent at any other time, but I expect that to > be less used in practice. Binary format is used for registered > types if > they have binary format support functions, otherwise text continues to > be used. There is no error response for types without binary support. > (There should probably be an error response for registering a type that > does not exist.) > > I'm not sure we (pgjdbc) want all types with binary support functions > sent automatically. Turns out that decoding binary is sometimes slower > than decoding the text and the on wire overhead isn't significant. > Timestamps/dates with timezone are also interesting as the binary output > does not include the timezone. In this design, you pick the types you want. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-10-09 20:46, Andres Freund wrote: > Is there really a good reason for forcing the client to issue > NextResult, Describe, Execute for each of the dynamic result sets? It's > not like there's really a case for allowing the clients to skip them, > right? Why aren't we sending something more like > > S: CommandPartiallyComplete > S: RowDescription > S: DataRow... > S: CommandPartiallyComplete > S: RowDescription > S: DataRow... > ... > S: CommandComplete > C: Sync I want to post my current patch, to keep this discussion moving. There are still a number of pieces to pull together, but what I have is a self-contained functioning prototype. The interesting thing about the above message sequence is that the "CommandPartiallyComplete" isn't actually necessary. Since an Execute message normally does not issue a RowDescription response, the appearance of one is already enough to mark the beginning of a new result set. Moreover, libpq already handles this correctly, so we wouldn't need to change it at all. We might still want to add a new protocol message, for clarity perhaps, and that would probably only be a few lines of code on either side, but that would only serve for additional error checking and wouldn't actually be needed to identify what's going on. What else we need: - Think about what should happen if the Execute message specifies a row count, and what should happen during subsequent Execute messages on the same portal. I suspect that there isn't a particularly elegant answer, but we need to pick some behavior. - Some way for psql to display multiple result sets. Proposals have been made in [0] and [1]. (You need either patch or one like it for the regression tests in this patch to pass.) - Session-level default result formats setting, proposed in [2]. Not strictly necessary, but would be most sensible to coordinate these two. - We don't have a way to test the extended query protocol. I have attached my test program, but we might want to think about something more permanent. Proposals for this have already been made in [3]. - Right now, this only supports returning dynamic result sets from a top-level CALL. Specifications for passing dynamic result sets from one procedure to a calling procedure exist in the SQL standard and could be added later. (All the SQL additions in this patch are per SQL standard. DB2 appears to be the closest existing implementation.) [0]: https://www.postgresql.org/message-id/flat/4580ff7b-d610-eaeb-e06f-4d686896b93b%402ndquadrant.com [1]: https://commitfest.postgresql.org/29/2096/ [2]: https://commitfest.postgresql.org/31/2812/ [3]: https://www.postgresql.org/message-id/4f733cca-5e07-e167-8b38-05b5c9066d04%402ndQuadrant.com -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Attachment
Here is an updated patch with some merge conflicts resolved, to keep it fresh. It's still pending in the commit fest from last time. My focus right now is to work on the "psql - add SHOW_ALL_RESULTS option" patch (https://commitfest.postgresql.org/33/2096/) first, which is pretty much a prerequisite to this one. The attached patch set contains a minimal variant of that patch in 0001 and 0002, just to get this working, but disregard those for the purposes of code review. The 0003 patch contains comprehensive documentation and test changes that can explain the feature in its current form.
Attachment
On Tue, Jun 29, 2021 at 7:10 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > Here is an updated patch with some merge conflicts resolved, to keep it > fresh. It's still pending in the commit fest from last time. > > My focus right now is to work on the "psql - add SHOW_ALL_RESULTS > option" patch (https://commitfest.postgresql.org/33/2096/) first, which > is pretty much a prerequisite to this one. The attached patch set > contains a minimal variant of that patch in 0001 and 0002, just to get > this working, but disregard those for the purposes of code review. > > The 0003 patch contains comprehensive documentation and test changes > that can explain the feature in its current form. One of the patch v3-0003-Dynamic-result-sets-from-procedures.patch does not apply on HEAD, please post an updated patch for it: Hunk #1 FAILED at 57. 1 out of 1 hunk FAILED -- saving rejects to file src/include/commands/defrem.h.rej Regards, Vignesh
rebased patch set On 22.07.21 08:06, vignesh C wrote: > On Tue, Jun 29, 2021 at 7:10 PM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: >> >> Here is an updated patch with some merge conflicts resolved, to keep it >> fresh. It's still pending in the commit fest from last time. >> >> My focus right now is to work on the "psql - add SHOW_ALL_RESULTS >> option" patch (https://commitfest.postgresql.org/33/2096/) first, which >> is pretty much a prerequisite to this one. The attached patch set >> contains a minimal variant of that patch in 0001 and 0002, just to get >> this working, but disregard those for the purposes of code review. >> >> The 0003 patch contains comprehensive documentation and test changes >> that can explain the feature in its current form. > > One of the patch v3-0003-Dynamic-result-sets-from-procedures.patch > does not apply on HEAD, please post an updated patch for it: > Hunk #1 FAILED at 57. > 1 out of 1 hunk FAILED -- saving rejects to file > src/include/commands/defrem.h.rej > > Regards, > Vignesh > >
Attachment
On Mon, Aug 30, 2021 at 1:23 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
rebased patch set
On 22.07.21 08:06, vignesh C wrote:
> On Tue, Jun 29, 2021 at 7:10 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>>
>> Here is an updated patch with some merge conflicts resolved, to keep it
>> fresh. It's still pending in the commit fest from last time.
>>
>> My focus right now is to work on the "psql - add SHOW_ALL_RESULTS
>> option" patch (https://commitfest.postgresql.org/33/2096/) first, which
>> is pretty much a prerequisite to this one. The attached patch set
>> contains a minimal variant of that patch in 0001 and 0002, just to get
>> this working, but disregard those for the purposes of code review.
>>
>> The 0003 patch contains comprehensive documentation and test changes
>> that can explain the feature in its current form.
>
> One of the patch v3-0003-Dynamic-result-sets-from-procedures.patch
> does not apply on HEAD, please post an updated patch for it:
> Hunk #1 FAILED at 57.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/include/commands/defrem.h.rej
>
> Regards,
> Vignesh
>
>
Hi,
+ <term><literal>WITHOUT RETURN</literal></term>
+ <listitem>
+ <para>
+ This option is only valid for cursors defined inside a procedure.
Since there are two options listed, I think using 'These options are' would be better.
For CurrentProcedure(),
+ return InvalidOid;
+ else
+ return llast_oid(procedure_stack);
+ else
+ return llast_oid(procedure_stack);
The word 'else' can be omitted.
Cheers
Hi, On Mon, Aug 30, 2021 at 02:11:34PM -0700, Zhihong Yu wrote: > On Mon, Aug 30, 2021 at 1:23 PM Peter Eisentraut < > peter.eisentraut@enterprisedb.com> wrote: > > > rebased patch set > > + <term><literal>WITH RETURN</literal></term> > + <term><literal>WITHOUT RETURN</literal></term> > + <listitem> > + <para> > + This option is only valid for cursors defined inside a procedure. > > Since there are two options listed, I think using 'These options are' would > be better. > > For CurrentProcedure(), > > + return InvalidOid; > + else > + return llast_oid(procedure_stack); > > The word 'else' can be omitted. The cfbot reports that the patch doesn't apply anymore: http://cfbot.cputube.org/patch_36_2911.log. Since you mentioned that this patch depends on the SHOW_ALL_RESULTS psql patch which is still being worked on, I'm not expecting much activity here until the prerequirements are done. It also seems better to mark this patch as Waiting on Author as further reviews are probably not really needed for now.
On 12.01.22 11:20, Julien Rouhaud wrote: > Since you mentioned that this patch depends on the SHOW_ALL_RESULTS psql patch > which is still being worked on, I'm not expecting much activity here until the > prerequirements are done. It also seems better to mark this patch as Waiting > on Author as further reviews are probably not really needed for now. Well, a review on the general architecture and approach would have been useful. But I understand that without the psql work, it's difficult for a reviewer to even get started on this patch. It's also similarly difficult for me to keep updating it. So I'll set it to Returned with feedback for now and take it off the table. I want to get back to it when the prerequisites are more settled.
On 01.02.22 15:40, Peter Eisentraut wrote: > On 12.01.22 11:20, Julien Rouhaud wrote: >> Since you mentioned that this patch depends on the SHOW_ALL_RESULTS >> psql patch >> which is still being worked on, I'm not expecting much activity here >> until the >> prerequirements are done. It also seems better to mark this patch as >> Waiting >> on Author as further reviews are probably not really needed for now. > > Well, a review on the general architecture and approach would have been > useful. But I understand that without the psql work, it's difficult for > a reviewer to even get started on this patch. It's also similarly > difficult for me to keep updating it. So I'll set it to Returned with > feedback for now and take it off the table. I want to get back to it > when the prerequisites are more settled. Now that the psql support for multiple result sets exists, I want to revive this patch. It's the same as the last posted version, except now it doesn't require any psql changes or any weird test modifications anymore. (Old news: This patch allows declaring a cursor WITH RETURN in a procedure to make the cursor's data be returned as a result of the CALL invocation. The procedure needs to be declared with the DYNAMIC RESULT SETS attribute.)
Attachment
Hi
pá 14. 10. 2022 v 9:12 odesílatel Peter Eisentraut <peter.eisentraut@enterprisedb.com> napsal:
On 01.02.22 15:40, Peter Eisentraut wrote:
> On 12.01.22 11:20, Julien Rouhaud wrote:
>> Since you mentioned that this patch depends on the SHOW_ALL_RESULTS
>> psql patch
>> which is still being worked on, I'm not expecting much activity here
>> until the
>> prerequirements are done. It also seems better to mark this patch as
>> Waiting
>> on Author as further reviews are probably not really needed for now.
>
> Well, a review on the general architecture and approach would have been
> useful. But I understand that without the psql work, it's difficult for
> a reviewer to even get started on this patch. It's also similarly
> difficult for me to keep updating it. So I'll set it to Returned with
> feedback for now and take it off the table. I want to get back to it
> when the prerequisites are more settled.
Now that the psql support for multiple result sets exists, I want to
revive this patch. It's the same as the last posted version, except now
it doesn't require any psql changes or any weird test modifications anymore.
(Old news: This patch allows declaring a cursor WITH RETURN in a
procedure to make the cursor's data be returned as a result of the CALL
invocation. The procedure needs to be declared with the DYNAMIC RESULT
SETS attribute.)
I did a quick test of this patch, and it is working pretty well.
I have two ideas.
1. there can be possibility to set "dynamic result sets" to unknown. The behaviour of the "dynamic result sets" option is a little bit confusing. I expect the number of result sets should be exactly the same as this number. But the warning is raised only when this number is acrossed. For this implementation the correct name should be like "max dynamic result sets" or some like this. At this moment, I see this feature "dynamic result sets" more confusing, and because the effect is just a warning, then I don't see a strong benefit. I can see some benefit if I can declare so CALL will be without dynamic result sets, or with exact number of dynamic result sets or with unknown number of dynamic result sets. And if the result is not expected, then an exception should be raised (not warning).
2. Unfortunately, it doesn't work nicely with pagers. It starts a pager for one result, and waits for the end, and starts pager for the second result, and waits for the end. There is not a possibility to see all results at one time. The current behavior is correct, but I don't think it is user friendly. I think I can teach pspg to support multiple documents. But I need a more robust protocol and some separators - minimally an empty line (but some ascii control char can be safer). As second step we can introduce new psql option like PSQL_MULTI_PAGER, that can be used when possible result sets is higher than 1
Regards
Pavel
On 14.10.22 19:22, Pavel Stehule wrote: > 1. there can be possibility to set "dynamic result sets" to unknown. The > behaviour of the "dynamic result sets" option is a little bit confusing. > I expect the number of result sets should be exactly the same as this > number. But the warning is raised only when this number is acrossed. For > this implementation the correct name should be like "max dynamic result > sets" or some like this. At this moment, I see this feature "dynamic > result sets" more confusing, and because the effect is just a warning, > then I don't see a strong benefit. I can see some benefit if I can > declare so CALL will be without dynamic result sets, or with exact > number of dynamic result sets or with unknown number of dynamic result > sets. And if the result is not expected, then an exception should be > raised (not warning). All of this is specified by the SQL standard. (What I mean by that is that if we want to deviate from that, we should have strong reasons beyond "it seems a bit odd".) > 2. Unfortunately, it doesn't work nicely with pagers. It starts a pager > for one result, and waits for the end, and starts pager for the second > result, and waits for the end. There is not a possibility to see all > results at one time. The current behavior is correct, but I don't think > it is user friendly. I think I can teach pspg to support multiple > documents. But I need a more robust protocol and some separators - > minimally an empty line (but some ascii control char can be safer). As > second step we can introduce new psql option like PSQL_MULTI_PAGER, that > can be used when possible result sets is higher than 1 I think that is unrelated to this patch. Multiple result sets already exist and libpq and psql handle them. This patch introduces another way in which multiple result sets can be produced on the server, but it doesn't touch the client side. So your concerns should be added either as a new feature or possibly as a bug against existing psql functionality.
On 14.10.22 09:11, Peter Eisentraut wrote: > Now that the psql support for multiple result sets exists, I want to > revive this patch. It's the same as the last posted version, except now > it doesn't require any psql changes or any weird test modifications > anymore. > > (Old news: This patch allows declaring a cursor WITH RETURN in a > procedure to make the cursor's data be returned as a result of the CALL > invocation. The procedure needs to be declared with the DYNAMIC RESULT > SETS attribute.) I added tests using the new psql \bind command to test this functionality in the extended query protocol, which showed that this got broken since I first wrote this patch. This "blame" is on the pipeline mode in libpq patch (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659). I need to spend more time on this and figure out how to repair it. In the meantime, here is an updated patch set with the current status.
Attachment
On 2022-Nov-22, Peter Eisentraut wrote: > I added tests using the new psql \bind command to test this functionality in > the extended query protocol, which showed that this got broken since I first > wrote this patch. This "blame" is on the pipeline mode in libpq patch > (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659). I need to spend more time on > this and figure out how to repair it. In the meantime, here is an updated > patch set with the current status. I looked at this a little bit to understand why it fails with \bind. As you say, it does interact badly with pipeline mode -- more precisely, it collides with the queue handling that was added for pipeline. The problem is that in extended query mode, we "advance" the queue in PQgetResult when asyncStatus is READY -- fe-exec.c line 2110 ff. But the protocol relies on returning READY when the second RowDescriptor message is received (fe-protocol3.c line 319), so libpq gets confused and everything blows up. libpq needs the queue to stay put until all the results from that query have been consumed. If you comment out the pqCommandQueueAdvance() in fe-exec.c line 2124, your example works correctly and no longer throws a libpq error (but of course, other things break). I suppose that in order for this to work, we would have to find another way to "advance" the queue that doesn't rely on the status being PGASYNC_READY. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On 2022-Dec-21, Alvaro Herrera wrote: > I suppose that in order for this to work, we would have to find another > way to "advance" the queue that doesn't rely on the status being > PGASYNC_READY. I think the way to make this work is to increase the coupling between fe-exec.c and fe-protocol.c by making the queue advance occur when CommandComplete is received. This is likely more correct protocol-wise than what we're doing now: we would consider the command as done when the server tells us it is done, rather than relying on internal libpq state. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On 2022-Nov-22, Peter Eisentraut wrote: > I added tests using the new psql \bind command to test this functionality in > the extended query protocol, which showed that this got broken since I first > wrote this patch. This "blame" is on the pipeline mode in libpq patch > (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659). I need to spend more time on > this and figure out how to repair it. Applying this patch, your test queries seem to work correctly. This is quite WIP, especially because there's a couple of scenarios uncovered by tests that I'd like to ensure correctness about, but if you would like to continue adding tests for extended query and dynamic result sets, it may be helpful. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "How strange it is to find the words "Perl" and "saner" in such close proximity, with no apparent sense of irony. I doubt that Larry himself could have managed it." (ncm, http://lwn.net/Articles/174769/)
Attachment
On 30.01.23 14:06, Alvaro Herrera wrote: > On 2022-Nov-22, Peter Eisentraut wrote: > >> I added tests using the new psql \bind command to test this functionality in >> the extended query protocol, which showed that this got broken since I first >> wrote this patch. This "blame" is on the pipeline mode in libpq patch >> (acb7e4eb6b1c614c68a62fb3a6a5bba1af0a2659). I need to spend more time on >> this and figure out how to repair it. > > Applying this patch, your test queries seem to work correctly. Great! > This is quite WIP, especially because there's a couple of scenarios > uncovered by tests that I'd like to ensure correctness about, but if you > would like to continue adding tests for extended query and dynamic > result sets, it may be helpful. I should note that it is debatable whether my patch extends the extended query protocol or just uses it within its existing spec but in new ways. It just happened to work in old libpq versions without any changes. So you should keep that in mind as you refine your patch, since the way the protocol has been extended/creatively-used is still subject to review.
On 31.01.23 12:07, Peter Eisentraut wrote: >> Applying this patch, your test queries seem to work correctly. > > Great! > >> This is quite WIP, especially because there's a couple of scenarios >> uncovered by tests that I'd like to ensure correctness about, but if you >> would like to continue adding tests for extended query and dynamic >> result sets, it may be helpful. > > I should note that it is debatable whether my patch extends the extended > query protocol or just uses it within its existing spec but in new ways. > It just happened to work in old libpq versions without any changes. So > you should keep that in mind as you refine your patch, since the way the > protocol has been extended/creatively-used is still subject to review. After some consideration, I have an idea how to proceed with this. I have split my original patch into two incremental patches. The first patch implements the original feature, but just for the simple query protocol. (The simple query protocol already supports multiple result sets.) Attempting to return dynamic result sets using the extended query protocol will result in an error. The second patch then adds the extended query protocol support back in, but it still has the issues with libpq that we are discussing. I think this way we could have a chance to get the first part into PG16 or early into PG17, and then the second part can be worked on with less stress. This would also allow us to consider a minor protocol version bump, and the handling of binary format for dynamic result sets (like https://commitfest.postgresql.org/42/3777/), and maybe some other issues. The attached patches are the same as before, rebased over master and split up as described. I haven't done any significant work on the contents, but I will try to get the 0001 patch into a more polished state soon.
Attachment
On 20.02.23 13:58, Peter Eisentraut wrote: > The attached patches are the same as before, rebased over master and > split up as described. I haven't done any significant work on the > contents, but I will try to get the 0001 patch into a more polished > state soon. I've done a bit of work on this patch, mainly cleaned up and expanded the tests, and also added DO support, which is something that had been requested (meaning you can return result sets from DO with this facility). Here is a new version.