Thread: CommandStatus from insert returning when using a portal.
Greetings,
With a simple insert such as
INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id
if a portal is used to get the results then the CommandStatus is not returned on the execute only when the portal is closed. After looking at this more it is really after all of the data is read which is consistent if you don't use a portal, however it would be much more useful if we received the CommandStatus after the insert was completed and before the data
Obviously I am biased by the JDBC API which would like to have
PreparedStatement.execute() return the number of rows inserted without having to wait to read all of the rows returned
Dave Cramer
On Wed, Jul 12, 2023 at 1:03 PM Dave Cramer <davecramer@gmail.com> wrote:
INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING idif a portal is used to get the results then the CommandStatus
IIUC the portal is not optional if you including the RETURNING clause.
There is no CommandStatus message in the protocol, the desired information is part of the command tag returned in the CommandComplete message. You get that at the end of the command, which has been defined as when any portal produced by the command has been fully executed.
You probably should add your desire to the Version 4 protocol ToDo on the wiki.
If that ever becomes an active project working through the details of that list for desirability and feasibility would be the first thing to happen.
David J.
On Wed, Jul 12, 2023 at 1:03 PM Dave Cramer <davecramer@gmail.com> wrote: > > With a simple insert such as > > INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id > > if a portal is used to get the results then the CommandStatus is not returned on the execute only when the portal is closed.After looking at this more it is really after all of the data is read which is consistent if you don't use a portal,however it would be much more useful if we received the CommandStatus after the insert was completed and before thedata > > Obviously I am biased by the JDBC API which would like to have > > PreparedStatement.execute() return the number of rows inserted without having to wait to read all of the rows returned I believe if RETURNING clause is use, the protocol-level behaviour of INSERT is expected to match that of SELECT. If the SELECT command behaves like that (resultset followed by CommandStatus), then I'd say the INSERT RETURNING is behaving as expected. Best regards, Gurjeet http://Gurje.et
Dave Cramer
On Wed, 12 Jul 2023 at 16:31, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Jul 12, 2023 at 1:03 PM Dave Cramer <davecramer@gmail.com> wrote:INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING idif a portal is used to get the results then the CommandStatusIIUC the portal is not optional if you including the RETURNING clause.
From my testing it isn't required.
There is no CommandStatus message in the protocol, the desired information is part of the command tag returned in the CommandComplete message. You get that at the end of the command, which has been defined as when any portal produced by the command has been fully executed.
I could argue that the insert is fully completed whether I read the data or not.
You probably should add your desire to the Version 4 protocol ToDo on the wiki.
thx, will do.
Dave
Dave Cramer <davecramer@gmail.com> writes: > Obviously I am biased by the JDBC API which would like to have > PreparedStatement.execute() return the number of rows inserted > without having to wait to read all of the rows returned Umm ... you do realize that we return the rows on-the-fly? The server does not know how many rows got inserted/returned until it's run the query to completion, at which point all the data has already been sent to the client. There isn't any way to return the rowcount before the data, and it wouldn't be some trivial protocol adjustment to make that work differently. (What it *would* be is expensive, because we'd have to store those rows somewhere.) regards, tom lane
On Wed, 12 Jul 2023 at 17:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned
Umm ... you do realize that we return the rows on-the-fly?
I do realize that.
The server does not know how many rows got inserted/returned
Well I haven't looked at the code, but it seems unintuitive that adding the returning clause changes the semantics of insert.
until it's run the query to completion, at which point all
the data has already been sent to the client. There isn't
any way to return the rowcount before the data, and it wouldn't
be some trivial protocol adjustment to make that work differently.
(What it *would* be is expensive, because we'd have to store
those rows somewhere.)
I wasn't asking for that, I just want the number of rows inserted.
Thanks,
Dave
On Wed, Jul 12, 2023 at 2:59 PM Dave Cramer <davecramer@gmail.com> wrote:
On Wed, 12 Jul 2023 at 17:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned
Umm ... you do realize that we return the rows on-the-fly?I do realize that.The server does not know how many rows got inserted/returnedWell I haven't looked at the code, but it seems unintuitive that adding the returning clause changes the semantics of insert.
It doesn't have to - the insertions are always "as rows are produced", it is just that in the non-returning case the final row can be sent to /dev/null instead of the client (IOW, there is always some destination). In both cases the total number of rows inserted are only reliably known when the top executor node requests a new tuple and its immediate predecessor says "no more rows present".
David J.
Dave Cramer <davecramer@gmail.com> writes: > Obviously I am biased by the JDBC API which would like to have > PreparedStatement.execute() return the number of rows inserted > without having to wait to read all of the rows returned Huh ... just how *is* PreparedStatement.execute() supposed to behave when the statement is an INSERT RETURNING? execute() -> true getResultSet() -> the rows getMoreResults() -> false getUpdateCount() -> number inserted? It seems that would fit the portal's behavior easily enough. Or is the JDBC spec insisting on some other order? Regards, -Chap
On Wed, 12 Jul 2023 at 20:00, <chap@anastigmatix.net> wrote:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned
Huh ... just how *is* PreparedStatement.execute() supposed
to behave when the statement is an INSERT RETURNING?
It's really executeUpdate which is supposed to return the number of rows updated.
Without a cursor it returns right away as all of the results are returned by the server. However with cursor you have to wait until you fetch the rows before you can get the CommandComplete message which btw is wrong as it returns INSERT 0 0 instead of INSERT 2 0
Dave
On Wed, Jul 12, 2023 at 5:57 PM Dave Cramer <davecramer@gmail.com> wrote:
On Wed, 12 Jul 2023 at 20:00, <chap@anastigmatix.net> wrote:Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned
Huh ... just how *is* PreparedStatement.execute() supposed
to behave when the statement is an INSERT RETURNING?It's really executeUpdate which is supposed to return the number of rows updated.
Right, and executeUpdate is the wrong API method to use, in the PostgreSQL world, when executing insert/update/delete with the non-SQL-standard returning clause. executeQuery is the method to use. And execute() should behave as if executeQuery was called, i.e., return true, which it is capable of doing since it has resultSet data that it needs to handle.
The addition of returning turns the insert/update/delete into a select in terms of effective client-seen behavior.
ISTM that you are trying to make user-error less painful. While that is laudable it apparently isn't practical. They can either discard the results and get a count by omitting returning or obtain the result and derive the count by counting rows alongside whatever else they needed the returned data for.
David J.
On Wed, 12 Jul 2023 at 21:31, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Wed, Jul 12, 2023 at 5:57 PM Dave Cramer <davecramer@gmail.com> wrote:On Wed, 12 Jul 2023 at 20:00, <chap@anastigmatix.net> wrote:Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned
Huh ... just how *is* PreparedStatement.execute() supposed
to behave when the statement is an INSERT RETURNING?It's really executeUpdate which is supposed to return the number of rows updated.Right, and executeUpdate is the wrong API method to use, in the PostgreSQL world, when executing insert/update/delete with the non-SQL-standard returning clause. executeQuery is the method to use. And execute() should behave as if executeQuery was called, i.e., return true, which it is capable of doing since it has resultSet data that it needs to handle.The addition of returning turns the insert/update/delete into a select in terms of effective client-seen behavior.ISTM that you are trying to make user-error less painful. While that is laudable it apparently isn't practical. They can either discard the results and get a count by omitting returning or obtain the result and derive the count by counting rows alongside whatever else they needed the returned data for.
Any comment on why the CommandComplete is incorrect ?
It returns INSERT 0 0 if a cursor is used
Dave
On Thursday, July 13, 2023, Dave Cramer <davecramer@gmail.com> wrote:
Any comment on why the CommandComplete is incorrect ?It returns INSERT 0 0 if a cursor is used
Looking at DECLARE it is surprising that what you describe is even possible. Can you share a psql reproducer?
David J.
On Thu, 13 Jul 2023 at 10:24, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Thursday, July 13, 2023, Dave Cramer <davecramer@gmail.com> wrote:Any comment on why the CommandComplete is incorrect ?It returns INSERT 0 0 if a cursor is usedLooking at DECLARE it is surprising that what you describe is even possible. Can you share a psql reproducer?
apologies, we are using a portal, not a cursor.
Dave Cramer
On Thu, Jul 13, 2023 at 6:07 PM Dave Cramer <davecramer@gmail.com> wrote:
On Thu, 13 Jul 2023 at 10:24, David G. Johnston <david.g.johnston@gmail.com> wrote:On Thursday, July 13, 2023, Dave Cramer <davecramer@gmail.com> wrote:Any comment on why the CommandComplete is incorrect ?It returns INSERT 0 0 if a cursor is usedLooking at DECLARE it is surprising that what you describe is even possible. Can you share a psql reproducer?apologies, we are using a portal, not a cursor.
Still the same basic request of providing a reproducer - ideally in psql.
IIUC a portal has to be used for a prepared (extended query mode) result set returning query. v16 can now handle parameter binding so:
postgres=# \bind 4
postgres=# insert into ins values ($1) returning id;
id
----
4
(1 row)
INSERT 0 1
postgres=# insert into ins values ($1) returning id;
id
----
4
(1 row)
INSERT 0 1
Which gives the expected non-zero command tag row count result.
David J.
On 2023-07-12 20:57, Dave Cramer wrote: > Without a cursor it returns right away as all of the results are > returned > by the server. However with cursor you have to wait until you fetch the > rows before you can get the CommandComplete message which btw is wrong > as > it returns INSERT 0 0 instead of INSERT 2 0 To make sure I am following, was this describing a comparison of two different ways in Java, using JDBC, to perform the same operation, one of which behaves as desired while the other doesn't? If so, for my curiosity, what do both ways look like in Java? Or was it a comparison of two different operations, say one an INSERT RETURNING and the other something else? Regards, -Chap
David,
I will try to get a tcpdump file. Doing this in libpq seems challenging as I'm not aware of how to create a portal in psql.
Chap
The only difference is one instance uses a portal to fetch the results, the other (correct one) is a normal insert where all of the rows are returned immediately
this is a reproducer in Java
conn.prepareStatement("DROP TABLE IF EXISTS test_table").execute();
conn.prepareStatement("CREATE TABLE IF NOT EXISTS test_table (id SERIAL PRIMARY KEY, cnt INT NOT NULL)").execute();
for (var fetchSize : List.of(0, 1, 2, 3)) { System.out.println("FetchSize=" + fetchSize);
try (var stmt = conn.prepareStatement("INSERT INTO test_table (cnt) VALUES (1), (2) RETURNING id", RETURN_GENERATED_KEYS)) { stmt.setFetchSize(fetchSize);
var ret = stmt.executeUpdate(); System.out.println("executeUpdate result: " + ret);
var rs = stmt.getGeneratedKeys(); System.out.print("ids: "); while (rs.next()) { System.out.print(rs.getInt(1) + " "); } System.out.print("\n\n"); }
}
Dave
On Fri, 14 Jul 2023 at 12:07, <chap@anastigmatix.net> wrote:
On 2023-07-12 20:57, Dave Cramer wrote:
> Without a cursor it returns right away as all of the results are
> returned
> by the server. However with cursor you have to wait until you fetch the
> rows before you can get the CommandComplete message which btw is wrong
> as
> it returns INSERT 0 0 instead of INSERT 2 0
To make sure I am following, was this describing a comparison of
two different ways in Java, using JDBC, to perform the same operation,
one of which behaves as desired while the other doesn't? If so, for
my curiosity, what do both ways look like in Java?
Or was it a comparison of two different operations, say one
an INSERT RETURNING and the other something else?
Regards,
-Chap
On Fri, Jul 14, 2023 at 9:30 AM Dave Cramer <davecramer@gmail.com> wrote:
David,I will try to get a tcpdump file. Doing this in libpq seems challenging as I'm not aware of how to create a portal in psql.
Yeah, apparently psql does something special (like ignoring it...) with its FETCH_COUNT variable (set to 2 below as evidenced by the first query) for the insert returning case. As documented since the command itself is not select or values the test in is_select_command returns false and the branch:
else if (pset.fetch_count <= 0 || pset.gexec_flag ||
pset.crosstab_flag || !is_select_command(query))
{
/* Default fetch-it-all-and-print mode */
pset.crosstab_flag || !is_select_command(query))
{
/* Default fetch-it-all-and-print mode */
Is chosen.
Fixing that test in some manner and recompiling psql seems like it should be the easiest way to produce a core-only test case.
postgres=# select * from (Values (1),(2),(30000),(40000)) vals (v);
v---
1
2
30000
40000
(4 rows)
postgres=# \bind 5 6 70000 80000
postgres=# insert into ins values ($1),($2),($3),($4) returning id;
id
-------
5
6
70000
80000
(4 rows)
INSERT 0 4
I was hoping to see the INSERT RETURNING query have a 4 width header instead of 7.
David J.
On Fri, Jul 14, 2023 at 9:50 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
Fixing that test in some manner and recompiling psql seems like it should be the easiest way to produce a core-only test case.
Apparently not - since it (ExecQueryUsingCursor) literally wraps the query in a DECLARE CURSOR SQL Command which prohibits INSERT...
I suppose we'd have to write a psql equivalent of ExecQueryUsingPortal that iterates over via fetch to make this work...probably more than I'm willing to try.
David J.
See attached pcap file
after the execute of the portal it returns INSERT 0 0
Dave Cramer
On Fri, 14 Jul 2023 at 12:57, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Fri, Jul 14, 2023 at 9:50 AM David G. Johnston <david.g.johnston@gmail.com> wrote:Fixing that test in some manner and recompiling psql seems like it should be the easiest way to produce a core-only test case.Apparently not - since it (ExecQueryUsingCursor) literally wraps the query in a DECLARE CURSOR SQL Command which prohibits INSERT...I suppose we'd have to write a psql equivalent of ExecQueryUsingPortal that iterates over via fetch to make this work...probably more than I'm willing to try.David J.
Attachment
On 2023-07-14 12:58, Dave Cramer wrote: > See attached pcap file So if the fetch count is zero and no portal is needed, or if the fetch count exceeds the row count and the command completion follows directly with no suspension of the portal, then it comes with the correct count, but if the portal gets suspended, then the later command completion reports a zero count? Regards, -Chap
On Fri, 14 Jul 2023 at 13:39, <chap@anastigmatix.net> wrote:
On 2023-07-14 12:58, Dave Cramer wrote:
> See attached pcap file
So if the fetch count is zero and no portal is needed,
or if the fetch count exceeds the row count and the command
completion follows directly with no suspension of the portal, then
it comes with the correct count, but if the portal gets suspended,
then the later command completion reports a zero count?
Seems so, yes.
Dave
On Fri, Jul 14, 2023 at 10:39 AM <chap@anastigmatix.net> wrote:
On 2023-07-14 12:58, Dave Cramer wrote:
> See attached pcap file
So if the fetch count is zero and no portal is needed,
or if the fetch count exceeds the row count and the command
completion follows directly with no suspension of the portal, then
it comes with the correct count, but if the portal gets suspended,
then the later command completion reports a zero count?
I cannot really understand that output other than to confirm that all queries had returning and one of them showed INSERT 0 0
Is there some magic set of arguments I should be using besides: tcpdump -Ar filename ?
Because of the returning they all need a portal so far as the server is concerned and the server will obligingly send the contents of the portal back to the client.
I can definitely believe a bug exists in the intersection of a non-SELECT query and a less-than-complete fetch count specification. There doesn't seem to be any place in the core testing framework to actually test out the interaction though...I even tried using plpgsql, which lets me open/execute a plpgsql cursor with insert returning (which SQL prohibits) but we can't get access to the command tag itself there. The ROW_COUNT variable likely tracks actual rows fetched or seen (in the case of MOVE).
What I kinda suspect might be happening with a portal suspend is that the suspension loop only ends when the final fetch attempt find zero rows to return and thus the final count ends up being zero instead of the cumulative sum over all portal scans.
David J.
On 2023-07-12 21:30, David G. Johnston wrote: > Right, and executeUpdate is the wrong API method to use, in the > PostgreSQL > world, when executing insert/update/delete with the non-SQL-standard > returning clause. ... ISTM that you are trying to make user-error less > painful. In Dave's Java reproducer, no user-error has been made, because the user supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the RETURNING clause has been added by the JDBC driver. So the user expects executeUpdate to be the right method, and return the row count, and getGeneratedKeys() to then return the rows. I've seen a possibly even more interesting result using pgjdbc-ng with protocol.trace=true: FetchSize=0 <P<D<S > 1.>t.>T$>Z* <B<E<S > 2.>D.>D.>C.>Z* executeUpdate result: 2 ids: 1 2 FetchSize=1 <B<E<H > 2.>D.>s* executeUpdate result: -1 ids: 3 <E<H > D.>s* 4 <E<H > C* <C<S > 3.>Z* FetchSize=2 <B<E<H > 2.>D.>D.>s* executeUpdate result: -1 ids: 5 6 <E<H > C* <C<S > 3.>Z* FetchSize=3 <B<E<H > 2.>D.>D.>C* <C<S > 3.>Z* executeUpdate result: 2 ids: 7 8 Unless there's some interleaving of trace and stdout messages happening here, I think pgjdbc-ng is not even collecting all the returned rows in the suspended-cursor case before executeUpdate returns, but keeping the cursor around for getGeneratedKeys() to use, so executeUpdate returns -1 before even having seen the later command complete, and would still do that even if the command complete message had the right count. Regards, -Chap
On 2023-07-14 14:19, David G. Johnston wrote: > Is there some magic set of arguments I should be using besides: tcpdump > -Ar > filename ? I opened it with Wireshark, which has a pgsql protocol decoder. Regards, -Chap
On Fri, 14 Jul 2023 at 14:34, <chap@anastigmatix.net> wrote:
On 2023-07-12 21:30, David G. Johnston wrote:
> Right, and executeUpdate is the wrong API method to use, in the
> PostgreSQL
> world, when executing insert/update/delete with the non-SQL-standard
> returning clause. ... ISTM that you are trying to make user-error less
> painful.
In Dave's Java reproducer, no user-error has been made, because the user
supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the
RETURNING clause has been added by the JDBC driver. So the user expects
executeUpdate to be the right method, and return the row count, and
getGeneratedKeys() to then return the rows.
I've seen a possibly even more interesting result using pgjdbc-ng with
protocol.trace=true:
FetchSize=0
<P<D<S
> 1.>t.>T$>Z*
<B<E<S
> 2.>D.>D.>C.>Z*
executeUpdate result: 2
ids: 1 2
FetchSize=1
<B<E<H
> 2.>D.>s*
executeUpdate result: -1
ids: 3 <E<H
> D.>s*
4 <E<H
> C*
<C<S
> 3.>Z*
FetchSize=2
<B<E<H
> 2.>D.>D.>s*
executeUpdate result: -1
ids: 5 6 <E<H
> C*
<C<S
> 3.>Z*
FetchSize=3
<B<E<H
> 2.>D.>D.>C*
<C<S
> 3.>Z*
executeUpdate result: 2
ids: 7 8
Unless there's some interleaving of trace and stdout messages happening
here, I think pgjdbc-ng is not even collecting all the returned rows
in the suspended-cursor case before executeUpdate returns, but keeping
the cursor around for getGeneratedKeys() to use, so executeUpdate
returns -1 before even having seen the later command complete, and would
still do that even if the command complete message had the right count.
My guess is that pgjdbc-ng sees the -1 and doesn't bother looking any further
Either way pgjdbc-ng is a dead project so I'm not so concerned about it.
Dave
Regards,
-Chap
On Fri, Jul 14, 2023 at 11:34 AM <chap@anastigmatix.net> wrote:
On 2023-07-12 21:30, David G. Johnston wrote:
> Right, and executeUpdate is the wrong API method to use, in the
> PostgreSQL
> world, when executing insert/update/delete with the non-SQL-standard
> returning clause. ... ISTM that you are trying to make user-error less
> painful.
In Dave's Java reproducer, no user-error has been made, because the user
supplied a plain INSERT with the RETURN_GENERATED_KEYS option, and the
RETURNING clause has been added by the JDBC driver. So the user expects
executeUpdate to be the right method, and return the row count, and
getGeneratedKeys() to then return the rows.
That makes more sense, though I don't understand how the original desire of having the count appear before the actual rows would materially benefit that feature.
I agree that the documented contract of the insert command tag says it reports the size of the entire tuple store maintained by the server during the transaction instead of just the most recent count on subsequent fetches.
David J.
On 2023-07-14 14:19, David G. Johnston wrote: > Because of the returning they all need a portal so far as the server is > concerned and the server will obligingly send the contents of the > portal > back to the client. Dave's pcap file, for the fetch count 0 case, does not show any portal name used in the bind, describe, or execute messages, or any portal close message issued by the client afterward. The server may be using a portal in that case, but it seems more transparent to the client when fetch count is zero. Perhaps an easy rule would be, if the driver itself adds RETURNING because of a RETURN_GENERATED_KEYS option, it should also force the fetch count to zero and collect all the returned rows before executeUpdate returns, and then it will have the right count to return? It seems that any approach leaving any of the rows unfetched at the time executeUpdate returns might violate a caller's expectation that the whole outcome is known by that point. Regards, -Chap
On Fri, 14 Jul 2023 at 15:40, <chap@anastigmatix.net> wrote:
On 2023-07-14 14:19, David G. Johnston wrote:
> Because of the returning they all need a portal so far as the server is
> concerned and the server will obligingly send the contents of the
> portal
> back to the client.
Dave's pcap file, for the fetch count 0 case, does not show any
portal name used in the bind, describe, or execute messages, or
any portal close message issued by the client afterward. The server
may be using a portal in that case, but it seems more transparent
to the client when fetch count is zero.
There is no portal for fetch count 0.
Perhaps an easy rule would be, if the driver itself adds RETURNING
because of a RETURN_GENERATED_KEYS option, it should also force the
fetch count to zero and collect all the returned rows before
executeUpdate returns, and then it will have the right count
to return?
Well that kind of negates the whole point of using a cursor in the case where you have a large result set.
The rows are subsequently fetched in rs.next()
Solves one problem, but creates another.
Dave
"David G. Johnston" <david.g.johnston@gmail.com> writes: > I agree that the documented contract of the insert command tag says it > reports the size of the entire tuple store maintained by the server during > the transaction instead of just the most recent count on subsequent fetches. Where do you see that documented, exactly? I looked in the protocol chapter and didn't find anything definite either way. I'm quite prepared to believe there are bugs here, since this whole set of behaviors is unreachable via libpq: you can't get it to send an Execute with a count other than zero (ie, fetch all), nor is it prepared to deal with the PortalSuspended messages it'd get if it did. I think that the behavior is arising from this bit in PortalRun: switch (portal->strategy) { ... case PORTAL_ONE_RETURNING: ... /* * If we have not yet run the command, do so, storing its * results in the portal's tuplestore. But we don't do that * for the PORTAL_ONE_SELECT case. */ if (portal->strategy != PORTAL_ONE_SELECT && !portal->holdStore) FillPortalStore(portal, isTopLevel); /* * Now fetch desired portion of results. */ nprocessed = PortalRunSelect(portal, true, count, dest); /* * If the portal result contains a command tag and the caller * gave us a pointer to store it, copy it and update the * rowcount. */ if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN) { CopyQueryCompletion(qc, &portal->qc); ------>>> qc->nprocessed = nprocessed; } /* Mark portal not active */ portal->status = PORTAL_READY; /* * Since it's a forward fetch, say DONE iff atEnd is now true. */ result = portal->atEnd; The marked line is, seemingly intentionally, overriding the portal's rowcount (which ought to count the whole query result) with the number of rows processed in the current fetch. Chap's theory that that's always zero when this is being driven by JDBC seems plausible, since the query status won't be returned to the client unless we detect end-of-portal (otherwise we just send PortalSuspended). It seems plausible to me that we should just remove that marked line. Not sure about the compatibility implications, though. regards, tom lane
On 2023-07-14 15:49, Dave Cramer wrote: > On Fri, 14 Jul 2023 at 15:40, <chap@anastigmatix.net> wrote: >> Perhaps an easy rule would be, if the driver itself adds RETURNING >> because of a RETURN_GENERATED_KEYS option, it should also force the >> fetch count to zero and collect all the returned rows before >> executeUpdate returns, and then it will have the right count >> to return? > > Well that kind of negates the whole point of using a cursor in the case > where you have a large result set. > > The rows are subsequently fetched in rs.next() I guess it comes down, again, to the question of what kind of thing the API client thinks it is doing, when it issues an INSERT with the RETURN_GENERATED_KEYS option. An API client issuing a plain INSERT knows it is the kind of thing for which executeUpdate() is appropriate, and the complete success or failure outcome is known when that returns. An API client issuing its own explicit INSERT RETURNING knows it is the kind of thing for which executeQuery() is appropriate, and that some processing (and possibly ereporting) may be left to occur while working through the ResultSet. But now how about this odd hybrid, where the API client wrote a plain INSERT, but added the RETURN_GENERATED_KEYS option? The rewritten query is the kind of thing the server and the driver need to treat as a query, but to the API client it still appears the kind of thing for which executeUpdate() is suited. The generated keys can then be examined--in the form of a ResultSet--but one obtained with the special method getGeneratedKeys(), rather than the usual way of getting the ResultSet from a query. Should the client then assume the semantics of executeUpdate have changed for this case, the outcome isn't fully known yet, and errors could come to light while examining the returned keys? Or should it still assume that executeUpdate has its usual meaning, all the work is done by that point, and the ResultSet from getGeneratedKeys() is simply a convenient, familiar interface for examining what came back? I'm not sure if there's a firm answer to that one way or the other in the formal JDBC spec, but the latter seems perhaps safer to me. Regards, -Chap
On Fri, 14 Jul 2023 at 16:32, <chap@anastigmatix.net> wrote:
On 2023-07-14 15:49, Dave Cramer wrote:
> On Fri, 14 Jul 2023 at 15:40, <chap@anastigmatix.net> wrote:
>> Perhaps an easy rule would be, if the driver itself adds RETURNING
>> because of a RETURN_GENERATED_KEYS option, it should also force the
>> fetch count to zero and collect all the returned rows before
>> executeUpdate returns, and then it will have the right count
>> to return?
>
> Well that kind of negates the whole point of using a cursor in the case
> where you have a large result set.
>
> The rows are subsequently fetched in rs.next()
I guess it comes down, again, to the question of what kind of thing
the API client thinks it is doing, when it issues an INSERT with
the RETURN_GENERATED_KEYS option.
An API client issuing a plain INSERT knows it is the kind of thing
for which executeUpdate() is appropriate, and the complete success
or failure outcome is known when that returns.
An API client issuing its own explicit INSERT RETURNING knows it
is the kind of thing for which executeQuery() is appropriate, and
that some processing (and possibly ereporting) may be left to
occur while working through the ResultSet.
But now how about this odd hybrid, where the API client wrote
a plain INSERT, but added the RETURN_GENERATED_KEYS option?
The rewritten query is the kind of thing the server and the
driver need to treat as a query, but to the API client it still
appears the kind of thing for which executeUpdate() is suited.
The generated keys can then be examined--in the form of a
ResultSet--but one obtained with the special method
getGeneratedKeys(), rather than the usual way of getting the
ResultSet from a query.
Should the client then assume the semantics of executeUpdate
have changed for this case, the outcome isn't fully known yet,
and errors could come to light while examining the returned
keys? Or should it still assume that executeUpdate has its
usual meaning, all the work is done by that point, and the
ResultSet from getGeneratedKeys() is simply a convenient,
familiar interface for examining what came back?
The fly in the ointment here is when they setFetchSize and we decide to use a Portal under the covers.
I"m willing to document around this since it looks pretty unlikely that changing the behaviour in the server is a non-starter.
I'm not sure if there's a firm answer to that one way or the
other in the formal JDBC spec, but the latter seems perhaps
safer to me.
I'll leave the user to decide their own fate.
Dave
Regards,
-Chap
On 2023-07-14 17:02, Dave Cramer wrote: > The fly in the ointment here is when they setFetchSize and we decide to > use a Portal under the covers. A person might language-lawyer about whether setFetchSize even applies to the kind of thing done with executeUpdate. Hmm ... the apidoc for setFetchSize says "Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects generated by this Statement." So ... it appears to apply to any "ResultSet objects generated by this Statement", and getGeneratedKeys returns a ResultSet, so maybe setFetchSize should apply to it. OTOH, setFetchSize has @since 1.2, and getGeneratedKeys @since 1.4. At the time setFetchSize was born, the only way you could get a ResultSet was from the kind of thing you'd use executeQuery for. So when getGeneratedKeys was later added, a way of getting a ResultSet after an executeUpdate, did they consciously intend it to come under the jurisdiction of existing apidoc that concerned the fetch size of a ResultSet you wanted from executeQuery? Full employment for language lawyers. Moreover, the apidoc does say the fetch size is "a hint", and also that it applies "when more rows are needed" from the ResultSet. So it's technically not a misbehavior to disregard the hint, and you're not even disregarding the hint if you fetch all the rows at once, because then more rows can't be needed. :) Regards, -Chap
On Fri, Jul 14, 2023 at 12:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> I agree that the documented contract of the insert command tag says it
> reports the size of the entire tuple store maintained by the server during
> the transaction instead of just the most recent count on subsequent fetches.
Where do you see that documented, exactly? I looked in the protocol
chapter and didn't find anything definite either way.
On successful completion, an INSERT command returns a command tag of the form
INSERT oid count
The count is the number of rows inserted or updated.
It doesn't, nor should, have any qualifications about not applying to the returning case and definitely shouldn't change based upon use of FETCH on the unnamed portal.
I'm quite prepared to believe there are bugs here, since this whole
set of behaviors is unreachable via libpq: you can't get it to send
an Execute with a count other than zero (ie, fetch all), nor is it
prepared to deal with the PortalSuspended messages it'd get if it did.
I think that the behavior is arising from this bit in PortalRun:
if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN)
{
CopyQueryCompletion(qc, &portal->qc);
------>>> qc->nprocessed = nprocessed;
}
I came to the same conclusion. The original introduction of that line replaced string munging "SELECT " + nprocessed; so this code never even considered INSERT as being in scope and indeed wanted to return the per-run value in a fetch-list context.
I think I see why simple removal of that line is sufficient as the copied-from &portal->qc already has the result of executing the underlying insert query. That said, the paranoid approach would seem to be to keep the assignment but only use it when we aren't dealing with the returning case.
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..5e75141f0b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -775,7 +775,8 @@ PortalRun(Portal portal, long count, bool isTopLevel, bool run_once,
if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN)
{
CopyQueryCompletion(qc, &portal->qc);
- qc->nprocessed = nprocessed;
+ if (portal->strategy != PORTAL_ONE_RETURNING)
+ qc->nprocessed = nprocessed;
}
/* Mark portal not active */
index 5565f200c3..5e75141f0b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -775,7 +775,8 @@ PortalRun(Portal portal, long count, bool isTopLevel, bool run_once,
if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN)
{
CopyQueryCompletion(qc, &portal->qc);
- qc->nprocessed = nprocessed;
+ if (portal->strategy != PORTAL_ONE_RETURNING)
+ qc->nprocessed = nprocessed;
}
/* Mark portal not active */
It seems plausible to me that we should just remove that marked line.
Not sure about the compatibility implications, though.
I believe it is a bug worth fixing, save driver writers processing time getting a count when the command tag is supposed to be providing it to them using compute time already spent anyways.
David J.
On 2023-07-14 17:31, Chapman Flack wrote: > So when getGeneratedKeys was later added, a way of getting a ResultSet > after an executeUpdate, did they consciously intend it to come under > the jurisdiction of existing apidoc that concerned the fetch size of > a ResultSet you wanted from executeQuery? > ... > Moreover, the apidoc does say the fetch size is "a hint", and also that > it applies "when more rows are needed" from the ResultSet. > > So it's technically not a misbehavior to disregard the hint, and you're > not even disregarding the hint if you fetch all the rows at once, > because > then more rows can't be needed. :) ... and just to complete the thought, the apidoc for executeUpdate leaves no wiggle room for what that method returns: for DML, it has to be the row count. So if the only way to get the accurate row count is to fetch all the RETURN_GENERATED_KEYS rows at once, either to count them locally or to find the count in the completion message that follows them, that mandate seems stronger than any hint from setFetchSize. If someone really does want to do a huge INSERT and get the generated values back in increments, it might be clearer to write an explicit INSERT RETURNING and issue it with executeQuery, where everything will work as expected. I am also thinking someone might possibly allocate one Statement to use for some number of executeQuery and executeUpdate calls, and might call setFetchSize as a hint for the queries, but not expect it to have effects spilling over to executeUpdate. Regards, -Chap
On Fri, Jul 14, 2023 at 3:12 PM Chapman Flack <chap@anastigmatix.net> wrote:
If someone really does want to do a huge INSERT and get the generated
values back in increments, it might be clearer to write an explicit
INSERT RETURNING and issue it with executeQuery, where everything will
work as expected.
For PostgreSQL this is even moreso (i.e, huge means count > 1) since the order of rows in the returning clause is not promised to be related to the order of the rows as seen in the supplied insert command. A manual insert returning should ask for not only any auto-generated column but also the set of columns that provide the unique natural key.
David J.
On 2023-07-14 18:22, David G. Johnston wrote: > For PostgreSQL this is even moreso (i.e, huge means count > 1) since > the > order of rows in the returning clause is not promised to be related to > the > order of the rows as seen in the supplied insert command. A manual > insert > returning should ask for not only any auto-generated column but also > the > set of columns that provide the unique natural key. Yikes! That sounds like something that (if it's feasible) the driver's rewriting for RETURN_GENERATED_KEYS should try to do ... the driver is already expected to be smart enough to know which columns the generated keys are ... should it also try to rewrite the query in some way to get a meaningful order of the results? But then ... the apidoc for getGeneratedKeys is completely silent on the ordering anyway. I get the feeling this whole corner of the JDBC API may have been thought out only as far as issuing a single-row INSERT at a time and getting its assigned keys back. Regards, -Chap
On 2023-Jul-14, Dave Cramer wrote: > David, > > I will try to get a tcpdump file. Doing this in libpq seems challenging as > I'm not aware of how to create a portal in psql. You can probably have a look at src/test/modules/libpq_pipeline/libpq_pipeline.c as a basis to write some test code for this. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "If you want to have good ideas, you must have many ideas. Most of them will be wrong, and what you have to learn is which ones to throw away." (Linus Pauling)
On Wed, Jul 12, 2023 at 2:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dave Cramer <davecramer@gmail.com> writes:
> Obviously I am biased by the JDBC API which would like to have
> PreparedStatement.execute() return the number of rows inserted
> without having to wait to read all of the rows returned
Umm ... you do realize that we return the rows on-the-fly?
The server does not know how many rows got inserted/returned
until it's run the query to completion, at which point all
the data has already been sent to the client
Doesn't this code contradict that statement?
src/backend/tcop/pquery.c
/** If we have not yet run the command, do so, storing its
* results in the portal's tuplestore. But we don't do that
* for the PORTAL_ONE_SELECT case.
*/
if (portal->strategy != PORTAL_ONE_SELECT && !portal->holdStore)
FillPortalStore(portal, isTopLevel);
/*
* Now fetch desired portion of results.
*/
nprocessed = PortalRunSelect(portal, true, count, dest);
*/
nprocessed = PortalRunSelect(portal, true, count, dest);
Not sure we'd want to lock ourselves into this implementation but at least as it stands now we could send a message with the portal size after calling FillPortalStore and prior to calling PortalRunSelect.
David J.