Re: Disk buffering of resultsets - Mailing list pgsql-jdbc

From Vitalii Tymchyshyn
Subject Re: Disk buffering of resultsets
Date
Msg-id CABWW-d3jpnTLeY3mrRJtNW+2NNN21kPh=rXDXowOkB5qYMyVfw@mail.gmail.com
Whole thread Raw
In response to Re: Disk buffering of resultsets  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Disk buffering of resultsets  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-jdbc
Hello.

As of routing I meant exactly attaching to original statement result belongs to and throwing an error as soon as someone get to correct point calling this original statement.
As of threading I tend to agree with you. It can be revised in Java 8 where there is a default executor that can be used.

The primary problem as I can see it are OOMs on large result sets. Currently the workaround is to use fetchSize, but it works only in very limited scenarios. 
I can see two ways here: improve fetchSize handling (potentially setting default fetchSize to some value like 10000) or storing result sets out of heap.
One more thing to remember is time to get first row. It would be great to have first fast enough  without reading, parsing and storing somewhere all the 1000000 rows.

Best regards, Vitalii Tymchyshyn


2014-10-05 21:53 GMT-04:00 Craig Ringer <craig@2ndquadrant.com>:
On 10/05/2014 03:16 AM, Vitalii Tymchyshyn wrote:
> Well, the exception in this case should be "routed" to the statement
> that run the problematic query. Next one should get something only if
> connection became ususable as a result.

You can't do that, the logic flow and timing are all wrong.

The opportunity to throw an exception at the right place is gone and
past by the time you get to this point. You can't go back in the code
and throw an exception at some prior point of execution.

If the statement still exists you can attach the exception that would be
thrown to it such that you throw it next time somebody calls a method on
that statement or its result set though.

>> It also
>> makes no sense, as when you execute a new statement, the resultset of
>> the prior statement is automatically closed.
>
> Do they? I think they are closed only for the same statement object.
> Different statement may try to reuse the connection.

Yay, specification reading time.

https://jcp.org/aboutJava/communityprocess/final/jsr221/

The JDBC spec is one of the less awful Java specs, thankfully. (Whatever
you do, do not attempt to read the JSF2 specification).

Short version: you're right, you can have multiple open statements, each
with a valid open resultset. The JDBC implementation is free to decide
how it does this based on the capabilities of the database.

We're allowed to close all statements and result sets at commit time,
either implicit autocommit or explicit commit. If a user wants to keep a
resultset past that time they must set it as a holdable resultset with
the HOLD_CURSORS_OVER_COMMIT flag.






Detail of relevant spec sections:

13.1.1 "Creating statements":

> Each Connection object can create multiple Statement objects that may
> be used concurrently by the program.

 13.1.4 (pdf page 117)

> An application calls the method Statement.close to indicate that it has finished
> processing a statement. All Statement objects will be closed when the connection
> that created them is closed. [...]
>
> Closing a Statement object will close and invalidate any instances of ResultSet
> produced by that Statement object. [...]

and 15.1 "Result Sets", particularly 15.1.3 "Resultset Holdablity"

> Calling the method Connection.commit can close the ResultSet objects >
that have been created during the current transaction.

and 15.2.5 "Closing a resultset object":

A ResultSet object is explicitly closed when
* The close method on the ResultSet is executed, thereby releasing any
external resources
* The Statement or Connection object that produced the ResultSet is
explictly closed
A ResultSet object is implicitly closed when
* The associated Statement object is re-executed
* The ResultSet is created with a Holdability of
CLOSE_CURSORS_AT_COMMIT and an implicit or explicit commit occurs

Also, note that PgJDBC declares that:

"For Select statements, the statement is complete when the associated
result set is closed."

so in autocommit we're allowed to keep a transaction open with a cursor
streaming results until the resultset is closed.


>> > - do it in background (a little conflicting to postponing, but more
>> > thinking is needed). Return first fetchSize rows and start copying
>> > network to disk in background thread.
>>
>> You don't want to go there. It's horribly complicated to work with
>> background threads portably in the JDBC driver. For an example, see the
>> recent discussion of Timer handling.
>
> But we've already got separate threads. Why can't we have some more?

The threading we're already doing is causing issues.

Threading is very different in Java SE and Java EE environments. Lots of
care is required to cope with driver unloads/reloads, avoiding
classloader leaks, etc.

>> I'm pretty sure this is already possible in PgJDBC (when you set a fetch
>> size) though I'd need to write a test case and do some code reading to
>> be totally sure.
>
> As far as I understand it's not, and by implementing this we could solve
> a lot of issues for large result sets.

I think we might have a different idea of what "this" is.

Perhaps it would be helpful if you described the underlying problem
you're trying to solve? We've kind of started at the solution, without
really defining the problem the solution is for.

From what I can tell I think you might be trying to make holdable
resultsets in autocommit mode more efficient by implementing lazy
resultset fetching.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-jdbc by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Disk buffering of resultsets
Next
From: Craig Ringer
Date:
Subject: Re: Disk buffering of resultsets