Re: Disk buffering of resultsets - Mailing list pgsql-jdbc

From Craig Ringer
Subject Re: Disk buffering of resultsets
Date
Msg-id 5432019C.7000401@2ndquadrant.com
Whole thread Raw
In response to Re: Disk buffering of resultsets  (Vitalii Tymchyshyn <vit@tym.im>)
List pgsql-jdbc
On 10/06/2014 10:22 AM, Vitalii Tymchyshyn wrote:
> Hello.
>
> As of routing I meant exactly attaching to original statement result
> belongs to and throwing an error as soon as someone get to correct point
> calling this original statement.
> As of threading I tend to agree with you. It can be revised in Java 8
> where there is a default executor that can be used.
>
> The primary problem as I can see it are OOMs on large result sets.
> Currently the workaround is to use fetchSize, but it works only in very
> limited scenarios.

Specifically, autocommit=off and only one active Statement. Right?

The PostgreSQL backend its self supports multiple open portals, but only
within the same open transaction, and if any statement causes an error
all portals are invalidated. So we can't really use that if you want
multiple *independent* statements.

The usual answer would be WITH HOLD cursors (or a portal-level
equivalent). There's a server-side impact to them, but it's probably the
first thing to try.

Failing that, as you say, we'd have to fetch the whole result set to the
client and store it off-heap, which is where the abstraction I've talked
about on this mailing list thread comes in.

An issue here is that PgJDBC currently has some restrictions around
multiple application threads using a single Connection at the same time.
If you're working with multiple statements you're quite possibly also
working with multiple threads, right?

> I can see two ways here: improve fetchSize handling (potentially setting
> default fetchSize to some value like 10000) or storing result sets out
> of heap.
> One more thing to remember is time to get first row. It would be great
> to have first fast enough  without reading, parsing and storing
> somewhere all the 1000000 rows.

... which is where the issues with threading come in, because you want
to fetch some results, return them to the client, and continue
processing the rest of the results in a helper thread in the driver.

I think we'd have to get the client app involved more directly in that,
by exposing partial fetches more directly to the client and allowing it
to ask us (possibly via a separate thread) to consume more results and
append them to the result set. We'd have to deal with the resulting
locking issues carefully.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-jdbc by date:

Previous
From: Vitalii Tymchyshyn
Date:
Subject: Re: Disk buffering of resultsets
Next
From: Swapna Shetty
Date:
Subject: Network error