Home > mailing lists

Re: Disk buffering of resultsets - Mailing list pgsql-jdbc

From	Craig Ringer
Subject	Re: Disk buffering of resultsets
Date	October 6, 2014 05:42:51
Msg-id	5432019C.7000401@2ndquadrant.com Whole thread Raw
In response to	Re: Disk buffering of resultsets (Vitalii Tymchyshyn <vit@tym.im>)
List	pgsql-jdbc

Tree view

On 10/06/2014 10:22 AM, Vitalii Tymchyshyn wrote:
> Hello.
>
> As of routing I meant exactly attaching to original statement result
> belongs to and throwing an error as soon as someone get to correct point
> calling this original statement.
> As of threading I tend to agree with you. It can be revised in Java 8
> where there is a default executor that can be used.
>
> The primary problem as I can see it are OOMs on large result sets.
> Currently the workaround is to use fetchSize, but it works only in very
> limited scenarios.

Specifically, autocommit=off and only one active Statement. Right?

The PostgreSQL backend its self supports multiple open portals, but only
within the same open transaction, and if any statement causes an error
all portals are invalidated. So we can't really use that if you want
multiple *independent* statements.

The usual answer would be WITH HOLD cursors (or a portal-level
equivalent). There's a server-side impact to them, but it's probably the
first thing to try.

Failing that, as you say, we'd have to fetch the whole result set to the
client and store it off-heap, which is where the abstraction I've talked
about on this mailing list thread comes in.

An issue here is that PgJDBC currently has some restrictions around
multiple application threads using a single Connection at the same time.
If you're working with multiple statements you're quite possibly also
working with multiple threads, right?

> I can see two ways here: improve fetchSize handling (potentially setting
> default fetchSize to some value like 10000) or storing result sets out
> of heap.
> One more thing to remember is time to get first row. It would be great
> to have first fast enough  without reading, parsing and storing
> somewhere all the 1000000 rows.

... which is where the issues with threading come in, because you want
to fetch some results, return them to the client, and continue
processing the rest of the results in a helper thread in the driver.

I think we'd have to get the client app involved more directly in that,
by exposing partial fetches more directly to the client and allowing it
to ask us (possibly via a separate thread) to consume more results and
append them to the result set. We'd have to deal with the resulting
locking issues carefully.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-jdbc by date:

From: Vitalii Tymchyshyn
Date: 06 October 2014, 05:22:10
Subject: Re: Disk buffering of resultsets

From: Swapna Shetty
Date: 08 October 2014, 22:50:30
Subject: Network error

Re: Disk buffering of resultsets - Mailing list pgsql-jdbc

Previous

Next