Streaming ResultSet rows instead of buffering them in memory - Mailing list pgsql-jdbc

From Mikko Tiihonen
Subject Streaming ResultSet rows instead of buffering them in memory
Date
Msg-id HE1PR07MB33557AB067C76483280315D288F80@HE1PR07MB3355.eurprd07.prod.outlook.com
Whole thread Raw
Responses Re: Streaming ResultSet rows instead of buffering them in memory
List pgsql-jdbc
Hi,

I'd like to propose that the pgjdbc is modified to avoid buffering of ResultSets to memory by default, whenever it is
safeto do so. 
I created a WIP PR that works for many basic scenarios https://github.com/pgjdbc/pgjdbc/pull/1735

I tried to do a very minimal change that should still be maintainable in the long term. Basically I just moved the
protocolparsing variables into a state object and drag it along when processing next row of ResultSet. The effect on
applicationsis that queries have an effective fetch size of 1, without any of the downsides of such setting. 

The motivation for the patch is to speed up queries and lower memory usage. The queries become faster indirectly since
thequery processing is parallelized. Now we do in sequence: 
1) backend returns results and jdbc driver parses and buffers result set (which can take a while)
2) application consumes result set

With this patch the steps occur concurrently
1) backend returns results
2) application consumes result set with the jdbc driver parsing rows from tcp buffer on demand

The cursors used by enabling a fetch size do a similar thing, but
- it tells the backend to optimize the query plan for fetching only the first few rows instead of the full result set
- each batch of rows require a round-trip between backend and application during which neither do any useful work
- small fetch size causes many round-trips while large fetch size uses memory and increases latency

The bad thing is that if the application is not consuming the result set fast enough the backend cannot release its
locksas fast. But this only effects large results that do not fit entirely into the tcp buffers.  

The WIP code sometimes fails if the ResultSet is not properly closed before the next statement is executed. For many
teststhis happens in the tearDown phase. I also think that currently things might not work correctly if application
triggersqueries on the connection while processing the results, for example by requesting metadata. In that case we
haveno other option than to go back to buffering to clear the connection for reuse. 

-Mikko


pgsql-jdbc by date:

Previous
From: Vladimir Sitnikov
Date:
Subject: Re: pgjdbc: Maven -> Gradle, Java 6 => 7, JUnit 4 => 5
Next
From: Philippe Marschall
Date:
Subject: Re: pgjdbc: Maven -> Gradle, Java 6 => 7, JUnit 4 => 5