Thread: Queries with large ResultSets

Queries with large ResultSets

From
Jack Orenstein
Date:
If I'm reading the source code correctly, the PostgreSQL 7.3.4 JDBC
driver creates a ResultSet by evaluating the entire query, and putting
all the rows into a Vector. ResultSet.next() iterates over the
Vector. Is that in fact what is happening?

If so, this isn't very good for processing large ResultSets. The only
other open-source driver I could find is jxDBCon
(http://jxdbcon.sourceforge.net). Does anyone here have experience
with it?

One obvious application-level workaround is to add ORDER BY <primary
key> and LIMIT to the query, retrieve batches, keeping track of the
last primary key value, etc. That has lots of problems -- changing the
results of the overall scan (since each query runs at a different
time), and doing an index scan instead of a seq scan.

I'm considering modifying the driver, but that would put me behind on
my development schedule.

Jack Orenstein




Re: Queries with large ResultSets

From
Oliver Jowett
Date:
Jack Orenstein wrote:
> If I'm reading the source code correctly, the PostgreSQL 7.3.4 JDBC
> driver creates a ResultSet by evaluating the entire query, and putting
> all the rows into a Vector. ResultSet.next() iterates over the
> Vector. Is that in fact what is happening?

That's probably true for the 7.3.4 driver, which is quite old now.

The latest 7.4 drivers (which will work against a 7.3 server) will use a 
cursor to fetch the resultset incrementally when:
 - the statement fetchsize is non-zero; and - autocommit is off (and - the statement is TYPE_FORWARD_ONLY; and - the
queryis a single SELECT
 

There's also been a patch posted recently (check the archives) that 
allows use of cursors with scrollable resultsets.

-O


Re: Queries with large ResultSets

From
Kris Jurka
Date:

On Wed, 19 May 2004, Jack Orenstein wrote:

> If I'm reading the source code correctly, the PostgreSQL 7.3.4 JDBC
> driver creates a ResultSet by evaluating the entire query, and putting
> all the rows into a Vector. ResultSet.next() iterates over the
> Vector. Is that in fact what is happening?
>

Yes, in the 7.3 driver.  The 7.4 driver (which works quite well against a
7.3 database) will use a cursor based fetch to retrieve a chunk of rows at
a time.  There are a number of restrictions required to be able to use
this method (It only works with autocommit off and FORWARD_ONLY result
sets).  To enable this use Statement.setFetchSize to indicate how many
rows at a time you want retrieved.  Also there is a patch to the 7.5
driver to be able to retrieve subsets of the resultset for the scrollable
variety and hopefully I should get that in soon.

Kris Jurka

Re: Queries with large ResultSets

From
Andrea Aime
Date:
Alle 18:01, giovedì 20 maggio 2004, Kris Jurka ha scritto:
> On Wed, 19 May 2004, Jack Orenstein wrote:
> > If I'm reading the source code correctly, the PostgreSQL 7.3.4 JDBC
> > driver creates a ResultSet by evaluating the entire query, and putting
> > all the rows into a Vector. ResultSet.next() iterates over the
> > Vector. Is that in fact what is happening?
>
> Yes, in the 7.3 driver.  The 7.4 driver (which works quite well against a
> 7.3 database) will use a cursor based fetch to retrieve a chunk of rows at
> a time.  There are a number of restrictions required to be able to use
> this method (It only works with autocommit off and FORWARD_ONLY result
> sets).  To enable this use Statement.setFetchSize to indicate how many
> rows at a time you want retrieved.  Also there is a patch to the 7.5
> driver to be able to retrieve subsets of the resultset for the scrollable
> variety and hopefully I should get that in soon.

Ugh... those limitation are really frightening, this means we cannot fetch
big quantities of data outside of a transaction... this is a problem with
application servers like GeoServer that keep a connection pool and
need to fetch big quantities of data also outside a transaction... any hope
to see this fixed soon? Is it a driver problem or a server limitation?

Best regards
Andrea Aime

Re: Queries with large ResultSets

From
Tom Lane
Date:
Andrea Aime <andrea.aime@aliceposta.it> writes:
> Ugh... those limitation are really frightening, this means we cannot fetch
> big quantities of data outside of a transaction...

You were expecting something different?  Postgres does not do *anything*
outside of a transaction.

            regards, tom lane

Re: Queries with large ResultSets

From
Andrea Aime
Date:
Alle 20:44, giovedì 20 maggio 2004, Tom Lane ha scritto:
> Andrea Aime <andrea.aime@aliceposta.it> writes:
> > Ugh... those limitation are really frightening, this means we cannot
> > fetch big quantities of data outside of a transaction...
>
> You were expecting something different?  Postgres does not do *anything*
> outside of a transaction.

Every other database I have some experience on (sql server, sapdb) allows you
to use cursors regardless of the transaction. If you are working with a
database in autocommit mode that doesn't mean that you don't need to load
huge quantities of data... on the contrary, I would expect that it would be
more costly to load the huge amount of data inside of a transaction because
of transaction isolation.
Anyway, that's not the matter, the real problem is that our generic jdbc code
won't work properly with postgres because of this "feature" thus we will have
to subclass everything that deals with the database in order to get the
correct behaviour.

If I am in autocommit = false mode, I will have to call connection.commit()
after every write command to the database... isolation wise, to simulate a
true autocommit, will I have to call connection.commit() also after reads if
I raise the isolation level above READ_COMMITED?

Best regards
Andrea Aime


Re: Queries with large ResultSets

From
Dave Cramer
Date:
You can use a holdable cursor, and get a cursor outside of a
transaction, but beware, postgres has to store this somewhere, and it's
not update-able

Dave
On Thu, 2004-05-20 at 15:27, Andrea Aime wrote:
> Alle 20:44, giovedì 20 maggio 2004, Tom Lane ha scritto:
> > Andrea Aime <andrea.aime@aliceposta.it> writes:
> > > Ugh... those limitation are really frightening, this means we cannot
> > > fetch big quantities of data outside of a transaction...
> >
> > You were expecting something different?  Postgres does not do *anything*
> > outside of a transaction.
>
> Every other database I have some experience on (sql server, sapdb) allows you
> to use cursors regardless of the transaction. If you are working with a
> database in autocommit mode that doesn't mean that you don't need to load
> huge quantities of data... on the contrary, I would expect that it would be
> more costly to load the huge amount of data inside of a transaction because
> of transaction isolation.
> Anyway, that's not the matter, the real problem is that our generic jdbc code
> won't work properly with postgres because of this "feature" thus we will have
> to subclass everything that deals with the database in order to get the
> correct behaviour.
>
> If I am in autocommit = false mode, I will have to call connection.commit()
> after every write command to the database... isolation wise, to simulate a
> true autocommit, will I have to call connection.commit() also after reads if
> I raise the isolation level above READ_COMMITED?
>
> Best regards
> Andrea Aime
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>
>
>
> !DSPAM:40ad082858103601796679!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561


Re: Queries with large ResultSets

From
Andrea Aime
Date:
Alle 22:03, giovedì 20 maggio 2004, Dave Cramer ha scritto:
> You can use a holdable cursor, and get a cursor outside of a
> transaction, but beware, postgres has to store this somewhere, and it's
> not update-able

Store it somewhere means that it has to write the result of the query
to persistent storage and then provide scrolling over it? It's not the
best way to do things since in the case of geoserver the query result
can be several megabytes (gis data)...
Anyway, how do I get such a cursor using the JDBC driver?
Moreover, would the hack of calling commit() at the end of every operation
in order to simulate an autocommit connection work?

Best regards
Andrea Aime

Re: Queries with large ResultSets

From
Oliver Jowett
Date:
Andrea Aime wrote:

> Ugh... those limitation are really frightening, this means we cannot fetch
> big quantities of data outside of a transaction... this is a problem with
> application servers like GeoServer that keep a connection pool and
> need to fetch big quantities of data also outside a transaction... any hope
> to see this fixed soon? Is it a driver problem or a server limitation?

Cursor are implicitly closed at the end of a transaction unless they are
declared WITH HOLD. Declaring a cursor WITH HOLD has an associated cost
on the backend (namely it will copy the cursor's contents at the end of
the transaction). If autocommit is on, you have an implicit transaction
around every query, so it doesn't make sense to use a non-holdable
cursor with autocommit on -- you'd never be able to fetch any results.

This could be controllable via the JDBC3 resultset holdability methods,
but currently it isn't and all resultsets effectively default to
ResultSet.CLOSE_CURSORS_AT_COMMIT.

I don't think you want a holdable cursor for this case anyway since the
backend would end up doing a lot of unnecessary copying results around.
If you're accessing big quantities of data, the overhead of an explicit
commit() after you're done with the resultset is going to be
insignificant compared to the cost of actually transferring and handling
that data. Use something like this:

   connection.setAutoCommit(false);
   PreparedStatement stmt = connection.prepareStatement("SELECT ....");
   ResultSet rs = stmt.executeQuery();
   while (rs.next()) {
      // process data
   }
   rs.close();
   connection.commit();

-O

Re: Queries with large ResultSets

From
Andrea Aime
Date:
Alle 00:54, venerdì 21 maggio 2004, Oliver Jowett ha scritto:
> Andrea Aime wrote:

...

> I don't think you want a holdable cursor for this case anyway since the
> backend would end up doing a lot of unnecessary copying results around.
> If you're accessing big quantities of data, the overhead of an explicit
> commit() after you're done with the resultset is going to be
> insignificant compared to the cost of actually transferring and handling
> that data. Use something like this:
>
>    connection.setAutoCommit(false);
>    PreparedStatement stmt = connection.prepareStatement("SELECT ....");
>    ResultSet rs = stmt.executeQuery();
>    while (rs.next()) {
>       // process data
>    }
>    rs.close();
>    connection.commit();
>
> -O

I see... well, that's what I will need to do it seems. Thank you for the
advice :-)

Best regards
Andrea Aime


Re: Queries with large ResultSets

From
Dave Cramer
Date:
There's some confusion as to whether a cursor is materialized even
inside a transaction. It could be that complicated queries will be
stored on the disk too.

Tom ?

Dave
On Thu, 2004-05-20 at 18:54, Oliver Jowett wrote:
> Andrea Aime wrote:
>
> > Ugh... those limitation are really frightening, this means we cannot fetch
> > big quantities of data outside of a transaction... this is a problem with
> > application servers like GeoServer that keep a connection pool and
> > need to fetch big quantities of data also outside a transaction... any hope
> > to see this fixed soon? Is it a driver problem or a server limitation?
>
> Cursor are implicitly closed at the end of a transaction unless they are
> declared WITH HOLD. Declaring a cursor WITH HOLD has an associated cost
> on the backend (namely it will copy the cursor's contents at the end of
> the transaction). If autocommit is on, you have an implicit transaction
> around every query, so it doesn't make sense to use a non-holdable
> cursor with autocommit on -- you'd never be able to fetch any results.
>
> This could be controllable via the JDBC3 resultset holdability methods,
> but currently it isn't and all resultsets effectively default to
> ResultSet.CLOSE_CURSORS_AT_COMMIT.
>
> I don't think you want a holdable cursor for this case anyway since the
> backend would end up doing a lot of unnecessary copying results around.
> If you're accessing big quantities of data, the overhead of an explicit
> commit() after you're done with the resultset is going to be
> insignificant compared to the cost of actually transferring and handling
> that data. Use something like this:
>
>    connection.setAutoCommit(false);
>    PreparedStatement stmt = connection.prepareStatement("SELECT ....");
>    ResultSet rs = stmt.executeQuery();
>    while (rs.next()) {
>       // process data
>    }
>    rs.close();
>    connection.commit();
>
> -O
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>
>
>
> !DSPAM:40ad3936130991925076984!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561


Re: Queries with large ResultSets

From
Oliver Jowett
Date:
Dave Cramer wrote:
> There's some confusion as to whether a cursor is materialized even
> inside a transaction. It could be that complicated queries will be
> stored on the disk too.

The docs cover this. Executive summary: SCROLL cursors might be more
expensive than NO SCROLL cursors, depending on the query. It's a
separate issue to WITH HOLD / WITHOUT HOLD as far as I know.

http://www.postgresql.org/docs/7.4/static/sql-declare.html

-O

Re: Queries with large ResultSets

From
Tom Lane
Date:
Dave Cramer <pg@fastcrypt.com> writes:
> There's some confusion as to whether a cursor is materialized even
> inside a transaction. It could be that complicated queries will be
> stored on the disk too.

It depends on the query and on the cursor options.

If you don't say SCROLL nor WITH HOLD then the result isn't materialized
anywhere, it's just computed and delivered incrementally in response to
FETCH commands.

If you specify SCROLL and the query plan isn't one that's amenable to
being run backwards, then we materialize the result (ie, save aside each
row the first time it is read from the underlying query) so that we can
support FETCH BACKWARD.  By and large, only the simplest seqscan or
indexscan plans (no joins, aggregates, etc) are capable of being run
backwards and so can handle SCROLL without overhead.  You can use
"EXPLAIN DECLARE CURSOR" to see whether a particular query can do this
--- look at whether a Materialize node gets stuck atop the plan when
you add SCROLL.

If you specify WITH HOLD and don't close the cursor before transaction
end, then at transaction end the result is materialized: we read the
entire query output (including any rows you already read) and save it
aside to support future FETCHes.

            regards, tom lane