Thread: Queries with large ResultSets
If I'm reading the source code correctly, the PostgreSQL 7.3.4 JDBC driver creates a ResultSet by evaluating the entire query, and putting all the rows into a Vector. ResultSet.next() iterates over the Vector. Is that in fact what is happening? If so, this isn't very good for processing large ResultSets. The only other open-source driver I could find is jxDBCon (http://jxdbcon.sourceforge.net). Does anyone here have experience with it? One obvious application-level workaround is to add ORDER BY <primary key> and LIMIT to the query, retrieve batches, keeping track of the last primary key value, etc. That has lots of problems -- changing the results of the overall scan (since each query runs at a different time), and doing an index scan instead of a seq scan. I'm considering modifying the driver, but that would put me behind on my development schedule. Jack Orenstein
Jack Orenstein wrote: > If I'm reading the source code correctly, the PostgreSQL 7.3.4 JDBC > driver creates a ResultSet by evaluating the entire query, and putting > all the rows into a Vector. ResultSet.next() iterates over the > Vector. Is that in fact what is happening? That's probably true for the 7.3.4 driver, which is quite old now. The latest 7.4 drivers (which will work against a 7.3 server) will use a cursor to fetch the resultset incrementally when: - the statement fetchsize is non-zero; and - autocommit is off (and - the statement is TYPE_FORWARD_ONLY; and - the queryis a single SELECT There's also been a patch posted recently (check the archives) that allows use of cursors with scrollable resultsets. -O
On Wed, 19 May 2004, Jack Orenstein wrote: > If I'm reading the source code correctly, the PostgreSQL 7.3.4 JDBC > driver creates a ResultSet by evaluating the entire query, and putting > all the rows into a Vector. ResultSet.next() iterates over the > Vector. Is that in fact what is happening? > Yes, in the 7.3 driver. The 7.4 driver (which works quite well against a 7.3 database) will use a cursor based fetch to retrieve a chunk of rows at a time. There are a number of restrictions required to be able to use this method (It only works with autocommit off and FORWARD_ONLY result sets). To enable this use Statement.setFetchSize to indicate how many rows at a time you want retrieved. Also there is a patch to the 7.5 driver to be able to retrieve subsets of the resultset for the scrollable variety and hopefully I should get that in soon. Kris Jurka
Alle 18:01, giovedì 20 maggio 2004, Kris Jurka ha scritto: > On Wed, 19 May 2004, Jack Orenstein wrote: > > If I'm reading the source code correctly, the PostgreSQL 7.3.4 JDBC > > driver creates a ResultSet by evaluating the entire query, and putting > > all the rows into a Vector. ResultSet.next() iterates over the > > Vector. Is that in fact what is happening? > > Yes, in the 7.3 driver. The 7.4 driver (which works quite well against a > 7.3 database) will use a cursor based fetch to retrieve a chunk of rows at > a time. There are a number of restrictions required to be able to use > this method (It only works with autocommit off and FORWARD_ONLY result > sets). To enable this use Statement.setFetchSize to indicate how many > rows at a time you want retrieved. Also there is a patch to the 7.5 > driver to be able to retrieve subsets of the resultset for the scrollable > variety and hopefully I should get that in soon. Ugh... those limitation are really frightening, this means we cannot fetch big quantities of data outside of a transaction... this is a problem with application servers like GeoServer that keep a connection pool and need to fetch big quantities of data also outside a transaction... any hope to see this fixed soon? Is it a driver problem or a server limitation? Best regards Andrea Aime
Andrea Aime <andrea.aime@aliceposta.it> writes: > Ugh... those limitation are really frightening, this means we cannot fetch > big quantities of data outside of a transaction... You were expecting something different? Postgres does not do *anything* outside of a transaction. regards, tom lane
Alle 20:44, giovedì 20 maggio 2004, Tom Lane ha scritto: > Andrea Aime <andrea.aime@aliceposta.it> writes: > > Ugh... those limitation are really frightening, this means we cannot > > fetch big quantities of data outside of a transaction... > > You were expecting something different? Postgres does not do *anything* > outside of a transaction. Every other database I have some experience on (sql server, sapdb) allows you to use cursors regardless of the transaction. If you are working with a database in autocommit mode that doesn't mean that you don't need to load huge quantities of data... on the contrary, I would expect that it would be more costly to load the huge amount of data inside of a transaction because of transaction isolation. Anyway, that's not the matter, the real problem is that our generic jdbc code won't work properly with postgres because of this "feature" thus we will have to subclass everything that deals with the database in order to get the correct behaviour. If I am in autocommit = false mode, I will have to call connection.commit() after every write command to the database... isolation wise, to simulate a true autocommit, will I have to call connection.commit() also after reads if I raise the isolation level above READ_COMMITED? Best regards Andrea Aime
You can use a holdable cursor, and get a cursor outside of a transaction, but beware, postgres has to store this somewhere, and it's not update-able Dave On Thu, 2004-05-20 at 15:27, Andrea Aime wrote: > Alle 20:44, giovedì 20 maggio 2004, Tom Lane ha scritto: > > Andrea Aime <andrea.aime@aliceposta.it> writes: > > > Ugh... those limitation are really frightening, this means we cannot > > > fetch big quantities of data outside of a transaction... > > > > You were expecting something different? Postgres does not do *anything* > > outside of a transaction. > > Every other database I have some experience on (sql server, sapdb) allows you > to use cursors regardless of the transaction. If you are working with a > database in autocommit mode that doesn't mean that you don't need to load > huge quantities of data... on the contrary, I would expect that it would be > more costly to load the huge amount of data inside of a transaction because > of transaction isolation. > Anyway, that's not the matter, the real problem is that our generic jdbc code > won't work properly with postgres because of this "feature" thus we will have > to subclass everything that deals with the database in order to get the > correct behaviour. > > If I am in autocommit = false mode, I will have to call connection.commit() > after every write command to the database... isolation wise, to simulate a > true autocommit, will I have to call connection.commit() also after reads if > I raise the isolation level above READ_COMMITED? > > Best regards > Andrea Aime > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > > > !DSPAM:40ad082858103601796679! > > -- Dave Cramer 519 939 0336 ICQ # 14675561
Alle 22:03, giovedì 20 maggio 2004, Dave Cramer ha scritto: > You can use a holdable cursor, and get a cursor outside of a > transaction, but beware, postgres has to store this somewhere, and it's > not update-able Store it somewhere means that it has to write the result of the query to persistent storage and then provide scrolling over it? It's not the best way to do things since in the case of geoserver the query result can be several megabytes (gis data)... Anyway, how do I get such a cursor using the JDBC driver? Moreover, would the hack of calling commit() at the end of every operation in order to simulate an autocommit connection work? Best regards Andrea Aime
Andrea Aime wrote: > Ugh... those limitation are really frightening, this means we cannot fetch > big quantities of data outside of a transaction... this is a problem with > application servers like GeoServer that keep a connection pool and > need to fetch big quantities of data also outside a transaction... any hope > to see this fixed soon? Is it a driver problem or a server limitation? Cursor are implicitly closed at the end of a transaction unless they are declared WITH HOLD. Declaring a cursor WITH HOLD has an associated cost on the backend (namely it will copy the cursor's contents at the end of the transaction). If autocommit is on, you have an implicit transaction around every query, so it doesn't make sense to use a non-holdable cursor with autocommit on -- you'd never be able to fetch any results. This could be controllable via the JDBC3 resultset holdability methods, but currently it isn't and all resultsets effectively default to ResultSet.CLOSE_CURSORS_AT_COMMIT. I don't think you want a holdable cursor for this case anyway since the backend would end up doing a lot of unnecessary copying results around. If you're accessing big quantities of data, the overhead of an explicit commit() after you're done with the resultset is going to be insignificant compared to the cost of actually transferring and handling that data. Use something like this: connection.setAutoCommit(false); PreparedStatement stmt = connection.prepareStatement("SELECT ...."); ResultSet rs = stmt.executeQuery(); while (rs.next()) { // process data } rs.close(); connection.commit(); -O
Alle 00:54, venerdì 21 maggio 2004, Oliver Jowett ha scritto: > Andrea Aime wrote: ... > I don't think you want a holdable cursor for this case anyway since the > backend would end up doing a lot of unnecessary copying results around. > If you're accessing big quantities of data, the overhead of an explicit > commit() after you're done with the resultset is going to be > insignificant compared to the cost of actually transferring and handling > that data. Use something like this: > > connection.setAutoCommit(false); > PreparedStatement stmt = connection.prepareStatement("SELECT ...."); > ResultSet rs = stmt.executeQuery(); > while (rs.next()) { > // process data > } > rs.close(); > connection.commit(); > > -O I see... well, that's what I will need to do it seems. Thank you for the advice :-) Best regards Andrea Aime
There's some confusion as to whether a cursor is materialized even inside a transaction. It could be that complicated queries will be stored on the disk too. Tom ? Dave On Thu, 2004-05-20 at 18:54, Oliver Jowett wrote: > Andrea Aime wrote: > > > Ugh... those limitation are really frightening, this means we cannot fetch > > big quantities of data outside of a transaction... this is a problem with > > application servers like GeoServer that keep a connection pool and > > need to fetch big quantities of data also outside a transaction... any hope > > to see this fixed soon? Is it a driver problem or a server limitation? > > Cursor are implicitly closed at the end of a transaction unless they are > declared WITH HOLD. Declaring a cursor WITH HOLD has an associated cost > on the backend (namely it will copy the cursor's contents at the end of > the transaction). If autocommit is on, you have an implicit transaction > around every query, so it doesn't make sense to use a non-holdable > cursor with autocommit on -- you'd never be able to fetch any results. > > This could be controllable via the JDBC3 resultset holdability methods, > but currently it isn't and all resultsets effectively default to > ResultSet.CLOSE_CURSORS_AT_COMMIT. > > I don't think you want a holdable cursor for this case anyway since the > backend would end up doing a lot of unnecessary copying results around. > If you're accessing big quantities of data, the overhead of an explicit > commit() after you're done with the resultset is going to be > insignificant compared to the cost of actually transferring and handling > that data. Use something like this: > > connection.setAutoCommit(false); > PreparedStatement stmt = connection.prepareStatement("SELECT ...."); > ResultSet rs = stmt.executeQuery(); > while (rs.next()) { > // process data > } > rs.close(); > connection.commit(); > > -O > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > > > > !DSPAM:40ad3936130991925076984! > > -- Dave Cramer 519 939 0336 ICQ # 14675561
Dave Cramer wrote: > There's some confusion as to whether a cursor is materialized even > inside a transaction. It could be that complicated queries will be > stored on the disk too. The docs cover this. Executive summary: SCROLL cursors might be more expensive than NO SCROLL cursors, depending on the query. It's a separate issue to WITH HOLD / WITHOUT HOLD as far as I know. http://www.postgresql.org/docs/7.4/static/sql-declare.html -O
Dave Cramer <pg@fastcrypt.com> writes: > There's some confusion as to whether a cursor is materialized even > inside a transaction. It could be that complicated queries will be > stored on the disk too. It depends on the query and on the cursor options. If you don't say SCROLL nor WITH HOLD then the result isn't materialized anywhere, it's just computed and delivered incrementally in response to FETCH commands. If you specify SCROLL and the query plan isn't one that's amenable to being run backwards, then we materialize the result (ie, save aside each row the first time it is read from the underlying query) so that we can support FETCH BACKWARD. By and large, only the simplest seqscan or indexscan plans (no joins, aggregates, etc) are capable of being run backwards and so can handle SCROLL without overhead. You can use "EXPLAIN DECLARE CURSOR" to see whether a particular query can do this --- look at whether a Materialize node gets stuck atop the plan when you add SCROLL. If you specify WITH HOLD and don't close the cursor before transaction end, then at transaction end the result is materialized: we read the entire query output (including any rows you already read) and save it aside to support future FETCHes. regards, tom lane