Thread: ResultSet memory usage
Hi. I'm new to this list, so please be gentle. I've been using PostgreSQL with a Java application that needs to SELECT (without LIMIT) from a table with a lot or rows. I tested the application when the table in question had over 2 million rows, and the JVM kept running out of memory (of course I could tune the heap size settings). I don't actually need all the 2 million rows; the application logic selects some of the first rows (in a manner that I couldn't implement in the SQL query) and then stops reading the ResultSet. I checked the code of the JDBC driver. As far as I can understand org/postgresql/jdbc2/ResultSet.java, the whole result set is read into memory at once. Is there any practical way to tune the driver to read the rows a little at a time? Any plans to enhance the driver to do that? Would it be a big job to write such a patch without prior knowledge of the internals of PostgreSQL? Timo
Timo, Unfortunately this is an artifact of the way the backend works, once the request is made the backend is going to give us all the data. You can mitigate this using cursors http://www.postgresql.org/idocs/index.php?sql-declare.html Dave -----Original Message----- From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Timo Savola Sent: Friday, January 11, 2002 6:42 AM To: pgsql-jdbc@postgresql.org Subject: [JDBC] ResultSet memory usage Hi. I'm new to this list, so please be gentle. I've been using PostgreSQL with a Java application that needs to SELECT (without LIMIT) from a table with a lot or rows. I tested the application when the table in question had over 2 million rows, and the JVM kept running out of memory (of course I could tune the heap size settings). I don't actually need all the 2 million rows; the application logic selects some of the first rows (in a manner that I couldn't implement in the SQL query) and then stops reading the ResultSet. I checked the code of the JDBC driver. As far as I can understand org/postgresql/jdbc2/ResultSet.java, the whole result set is read into memory at once. Is there any practical way to tune the driver to read the rows a little at a time? Any plans to enhance the driver to do that? Would it be a big job to write such a patch without prior knowledge of the internals of PostgreSQL? Timo ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
Timo- A possible workaround- If you only need to grab a few rows is there some way to make those rows float to the top using an "order by" & then apply "limit" so you don't have to deal with the huge ResultSet? -Nick > -----Original Message----- > From: pgsql-jdbc-owner@postgresql.org > [mailto:pgsql-jdbc-owner@postgresql.org]On Behalf Of Timo Savola > Sent: Friday, January 11, 2002 6:42 AM > To: pgsql-jdbc@postgresql.org > Subject: [JDBC] ResultSet memory usage > > > Hi. I'm new to this list, so please be gentle. > > I've been using PostgreSQL with a Java application that needs to SELECT > (without LIMIT) from a table with a lot or rows. I tested the > application when the table in question had over 2 million rows, and the > JVM kept running out of memory (of course I could tune the heap size > settings). I don't actually need all the 2 million rows; the application > logic selects some of the first rows (in a manner that I couldn't > implement in the SQL query) and then stops reading the ResultSet. > > I checked the code of the JDBC driver. As far as I can understand > org/postgresql/jdbc2/ResultSet.java, the whole result set is read into > memory at once. Is there any practical way to tune the driver to read > the rows a little at a time? Any plans to enhance the driver to do that? > Would it be a big job to write such a patch without prior knowledge of > the internals of PostgreSQL? > > Timo > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
> A possible workaround- If you only need to grab a few rows is there some way > to make those rows float to the top using an "order by" & then apply "limit" > so you don't have to deal with the huge ResultSet? I'm using order by, but the point is that I can only make an educated guess for the limit parameter. And I can't calculate a "big enough" value. I need to get N first entries with duplicates removed based on one (or two) unique column(s). I can't use distinct since I need to select also other columns that shouldn't be affected by "distinct". I've thought about subselects, etc. but so far the best/cleanest approach I've come up with is to use a HashSet for the unique column values on the Java end. The down side is that I need to transfer a lot of unnecessary rows from to the application, and with PostgreSQL that means all rows. Timo
> Unfortunately this is an artifact of the way the backend works, once the > request is made the backend is going to give us all the data. Couldn't that still be implemented in the JDBC driver by reading from the socket a little at a time? Or would that introduce bigger problems, like catastrophic slowdown of other connections to the backend? Timo
Hmmm... I'm not exactly sure what postgres would do with a blocked socket, but yes I suppose it could be. Cursors work just fine in this instance, though Dave -----Original Message----- From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Timo Savola Sent: Friday, January 11, 2002 11:08 AM To: pgsql-jdbc@postgresql.org Subject: Re: [JDBC] ResultSet memory usage > Unfortunately this is an artifact of the way the backend works, once > the request is made the backend is going to give us all the data. Couldn't that still be implemented in the JDBC driver by reading from the socket a little at a time? Or would that introduce bigger problems, like catastrophic slowdown of other connections to the backend? Timo ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Fri, Jan 11, 2002 at 06:05:40PM +0200, Timo Savola wrote: > > A possible workaround- If you only need to grab a few rows is there some way > > to make those rows float to the top using an "order by" & then apply "limit" > > so you don't have to deal with the huge ResultSet? > > I'm using order by, but the point is that I can only make an educated > guess for the limit parameter. And I can't calculate a "big enough" > value. > > I need to get N first entries with duplicates removed based on one (or > two) unique column(s). I can't use distinct since I need to select also > other columns that shouldn't be affected by "distinct". I've thought > about subselects, etc. but so far the best/cleanest approach I've come > up with is to use a HashSet for the unique column values on the Java > end. The down side is that I need to transfer a lot of unnecessary rows > from to the application, and with PostgreSQL that means all rows. Hmm, PostgreSQL has a non-SQL-standard extension: 'distinct on (expr)': I think it might do exactly what you want (n.b. I haven't been following this whole thread, just say this comment) test=# select distinct inst from people order by inst; inst ------------- BCM BCM/Rice MD Anderson Rice UH UTH UTMB (7 rows) test=# select distinct inst, lastname from people order by inst; inst | lastname -------------+--------------- BCM | Beck BCM | Chiu <snip> UTH | Rodin UTMB | Gorenstein UTMB | Luxon (74 rows) test=# select distinct on (inst) inst, lastname from people order by inst; inst | lastname -------------+------------ BCM | Beck BCM/Rice | Ma MD Anderson | C. MacLeod Rice | Stewart UH | Fox UTH | Brandt UTMB | Gorenstein (7 rows) test=# select distinct on (inst) inst, lastname from people order by inst, lastname; inst | lastname -------------+------------ BCM | Beck BCM/Rice | Ma MD Anderson | Aldaz Rice | Bennett UH | Eick UTH | Boerwinkle UTMB | Gorenstein (7 rows) test=# select distinct on (inst) inst, lastname from people order by inst, lastname limit 3; inst | lastname -------------+---------- BCM | Beck BCM/Rice | Ma MD Anderson | Aldaz (3 rows) test=# Ross
> Couldn't that still be implemented in the JDBC driver by reading from > the socket a little at a time? Or would that introduce bigger problems, > like catastrophic slowdown of other connections to the backend? It can't. Consider the following example: ResultSet l_rset1 = l_stmt1.executeQuery("select a from foo"); while (l_rset1.next()) { ResultSet l_rset2 = l_stmt2.executeQuery("select b from bar where a = " + l_rset1.getInt(1)); while (l_rset2.next()) { //do something useful } } This example shows that you need to be able to use the socket to execute other queries while you are still fetching from the first. Doing what you suggest would break many applications that use jdbc. thanks, --Barry Timo Savola wrote: >>Unfortunately this is an artifact of the way the backend works, once the >>request is made the backend is going to give us all the data. >> > > Couldn't that still be implemented in the JDBC driver by reading from > the socket a little at a time? Or would that introduce bigger problems, > like catastrophic slowdown of other connections to the backend? > > Timo > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >
Timo, This question has been answered before on this list so you can check the mail archives for a good answer. But in short, the only way to do this today would be to either use the LIMIT clause in the SQL or to use the SQL cursor statements such as DECLARE CURSOR, FETCH, CLOSE, etc. The behavior you are seeing is how the database talks to client applications. When a query is executed the entire result is sent to the client at once and must be read by the client before any other opperations can be performed. There is a todo item on the jdbc todo list to begin implicitly using CURSORS to work around this limitatation but that won't happen before 7.3. thanks, --Barry Timo Savola wrote: > Hi. I'm new to this list, so please be gentle. > > I've been using PostgreSQL with a Java application that needs to SELECT > (without LIMIT) from a table with a lot or rows. I tested the > application when the table in question had over 2 million rows, and the > JVM kept running out of memory (of course I could tune the heap size > settings). I don't actually need all the 2 million rows; the application > logic selects some of the first rows (in a manner that I couldn't > implement in the SQL query) and then stops reading the ResultSet. > > I checked the code of the JDBC driver. As far as I can understand > org/postgresql/jdbc2/ResultSet.java, the whole result set is read into > memory at once. Is there any practical way to tune the driver to read > the rows a little at a time? Any plans to enhance the driver to do that? > Would it be a big job to write such a patch without prior knowledge of > the internals of PostgreSQL? > > Timo > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > >
> I need to get N first entries with duplicates removed based on one (or > two) unique column(s). I can't use distinct since I need to select also > other columns that shouldn't be affected by "distinct". I can think of (at least) two approaches that might suit your need: 1. Get the keys, then get the data ResultSet aSet1 = aStmt1.executeQuery("SELECT DISTINCT a FROM t WHERE a LIKE "+a); while (aSet1.next() and !gotEverythingNeeded) { ResultSet aSet2 = aStmt2.executeQuery("SELECT * FROM t WHERE a = "+aSet1.getString(1)); while (aSet2.next() and !gotEverythingNeeded) { // ... Get the data } } 2. Get it in small pieces If the keys aren't unique enough to avoid the second query getting to big, you can use the LIMIT keyword to sorta kinda implement your own cursor: ResultSet aSet = aStmt.executeQuery("SELECT COUNT(*) FROM t"); aSet.next(); int noOfRows = aSet.getInt(1); int offset = 0; int rowsPerFetch = 10; while (offset < noOfRows and !gotEverythingNeeded) { aSet = aStmt.executeQuery( "SELECT * "+ "FROM t "+ "ORDER BY a,b,c "+ "LIMIT "+rowsPerFetch+" "+ "OFFSET "+offset); while (aSet.next()) { // ... Get the data } aSet.close(); offset += rowsPerFetch; } Please note I haven't tested the code; it need to be refined for your specific needs, you need to close resultsets etc. It's meant to give ideas on approaches to the problem. Yours, ///Jens Carlberg
> Hmm, PostgreSQL has a non-SQL-standard extension: 'distinct on (expr)': > I think it might do exactly what you want (n.b. I haven't been following > this whole thread, just say this comment) > > test=# select distinct on (inst) inst, lastname from people > order by inst, lastname limit 3; I need to order by a column other than the unique one (inst), so this approach doesn't really help me. However, I managed to accomplish the right ordering and the usage of limit like this: select * from people where id in (select distinct on (inst) id from people order by inst, age) order by age limit 10; This is much slower than my original approach: "select * from people order by age" and then filter out all duplicates afterwards. But it should help with memory shortages on the client side. Timo
> If the keys aren't unique enough to avoid the second query getting to > big, you can use the LIMIT keyword to sorta kinda implement your own > cursor: The contents of the table may change from select to select. I think real cursors work best in this case. Timo
> Cursors work just fine in this instance, though I wrote a version of my DB layer that uses cursors. It seems to be quite fast. A lot faster than simply selecting all of the unused rows, anyway. Thanks. Timo