Thread: ResultSet memory usage

ResultSet memory usage

From
Timo Savola
Date:
Hi. I'm new to this list, so please be gentle.

I've been using PostgreSQL with a Java application that needs to SELECT
(without LIMIT) from a table with a lot or rows. I tested the
application when the table in question had over 2 million rows, and the
JVM kept running out of memory (of course I could tune the heap size
settings). I don't actually need all the 2 million rows; the application
logic selects some of the first rows (in a manner that I couldn't
implement in the SQL query) and then stops reading the ResultSet.

I checked the code of the JDBC driver. As far as I can understand
org/postgresql/jdbc2/ResultSet.java, the whole result set is read into
memory at once. Is there any practical way to tune the driver to read
the rows a little at a time? Any plans to enhance the driver to do that?
Would it be a big job to write such a patch without prior knowledge of
the internals of PostgreSQL?

Timo



Re: ResultSet memory usage

From
"Dave Cramer"
Date:
Timo,

Unfortunately this is an artifact of the way the backend works, once the
request is made the backend is going to give us all the data.

You can mitigate this using cursors
http://www.postgresql.org/idocs/index.php?sql-declare.html

Dave

-----Original Message-----
From: pgsql-jdbc-owner@postgresql.org
[mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Timo Savola
Sent: Friday, January 11, 2002 6:42 AM
To: pgsql-jdbc@postgresql.org
Subject: [JDBC] ResultSet memory usage


Hi. I'm new to this list, so please be gentle.

I've been using PostgreSQL with a Java application that needs to SELECT
(without LIMIT) from a table with a lot or rows. I tested the
application when the table in question had over 2 million rows, and the
JVM kept running out of memory (of course I could tune the heap size
settings). I don't actually need all the 2 million rows; the application
logic selects some of the first rows (in a manner that I couldn't
implement in the SQL query) and then stops reading the ResultSet.

I checked the code of the JDBC driver. As far as I can understand
org/postgresql/jdbc2/ResultSet.java, the whole result set is read into
memory at once. Is there any practical way to tune the driver to read
the rows a little at a time? Any plans to enhance the driver to do that?
Would it be a big job to write such a patch without prior knowledge of
the internals of PostgreSQL?

Timo



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly



Re: ResultSet memory usage

From
"Nick Fankhauser"
Date:
Timo-

A possible workaround- If you only need to grab a few rows is there some way
to make those rows float to the top using an "order by" & then apply "limit"
so you don't have to deal with the huge ResultSet?

-Nick



> -----Original Message-----
> From: pgsql-jdbc-owner@postgresql.org
> [mailto:pgsql-jdbc-owner@postgresql.org]On Behalf Of Timo Savola
> Sent: Friday, January 11, 2002 6:42 AM
> To: pgsql-jdbc@postgresql.org
> Subject: [JDBC] ResultSet memory usage
>
>
> Hi. I'm new to this list, so please be gentle.
>
> I've been using PostgreSQL with a Java application that needs to SELECT
> (without LIMIT) from a table with a lot or rows. I tested the
> application when the table in question had over 2 million rows, and the
> JVM kept running out of memory (of course I could tune the heap size
> settings). I don't actually need all the 2 million rows; the application
> logic selects some of the first rows (in a manner that I couldn't
> implement in the SQL query) and then stops reading the ResultSet.
>
> I checked the code of the JDBC driver. As far as I can understand
> org/postgresql/jdbc2/ResultSet.java, the whole result set is read into
> memory at once. Is there any practical way to tune the driver to read
> the rows a little at a time? Any plans to enhance the driver to do that?
> Would it be a big job to write such a patch without prior knowledge of
> the internals of PostgreSQL?
>
> Timo
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>


Re: ResultSet memory usage

From
Timo Savola
Date:
> A possible workaround- If you only need to grab a few rows is there some way
> to make those rows float to the top using an "order by" & then apply "limit"
> so you don't have to deal with the huge ResultSet?

I'm using order by, but the point is that I can only make an educated
guess for the limit parameter. And I can't calculate a "big enough"
value.

I need to get N first entries with duplicates removed based on one (or
two) unique column(s). I can't use distinct since I need to select also
other columns that shouldn't be affected by "distinct". I've thought
about subselects, etc. but so far the best/cleanest approach I've come
up with is to use a HashSet for the unique column values on the Java
end. The down side is that I need to transfer a lot of unnecessary rows
from to the application, and with PostgreSQL that means all rows.

Timo



Re: ResultSet memory usage

From
Timo Savola
Date:
> Unfortunately this is an artifact of the way the backend works, once the
> request is made the backend is going to give us all the data.

Couldn't that still be implemented in the JDBC driver by reading from
the socket a little at a time? Or would that introduce bigger problems,
like catastrophic slowdown of other connections to the backend?

Timo



Re: ResultSet memory usage

From
"Dave Cramer"
Date:
Hmmm... I'm not exactly sure what postgres would do with a blocked
socket, but yes I suppose it could be.

Cursors work just fine in this instance, though

Dave

-----Original Message-----
From: pgsql-jdbc-owner@postgresql.org
[mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Timo Savola
Sent: Friday, January 11, 2002 11:08 AM
To: pgsql-jdbc@postgresql.org
Subject: Re: [JDBC] ResultSet memory usage


> Unfortunately this is an artifact of the way the backend works, once
> the request is made the backend is going to give us all the data.

Couldn't that still be implemented in the JDBC driver by reading from
the socket a little at a time? Or would that introduce bigger problems,
like catastrophic slowdown of other connections to the backend?

Timo



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)



Re: ResultSet memory usage

From
"Ross J. Reedstrom"
Date:
On Fri, Jan 11, 2002 at 06:05:40PM +0200, Timo Savola wrote:
> > A possible workaround- If you only need to grab a few rows is there some way
> > to make those rows float to the top using an "order by" & then apply "limit"
> > so you don't have to deal with the huge ResultSet?
>
> I'm using order by, but the point is that I can only make an educated
> guess for the limit parameter. And I can't calculate a "big enough"
> value.
>
> I need to get N first entries with duplicates removed based on one (or
> two) unique column(s). I can't use distinct since I need to select also
> other columns that shouldn't be affected by "distinct". I've thought
> about subselects, etc. but so far the best/cleanest approach I've come
> up with is to use a HashSet for the unique column values on the Java
> end. The down side is that I need to transfer a lot of unnecessary rows
> from to the application, and with PostgreSQL that means all rows.

Hmm, PostgreSQL has a non-SQL-standard extension: 'distinct on (expr)':
I think it might do exactly what you want (n.b. I haven't been following
this whole thread, just say this comment)

test=# select distinct inst from people order by inst;
    inst
-------------
 BCM
 BCM/Rice
 MD Anderson
 Rice
 UH
 UTH
 UTMB
(7 rows)

test=# select distinct inst, lastname from people order by inst;
    inst     |   lastname
-------------+---------------
 BCM         | Beck
 BCM         | Chiu
<snip>
 UTH         | Rodin
 UTMB        | Gorenstein
 UTMB        | Luxon
(74 rows)

test=#  select distinct on (inst) inst, lastname from people order by inst;
    inst     |  lastname
-------------+------------
 BCM         | Beck
 BCM/Rice    | Ma
 MD Anderson | C. MacLeod
 Rice        | Stewart
 UH          | Fox
 UTH         | Brandt
 UTMB        | Gorenstein
(7 rows)

test=# select distinct on (inst) inst, lastname from people order by inst, lastname;
    inst     |  lastname
-------------+------------
 BCM         | Beck
 BCM/Rice    | Ma
 MD Anderson | Aldaz
 Rice        | Bennett
 UH          | Eick
 UTH         | Boerwinkle
 UTMB        | Gorenstein
(7 rows)

test=# select distinct on (inst) inst, lastname from people order by inst, lastname limit 3;
    inst     | lastname
-------------+----------
 BCM         | Beck
 BCM/Rice    | Ma
 MD Anderson | Aldaz
(3 rows)

test=#

Ross

Re: ResultSet memory usage

From
Barry Lind
Date:
 > Couldn't that still be implemented in the JDBC driver by reading from
 > the socket a little at a time? Or would that introduce bigger problems,
 > like catastrophic slowdown of other connections to the backend?

It can't.  Consider the following example:

ResultSet l_rset1 = l_stmt1.executeQuery("select a from foo");
while (l_rset1.next()) {
   ResultSet l_rset2 = l_stmt2.executeQuery("select b from bar where a =
" + l_rset1.getInt(1));
   while (l_rset2.next()) {
     //do something useful
   }
}

This example shows that you need to be able to use the socket to execute
other queries while you are still fetching from the first.  Doing what
you suggest would break many applications that use jdbc.

thanks,
--Barry


Timo Savola wrote:

>>Unfortunately this is an artifact of the way the backend works, once the
>>request is made the backend is going to give us all the data.
>>
>
> Couldn't that still be implemented in the JDBC driver by reading from
> the socket a little at a time? Or would that introduce bigger problems,
> like catastrophic slowdown of other connections to the backend?
>
> Timo
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>



Re: ResultSet memory usage

From
Barry Lind
Date:
Timo,

This question has been answered before on this list so you can check the
mail archives for a good answer.  But in short, the only way to do this
today would be to either use the LIMIT clause in the SQL or to use the
SQL cursor statements such as DECLARE CURSOR, FETCH, CLOSE, etc.

The behavior you are seeing is how the database talks to client
applications.  When a query is executed the entire result is sent to the
client at once and must be read by the client before any other
opperations can be performed.  There is a todo item on the jdbc todo
list to begin implicitly using CURSORS to work around this limitatation
but that won't happen before 7.3.

thanks,
--Barry


Timo Savola wrote:

> Hi. I'm new to this list, so please be gentle.
>
> I've been using PostgreSQL with a Java application that needs to SELECT
> (without LIMIT) from a table with a lot or rows. I tested the
> application when the table in question had over 2 million rows, and the
> JVM kept running out of memory (of course I could tune the heap size
> settings). I don't actually need all the 2 million rows; the application
> logic selects some of the first rows (in a manner that I couldn't
> implement in the SQL query) and then stops reading the ResultSet.
>
> I checked the code of the JDBC driver. As far as I can understand
> org/postgresql/jdbc2/ResultSet.java, the whole result set is read into
> memory at once. Is there any practical way to tune the driver to read
> the rows a little at a time? Any plans to enhance the driver to do that?
> Would it be a big job to write such a patch without prior knowledge of
> the internals of PostgreSQL?
>
> Timo
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>
>



Re: ResultSet memory usage

From
Jens Carlberg
Date:
> I need to get N first entries with duplicates removed based on one (or
> two) unique column(s). I can't use distinct since I need to select also
> other columns that shouldn't be affected by "distinct".

I can think of (at least) two approaches that might suit your need:

1. Get the keys, then get the data

ResultSet aSet1 =
  aStmt1.executeQuery("SELECT DISTINCT a FROM t WHERE a LIKE "+a);

while (aSet1.next() and !gotEverythingNeeded) {
  ResultSet aSet2 =
    aStmt2.executeQuery("SELECT * FROM t WHERE a =
"+aSet1.getString(1));

  while (aSet2.next() and !gotEverythingNeeded) {
    // ... Get the data
  }
}


2. Get it in small pieces

If the keys aren't unique enough to avoid the second query getting to
big, you can use the LIMIT keyword to sorta kinda implement your own
cursor:

ResultSet aSet = aStmt.executeQuery("SELECT COUNT(*) FROM t");
aSet.next();
int noOfRows = aSet.getInt(1);

int offset = 0;
int rowsPerFetch = 10;

while (offset < noOfRows and !gotEverythingNeeded) {
  aSet = aStmt.executeQuery(
    "SELECT * "+
    "FROM t "+
    "ORDER BY a,b,c "+
    "LIMIT "+rowsPerFetch+" "+
    "OFFSET "+offset);

  while (aSet.next()) {
    // ... Get the data
  }

  aSet.close();
  offset += rowsPerFetch;
}


Please note I haven't tested the code; it need to be refined for your
specific needs, you need to close resultsets etc. It's meant to give
ideas on approaches to the problem.

Yours,
///Jens Carlberg

Re: ResultSet memory usage

From
Timo Savola
Date:
> Hmm, PostgreSQL has a non-SQL-standard extension: 'distinct on (expr)':
> I think it might do exactly what you want (n.b. I haven't been following
> this whole thread, just say this comment)
>
> test=# select distinct on (inst) inst, lastname from people
>        order by inst, lastname limit 3;

I need to order by a column other than the unique one (inst), so this
approach doesn't really help me. However, I managed to accomplish the
right ordering and the usage of limit like this:

    select * from people
    where id in
        (select distinct on (inst) id from people order by inst, age)
    order by age
    limit 10;

This is much slower than my original approach: "select * from people
order by age" and then filter out all duplicates afterwards. But it
should help with memory shortages on the client side.

Timo



Re: ResultSet memory usage

From
Timo Savola
Date:
> If the keys aren't unique enough to avoid the second query getting to
> big, you can use the LIMIT keyword to sorta kinda implement your own
> cursor:

The contents of the table may change from select to select. I think real
cursors work best in this case.

Timo


Re: ResultSet memory usage

From
Timo Savola
Date:
> Cursors work just fine in this instance, though

I wrote a version of my DB layer that uses cursors. It seems to be quite
fast. A lot faster than simply selecting all of the unused rows, anyway.
Thanks.

Timo