Thread: Retrieving ResultSets with millions of rows - OutOfMemoryError

Retrieving ResultSets with millions of rows - OutOfMemoryError

From
Doug Fields
Date:
Hello,

I've just come to the startling realization that the 7.2 JDBC code loads
every single row in a ResultSet into a Vector. (core/QueryExecutor.java)

Unfortunately, I run some queries which return millions of rows, each row
which could well be 100-1000 bytes and more.

Hence, I get an OutOfMemoryError.

For some queries, there's an obvious workaround: issue the query with an
appropriate suffix "ORDER BY <something> LIMIT <something> OFFSET
<something>" several times in succession. This will, of course, work mostly
fine assuming it's a simple single-table query with an appropriate index
(such as a primary key).

However, some of my queries are complex one-to-many joins with no
particular ordering (or no index for quick ordering). These would be much
harder to do that trick with, and/or incur extreme amounts of database
overhead in running the query hundreds of times (for example, if I were to
handle 10,000 rows at a time).

Is there any way to get a ResultSet which is actually live streamed, which
keeps no knowledge of previous rows or the row count? It seems utterly
wasteful to me that I should need to load millions of rows into a Vector
(try an ArrayList next time, for minor speedups) when all I want to do is
stream the results, not ever needing to know the total number of rows ahead
of time nor any previous (or subsequent) rows to the "current one" from
ResultSet.next().

I can't imagine I'm the first person to be using JDBC to access tables with
tens of millions of rows to attempt to access significant portions of them
at a time.

Thanks,

Doug


Re: Retrieving ResultSets with millions of rows -

From
Dave Cramer
Date:
Doug,

Ya, there is, it's a bit of a hack but the only way around it is to use
a cursor. Eventually the driver will do this automatically, but for now
you will have to do it manually

so
begin;
declare cursor1 CURSOR FOR SELECT ....
fetch n from cursor1 ...
end;

Dave
On Fri, 2002-09-13 at 17:43, Doug Fields wrote:
> Hello,
>
> I've just come to the startling realization that the 7.2 JDBC code loads
> every single row in a ResultSet into a Vector. (core/QueryExecutor.java)
>
> Unfortunately, I run some queries which return millions of rows, each row
> which could well be 100-1000 bytes and more.
>
> Hence, I get an OutOfMemoryError.
>
> For some queries, there's an obvious workaround: issue the query with an
> appropriate suffix "ORDER BY <something> LIMIT <something> OFFSET
> <something>" several times in succession. This will, of course, work mostly
> fine assuming it's a simple single-table query with an appropriate index
> (such as a primary key).
>
> However, some of my queries are complex one-to-many joins with no
> particular ordering (or no index for quick ordering). These would be much
> harder to do that trick with, and/or incur extreme amounts of database
> overhead in running the query hundreds of times (for example, if I were to
> handle 10,000 rows at a time).
>
> Is there any way to get a ResultSet which is actually live streamed, which
> keeps no knowledge of previous rows or the row count? It seems utterly
> wasteful to me that I should need to load millions of rows into a Vector
> (try an ArrayList next time, for minor speedups) when all I want to do is
> stream the results, not ever needing to know the total number of rows ahead
> of time nor any previous (or subsequent) rows to the "current one" from
> ResultSet.next().
>
> I can't imagine I'm the first person to be using JDBC to access tables with
> tens of millions of rows to attempt to access significant portions of them
> at a time.
>
> Thanks,
>
> Doug
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>
>




Re: Retrieving ResultSets with millions of rows -

From
Doug Fields
Date:
>Ya, there is, it's a bit of a hack but the only way around it is to use
>a cursor. Eventually the driver will do this automatically, but for now
>you will have to do it manually

That's what I was afraid you were going to say. Thanks for the response.

Question: Do I have to do anything unusual with the JDBC Connection and/or
Statement to put it in a certain mode? I'm not much on the interactions
between the JDBC API and the database transactions (begin/commit, etc.). I
also use the Protomatter connection pool; what do I have to do with the
connection when I release it back to the pool?

Also, a slightly tangential question: If I'm only doing reads in the
transaction (the cursor stuff), is there any performance difference if I
COMMIT or ROLLBACK?

Thanks,

Doug


>so
>begin;
>declare cursor1 CURSOR FOR SELECT ....
>fetch n from cursor1 ...
>end;
>
>Dave
>On Fri, 2002-09-13 at 17:43, Doug Fields wrote:
> > Hello,
> >
> > I've just come to the startling realization that the 7.2 JDBC code loads
> > every single row in a ResultSet into a Vector. (core/QueryExecutor.java)
> >
> > Unfortunately, I run some queries which return millions of rows, each row
> > which could well be 100-1000 bytes and more.
> >
> > Hence, I get an OutOfMemoryError.
> >
> > For some queries, there's an obvious workaround: issue the query with an
> > appropriate suffix "ORDER BY <something> LIMIT <something> OFFSET
> > <something>" several times in succession. This will, of course, work
> mostly
> > fine assuming it's a simple single-table query with an appropriate index
> > (such as a primary key).
> >
> > However, some of my queries are complex one-to-many joins with no
> > particular ordering (or no index for quick ordering). These would be much
> > harder to do that trick with, and/or incur extreme amounts of database
> > overhead in running the query hundreds of times (for example, if I were to
> > handle 10,000 rows at a time).
> >
> > Is there any way to get a ResultSet which is actually live streamed, which
> > keeps no knowledge of previous rows or the row count? It seems utterly
> > wasteful to me that I should need to load millions of rows into a Vector
> > (try an ArrayList next time, for minor speedups) when all I want to do is
> > stream the results, not ever needing to know the total number of rows
> ahead
> > of time nor any previous (or subsequent) rows to the "current one" from
> > ResultSet.next().
> >
> > I can't imagine I'm the first person to be using JDBC to access tables
> with
> > tens of millions of rows to attempt to access significant portions of them
> > at a time.
> >
> > Thanks,
> >
> > Doug
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> > http://www.postgresql.org/users-lounge/docs/faq.html
> >
> >



Re: Retrieving ResultSets with millions of rows -

From
Dave Cramer
Date:
On Fri, 2002-09-13 at 20:45, Doug Fields wrote:
>
> >Ya, there is, it's a bit of a hack but the only way around it is to use
> >a cursor. Eventually the driver will do this automatically, but for now
> >you will have to do it manually
>
> That's what I was afraid you were going to say. Thanks for the response.
>
> Question: Do I have to do anything unusual with the JDBC Connection and/or
> Statement to put it in a certain mode? I'm not much on the interactions
> between the JDBC API and the database transactions (begin/commit, etc.). I
> also use the Protomatter connection pool; what do I have to do with the
> connection when I release it back to the pool?
yes, turn autocommit off, and then to be safe turn it back on before
returning it to the pool.
>
> Also, a slightly tangential question: If I'm only doing reads in the
> transaction (the cursor stuff), is there any performance difference if I
> COMMIT or ROLLBACK?
with autocommit off you don't have to do either, just do an end;
>
> Thanks,
>
> Doug
>
>
> >so
> >begin;
> >declare cursor1 CURSOR FOR SELECT ....
> >fetch n from cursor1 ...
> >end;
> >
> >Dave
> >On Fri, 2002-09-13 at 17:43, Doug Fields wrote:
> > > Hello,
> > >
> > > I've just come to the startling realization that the 7.2 JDBC code loads
> > > every single row in a ResultSet into a Vector. (core/QueryExecutor.java)
> > >
> > > Unfortunately, I run some queries which return millions of rows, each row
> > > which could well be 100-1000 bytes and more.
> > >
> > > Hence, I get an OutOfMemoryError.
> > >
> > > For some queries, there's an obvious workaround: issue the query with an
> > > appropriate suffix "ORDER BY <something> LIMIT <something> OFFSET
> > > <something>" several times in succession. This will, of course, work
> > mostly
> > > fine assuming it's a simple single-table query with an appropriate index
> > > (such as a primary key).
> > >
> > > However, some of my queries are complex one-to-many joins with no
> > > particular ordering (or no index for quick ordering). These would be much
> > > harder to do that trick with, and/or incur extreme amounts of database
> > > overhead in running the query hundreds of times (for example, if I were to
> > > handle 10,000 rows at a time).
> > >
> > > Is there any way to get a ResultSet which is actually live streamed, which
> > > keeps no knowledge of previous rows or the row count? It seems utterly
> > > wasteful to me that I should need to load millions of rows into a Vector
> > > (try an ArrayList next time, for minor speedups) when all I want to do is
> > > stream the results, not ever needing to know the total number of rows
> > ahead
> > > of time nor any previous (or subsequent) rows to the "current one" from
> > > ResultSet.next().
> > >
> > > I can't imagine I'm the first person to be using JDBC to access tables
> > with
> > > tens of millions of rows to attempt to access significant portions of them
> > > at a time.
> > >
> > > Thanks,
> > >
> > > Doug
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 5: Have you checked our extensive FAQ?
> > >
> > > http://www.postgresql.org/users-lounge/docs/faq.html
> > >
> > >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>