Thread: Help with retrieving large results sets and memory usage.

Help with retrieving large results sets and memory usage.

From
Peter.Rupp@ual.com
Date:
Hi everybody,
I have several general and specific questions regarding the protocol
that postgresql uses to send large result sets to client processes over
a TCP network connection.

I'm connecting to a 7.3.4 Postgres database, running on HPUX 11.i.  I'm
using two ways to connect using the network: 1)  Connecting from Java
using the JDBC type-4 driver built in the postgres distribution, and 2)
Connecting from C-based Python using the Python PyGreSQL interface
module (pg), also supplied and built from the 7.3.4 postgres
distribution.

When using either interface, I notice that when I select rows from a
table, it appears that the client interface retrieves  the entire
results set at once...and loads it into memory on the client machine.
When I recently had to retrieve all rows in a table (the data size
totalled 110Meg)
The memory usage in the JVM jumped to rougly 330Meg.  In Python, it went
to 440-450Meg.
I observed the memory usage with a fairly sophisticated performance
monitoring tool (HP's glance/perfview)

I understand that all interfaces need to keep a copy of the data around
somewhere, so this is understandable;  However, there are times when I
will need to select a large amount of data.... where there could be
thousands or millions of rows and the data returned could easily exceed
the memory size of our machine.

For a work-around to this issue, I looked at:

1) I could break the sql statement up into multiple calls....attempting
to limit the size of the results set   retrieved.  However, this proves very difficult (if not impossible)
for several of our tables.

2) Perhaps  the postgres network protocol and/or the postgres
'postmaster' daemon allow for retrieval of a portion of a results set at
a time?   In other words, is there a way for the client JDBC or Pysql module
to bring in only   a portion of the result set into memory?  That is....read say (500
rows, or 50meg of data    from the network, hand this off to the client program.  When program
needs more, then   the adaptor reads the next 500 rows, or something to that effect.


3)  I'm not sure if  the JDBC (or any other db connection api) allows
for row/size buffering as I describe in #2, perhaps this is an
efficiency that's left to the implementation.   However, I'm proficient
enough in C and Java to add this functionality, if postgres engine
itself will allow it.        Do you folks publish the network symantics and protocols used to
make sql requests and retrieve their results?   Any documentation about
byte-stream structures, etc would be really helpful; I could look at the
source code, but if there's anybody who can summarize this...it could
help me a lot.


Thanks very much for your assistance.
==pete==

Best regards,
==pete==
P. A. Rupp
United Airlines Corp. (WHQKT)
voice: 847-700-3226
email: peter.rupp@ual.com




Re: Help with retrieving large results sets and memory usage.

From
Tom Lane
Date:
Peter.Rupp@ual.com writes:
> When using either interface, I notice that when I select rows from a
> table, it appears that the client interface retrieves  the entire
> results set at once...and loads it into memory on the client machine.   

The standard answer is to use a cursor and FETCH a few rows at a time.

>     Do you folks publish the network symantics and protocols used to
> make sql requests and retrieve their results?

http://developer.postgresql.org/docs/postgres/protocol.html
        regards, tom lane