large table problem - Mailing list pgsql-general

From Jason Nerothin
Subject large table problem
Date
Msg-id f42b58b90704201036w2d1b66c6m9a923a4ff982b960@mail.gmail.com
Whole thread Raw
Responses Re: large table problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: large table problem  (Kris Jurka <books@ejurka.com>)
List pgsql-general
I'm trying to work my way around a large query problem.

In my system, I've created a number of large materialized views that are the output of some computationally expensive stored procedures on other large tables in my system. They are intended to serve as staging tables for the next phase of computation, such that I can call (the presumably efficient):

select * from <my_mv_table>;

and then have at the data on the other side of a Java application server.

These tables range anywhere from 400,000 to >32,000,000 rows, though individual records are only 5 integers wide.

Not too unexpectedly, the app server (EJB3/JPA) is choking on the queries which are "unnamed native queries" in Java parliance. Work-around attempt 1 was to call directly to the JDBC driver, but the cursor doesn't dispose of the memory in the ResultSet once I've passed it by (OutOfMemoryError) and the documentation suggests that cursor behavior is a little buggy for the current postgres driver. (The docs suggest implementing a custom stored procedure to provide iteration.) Attempt number 2, now underway, is to pass LIMIT and OFFSET values to the query which Postgres handles quite effectively as long as the OFFSET value is less than the total number of rows in the table. When the value is greater than <num_rows>, the query hangs for minutes.

So my question is, does Postgres keep any metadata around about un-indexed table sizes? select count(*) from <my_table> itself can take a minute to process. If I had ready access to that information, I could kluge up my code with something like:

num_rows = getNumRows();
while( offset < num_rows ){
        processData( select( offset += window_size, window_size ) )
}

At the moment the best option I have is to write a stored proceedure to populate a table <mv_sizes>, but not only is this a pain in the patoot, it just seems sick and wrong.

Am I missing something painfully obvious?

Jason
--
========================================================
Jason Nerothin
Programmer/Analyst IV - Database Administration
UCLA-DOE Institute for Genomics & Proteomics
Howard Hughes Medical Institute
========================================================
611 C.E. Young Drive East   | Tel: (310) 206-3907
105 Boyer Hall, Box 951570  | Fax: (310) 206-3914
Los Angeles, CA 90095. USA | Mail: jason@mbi.ucla.edu
========================================================
http://www.mbi.ucla.edu/~jason
========================================================

pgsql-general by date:

Previous
From: Tilmann Singer
Date:
Subject: tsearch2 dictionary that indexes substrings?
Next
From: Jonathan Vanasco
Date:
Subject: unique constraint on 2 columns