Thread: Does the psql executable support a "fetch many" approach when dumping large queries to stdout?
Does the psql executable support a "fetch many" approach when dumping large queries to stdout?
From
"Beaver, John E"
Date:
Does the psql executable have any ability to do a "fetch many", using a server-side named cursor, when returning results?It seems like it tries to retrieve the query entirely to local memory before printing to standard out. Specifically, I've tried running the following command on my desktop, which returns about 70 million lines: echo "select [thirteen columns, all floating point and boolean types] from go_prediction_view_mouse gpv order by combined_scoredesc nulls last" | psql -U [username] -h [remote host] [database] > mouse_predictions.tab The command increases in memory usage until about 12GB, when it stops and there is significant disk activity (assume pagingto disk). Closing a few big programs immediately causes it to increase its memory usage accordingly. After about 50minutes, I killed it. If I instead run the Python program below, which simply performs the query using a fetchmany() call to retrieve a few hundredthousand tuples at a time through a named server-side cursor, the program remains under about 20 MB of memory usagethroughout and finishes in about 35 minutes. I know that the query used here could have been a COPY statement, which I assume would be better-behaved, but I'm more concernedabout the case in which the query is more complex. The local (OSX 10.6.2) version of Postgres is 8.4.2, and the server's (Ubuntu 4.x) version of Postgres is 8.3.5. ------ Python source to use a named server-side cursor to dump a large number of rows to a file ------ import psycopg2 import psycopg2.extensions #---Modify this section---# species = 'mouse' query = 'select go_term_ref, gene_remote_id, function_verified_exactly, function_verified_with_parent_go, function_verified_with_child_go,combined_score, obozinski_score, lee_score, mostafavi_score, guan_score, kim_score, joshi_score,tasan_score, tasan_revised_score, qi_score, leone_score from go_prediction_view_' + species + ' gpv order bycombined_score desc nulls last' outputFilePath = [*output file path*] connectionString = [*connection string*] queryBufferSize = 10000 def processRow(row): # not important #---End modify this section----# #---Everything below should be genetic---# conn = psycopg2.connect(connectionString); cur = conn.cursor('temp_dump_cursor') cur.arraysize = queryBufferSize def resultIter(cursor): 'An iterator that uses fetchmany to keep memory usage down' done = False while not done: results = cursor.fetchmany(queryBufferSize) if not results or len(results) == 0: done = True for result in results: yield result with open(outputFilePath, 'w') as file: print 'Starting ' + query + '...' cur.execute(query) i = 0 for row in resultIter(cur): i += 1 row = processRow(row) file.write('\t'.join(row) + '\n') if i % queryBufferSize == 0: print 'Wrote ' + str(i) + ' lines.' print 'finished. Total of ' + str(i) + ' lines.' cur.close() conn.close()
Re: *** PROBABLY SPAM *** Does the psql executable support a "fetch many" approach when dumping large queries to stdout?
From
"Pierre C"
Date:
> Does the psql executable have any ability to do a "fetch many", using a > server-side named cursor, when returning results? It seems like it tries > to retrieve the query entirely to local memory before printing to > standard out. I think it grabs the whole result set to calculate the display column widths. I think there is an option to tweak this but don't remember which, have a look at the psql commands (\?), formatting section.
Re: Does the psql executable support a "fetch many" approach when dumping large queries to stdout?
From
John Beaver
Date:
Ah, you're right. Thanks Hannu, that's a good solution.<br /><br /> Hannu Krosing wrote: <blockquote cite="mid:1270246688.6482.263.camel@hvost"type="cite"><pre wrap="">On Fri, 2010-04-02 at 16:28 -0400, Beaver, John E wrote: ... </pre><blockquote type="cite"><pre wrap="">I know that the query used here could have been a COPY statement, which I assumewouldbe better-behaved, but I'm more concerned about the case in which the query is more complex. </pre></blockquote><prewrap=""> COPY can copy out results of a SELECT query as well. </pre></blockquote><br /><pre class="moz-signature" cols="72">-- John E. Beaver Bioinformatics Developer Harvard Medical School</pre>
Re: *** PROBABLY SPAM *** Does the psql executable support a "fetch many" approach when dumping large queries to stdout?
From
John Beaver
Date:
That makes sense. I'll just use a COPY statement instead like Hannu suggests.<br /><br /> Pierre C wrote: <blockquote cite="mid:op.vajyiejneorkce@immo.peufeu.com"type="cite"><blockquote type="cite"><pre wrap="">Does the psql executable haveany ability to do a "fetch many", using a server-side named cursor, when returning results? It seems like it tries to retrieve the query entirely to local memory before printing to standard out. </pre></blockquote><pre wrap=""> I think it grabs the whole result set to calculate the display column widths. I think there is an option to tweak this but don't remember which, have a look at the psql commands (\?), formatting section. </pre></blockquote><br /><pre class="moz-signature" cols="72">-- John E. Beaver Bioinformatics Developer Harvard Medical School</pre>
Re: Does the psql executable support a "fetch many" approach when dumping large queries to stdout?
From
Hannu Krosing
Date:
On Fri, 2010-04-02 at 16:28 -0400, Beaver, John E wrote: ... > I know that the query used here could have been a COPY statement, which I assume would > be better-behaved, but I'm more concerned about the case in which the query is more complex. COPY can copy out results of a SELECT query as well. -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
Re: *** PROBABLY SPAM *** Does the psql executable support a "fetch many" approach when dumping large queries to stdout?
From
Bruce Momjian
Date:
Pierre C wrote: > > Does the psql executable have any ability to do a "fetch many", using a > > server-side named cursor, when returning results? It seems like it tries > > to retrieve the query entirely to local memory before printing to > > standard out. > > I think it grabs the whole result set to calculate the display column > widths. I think there is an option to tweak this but don't remember which, > have a look at the psql commands (\?), formatting section. See the FETCH_COUNT variable mentioned in the psql manual page. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com