Re: problem with lost connection while running long PL/R query - Mailing list pgsql-general
From | David M. Kaplan |
---|---|
Subject | Re: problem with lost connection while running long PL/R query |
Date | |
Msg-id | 5195E7CF.3020208@ird.fr Whole thread Raw |
In response to | Re: problem with lost connection while running long PL/R query (Joe Conway <mail@joeconway.com>) |
List | pgsql-general |
Hi Joe, Thanks for responding as you would clearly be the expert on this sort of problem. My current function does page through data using a cursor precisely to avoid out of memory problems, which is why I am somewhat surprised and stumped as to how this can be happening. It does return all the data at once, but one call to the function would seem to work, so I can't see why 4 wouldn't. I am currently planning to do some test runs using memory.profile() to see if each successive call to the PL/R function is somehow accumulating memory usage somewhere. Perhaps I am not properly closing a query or something like that? I am attaching my code. Perhaps you will have some insight. To give you a basic idea of what I am trying to do, I have separately developed a classification model for the state of a "system" based on the data in the postgresql table. I want to apply that model to each line of the table. I loop over the cursor and predict the state for batches of 10,000 lines at a time. Thanks again for the help. Cheers, David On 05/16/2013 11:40 PM, Joe Conway wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 05/16/2013 08:40 AM, Tom Lane wrote: >> "David M. Kaplan" <david.kaplan@ird.fr> writes: >>> Thanks for the help. You have definitely identified the problem, >>> but I am still looking for a solution that works for me. I tried >>> setting vm.overcommit_memory=2, but this just made the query >>> crash quicker than before, though without killing the entire >>> connection to the database. I imagine that this means that I >>> really am trying to use more memory than the system can handle? >>> I am wondering if there is a way to tell postgresql to flush a >>> set of table lines out to disk so that the memory they are using >>> can be liberated. >> Assuming you don't have work_mem set to something unreasonably >> large, it seems likely that the excessive memory consumption is >> inside your PL/R function, and not the fault of Postgres per se. >> You might try asking in some R-related forums about how to reduce >> the code's memory usage. > The two "classic" approaches to this with PL/R are either create a > custom aggregate with the PL/R as the final function (i.e. work on one > group at a time) or use the SPI cursor functionality within the PL/R > function and page your data using a cursor. Not all forms of analysis > lend themselves to these approaches, but perhaps yours does. > > Ultimately I would like to implement a form of R data.frame that does > the paging with a cursor transparently for you, but I have not been > able to find the time so far. > > Joe > > > - -- > Joe Conway > credativ LLC: http://www.credativ.us > Linux, PostgreSQL, and general Open Source > Training, Service, Consulting, & 24x7 Support > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQIcBAEBAgAGBQJRlVJdAAoJEDfy90M199hle8gP+wU+GSJ44g26VBBAy3po/E/Y > 9+pwxBhJe0x6v5PXtuM8Bzyy4yjlKCgzDj4XdZpEU7SYR+IKj7tWCihqc+Fuk1t1 > EjR2VUJwpSMztRvEIqWW8rX/DFGaVYCt89n0neKfKL/XJ5rbqMqQAUPbxMaBtW/p > 7EXo8RjVBMYibkvKrjpYJjLTuOTWkQCiXx5hc4HVFN53DYOF46rdFxMYUe5KLYTL > mZOnSoV0yrsaPGnxRIY0uzRv7ZTTBmB2o4TIWpTySx2rHNLqAJIT22wl0pfkjksH > JYvko3rWhSg7vSf+8RDN6X1eMAXcUO7H2NR5IdOoXEX2bzqTmDBQUjOcb5WR1yUd > L5XuT5WYiTpyzU8qAtPEVirwFnEwUN1tR6wDoVsseIWwXUYqSuXtg9qjFNAXZ1Hr > 05yxuzexOEzLQNwSXWhsCrLdnndEHrJ6pDlLaUCPVybxwwwW9BfS2fJUz+X63M8x > l5DYbyl6q6o2J2bs4UGCTk4r/1Qq/R9pApkWzsckTtF6zl49mzwzPnh5b/JcB+4x > u17Te+s3cRGcX09lt7qf9cWkv1uUF/Qw0ntBhW8TY2HYhbWVIEmiZV1HIksXf+nw > EBFshWs2/H75OPnhN9YNq3tjCuiR7o/eaZeINfGs2LzGIJvHpcjMDBgFFTES7CYV > Y20XukH07h9XcJGTsf0o > =TwfD > -----END PGP SIGNATURE----- -- ********************************** David M. Kaplan Charge de Recherche 1 Institut de Recherche pour le Developpement Centre de Recherche Halieutique Mediterraneenne et Tropicale av. Jean Monnet B.P. 171 34203 Sete cedex France Phone: +33 (0)4 99 57 32 27 Fax: +33 (0)4 99 57 32 95 http://www.umr-eme.org/team/dkaplan/ http://www.amped.ird.fr/ **********************************
Attachment
pgsql-general by date: