Re: problem with lost connection while running long PL/R query - Mailing list pgsql-general

From David M. Kaplan
Subject Re: problem with lost connection while running long PL/R query
Date
Msg-id 5195E7CF.3020208@ird.fr
Whole thread Raw
In response to Re: problem with lost connection while running long PL/R query  (Joe Conway <mail@joeconway.com>)
List pgsql-general
Hi Joe,

Thanks for responding as you would clearly be the expert on this sort of
problem.  My current function does page through data using a cursor
precisely to avoid out of memory problems, which is why I am somewhat
surprised and stumped as to how this can be happening.  It does return
all the data at once, but one call to the function would seem to work,
so I can't see why 4 wouldn't.  I am currently planning to do some test
runs using memory.profile() to see if each successive call to the PL/R
function is somehow accumulating memory usage somewhere.  Perhaps I am
not properly closing a query or something like that?

I am attaching my code.  Perhaps you will have some insight.  To give
you a basic idea of what I am trying to do, I have separately developed
a classification model for the state of a "system" based on the data in
the postgresql table.  I want to apply that model to each line of the
table.  I loop over the cursor and predict the state for batches of
10,000 lines at a time.

Thanks again for the help.

Cheers,
David



On 05/16/2013 11:40 PM, Joe Conway wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/16/2013 08:40 AM, Tom Lane wrote:
>> "David M. Kaplan" <david.kaplan@ird.fr> writes:
>>> Thanks for the help.  You have definitely identified the problem,
>>> but I am still looking for a solution that works for me.  I tried
>>> setting vm.overcommit_memory=2, but this just made the query
>>> crash quicker than before, though without killing the entire
>>> connection to the database.  I imagine that this means that I
>>> really am trying to use more memory than the system can handle?
>>> I am wondering if there is a way to tell postgresql to flush a
>>> set of table lines out to disk so that the memory they are using
>>> can be liberated.
>> Assuming you don't have work_mem set to something unreasonably
>> large, it seems likely that the excessive memory consumption is
>> inside your PL/R function, and not the fault of Postgres per se.
>> You might try asking in some R-related forums about how to reduce
>> the code's memory usage.
> The two "classic" approaches to this with PL/R are either create a
> custom aggregate with the PL/R as the final function (i.e. work on one
> group at a time) or use the SPI cursor functionality within the PL/R
> function and page your data using a cursor. Not all forms of analysis
> lend themselves to these approaches, but perhaps yours does.
>
> Ultimately I would like to implement a form of R data.frame that does
> the paging with a cursor transparently for you, but I have not been
> able to find the time so far.
>
> Joe
>
>
> - --
> Joe Conway
> credativ LLC: http://www.credativ.us
> Linux, PostgreSQL, and general Open Source
> Training, Service, Consulting, & 24x7 Support
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJRlVJdAAoJEDfy90M199hle8gP+wU+GSJ44g26VBBAy3po/E/Y
> 9+pwxBhJe0x6v5PXtuM8Bzyy4yjlKCgzDj4XdZpEU7SYR+IKj7tWCihqc+Fuk1t1
> EjR2VUJwpSMztRvEIqWW8rX/DFGaVYCt89n0neKfKL/XJ5rbqMqQAUPbxMaBtW/p
> 7EXo8RjVBMYibkvKrjpYJjLTuOTWkQCiXx5hc4HVFN53DYOF46rdFxMYUe5KLYTL
> mZOnSoV0yrsaPGnxRIY0uzRv7ZTTBmB2o4TIWpTySx2rHNLqAJIT22wl0pfkjksH
> JYvko3rWhSg7vSf+8RDN6X1eMAXcUO7H2NR5IdOoXEX2bzqTmDBQUjOcb5WR1yUd
> L5XuT5WYiTpyzU8qAtPEVirwFnEwUN1tR6wDoVsseIWwXUYqSuXtg9qjFNAXZ1Hr
> 05yxuzexOEzLQNwSXWhsCrLdnndEHrJ6pDlLaUCPVybxwwwW9BfS2fJUz+X63M8x
> l5DYbyl6q6o2J2bs4UGCTk4r/1Qq/R9pApkWzsckTtF6zl49mzwzPnh5b/JcB+4x
> u17Te+s3cRGcX09lt7qf9cWkv1uUF/Qw0ntBhW8TY2HYhbWVIEmiZV1HIksXf+nw
> EBFshWs2/H75OPnhN9YNq3tjCuiR7o/eaZeINfGs2LzGIJvHpcjMDBgFFTES7CYV
> Y20XukH07h9XcJGTsf0o
> =TwfD
> -----END PGP SIGNATURE-----

--
**********************************
David M. Kaplan
Charge de Recherche 1

Institut de Recherche pour le Developpement
Centre de Recherche Halieutique Mediterraneenne et Tropicale
av. Jean Monnet
B.P. 171
34203 Sete cedex
France

Phone: +33 (0)4 99 57 32 27
Fax: +33 (0)4 99 57 32 95

http://www.umr-eme.org/team/dkaplan/
http://www.amped.ird.fr/
**********************************


Attachment

pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: [HACKERS] PLJava for Postgres 9.2.
Next
From: "David M. Kaplan"
Date:
Subject: Re: problem with lost connection while running long PL/R query