Re: Understanding memory usage - Mailing list psycopg

From Daniele Varrazzo
Subject Re: Understanding memory usage
Date
Msg-id CA+mi_8YSo_gp0McvkD6ZURWQg=kyKdfkCTxAj-vXoCn+_jgVXQ@mail.gmail.com
Whole thread Raw
In response to Re: Understanding memory usage  (Damiano Albani <damiano.albani@gmail.com>)
Responses Re: Understanding memory usage
List psycopg
On Wed, Oct 30, 2013 at 7:27 PM, Damiano Albani
<damiano.albani@gmail.com> wrote:
> On Wed, Oct 30, 2013 at 7:27 PM, Daniele Varrazzo
> <daniele.varrazzo@gmail.com> wrote:
>>
>>
>> What to you mean as "freed"? Have you deleted the cursor and made sure the
>> gc reclaimed it? The cursor doesn't destroy the internal data until it is
>> deleted or another query is run (because after fetchall() you can invoke
>> scroll(0) and return it to Python again). And of course when the data
>> returned by fetch() is released depends on the client usage.
>
>
> By "freed", I mean doing like in the bug report #78:
>
>     del data
>     areadcur.close()
>     acon.close()
>     del areadcur
>     del acon
>

Please provide a complete and repeatable script (not as the guy in #78
did: repeatable means don't fetch from a table I don't see: use
generate_series() and repeat() to generate a syntetic dataset).

>> After a big query you may see memory usage going down as soon as you
>> execute "select 1 from false" because the result is replaced by a smaller
>> one.
>
>
> That's not the result that I get. Doing a query returning 2 millions rows
> followed by a "SELECT 1" has no effect on RSS memory usage in my case.

I've run some tests and I see what you mean: after returning *to
python* 1M of 100b strings the VmRSS doesn't go down from its peak,
even deleting the cursor and the objects and running gc.collect().
What I also see is:

- If the dataset remains in the cursor (execute() called, not
fetchall()) the memory shrinks as expected.
- Using 10 strings of 10M each also shows expected memory usage.
- Repeating the test with 1M of 100b strings more than once doesn't
bring the memory usage higher than the first time.


>> The only "problem" you may attribute to Psycopg is if you find an unbound
>> usage of the memory. If you run some piece of code in a loop and see memory
>> increasing linearly you have found a leak. Otherwise you can attribute the
>> artefacts you see to the Python GC.
>
>
> Indeed, there's no memory leak that I can see. But don't you find strange
> that Python / Psycopg memory management differs between 2 roughly equivalent
> query:
>
> a query returning 20 rows × 10 MB each
> a query returning 2 millions rows × 100 bytes each
>
> As far as I could test, in my environment, they're clearly not equal in
> terms of side effects.
> For the first, I can reclaim the memory after getting the results. For the
> second, I can't.

I easily expect a much bigger overhead in building millions of Python
object compared to building 20. Not only for the 37 bytes of overhead
each string has (sys.getsizeof()), but also for the consequences for
the GC to manage objects in the millions.

What I *suspect* is that you are seeing the combined effects of the OS
VM management, the C allocator, the Python allocator (that runs on top
of C one), and the Python GC system. I am not expert enough in any of
these areas to provide a good insight of how everything works
together.


-- Daniele


psycopg by date:

Previous
From: Damiano Albani
Date:
Subject: Re: Understanding memory usage
Next
From: Damiano Albani
Date:
Subject: Re: Understanding memory usage