Re: [HACKERS] aggregation memory leak and fix - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] aggregation memory leak and fix
Date
Msg-id 199903192223.RAA06691@candle.pha.pa.us
Whole thread Raw
In response to aggregation memory leak and fix  (Erik Riedel <riedel+@CMU.EDU>)
Responses Re: [HACKERS] aggregation memory leak and fix
List pgsql-hackers
> when I run this against 6.4.2, the postgres process grows to upwards of
> 1 GB of memory (at which point something overflows and it dumps core) -
> I watch it grow through 200 MB, 400 MB, 800 MB, dies somewhere near 1 GB
> of allocated memory).
> 
> If I take out a few of the "sum" expressions it gets better, removing
> sum_disk_price and sum_charge causes it to be only 600 MB and the query
> actually (eventually) completes.  Takes about 10 minutes on my 500 MHz
> machine with 256 MB core and 4 GB of swap.

Wow, that is large.

> The problem seems to be the memory allocation mechanism.  Looking at a
> call trace, it is doing some kind of "sub query" plan for each row in
> the database.  That means it does ExecEval and postquel_function and
> postquel_execute and all their friends for each row in the database. 
> Allocating a couple hundred bytes for each one.

I will admit we really haven't looked at executor issues recently.  We
have had few bug reports in that area, so we normally have just left it
alone.  I was aware of some memory allocation issues with aggregates,
but I thought I fixed them in 6.5.  Obviously not.

> The problem is that none of these allocations are freed - they seem to
> depend on the AllocSet to free them at the end of the transaction.  This
> means it isn't a "true" leak, because the bytes are all freed at the
> (very) end of the transaction, but it does mean that the process grows
> to unreasonable size in the meantime.  There is no need for this,
> because the individual expression results are aggregated as it goes
> along, so the intermediate nodes can be freed.

Yes, but a terrible over-allocation.

> I spent half a day last week chasing down the offending palloc() calls
> and execution stacks sufficiently that I think I found the right places
> to put pfree() calls.
> 
> As a result, I have changes in the files:
> 
> src/backend/executor/execUtils.c
> src/backend/executor/nodeResult.c 
> src/backend/executor/nodeAgg.c 
> src/backend/executor/execMain.c 
> 
> patches to these files are attached at the end of this message.  These
> files are based on the 6.5.0 snapshot downloaded from ftp.postgreql.org
> on 11 March 1999.
> 
> Apologies for sending patches to a non-released version.  If anyone has
> problems applying the patches, I can send the full files (I wanted to
> avoid sending a 100K shell archive to the list).  If anyone cares about
> reproducing my exact problem with the above table, I can provide the 100
> MB pg_dump file for download as well.

No apologies necessary.  Glad to have someone digging into that area of
the code.  We will gladly apply your patches to 6.5.  However, I request
that you send context diffs(diff -c).  Normal diffs are just too
error-prone in application.   Send them, and I will apply them right
away.

> Secondary Issue:  the reason I did not use the 6.4.2 code to make my
> changes is because the AllocSet calls in that one were particularly
> egregious - they only had the skeleton of the allocsets code that exists
> in the 6.5 snapshots, so they were calling malloc() for all of the 8 and
> 16 byte allocations that the above query causes.

Glad you used 6.5.  Makes it easier to merge them into our next release.


> Using the fixed code reduces the maximum memory requirement on the above
> query to about 210 MB, and reduces the runtime to (an acceptable) 1.5
> minutes - a factor of more than 6x improvement on my 256 MB machine.
> 
> Now the biggest part of the execution time is in the sort before the
> aggregation (which isn't strictly needed, but that is an optimization
> for another day).

Not sure why that is there?  Perhaps for GROUP BY processing?

> 
> Open Issue: there is still a small "leak" that I couldn't eliminate, I
> think I chased it down to the constvalue allocated in
> execQual::ExecTargetList(), but I couldn't figure out where to properly
> free it.  8 bytes leaked was much better than 750 bytes, so I stopped
> banging my head on that particular item.

Can you give me the exact line?  Is it the palloc(1)?


> Secondary Open Issue: what I did have to do to get down to 210 MB of
> core was reduce the minimum allocation size in AllocSet to 8 bytes from
> 16 bytes.  That reduces the 8 byte leak above to a true 8 byte, rather
> than a 16 byte leak.  Otherwise, I think the size was 280 MB (still a
> big improvement on 1000+ MB).  I only changed this in my code and I am
> not including a changed mcxt.c for that.

Maybe Jan, our memory optimizer, can discuss the 8 vs. 16 byte issue.



--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Michael Davis
Date:
Subject: RE: [HACKERS] Associative Operators? (Was: Re: [NOVICE] Out of f rying pan, into fire)
Next
From: Michael Davis
Date:
Subject: Additional requests for version 6.5