aggregation memory leak and fix - Mailing list pgsql-hackers
From | Erik Riedel |
---|---|
Subject | aggregation memory leak and fix |
Date | |
Msg-id | wqwfdpu00gNtImTPUm@andrew.cmu.edu Whole thread Raw |
Responses |
Re: [HACKERS] aggregation memory leak and fix
Re: [HACKERS] aggregation memory leak and fix Re: [HACKERS] aggregation memory leak and fix Re: [HACKERS] aggregation memory leak and fix |
List | pgsql-hackers |
Platform: Alpha, Digital UNIX 4.0D Software: PostgreSQL 6.4.2 and 6.5 snaphot (11 March 1999) I have a table as follows: Table = lineitem +------------------------+----------------------------------+-------+ | Field | Type | Length| +------------------------+----------------------------------+-------+ | l_orderkey | int4 not null | 4 | | l_partkey | int4 not null | 4 | | l_suppkey | int4 not null | 4 | | l_linenumber | int4 not null | 4 | | l_quantity | float4 not null | 4 | | l_extendedprice | float4 not null | 4 | | l_discount | float4 not null | 4 | | l_tax | float4 not null | 4 | | l_returnflag | char() not null | 1 | | l_linestatus | char() not null | 1 | | l_shipdate | date | 4 | | l_commitdate | date | 4 | | l_receiptdate | date | 4 | | l_shipinstruct | char() not null | 25 | | l_shipmode | char() not null | 10 | | l_comment | char() not null | 44 | +------------------------+----------------------------------+-------+ Index: lineitem_index_ that ends up having on the order of 500,000 rows (about 100 MB on disk). I then run an aggregation query as: -- -- Query 1 -- select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= ('1998-12-01'::datetime - interval '90 day')::date group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; when I run this against 6.4.2, the postgres process grows to upwards of 1 GB of memory (at which point something overflows and it dumps core) - I watch it grow through 200 MB, 400 MB, 800 MB, dies somewhere near 1 GB of allocated memory). If I take out a few of the "sum" expressions it gets better, removing sum_disk_price and sum_charge causes it to be only 600 MB and the query actually (eventually) completes. Takes about 10 minutes on my 500 MHz machine with 256 MB core and 4 GB of swap. The problem seems to be the memory allocation mechanism. Looking at a call trace, it is doing some kind of "sub query" plan for each row in the database. That means it does ExecEval and postquel_function and postquel_execute and all their friends for each row in the database. Allocating a couple hundred bytes for each one. The problem is that none of these allocations are freed - they seem to depend on the AllocSet to free them at the end of the transaction. This means it isn't a "true" leak, because the bytes are all freed at the (very) end of the transaction, but it does mean that the process grows to unreasonable size in the meantime. There is no need for this, because the individual expression results are aggregated as it goes along, so the intermediate nodes can be freed. I spent half a day last week chasing down the offending palloc() calls and execution stacks sufficiently that I think I found the right places to put pfree() calls. As a result, I have changes in the files: src/backend/executor/execUtils.c src/backend/executor/nodeResult.c src/backend/executor/nodeAgg.c src/backend/executor/execMain.c patches to these files are attached at the end of this message. These files are based on the 6.5.0 snapshot downloaded from ftp.postgreql.org on 11 March 1999. Apologies for sending patches to a non-released version. If anyone has problems applying the patches, I can send the full files (I wanted to avoid sending a 100K shell archive to the list). If anyone cares about reproducing my exact problem with the above table, I can provide the 100 MB pg_dump file for download as well. Secondary Issue: the reason I did not use the 6.4.2 code to make my changes is because the AllocSet calls in that one were particularly egregious - they only had the skeleton of the allocsets code that exists in the 6.5 snapshots, so they were calling malloc() for all of the 8 and 16 byte allocations that the above query causes. Using the fixed code reduces the maximum memory requirement on the above query to about 210 MB, and reduces the runtime to (an acceptable) 1.5 minutes - a factor of more than 6x improvement on my 256 MB machine. Now the biggest part of the execution time is in the sort before the aggregation (which isn't strictly needed, but that is an optimization for another day). Open Issue: there is still a small "leak" that I couldn't eliminate, I think I chased it down to the constvalue allocated in execQual::ExecTargetList(), but I couldn't figure out where to properly free it. 8 bytes leaked was much better than 750 bytes, so I stopped banging my head on that particular item. Secondary Open Issue: what I did have to do to get down to 210 MB of core was reduce the minimum allocation size in AllocSet to 8 bytes from 16 bytes. That reduces the 8 byte leak above to a true 8 byte, rather than a 16 byte leak. Otherwise, I think the size was 280 MB (still a big improvement on 1000+ MB). I only changed this in my code and I am not including a changed mcxt.c for that. I hope my changes are understandable/reasonable. Enjoy. Erik Riedel Carnegie Mellon University www.cs.cmu.edu/~riedel --------------[aggregation_memory_patch.sh]----------------------- #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create: # execMain.c.diff # execUtils.c.diff # nodeAgg.c.diff # nodeResult.c.diff # This archive created: Fri Mar 19 15:47:17 1999 export PATH; PATH=/bin:/usr/bin:$PATH if test -f 'execMain.c.diff' thenecho shar: "will not over-write existing file 'execMain.c.diff'" else cat << \SHAR_EOF > 'execMain.c.diff' 583c . 398a . 396a/* XXX - clean up some more from ExecutorStart() - er1p */if (NULL == estate->es_snapshot) { /* nothing to free */}else { if (estate->es_snapshot->xcnt > 0) { pfree(estate->es_snapshot->xip); } pfree(estate->es_snapshot);} if (NULL == estate->es_param_exec_vals) { /* nothing to free */} else { pfree(estate->es_param_exec_vals); estate->es_param_exec_vals= NULL;} . SHAR_EOF fi if test -f 'execUtils.c.diff' thenecho shar: "will not over-write existing file 'execUtils.c.diff'" else cat << \SHAR_EOF > 'execUtils.c.diff' 368a } /* ----------------* ExecFreeExprContext* ----------------*/ void ExecFreeExprContext(CommonState *commonstate) {ExprContext *econtext; /* ---------------- * get expression context. if NULL then this node has * none so we just return. * ----------------*/econtext = commonstate->cs_ExprContext;if (econtext == NULL) return; /* ---------------- * clean up memory used. * ---------------- */pfree(econtext);commonstate->cs_ExprContext = NULL; } /* ----------------* ExecFreeTypeInfo* ----------------*/ void ExecFreeTypeInfo(CommonState *commonstate) {TupleDesc tupDesc; tupDesc = commonstate->cs_ResultTupleSlot->ttc_tupleDescriptor;if (tupDesc == NULL) return; /* ---------------- * clean up memory used. * ---------------- */FreeTupleDesc(tupDesc);commonstate->cs_ResultTupleSlot->ttc_tupleDescriptor= NULL; . 274a . SHAR_EOF fi if test -f 'nodeAgg.c.diff' thenecho shar: "will not over-write existing file 'nodeAgg.c.diff'" else cat << \SHAR_EOF > 'nodeAgg.c.diff' 376a pfree(oldVal); /* XXX - new, let's free the old datum - er1p */ . 374a oldVal = value1[aggno]; /* XXX - save so we can free later - er1p */ . 112aDatum oldVal = (Datum) NULL; /* XXX - so that we can save and free on each iteration - er1p */ . SHAR_EOF fi if test -f 'nodeResult.c.diff' thenecho shar: "will not over-write existing file 'nodeResult.c.diff'" else cat << \SHAR_EOF > 'nodeResult.c.diff' 278apfree(resstate); node->resstate = NULL; /* XXX - new for us - er1p */ . 265aExecFreeExprContext(&resstate->cstate); /* XXX - new for us - er1p */ExecFreeTypeInfo(&resstate->cstate); /* XXX - newfor us - er1p */ . SHAR_EOF fi exit 0 # End of shell archive
pgsql-hackers by date: