Thread: Huge amount of memory consumed during transaction
Hi,
I'm running into a problem with PostgreSQL 8.2.4 (running on 32 bit Debian Etch/2x dual core C2D/8GB mem). The thing is that I have a huge transaction that does 2 things: 1) delete about 300.000 rows from a table with about 15 million rows and 2) do some (heavy) calculations and re-insert a litlte more than 300.000 new rows.
My problem is that this consumes huge amounts of memory. The transaction runs for about 20 minutes and during that transaction memory usage peaks to about 2GB. Over time, the more rows that are involved in this transaction, the higher the peak memory requirements.
Lately we increased our shared_buffers to 1.5GB, and during this transaction we reached the process memory limit, causing an out of memory and a rollback of the transaction:
Express yourself instantly with MSN Messenger! MSN Messenger
I'm running into a problem with PostgreSQL 8.2.4 (running on 32 bit Debian Etch/2x dual core C2D/8GB mem). The thing is that I have a huge transaction that does 2 things: 1) delete about 300.000 rows from a table with about 15 million rows and 2) do some (heavy) calculations and re-insert a litlte more than 300.000 new rows.
My problem is that this consumes huge amounts of memory. The transaction runs for about 20 minutes and during that transaction memory usage peaks to about 2GB. Over time, the more rows that are involved in this transaction, the higher the peak memory requirements.
Lately we increased our shared_buffers to 1.5GB, and during this transaction we reached the process memory limit, causing an out of memory and a rollback of the transaction:
BEGIN
DELETE 299980
ERROR: out of memory
DETAIL: Failed on request of size 4194304.
ROLLBACK
DROP SEQUENCE
real 19m45.797s
user 0m0.024s
sys 0m0.000s
On my development machine, which has less than 2GB of memory, I can not even finish the transaction.
Is there a way to tell PG to start swapping to disk instead of using ram memory during such a transaction?
Thanks in advance for all help
Express yourself instantly with MSN Messenger! MSN Messenger
henk de wit wrote: > Hi, > > I'm running into a problem with PostgreSQL 8.2.4 (running on 32 bit > Debian Etch/2x dual core C2D/8GB mem). The thing is that I have a > huge transaction that does 2 things: 1) delete about 300.000 rows > from a table with about 15 million rows and 2) do some (heavy) > calculations and re-insert a litlte more than 300.000 new rows. > > My problem is that this consumes huge amounts of memory. What exactly consumes all your memory? I'm assuming it's not just straight SQL. -- Richard Huxton Archonet Ltd
On 10/11/07, henk de wit <henk53602@hotmail.com> wrote: > > Hi, > > I'm running into a problem with PostgreSQL 8.2.4 (running on 32 bit Debian > Etch/2x dual core C2D/8GB mem). The thing is that I have a huge transaction > that does 2 things: 1) delete about 300.000 rows from a table with about 15 > million rows and 2) do some (heavy) calculations and re-insert a litlte more > than 300.000 new rows. > > My problem is that this consumes huge amounts of memory. The transaction > runs for about 20 minutes and during that transaction memory usage peaks to > about 2GB. Over time, the more rows that are involved in this transaction, > the higher the peak memory requirements. How is the memory consumed? How are you measuring it? I assume you mean the postgres process that is running the query uses the memory. If so, which tool(s) are you using and what's the output that shows it being used? I believe that large transactions with foreign keys are known to cause this problem. > Lately we increased our shared_buffers to 1.5GB, and during this transaction > we reached the process memory limit, causing an out of memory and a rollback > of the transaction: How much memory does this machine have? You do realize that shared_buffers are not a generic postgresql memory pool, but explicitly used to hold data from the discs. If you need to sort and materialize data, that is done with memory allocated from the heap. If you've given all your memory to shared_buffers, there might not be any left. How much swap have you got configured? Lastly, what does explain <your query here> say?
henk de wit <henk53602@hotmail.com> writes: > ERROR: out of memory > DETAIL: Failed on request of size 4194304. This error should have produced a map of per-context memory use in the postmaster log. Please show us that. regards, tom lane
On Oct 11, 2007, at 9:51 AM, Tom Lane wrote: > henk de wit <henk53602@hotmail.com> writes: >> ERROR: out of memory >> DETAIL: Failed on request of size 4194304. > > This error should have produced a map of per-context memory use in the > postmaster log. Please show us that. > > regards, tom lane Tom, are there any docs anywhere that explain how to interpret those per-context memory dumps? For example, when I see an autovacuum context listed is it safe to assume that the error came from an autovac operation, etc.? Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Erik Jones <erik@myemma.com> writes: > Tom, are there any docs anywhere that explain how to interpret those =20 > per-context memory dumps? No, not really. What you have to do is grovel around in the code and see where contexts with particular names might get created. > For example, when I see an autovacuum =20 > context listed is it safe to assume that the error came from an =20 > autovac operation, etc.? Probably, but I haven't looked. regards, tom lane
> How is the memory consumed? How are you measuring it? I assume you
> mean the postgres process that is running the query uses the memory.> If so, which tool(s) are you using and what's the output that shows it
> being used?
It's periodically measured and recorded by a script from which the relevant parts are:
GET_VSZ="ps aux | grep $REQ_GREP | grep -v grep | grep -v $$ | awk '{print \$5}'
| sort -n | tail -n1";
GET_RSS="ps aux | grep $REQ_GREP | grep -v grep | grep -v $$ | awk '{print \$6}'
| sort -n | tail -n1";
From this I draw graphs using Cacti. I just checked a recent transaction; during this transaction which involved about 900.000 rows, VSZ peakes at 2.36GB, with RSS then peaking at 2.27GB. This memory usage is on top of a shared_buffers being set back to 320MB. Shortly after the transaction finished, memory usage indeed drops back to a nearly steady 320MB.
(btw, I mistyped the rows involved in the original post; the 2GB memory usage is for 900.000 rows, not 300.000).
After some more digging, I found out that the immense increase of memory usage started fairly recently (but before the increase of my shared_buffers, that just caused the out of memory exception).
E.g. for a transaction with 300.000 rows involved a few weeks back, the memory usage stayed at a rather moderate 546MB/408MB (including 320MB for shared_buffers), and for some 800.000 rows the memory usage peaked at 'only' 631/598. When I draw a graph of "rows involved" vs "memory usage" there is a direct relation; apart from a few exceptions its clearly that the more rows are involved, the more memory is consumed.
I'll have to check what was exactly changed at the PG installation recently, but nevertheless even with the more moderate memory consumption it becomes clear that PG eventually runs out of memory when more and more rows are involved.
> I believe that large transactions with foreign keys are known to cause
> this problem.
As far as I can see there are no, or nearly no foreign keys involved in the transaction I'm having problems with.
> How much memory does this machine have?
It's in the original post: 8GB ;)
> If you've given all your memory to shared_buffers, there might not be
> any left.
I have of course not given all memory to shared_buffers. I tried to apply the rule of thumb of setting it to 1/4 of total memory. To be a little conservative, even a little less than that. 1/4 of 8GB is 2GB, so I tried with 1.5 to start. All other queries and small transactions run fine (we're talking about thousands upon thousands of queries and 100's of different ones. It's this huge transaction that occupies so much memory.
> Lastly, what does explain <your query here> say?
I can't really test that easily now and it'll be a huge explain anyway (the query is almost 500 lines :X). I'll try to get one though.
Express yourself instantly with MSN Messenger! MSN Messenger
> This error should have produced a map of per-context memory use in the> postmaster log.
> Please show us that.
> Please show us that.
I'm not exactly sure what to look for in the log. I'll do my best though and see what I can come up with.
Express yourself instantly with MSN Messenger! MSN Messenger
henk de wit <henk53602@hotmail.com> writes: > I'm not exactly sure what to look for in the log. I'll do my best though an= > d see what I can come up with. It'll be a bunch of lines like TopMemoryContext: 49832 total in 6 blocks; 8528 free (6 chunks); 41304 used immediately in front of the out-of-memory ERROR report. regards, tom lane
henk de wit <henk53602@hotmail.com> writes: > I indeed found them in the logs. Here they are: It looks to me like you have work_mem set optimistically large. This query seems to be doing *many* large sorts and hashes: > HashBatchContext: 262144236 total in 42 blocks; 3977832 free (40 chunks); 258166404 used > TupleSort: 9429016 total in 11 blocks; 1242544 free (16 chunks); 8186472 used > HashBatchContext: 262144236 total in 42 blocks; 3977832 free (40 chunks); 258166404 used > TupleSort: 9429016 total in 11 blocks; 674376 free (20 chunks); 8754640 used > TupleSort: 9429016 total in 11 blocks; 245496 free (9 chunks); 9183520 used > TupleSort: 17817624 total in 12 blocks; 3007648 free (14 chunks); 14809976 used > TupleSort: 276878852 total in 44 blocks; 243209288 free (1727136 chunks); 33669564 used > TupleSort: 37740568 total in 14 blocks; 5139552 free (21 chunks); 32601016 used > HashBatchContext: 2105428 total in 9 blocks; 271912 free (7 chunks); 1833516 used > HashBatchContext: 4202580 total in 10 blocks; 927408 free (13 chunks); 3275172 used > TupleSort: 75489304 total in 18 blocks; 7909776 free (29 chunks); 67579528 used > TupleSort: 9429016 total in 11 blocks; 155224 free (16 chunks); 9273792 used > TupleSort: 46129176 total in 15 blocks; 5787984 free (19 chunks); 40341192 used > TupleSort: 62906392 total in 17 blocks; 8340448 free (16 chunks); 54565944 used > HashBatchContext: 2105428 total in 9 blocks; 271912 free (7 chunks); 1833516 used > TupleSort: 134209560 total in 24 blocks; 4506232 free (41 chunks); 129703328 used > TupleSort: 18866200 total in 12 blocks; 2182552 free (17 chunks); 16683648 used > HashBatchContext: 2105428 total in 9 blocks; 271912 free (7 chunks); 1833516 used > HashBatchContext: 4202580 total in 10 blocks; 927408 free (13 chunks); 3275172 used > TupleSort: 37740568 total in 14 blocks; 1239480 free (21 chunks); 36501088 used > TupleSort: 4710424 total in 10 blocks; 307496 free (15 chunks); 4402928 used > TupleSort: 27254808 total in 13 blocks; 6921864 free (17 chunks); 20332944 used > TupleSort: 134209560 total in 25 blocks; 6873024 free (39 chunks); 127336536 used > TupleSort: 39837720 total in 15 blocks; 3136080 free (34 chunks); 36701640 used and you just plain don't have enough memory for that large a multiple of work_mem. regards, tom lane
> It looks to me like you have work_mem set optimistically large. This
> query seems to be doing *many* large sorts and hashes:
I have work_mem set to 256MB. Reading in PG documentation I now realize that "several sort or hash operations might be running in parallel". So this is most likely the problem, although I don't really understand why memory never seems to increase for any of the other queries (not executed in a transaction). Some of these are at least the size of the query that is giving problems.
Btw, is there some way to determine up front how many sort or hash operations will be running in parallel for a given query?
Regards
> query seems to be doing *many* large sorts and hashes:
I have work_mem set to 256MB. Reading in PG documentation I now realize that "several sort or hash operations might be running in parallel". So this is most likely the problem, although I don't really understand why memory never seems to increase for any of the other queries (not executed in a transaction). Some of these are at least the size of the query that is giving problems.
Btw, is there some way to determine up front how many sort or hash operations will be running in parallel for a given query?
Regards
Express yourself instantly with MSN Messenger! MSN Messenger
On Oct 12, 2007, at 4:09 PM, henk de wit wrote: > > It looks to me like you have work_mem set optimistically large. This > > query seems to be doing *many* large sorts and hashes: > > I have work_mem set to 256MB. Reading in PG documentation I now > realize that "several sort or hash operations might be running in > parallel". So this is most likely the problem, although I don't > really understand why memory never seems to increase for any of the > other queries (not executed in a transaction). Some of these are at > least the size of the query that is giving problems. Wow. That's inordinately high. I'd recommend dropping that to 32-43MB. > > Btw, is there some way to determine up front how many sort or hash > operations will be running in parallel for a given query? Explain is your friend in that respect. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
> > I have work_mem set to 256MB.
> Wow. That's inordinately high. I'd recommend dropping that to 32-43MB.
Ok, it seems I was totally wrong with the work_mem setting. I'll adjust it to a more saner level. Thanks a lot for the advice everyone!
> Explain is your friend in that respect.
It shows all the operators, but it doesn't really say that these all will actually run in parallel right? Of course I guess it would give a good idea about what the upper bound is.
regards
Express yourself instantly with MSN Messenger! MSN Messenger
> Wow. That's inordinately high. I'd recommend dropping that to 32-43MB.
Ok, it seems I was totally wrong with the work_mem setting. I'll adjust it to a more saner level. Thanks a lot for the advice everyone!
> Explain is your friend in that respect.
It shows all the operators, but it doesn't really say that these all will actually run in parallel right? Of course I guess it would give a good idea about what the upper bound is.
regards
Express yourself instantly with MSN Messenger! MSN Messenger
On Oct 12, 2007, at 4:48 PM, henk de wit wrote: > > > I have work_mem set to 256MB. > > Wow. That's inordinately high. I'd recommend dropping that to > 32-43MB. > > Ok, it seems I was totally wrong with the work_mem setting. I'll > adjust it to a more saner level. Thanks a lot for the advice everyone! > > > Explain is your friend in that respect. > > It shows all the operators, but it doesn't really say that these > all will actually run in parallel right? Of course I guess it would > give a good idea about what the upper bound is. You can determine what runs in parellel based on the indentation of the output. Items at the same indentation level under the same "parent" line will run in parallel Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
henk de wit escribió: > > How is the memory consumed? How are you measuring it? I assume you > > mean the postgres process that is running the query uses the memory. > > If so, which tool(s) are you using and what's the output that shows it > > being used? > > It's periodically measured and recorded by a script from which the relevant parts are: > > GET_VSZ="ps aux | grep $REQ_GREP | grep -v grep | grep -v $$ | awk '{print \$5}' > | sort -n | tail -n1"; > GET_RSS="ps aux | grep $REQ_GREP | grep -v grep | grep -v $$ | awk '{print \$6}' > | sort -n | tail -n1"; Huh, this seems really ugly, have you tried something like just $ ps -o cmd:50,vsz,rss -C postmaster CMD VSZ RSS /pgsql/install/00head/bin/postmaster 51788 3992 postgres: writer process 51788 1060 postgres: wal writer process 51788 940 postgres: autovacuum launcher process 51924 1236 postgres: stats collector process 22256 896 -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > henk de wit escribi�: >>> How is the memory consumed? How are you measuring it? I assume you >>> mean the postgres process that is running the query uses the memory. >>> If so, which tool(s) are you using and what's the output that shows it >>> being used? >> >> It's periodically measured and recorded by a script from which the relevant parts are: >> >> GET_VSZ="ps aux | grep $REQ_GREP | grep -v grep | grep -v $$ | awk '{print \$5}' >> | sort -n | tail -n1"; >> GET_RSS="ps aux | grep $REQ_GREP | grep -v grep | grep -v $$ | awk '{print \$6}' >> | sort -n | tail -n1"; On many variants of Unix, this is going to lead to a totally misleading number. The first problem is that shared buffers will be counted multiple times (once for each PG process). The second problem is that, depending on platform, a specific page of shared memory is counted against a process only after it first touches that page. This means that when you run a big seqscan, or anything else that touches a lot of buffers, the reported size of the process gradually increases from just its local memory space to its local memory space plus the total size of the Postgres shared buffer arena. This change in the reported size is *utterly meaningless* in terms of actual memory consumption. It's not easy to get useful measurements from "ps" when dealing with heavy memory sharing. There have been some discussions recently of alternative tools that let you get a clearer picture; check the PG list archives. regards, tom lane