Thread: ERROR: Out of memory - when connecting to database
Hi, we have several instances of following error in server log: 2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory 2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384. It's always the first log message from the backend. We're trying to trace it down. Whether it's always connection attempt to the same database or not - I don't know at the moment. Sometimes the error message is preceded by memory stats which are below in the email. Other relevant data: Linux, PostgreSQL 8.2.10 RAM 28GB max_connections = 2048 shared_buffers = 2048MB temp_buffers = 32MB max_prepared_transactions = 0 max_fsm_pages = 10000000 max_fsm_relations = 100000 There are cca 1200 concurrent database connections (active backends). I know it's too much, we're trying to reduce the number but it's not that easy because of large number of databases and heavy use of listen/notify so connection pooler doesn't help... What can cause this error? What parameter should be raised? Thanks, Kuba Messages preceding ERROR: out ouf memory message TopMemoryContext: 581920 total in 32 blocks; 13760 free (15 chunks); 568160 used TopTransactionContext: 85077936 total in 20 blocks; 4523352 free (9 chunks); 80554584 used Analyze Index: 1358288 total in 3 blocks; 7408 free (0 chunks); 1350880 used ExecutorState: 8192 total in 1 blocks; 4928 free (0 chunks); 3264 used ExprContext: 13664019952 total in 564677 blocks; 968292944 free (117296 chunks); 12695727008 used Analyze Column: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used Autovacuum context: 57344 total in 3 blocks; 35624 free (13 chunks); 21720 used CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used MbProcContext: 17408 total in 2 blocks; 17184 free (8 chunks); 224 used smgr relation table: 24576 total in 2 blocks; 5648 free (4 chunks); 18928 used TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used CacheMemoryContext: 800552 total in 19 blocks; 162536 free (2 chunks); 638016 used pg_toast_1152341368_index: 2048 total in 1 blocks; 680 free (0 chunks); 1368 used index_name1: 2048 total in 1 blocks; 512 free (0 chunks); 1536 used index_name2: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used ... Per-database table: 253952 total in 5 blocks; 120064 free (21 chunks); 133888 used Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks); 115408 used Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks); 115408 used Per-database table: 57344 total in 3 blocks; 34352 free (11 chunks); 22992 used ... Databases hash: 122880 total in 4 blocks; 66912 free (13 chunks); 55968 used MdSmgr: 8192 total in 1 blocks; 1664 free (0 chunks); 6528 used LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used PLpgSQL function cache: 8192 total in 1 blocks; 3744 free (0 chunks); 4448 used Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used Postmaster: 253952 total in 5 blocks; 65400 free (8 chunks); 188552 used ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
On Mon, Nov 08, 2010 at 07:19:43PM +0100, Jakub Ouhrabka wrote: > Hi, > > we have several instances of following error in server log: > > 2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory > 2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384. > > It's always the first log message from the backend. We're trying to > trace it down. Whether it's always connection attempt to the same > database or not - I don't know at the moment. > > Sometimes the error message is preceded by memory stats which are > below in the email. > > Other relevant data: > Linux, PostgreSQL 8.2.10 > RAM 28GB > > max_connections = 2048 > > shared_buffers = 2048MB > > temp_buffers = 32MB > max_prepared_transactions = 0 > > max_fsm_pages = 10000000 > max_fsm_relations = 100000 is it 32bit or 64bit machine? what's the work_mem? Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
> is it 32bit or 64bit machine? 64bit > what's the work_mem? 64MB Kuba Dne 8.11.2010 19:52, hubert depesz lubaczewski napsal(a): > On Mon, Nov 08, 2010 at 07:19:43PM +0100, Jakub Ouhrabka wrote: >> Hi, >> >> we have several instances of following error in server log: >> >> 2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory >> 2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384. >> >> It's always the first log message from the backend. We're trying to >> trace it down. Whether it's always connection attempt to the same >> database or not - I don't know at the moment. >> >> Sometimes the error message is preceded by memory stats which are >> below in the email. >> >> Other relevant data: >> Linux, PostgreSQL 8.2.10 >> RAM 28GB >> >> max_connections = 2048 >> >> shared_buffers = 2048MB >> >> temp_buffers = 32MB >> max_prepared_transactions = 0 >> >> max_fsm_pages = 10000000 >> max_fsm_relations = 100000 > > is it 32bit or 64bit machine? > > what's the work_mem? > > Best regards, > > depesz >
Replaying to my own mail. Maybe we've found the root cause: In one database there was a table with 200k records where each record contained 15kB bytea field. Auto-ANALYZE was running on that table continuously (with statistics target 500). When we avoid the auto-ANALYZE via UPDATE table set bytea_column = null; CLUSTER table; the problem with ERROR: out of memory went away. Could it be that the failed connections were issued by autovacuum? Thanks, Kuba Dne 8.11.2010 19:19, Jakub Ouhrabka napsal(a): > Hi, > > we have several instances of following error in server log: > > 2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory > 2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384. > > It's always the first log message from the backend. We're trying to > trace it down. Whether it's always connection attempt to the same > database or not - I don't know at the moment. > > Sometimes the error message is preceded by memory stats which are below > in the email. > > Other relevant data: > Linux, PostgreSQL 8.2.10 > RAM 28GB > > max_connections = 2048 > > shared_buffers = 2048MB > > temp_buffers = 32MB > max_prepared_transactions = 0 > > max_fsm_pages = 10000000 > max_fsm_relations = 100000 > > > There are cca 1200 concurrent database connections (active backends). I > know it's too much, we're trying to reduce the number but it's not that > easy because of large number of databases and heavy use of listen/notify > so connection pooler doesn't help... > > What can cause this error? What parameter should be raised? > > Thanks, > > Kuba > > Messages preceding ERROR: out ouf memory message > > TopMemoryContext: 581920 total in 32 blocks; 13760 free (15 chunks); > 568160 used > TopTransactionContext: 85077936 total in 20 blocks; 4523352 free (9 > chunks); 80554584 used > Analyze Index: 1358288 total in 3 blocks; 7408 free (0 chunks); 1350880 > used > ExecutorState: 8192 total in 1 blocks; 4928 free (0 chunks); 3264 used > ExprContext: 13664019952 total in 564677 blocks; 968292944 free (117296 > chunks); 12695727008 used > Analyze Column: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used > Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); > 12688 used > Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 > used > Autovacuum context: 57344 total in 3 blocks; 35624 free (13 chunks); > 21720 used > CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used > MbProcContext: 17408 total in 2 blocks; 17184 free (8 chunks); 224 used > smgr relation table: 24576 total in 2 blocks; 5648 free (4 chunks); > 18928 used > TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); > 32 used > Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used > PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used > Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used > CacheMemoryContext: 800552 total in 19 blocks; 162536 free (2 chunks); > 638016 used > pg_toast_1152341368_index: 2048 total in 1 blocks; 680 free (0 chunks); > 1368 used > index_name1: 2048 total in 1 blocks; 512 free (0 chunks); 1536 used > index_name2: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used > ... > Per-database table: 253952 total in 5 blocks; 120064 free (21 chunks); > 133888 used > Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks); > 115408 used > Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks); > 115408 used > Per-database table: 57344 total in 3 blocks; 34352 free (11 chunks); > 22992 used > ... > Databases hash: 122880 total in 4 blocks; 66912 free (13 chunks); 55968 > used > MdSmgr: 8192 total in 1 blocks; 1664 free (0 chunks); 6528 used > LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used > Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks); > 6512 used > PLpgSQL function cache: 8192 total in 1 blocks; 3744 free (0 chunks); > 4448 used > Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used > Postmaster: 253952 total in 5 blocks; 65400 free (8 chunks); 188552 used > ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
On Mon, Nov 08, 2010 at 08:04:32PM +0100, Jakub Ouhrabka wrote: > > is it 32bit or 64bit machine? > > 64bit > > > what's the work_mem? > > 64MB that's *way* too much with 24GB of ram and > 1k connections. please lower it to 32MB or even less. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
>>> what's the work_mem? >> >> 64MB > > that's *way* too much with 24GB of ram and> 1k connections. please > lower it to 32MB or even less. Thanks for your reply. You are generally right. But in our case most of the backends are only waiting for notify so not consuming any work_mem. The server is not swapping. On the other hand some queries profit from it. So we think it's fine as is. Regards, Kuba
> Date: Mon, 8 Nov 2010 20:05:23 +0100
> From: kuba@comgate.cz
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] ERROR: Out of memory - when connecting to database
>
> Replaying to my own mail. Maybe we've found the root cause:
>
> In one database there was a table with 200k records where each record
> contained 15kB bytea field. Auto-ANALYZE was running on that table
> continuously (with statistics target 500). When we avoid the
> auto-ANALYZE via UPDATE table set bytea_column = null; CLUSTER table;
> the problem with ERROR: out of memory went away.
Run pgtune against you configuration and it will tell you what is recommended.
Do you really have 2048 connections to the box?
If yes, maybe you need to run pgbouncer with 2048 connections into pgbouncer concentrator and
100 connections to postgres? Will reduce your resource used significantly.
Chris
2010/11/8 Jakub Ouhrabka <kuba@comgate.cz>: > Replaying to my own mail. Maybe we've found the root cause: > > In one database there was a table with 200k records where each record > contained 15kB bytea field. Auto-ANALYZE was running on that table > continuously (with statistics target 500). When we avoid the auto-ANALYZE > via UPDATE table set bytea_column = null; CLUSTER table; the problem with > ERROR: out of memory went away. > > Could it be that the failed connections were issued by autovacuum? > I think so not. Probably it use a different plan with different memory requests. This is relative typical situation when statistics are out together with HASH JOIN or HASH AGG. These two operations can get unlimited memory. Send a plans of your queries for both cases. Regards Pavel Stehule > Thanks, > > Kuba > > Dne 8.11.2010 19:19, Jakub Ouhrabka napsal(a): >> >> Hi, >> >> we have several instances of following error in server log: >> >> 2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory >> 2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384. >> >> It's always the first log message from the backend. We're trying to >> trace it down. Whether it's always connection attempt to the same >> database or not - I don't know at the moment. >> >> Sometimes the error message is preceded by memory stats which are below >> in the email. >> >> Other relevant data: >> Linux, PostgreSQL 8.2.10 >> RAM 28GB >> >> max_connections = 2048 >> >> shared_buffers = 2048MB >> >> temp_buffers = 32MB >> max_prepared_transactions = 0 >> >> max_fsm_pages = 10000000 >> max_fsm_relations = 100000 >> >> >> There are cca 1200 concurrent database connections (active backends). I >> know it's too much, we're trying to reduce the number but it's not that >> easy because of large number of databases and heavy use of listen/notify >> so connection pooler doesn't help... >> >> What can cause this error? What parameter should be raised? >> >> Thanks, >> >> Kuba >> >> Messages preceding ERROR: out ouf memory message >> >> TopMemoryContext: 581920 total in 32 blocks; 13760 free (15 chunks); >> 568160 used >> TopTransactionContext: 85077936 total in 20 blocks; 4523352 free (9 >> chunks); 80554584 used >> Analyze Index: 1358288 total in 3 blocks; 7408 free (0 chunks); 1350880 >> used >> ExecutorState: 8192 total in 1 blocks; 4928 free (0 chunks); 3264 used >> ExprContext: 13664019952 total in 564677 blocks; 968292944 free (117296 >> chunks); 12695727008 used >> Analyze Column: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used >> Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); >> 12688 used >> Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 >> used >> Autovacuum context: 57344 total in 3 blocks; 35624 free (13 chunks); >> 21720 used >> CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used >> MbProcContext: 17408 total in 2 blocks; 17184 free (8 chunks); 224 used >> smgr relation table: 24576 total in 2 blocks; 5648 free (4 chunks); >> 18928 used >> TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); >> 32 used >> Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used >> PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used >> Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used >> CacheMemoryContext: 800552 total in 19 blocks; 162536 free (2 chunks); >> 638016 used >> pg_toast_1152341368_index: 2048 total in 1 blocks; 680 free (0 chunks); >> 1368 used >> index_name1: 2048 total in 1 blocks; 512 free (0 chunks); 1536 used >> index_name2: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used >> ... >> Per-database table: 253952 total in 5 blocks; 120064 free (21 chunks); >> 133888 used >> Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks); >> 115408 used >> Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks); >> 115408 used >> Per-database table: 57344 total in 3 blocks; 34352 free (11 chunks); >> 22992 used >> ... >> Databases hash: 122880 total in 4 blocks; 66912 free (13 chunks); 55968 >> used >> MdSmgr: 8192 total in 1 blocks; 1664 free (0 chunks); 6528 used >> LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used >> Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks); >> 6512 used >> PLpgSQL function cache: 8192 total in 1 blocks; 3744 free (0 chunks); >> 4448 used >> Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used >> Postmaster: 253952 total in 5 blocks; 65400 free (8 chunks); 188552 used >> ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Jakub Ouhrabka <kuba@comgate.cz> writes: > Could it be that the failed connections were issued by autovacuum? They clearly were: notice the reference to "Autovacuum context" in the memory map. I think you are right to suspect that auto-analyze was getting blown out by the wide bytea columns. Did you have any expression indexes involving those columns? regards, tom lane
> They clearly were: notice the reference to "Autovacuum context" in the > memory map. I think you are right to suspect that auto-analyze was > getting blown out by the wide bytea columns. Did you have any > expression indexes involving those columns? Yes, there are two unique btree indexes: (col1, col2, col3, md5(array_to_string(col_bytea_arr, ''::text)) where col4 is not null (col1, col2, col3, md5(array_to_string(col_bytea_arr, ''::text)) where col4 is null What is the cause of the out of memory then? Thank you, Kuba
Jakub Ouhrabka <kuba@comgate.cz> writes: >>> They clearly were: notice the reference to "Autovacuum context" in the >>> memory map. I think you are right to suspect that auto-analyze was >>> getting blown out by the wide bytea columns. Did you have any >>> expression indexes involving those columns? > Yes, there are two unique btree indexes: > (col1, col2, col3, md5(array_to_string(col_bytea_arr, ''::text)) where > col4 is not null > (col1, col2, col3, md5(array_to_string(col_bytea_arr, ''::text)) where > col4 is null > What is the cause of the out of memory then? Hmm. I suspect that evaluation of that md5() call is resulting in memory leakage, but not sure why or where exactly. regards, tom lane
Replaying to my own mail. Maybe we've found the root cause: In one database there was a table with 200k records where each record contained 15kB bytea field. Auto-ANALYZE was running on that table continuously (with statistics target 500). When we avoid the auto-ANALYZE via UPDATE table set bytea_column = null; CLUSTER table; the problem with ERROR: out of memory went away. Could it be that the failed connections were issued by autovacuum? Thanks, Kuba Dne 8.11.2010 19:19, Jakub Ouhrabka napsal(a): > Hi, > > we have several instances of following error in server log: > > 2010-11-08 18:44:18 CET 5177 1 @ ERROR: out of memory > 2010-11-08 18:44:18 CET 5177 2 @ DETAIL: Failed on request of size 16384. > > It's always the first log message from the backend. We're trying to > trace it down. Whether it's always connection attempt to the same > database or not - I don't know at the moment. > > Sometimes the error message is preceded by memory stats which are below > in the email. > > Other relevant data: > Linux, PostgreSQL 8.2.10 > RAM 28GB > > max_connections = 2048 > > shared_buffers = 2048MB > > temp_buffers = 32MB > max_prepared_transactions = 0 > > max_fsm_pages = 10000000 > max_fsm_relations = 100000 > > > There are cca 1200 concurrent database connections (active backends). I > know it's too much, we're trying to reduce the number but it's not that > easy because of large number of databases and heavy use of listen/notify > so connection pooler doesn't help... > > What can cause this error? What parameter should be raised? > > Thanks, > > Kuba > > Messages preceding ERROR: out ouf memory message > > TopMemoryContext: 581920 total in 32 blocks; 13760 free (15 chunks); > 568160 used > TopTransactionContext: 85077936 total in 20 blocks; 4523352 free (9 > chunks); 80554584 used > Analyze Index: 1358288 total in 3 blocks; 7408 free (0 chunks); 1350880 > used > ExecutorState: 8192 total in 1 blocks; 4928 free (0 chunks); 3264 used > ExprContext: 13664019952 total in 564677 blocks; 968292944 free (117296 > chunks); 12695727008 used > Analyze Column: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used > Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); > 12688 used > Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 > used > Autovacuum context: 57344 total in 3 blocks; 35624 free (13 chunks); > 21720 used > CFuncHash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used > MbProcContext: 17408 total in 2 blocks; 17184 free (8 chunks); 224 used > smgr relation table: 24576 total in 2 blocks; 5648 free (4 chunks); > 18928 used > TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); > 32 used > Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used > PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used > Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used > CacheMemoryContext: 800552 total in 19 blocks; 162536 free (2 chunks); > 638016 used > pg_toast_1152341368_index: 2048 total in 1 blocks; 680 free (0 chunks); > 1368 used > index_name1: 2048 total in 1 blocks; 512 free (0 chunks); 1536 used > index_name2: 2048 total in 1 blocks; 528 free (0 chunks); 1520 used > ... > Per-database table: 253952 total in 5 blocks; 120064 free (21 chunks); > 133888 used > Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks); > 115408 used > Per-database table: 253952 total in 5 blocks; 138544 free (22 chunks); > 115408 used > Per-database table: 57344 total in 3 blocks; 34352 free (11 chunks); > 22992 used > ... > Databases hash: 122880 total in 4 blocks; 66912 free (13 chunks); 55968 > used > MdSmgr: 8192 total in 1 blocks; 1664 free (0 chunks); 6528 used > LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used > Rendezvous variable hash: 8192 total in 1 blocks; 1680 free (0 chunks); > 6512 used > PLpgSQL function cache: 8192 total in 1 blocks; 3744 free (0 chunks); > 4448 used > Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used > Postmaster: 253952 total in 5 blocks; 65400 free (8 chunks); 188552 used > ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used