Thread: Running out of memory at vacuum

Running out of memory at vacuum

From
Ioana Danes
Date:
Hi all,

I have a production database that sometimes runs out of memory at nightly vacuum.

The application runs typically with around 40 postgres connections but there are times when the connections increase
becauseof some queries going on. The reason is that the operations are slow, the terminals time out and try to
reconnectusing new connections. Some time ago I started to have problems with too many connections being open so I
loweredthe limit to 300 connections. It was all good until recently when even with 180 connections I am running out of
memoryon vacuum... So the connections increase to 180 and the system still runs properly for other 2 days but then at
nightlyvacuum runs out of memory. 
The fix is to restart postgres ... If I only close the connections the problem is still these so I need to restart
postgres.
If I don't restart postgres then the system will run out of memory on queries at a point...
Another important thing is that during vacuum at 1am nothing else is going on that server so all the connections are
idle. 

2013-05-14 06:53:51.449
CST,"postgres","abrazo",8741,"[local]",519233dc.2225,3,"VACUUM",2013-05-14 06:53:48 CST,174/67143,0,ERROR,53200,"out of
memory","Failedon request of size 668141820.",,,,,"vacuum;",,,"psql" 

OS:

SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2

Suse compiled postgre version :
                                                          version

----------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.0.3 on x86_64-suse-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973],
64-bit



Postgres configuration parameters:

max_connections = 300
shared_buffers = 2GB
maintenance_work_mem = 2GB
effective_cache_size = 4GB
work_mem = 2MB
wal_buffers = 1MB


checkpoint_segments = 16
autovacuum = on


wal_level = hot_standby
archive_mode = on
archive_command = 'test -f /cbnDBscripts/tmp/PITR_primarydb_stop_backup || rsync --timeout=30 -atz %p
spsnidb1:/data01/wal_files/%f'
archive_timeout = 60


free
                  total       used       free     shared    buffers     cached
Mem:       7880512    7825136      55376          0      72376    4537644
-/+ buffers/cache:    3215116    4665396
Swap:      2097148          0    2097148

/etc/sysctl.conf

kernel.shmmax=2684354560
kernel.shmall=2684354560
vm.swappiness=0
vm.overcommit_memory=2

postgres log:
TopMemoryContext: 179088 total in 18 blocks; 8608 free (13 chunks); 170480 used
  TopTransactionContext: 24576 total in 2 blocks; 21888 free (43 chunks); 2688 used
  Attopt cache: 57344 total in 3 blocks; 25824 free (5 chunks); 31520 used
  Type information cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
  MessageContext: 8192 total in 1 blocks; 4952 free (1 chunks); 3240 used
  Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  smgr relation table: 57344 total in 3 blocks; 21984 free (10 chunks); 35360 used
  TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
  Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
  PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
    PortalHeapMemory: 2048 total in 1 blocks; 2016 free (3 chunks); 32 used
      Vacuum: 24576 total in 2 blocks; 15392 free (0 chunks); 9184 used
  Relcache by OID: 24576 total in 2 blocks; 6592 free (3 chunks); 17984 used
  CacheMemoryContext: 2390704 total in 22 blocks; 641048 free (4 chunks); 1749656 used
    pk_cbntransaction: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    rel_transactionretailerid_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    rel_transaction_promotion_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    reference_64_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    reference_49_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    reference_46_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    reference_28_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_transaction_date: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
    ix_referenceid_transac: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_onlinetick_cbntrans: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_drawid_transaction: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_cbntransaction_terminalid_transactiondate: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    ix_cbntransaction_purchaseid: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pk_cardactivity: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pk_business_type: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_207856_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_bank: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_audit_1: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
    pg_toast_207843_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_itshipment: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
    pk_itpack: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
    pg_toast_25392890_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_drawpatterns: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
    pg_toast_50580_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    rel_draw_gameid_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    reference_29_fk: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pk_draw: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_62768_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_39412_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    ix_a_gamewinner_2006_7: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_a_gamewinner_2006_6: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_a_gamewinner_2006_5: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_a_gamewinner_2006_4: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_a_gamewinner_2006_2: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_a_gamewinner_2006_1: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_a_gamewinner_2006: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_39404_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    ix_a_gamewinner_2005_7: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_a_gamewinner_2005_6: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_a_gamewinner_2005_5: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_a_gamewinner_2005_4: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_a_gamewinner_2005_2: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_a_gamewinner_2005_1: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_a_gamewinner_2005: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_39428_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_a_cbntransaction_2008: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_cashdrawerbalance_1: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_cashdrawerbalance: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_62658_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_cashdrawer: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pk_account: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    ix_cashdrawertransaction_cashdrawerid_transdate: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_cashdrawertransaction: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_62672_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    ix_cashdrawerbalanceitem_1: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_62666_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_22502278_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_26623904_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_26623898_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_dbversion: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
    pg_toast_39501_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_a_matchgameplay_2009: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_39436_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_a_matchgameplay_2008: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_39396_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_a_matchgameplay_2007: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_39388_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_a_matchgameplay_2006: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_39380_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_a_matchgameplay_2005: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pk_accountforms: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_validzreporttypes: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_63590_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    ix_validtransactiontype_userdesc: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pk_validtransactiontype: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pk_validtransactionstatus: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pk_address: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_62598_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_62590_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_activitycontext: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_25232699_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_25232693_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_26623892_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_26623886_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_cast_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_507675_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_a_cbntransaction_2013: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_21922437_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_dbcompare_tables: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_507667_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pk_a_cbntransaction_2012: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_499794_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_inherits_relid_seqno_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_inherits_parent_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_11474_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_11469_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_11464_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_2604_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_attrdef_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
    pg_toast_11459_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_11454_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_largeobject_loid_pn_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_default_acl_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_2396_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_shdescription_o_c_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_shdepend_reference_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_shdepend_depender_index: 2048 total in 1 blocks; 440 free (0 chunks); 1608 used
    pg_pltemplate_name_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_tablespace_spcname_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_2964_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_depend_reference_index: 2048 total in 1 blocks; 456 free (0 chunks); 1592 used
    pg_depend_depender_index: 2048 total in 1 blocks; 456 free (0 chunks); 1592 used
    pg_toast_2609_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_description_o_c_o_index: 2048 total in 1 blocks; 504 free (0 chunks); 1544 used
    pg_toast_2620_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_trigger_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_trigger_tgconstraint_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_2618_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_rewrite_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_largeobject_metadata_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_amproc_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_amop_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_am_oid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_am_name_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_toast_1262_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_2606_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_constraint_contypid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_constraint_conrelid_index: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
    pg_constraint_conname_nsp_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_11484_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_11479_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_1255_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_toast_2619_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
    pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
    pg_db_role_setting_databaseid_rol_index: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
    pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1448 free (4 chunks); 1624 used
    pg_foreign_data_wrapper_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_class_relname_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_foreign_server_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1448 free (4 chunks); 1624 used
    pg_cast_source_target_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
    pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_ts_config_map_index: 3072 total in 2 blocks; 1496 free (4 chunks); 1576 used
    pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_foreign_data_wrapper_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used
    pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used
    pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_user_mapping_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1496 free (4 chunks); 1576 used
    pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_operator_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_namespace_nspname_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_amop_opr_fam_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1496 free (4 chunks); 1576 used
    pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_type_typname_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1496 free (4 chunks); 1576 used
    pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
    pg_proc_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_namespace_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
    pg_foreign_server_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_user_mapping_user_server_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_tablespace_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
    pg_database_datname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
    pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
    pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
  MdSmgr: 24576 total in 2 blocks; 9504 free (0 chunks); 15072 used
  LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
  Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
  ErrorContext: 8192 total in 1 blocks; 8160 free (4 chunks); 32 used


Any suggestions are greatly appreciated

Thanks,
ioana



Re: Running out of memory at vacuum

From
Jeff Janes
Date:
On Tuesday, May 14, 2013, Ioana Danes wrote:
Hi all,

I have a production database that sometimes runs out of memory at nightly vacuum.

The application runs typically with around 40 postgres connections but there are times when the connections increase because of some queries going on.

You should consider not allowing that to happen.  If 40 connections is the sweet spot for your hardware, then you can't change reality simply by changing the connection pool setting.
 
The reason is that the operations are slow, the terminals time out and try to reconnect using new connections.


So, "The beatings will continue until morale improves".  You should consider not allowing that to happen, either.  If the original operation is slow, why would trying it again on a new connection (while leaving the original one behind to clog things up) be any better?


 
Some time ago I started to have problems with too many connections being open so I lowered the limit to 300 connections. It was all good until recently when even with 180 connections I am running out of memory on vacuum... So the connections increase to 180 and the system still runs properly for other 2 days but then at nightly vacuum runs out of memory.
The fix is to restart postgres ... If I only close the connections the problem is still these so I need to restart postgres.

How are you closing the connections?

 
If I don't restart postgres then the system will run out of memory on queries at a point...
Another important thing is that during vacuum at 1am nothing else is going on that server so all the connections are idle.

Truly idle, or idle in transaction, or not even that?  If the "abandoned" connections have completed whatever long running thing caused them to be abandoned, then they should no longer exist.  What are the largest processes according to top?

You seem to have a series of unfortunate events here, but I think you are tackling them from the wrong end.  What are these slow queries that take up a lot of memory, and why can't they be improved?  Given that you have slow queries that take a lot of memory, why does your app respond to this by launching a Denial of Service attack against its own database, and why do you let it do that?

Cheers,

Jeff

Re: Running out of memory at vacuum

From
Ioana Danes
Date:
Hi Jeff,

On Tuesday, May 14, 2013, Ioana Danes wrote:
Hi all,

I have a production database that sometimes runs out of memory at nightly vacuum.

The application runs typically with around 40 postgres connections but there are times when the connections increase because of some queries going on.

You should consider not allowing that to happen.  If 40 connections is the sweet spot for your hardware, then you can't change reality simply by changing the connection pool setting.
 
Yes I agree

The reason is that the operations are slow, the terminals time out and try to reconnect using new connections.


So, "The beatings will continue until morale improves".  You should consider not allowing that to happen, either.  If the original operation is slow, why would trying it again on a new connection (while leaving the original one behind to clog things up) be any better?

Yes I agree it is just hard to distinguish between a real and false request for a new connections (high sales or bad queries).
 
Some time ago I started to have problems with too many connections being open so I lowered the limit to 300 connections. It was all good until recently when even with 180 connections I am running out of memory on vacuum... So the connections increase to 180 and the system still runs properly for other 2 days but then at nightly vacuum runs out of memory.
The fix is to restart postgres ... If I only close the connections the problem is still these so I need to restart postgres.

How are you closing the connections?

I restart the application server. The problem is that the max_idle connections was set to 1000 on jdbc connection so once the spike happened the app would run with 300 connections and 250 of them or so IDLE for most of the time. I am fixing that
 
If I don't restart postgres then the system will run out of memory on queries at a point...
Another important thing is that during vacuum at 1am nothing else is going on that server so all the connections are idle.

Truly idle, or idle in transaction, or not even that?  If the "abandoned" connections have completed whatever long running thing caused them to be abandoned, then they should no longer exist.  What are the largest processes according to top?
They were IDLE (not IDLE IN TRANSACTION)

You seem to have a series of unfortunate events here, but I think you are tackling them from the wrong end.  What are these slow queries that take up a lot of memory, and why can't they be improved?  Given that you have slow queries that take a lot of memory, why does your app respond to this by launching a Denial of Service attack against its own database, and why do you let it do that?

I never said that I am ignoring the bad queries or the bad processes. Sometimes it just takes time to catch and fix. It is just the reality....
Not all the time the pb is very obvious and easy to reproduce.


Thank you very much for your response,
ioana

Re: Running out of memory at vacuum

From
Jeff Janes
Date:
On Thu, May 16, 2013 at 6:35 AM, Ioana Danes <ioanasoftware@yahoo.ca> wrote:
Hi Jeff,

On Tuesday, May 14, 2013, Ioana Danes wrote:

The fix is to restart postgres ... If I only close the connections the problem is still these so I need to restart postgres.

How are you closing the connections?

I restart the application server. The problem is that the max_idle connections was set to 1000 on jdbc connection so once the spike happened the app would run with 300 connections and 250 of them or so IDLE for most of the time. I am fixing that


Hi Ionana,  thanks for the responses.  Does restarting the app server successfully cause all of those connections to terminate?  If so, and yet you still have memory problems, then there is still the mystery of where the memory is going.

Cheers,

Jeff


Re: Running out of memory at vacuum

From
Ioana Danes
Date:
Hi Jeff,

Yes stop/start of the application server does close all the connections to the database.
Lately I did restart postgres too everytime that happened. It did happen in the past, last year sometime when I tried just to close the app and it was not enough. I might mix up different scenarios thought because I did have another issue when the by mistake the max connections were set to 1000 and run out of memory for good reason. So it might have been happened in that case not now.

I will keep you updated.

Thank you,
Ioana

From: Jeff Janes <jeff.janes@gmail.com>
To: Ioana Danes <ioanasoftware@yahoo.ca>
Cc: PostgreSQL General <pgsql-general@postgresql.org>
Sent: Thursday, May 16, 2013 9:56:07 AM
Subject: Re: [GENERAL] Running out of memory at vacuum

On Thu, May 16, 2013 at 6:35 AM, Ioana Danes <ioanasoftware@yahoo.ca> wrote:
Hi Jeff,

On Tuesday, May 14, 2013, Ioana Danes wrote:

The fix is to restart postgres ... If I only close the connections the problem is still these so I need to restart postgres.

How are you closing the connections?

I restart the application server. The problem is that the max_idle connections was set to 1000 on jdbc connection so once the spike happened the app would run with 300 connections and 250 of them or so IDLE for most of the time. I am fixing that


Hi Ionana,  thanks for the responses.  Does restarting the app server successfully cause all of those connections to terminate?  If so, and yet you still have memory problems, then there is still the mystery of where the memory is going.

Cheers,

Jeff




Re: Running out of memory at vacuum

From
Tony Dare
Date:
On 05/16/2013 07:13 AM, Ioana Danes wrote:
Hi Jeff,

Yes stop/start of the application server does close all the connections to the database.
Lately I did restart postgres too everytime that happened. It did happen in the past, last year sometime when I tried just to close the app and it was not enough. I might mix up different scenarios thought because I did have another issue when the by mistake the max connections were set to 1000 and run out of memory for good reason. So it might have been happened in that case not now.

I will keep you updated.

Thank you,
Ioana

From: Jeff Janes <jeff.janes@gmail.com>
To: Ioana Danes <ioanasoftware@yahoo.ca>
Cc: PostgreSQL General <pgsql-general@postgresql.org>
Sent: Thursday, May 16, 2013 9:56:07 AM
Subject: Re: [GENERAL] Running out of memory at vacuum

On Thu, May 16, 2013 at 6:35 AM, Ioana Danes <ioanasoftware@yahoo.ca> wrote:
Hi Jeff,

On Tuesday, May 14, 2013, Ioana Danes wrote:

The fix is to restart postgres ... If I only close the connections the problem is still these so I need to restart postgres.

How are you closing the connections?

I restart the application server. The problem is that the max_idle connections was set to 1000 on jdbc connection so once the spike happened the app would run with 300 connections and 250 of them or so IDLE for most of the time. I am fixing that


Hi Ionana,  thanks for the responses.  Does restarting the app server successfully cause all of those connections to terminate?  If so, and yet you still have memory problems, then there is still the mystery of where the memory is going.

Cheers,

Jeff




From http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

work_mem maintainance_work_mem

If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents.

This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem. You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory.

maintenance_work_mem is used for operations like vacuum. Using extremely large values here doesn't help very much, and because you essentially need to reserve that memory for when vacuum kicks in, takes it away from more useful purposes. Something in the 256MB range has anecdotally been a reasonably large setting here.