Thread: PostgreSQL 14.4 ERROR: out of memory issues

PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:
Hello,

we are experiencing out-of-memory issues after Postygres upgrade from 14.3 to 14.4 on CentOS7.

OS: CenotOS7

No OOM killer messages in the syslog. SWAP is disabled.

-bash-4.2$ sysctl -q vm.overcommit_memory
vm.overcommit_memory = 2

[root@sts-uat-pgsql100 ~]# swapon -vs
[root@sts-uat-pgsql100 ~]#

[root@sts-uat-pgsql100 ~]# sysctl -q vm.overcommit_ratio
vm.overcommit_ratio = 95


-bash-4.2$ uname -r
3.10.0-1160.42.2.el7.x86_64

-bash-4.2$ free -h
              total        used        free      shared  buff/cache   available
Mem:            39G        7,8G        220M         10G         31G         20G
Swap:            0B          0B          0B



postgres=# show shared_buffers ;
 shared_buffers
----------------
 10GB
(1 row)

postgres=# show  work_mem ;
 work_mem
----------
 4MB
(1 row)

postgres=# show maintenance_work_mem ;
 maintenance_work_mem
----------------------
 512MB
(1 row)

postgres=# show max_parallel_workers;
 max_parallel_workers
----------------------
 8
(1 row)

postgres=# show max_parallel_maintenance_workers ;
 max_parallel_maintenance_workers
----------------------------------
 2
(1 row)

postgres=# select count(*) from pg_stat_activity ;
 count
-------
    93
(1 row)

postgres=# show max_connections ;
 max_connections
-----------------
 1000
(1 row)

No changes on the application side were made before the out-of-memory happened.



So far, it has happened three times (during a single week) from the 14.3 -> 14.4 upgrade, before 14.4 we haven't suffered from such an issue.

Questions:
1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the instance and reinstalling 14.3 PG packages (to prove, that the issue disappear)?
2) What is the best way to diagnose what is the root cause?

What we have started to collect:
- vmstat -nwt (per second)
- vmstat -nawt (per second)

~onece a minute
df  for /dev/shm
/proc/meminfo
proc/<pid>/stat for all PG processes
proc/<pid>/cmdline for all PG processes
proc/<pid>/status for all PG processes
proc/<pid>/smaps for all PG processes
list from pg_stat_activcity
pmap -x for all PG processes
ps aux listing
smem -u --abbreviate
smem -m --abbreviate
smem -w -k

Except the "pmpap" all of the metrics were already collected before the last out of meory issue.


Error messages:
----------------------
postgresql-Fri.log:2022-07-01 06:49:53 CEST [4476]: [20-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
postgresql-Fri.log:2022-07-01 06:49:53 CEST [4476]: [21-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
postgresql-Fri.log:2022-07-01 06:50:02 CEST [4476]: [22-1] user=,db=,host=,app= LOG:  could not fork new process for connection: Cannot allocate memory
postgresql-Fri.log:2022-07-01 06:50:02 CEST [4476]: [23-1] user=,db=,host=,app= LOG:  could not fork new process for connection: Cannot allocate memory
postgresql-Fri.log:2022-07-01 06:50:02 CEST [4476]: [24-1] user=,db=,host=,app= LOG:  could not fork new process for connection: Cannot allocate memory
...
2022-07-02 14:48:07 CEST [3930]: [2-1] user=,db=,host=,app= LOG:  automatic vacuum of table "prematch.monitoring.tab_replication_clock": index scans: 0
        pages: 0 removed, 36 remain, 0 skipped due to pins, 34 skipped frozen
        tuples: 0 removed, 203 remain, 201 are dead but not yet removable, oldest xmin: 269822444
        index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
        I/O timings: read: 0.000 ms, write: 0.000 ms
        avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
        buffer usage: 67 hits, 0 misses, 0 dirtied
        WAL usage: 1 records, 0 full page images, 245 bytes
        system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
TopMemoryContext: 279360 total in 10 blocks; 15384 free (9 chunks); 263976 used
  TopTransactionContext: 8192 total in 1 blocks; 7280 free (12 chunks); 912 used
  TOAST to main relid map: 65536 total in 4 blocks; 35168 free (15 chunks); 30368 used
  AV worker: 16384 total in 2 blocks; 10080 free (10 chunks); 6304 used
    Autovacuum Portal: 8192 total in 1 blocks; 7720 free (0 chunks); 472 used
      Vacuum: 8192 total in 1 blocks; 7768 free (0 chunks); 424 used
  Operator class cache: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  smgr relation table: 16384 total in 2 blocks; 4592 free (2 chunks); 11792 used
  pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1448 free (0 chunks); 6744 used
  TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
  Portal hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  TopPortalContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
  Relcache by OID: 16384 total in 2 blocks; 3504 free (2 chunks); 12880 used
  CacheMemoryContext: 524288 total in 7 blocks; 123552 free (4 chunks); 400736 used
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: tab_queue_tmp_pkey
    index info: 2048 total in 2 blocks; 952 free (0 chunks); 1096 used: pk_replicationclock_databasename
    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: pg_attrdef_adrelid_adnum_index
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_index_indrelid_index
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: pg_db_role_setting_databaseid_rol_index
    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_opclass_am_name_nsp_index
    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_foreign_data_wrapper_name_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_enum_oid_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_class_relname_nsp_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_foreign_server_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_publication_pubname_index
    index info: 3072 total in 2 blocks; 1128 free (1 chunks); 1944 used: pg_statistic_relid_att_inh_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_cast_source_target_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_language_name_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_transform_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_collation_oid_index
    index info: 3072 total in 2 blocks; 1096 free (2 chunks); 1976 used: pg_amop_fam_strat_index
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_index_indexrelid_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_ts_template_tmplname_index
    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_ts_config_map_index
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_opclass_oid_index
    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_foreign_data_wrapper_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_event_trigger_evtname_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_statistic_ext_name_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_publication_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_ts_dict_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_event_trigger_oid_index
    index info: 3072 total in 2 blocks; 1096 free (2 chunks); 1976 used: pg_conversion_default_index
    index info: 3072 total in 2 blocks; 1096 free (2 chunks); 1976 used: pg_operator_oprname_l_r_n_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_trigger_tgrelid_tgname_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_enum_typid_label_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_ts_config_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_user_mapping_oid_index
    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_opfamily_am_name_nsp_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_foreign_table_relid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_type_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_aggregate_fnoid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_constraint_oid_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_rewrite_rel_rulename_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_ts_parser_prsname_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_ts_config_cfgname_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_ts_parser_oid_index
    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_publication_rel_prrelid_prpubid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_operator_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_namespace_nspname_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_ts_template_oid_index
    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_amop_opr_fam_index
    index info: 3072 total in 2 blocks; 1128 free (1 chunks); 1944 used: pg_default_acl_role_nsp_obj_index
    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_collation_name_enc_nsp_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_publication_rel_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_range_rngtypid_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_ts_dict_dictname_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_type_typname_nsp_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_opfamily_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_statistic_ext_oid_index
    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_statistic_ext_data_stxoid_index
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_class_oid_index
    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_proc_proname_args_nsp_index
    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_partitioned_table_partrelid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_range_rngmultitypid_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_transform_type_lang_index
    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: pg_attribute_relid_attnum_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_proc_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_language_oid_index
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_namespace_oid_index
    index info: 3072 total in 2 blocks; 776 free (1 chunks); 2296 used: pg_amproc_fam_proc_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_foreign_server_name_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_attribute_relid_attnam_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_conversion_oid_index
    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_user_mapping_user_server_index
    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_subscription_rel_srrelid_srsubid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_sequence_seqrelid_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_conversion_name_nsp_index
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_authid_oid_index
    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_auth_members_member_role_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_subscription_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_tablespace_oid_index
    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_shseclabel_object_index
    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_replication_origin_roname_index
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_database_datname_index
    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_subscription_subname_index
    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_replication_origin_roiident_index
    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_auth_members_role_member_index
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_database_oid_index
    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_authid_rolname_index
  WAL record construction: 49768 total in 2 blocks; 6360 free (0 chunks); 43408 used
  PrivateRefCount: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
  MdSmgr: 8192 total in 1 blocks; 7808 free (2 chunks); 384 used
  LOCALLOCK hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  plpgsql_check - profiler context: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
    plpgsql_check function execution statistics: 8192 total in 1 blocks; 2488 free (0 chunks); 5704 used
    plpgsql_check function profiler local chunks: 8192 total in 1 blocks; 2488 free (0 chunks); 5704 used
    plpgsql_check function profiler local cache: 8192 total in 1 blocks; 2488 free (0 chunks); 5704 used
  plpgsql_check function cache: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
  Rendezvous variable hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  PLpgSQL function hash: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
  Timezones: 104120 total in 2 blocks; 2616 free (0 chunks); 101504 used
  Postmaster: 24576 total in 2 blocks; 23824 free (18 chunks); 752 used
    ident parser context: 1024 total in 1 blocks; 760 free (0 chunks); 264 used
    hba parser context: 25600 total in 6 blocks; 4464 free (8 chunks); 21136 used
  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); 1077400 used
2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of memory
2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed on request of size 152094068 in memory context "TopTransactionContext".
2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:  automatic vacuum of table "prematch.replication.tab_queue_tmp"

2022-07-02 14:48:47 CEST [4476]: [42-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
2022-07-02 14:48:47 CEST [4476]: [44-1] user=,db=,host=,app= LOG:  background worker "parallel worker" (PID 4303) was terminated by signal 6: Aborted


A day later, the same issue happens again:
2022-07-02 14:48:07 CEST [3930]: [2-1] user=,db=,host=,app= LOG:  automatic vacuum of table "prematch.monitoring.tab_replication_clock": index scans: 0
11        pages: 0 removed, 36 remain, 0 skipped due to pins, 34 skipped frozen
12        tuples: 0 removed, 203 remain, 201 are dead but not yet removable, oldest xmin: 269822444
13        index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
14        I/O timings: read: 0.000 ms, write: 0.000 ms
15        avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
16        buffer usage: 67 hits, 0 misses, 0 dirtied
17        WAL usage: 1 records, 0 full page images, 245 bytes
18        system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
19TopMemoryContext: 279360 total in 10 blocks; 15384 free (9 chunks); 263976 used
20  TopTransactionContext: 8192 total in 1 blocks; 7280 free (12 chunks); 912 used
21  TOAST to main relid map: 65536 total in 4 blocks; 35168 free (15 chunks); 30368 used
22  AV worker: 16384 total in 2 blocks; 10080 free (10 chunks); 6304 used
23    Autovacuum Portal: 8192 total in 1 blocks; 7720 free (0 chunks); 472 used
24      Vacuum: 8192 total in 1 blocks; 7768 free (0 chunks); 424 used
25  Operator class cache: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
26  smgr relation table: 16384 total in 2 blocks; 4592 free (2 chunks); 11792 used
27  pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1448 free (0 chunks); 6744 used
28  TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
29  Portal hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
30  TopPortalContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
31  Relcache by OID: 16384 total in 2 blocks; 3504 free (2 chunks); 12880 used
32  CacheMemoryContext: 524288 total in 7 blocks; 123552 free (4 chunks); 400736 used
33    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: tab_queue_tmp_pkey
34    index info: 2048 total in 2 blocks; 952 free (0 chunks); 1096 used: pk_replicationclock_databasename
35    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: pg_attrdef_adrelid_adnum_index
36    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_index_indrelid_index
37    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: pg_db_role_setting_databaseid_rol_index
38    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_opclass_am_name_nsp_index
39    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_foreign_data_wrapper_name_index
40    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_enum_oid_index
41    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_class_relname_nsp_index
42    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_foreign_server_oid_index
43    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_publication_pubname_index
44    index info: 3072 total in 2 blocks; 1128 free (1 chunks); 1944 used: pg_statistic_relid_att_inh_index
45    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_cast_source_target_index
46    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_language_name_index
47    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_transform_oid_index
48    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_collation_oid_index
49    index info: 3072 total in 2 blocks; 1096 free (2 chunks); 1976 used: pg_amop_fam_strat_index
50    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_index_indexrelid_index
51    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_ts_template_tmplname_index
52    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_ts_config_map_index
53    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_opclass_oid_index
54    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_foreign_data_wrapper_oid_index
55    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_event_trigger_evtname_index
56    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_statistic_ext_name_index
57    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_publication_oid_index
58    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_ts_dict_oid_index
59    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_event_trigger_oid_index
60    index info: 3072 total in 2 blocks; 1096 free (2 chunks); 1976 used: pg_conversion_default_index
61    index info: 3072 total in 2 blocks; 1096 free (2 chunks); 1976 used: pg_operator_oprname_l_r_n_index
62    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_trigger_tgrelid_tgname_index
63    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_enum_typid_label_index
64    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_ts_config_oid_index
65    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_user_mapping_oid_index
66    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_opfamily_am_name_nsp_index
67    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_foreign_table_relid_index
68    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_type_oid_index
69    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_aggregate_fnoid_index
70    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_constraint_oid_index
71    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_rewrite_rel_rulename_index
72    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_ts_parser_prsname_index
73    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_ts_config_cfgname_index
74    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_ts_parser_oid_index
75    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_publication_rel_prrelid_prpubid_index
76    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_operator_oid_index
77    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_namespace_nspname_index
78    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_ts_template_oid_index
79    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_amop_opr_fam_index
80    index info: 3072 total in 2 blocks; 1128 free (1 chunks); 1944 used: pg_default_acl_role_nsp_obj_index
81    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_collation_name_enc_nsp_index
82    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_publication_rel_oid_index
83    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_range_rngtypid_index
84    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_ts_dict_dictname_index
85    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_type_typname_nsp_index
86    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_opfamily_oid_index
87    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_statistic_ext_oid_index
88    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_statistic_ext_data_stxoid_index
89    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_class_oid_index
90    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_proc_proname_args_nsp_index
91    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_partitioned_table_partrelid_index
92    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_range_rngmultitypid_index
93    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_transform_type_lang_index
94    index info: 2048 total in 2 blocks; 448 free (1 chunks); 1600 used: pg_attribute_relid_attnum_index
95    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_proc_oid_index
96    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_language_oid_index
97    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_namespace_oid_index
98    index info: 3072 total in 2 blocks; 776 free (1 chunks); 2296 used: pg_amproc_fam_proc_index
99    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_foreign_server_name_index
100    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_attribute_relid_attnam_index
101    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_conversion_oid_index
102    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_user_mapping_user_server_index
103    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_subscription_rel_srrelid_srsubid_index
104    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_sequence_seqrelid_index
105    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_conversion_name_nsp_index
106    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_authid_oid_index
107    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_auth_members_member_role_index
108    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_subscription_oid_index
109    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_tablespace_oid_index
110    index info: 3072 total in 2 blocks; 1160 free (2 chunks); 1912 used: pg_shseclabel_object_index
111    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_replication_origin_roname_index
112    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_database_datname_index
113    index info: 2048 total in 2 blocks; 688 free (2 chunks); 1360 used: pg_subscription_subname_index
114    index info: 2048 total in 2 blocks; 952 free (2 chunks); 1096 used: pg_replication_origin_roiident_index
115    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_auth_members_role_member_index
116    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pg_database_oid_index
117    index info: 1024 total in 1 blocks; 0 free (0 chunks); 1024 used: pg_authid_rolname_index
118  WAL record construction: 49768 total in 2 blocks; 6360 free (0 chunks); 43408 used
119  PrivateRefCount: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
120  MdSmgr: 8192 total in 1 blocks; 7808 free (2 chunks); 384 used
121  LOCALLOCK hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
122  plpgsql_check - profiler context: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
123    plpgsql_check function execution statistics: 8192 total in 1 blocks; 2488 free (0 chunks); 5704 used
124    plpgsql_check function profiler local chunks: 8192 total in 1 blocks; 2488 free (0 chunks); 5704 used
125    plpgsql_check function profiler local cache: 8192 total in 1 blocks; 2488 free (0 chunks); 5704 used
126  plpgsql_check function cache: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
127  Rendezvous variable hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
128  PLpgSQL function hash: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
129  Timezones: 104120 total in 2 blocks; 2616 free (0 chunks); 101504 used
130  Postmaster: 24576 total in 2 blocks; 23824 free (18 chunks); 752 used
131    ident parser context: 1024 total in 1 blocks; 760 free (0 chunks); 264 used
132    hba parser context: 25600 total in 6 blocks; 4464 free (8 chunks); 21136 used
133  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
134Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); 1077400 used
1352022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of memory
1362022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed on request of size 152094068 in memory context "TopTransactionContext".
1372022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:  automatic vacuum of table "prematch.replication.tab_queue_tmp"

And a few days later again:
2022-07-08 14:54:22 CEST [4791]: [81186-1] user=app_evaluation,db=prematch,host=172.25.0.80,app=Evaluation_STS_UAT DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000' terminate called after throwing an instance of 'std::bad_alloc' terminate called after throwing an instance of 'std::bad_alloc' what(): what(): std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1] user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251) was terminated by signal 6: Aborted
2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:  terminating any other active server processes


Thanks Ales Zeleny

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Rob Sargent
Date:

> On Jul 11, 2022, at 2:50 AM, Aleš Zelený <zeleny.ales@gmail.com> wrote:
>
> 
> Hello,
>
> we are experiencing out-of-memory issues after Postygres upgrade from 14.3 to 14.4 on CentOS7.
>
> OS: CenotOS7
>
> No OOM killer messages in the syslog. SWAP is disabled.
>
> -bash-4.2$ sysctl -q vm.overcommit_memory
> vm.overcommit_memory = 2
>
> [root@sts-uat-pgsql100 ~]# swapon -vs
> [root@sts-uat-pgsql100 ~]#
>
> [root@sts-uat-pgsql100 ~]# sysctl -q vm.overcommit_ratio
> vm.overcommit_ratio = 95
>
>
> -bash-4.2$ uname -r
> 3.10.0-1160.42.2.el7.x86_64
>
> -bash-4.2$ free -h
>               total        used        free      shared  buff/cache   available
> Mem:            39G        7,8G        220M         10G         31G         20G
> Swap:            0B          0B          0B
>
>
>
> postgres=# show shared_buffers ;
>  shared_buffers
> ----------------
>  10GB
> (1 row)
>
> postgres=# show  work_mem ;
>  work_mem
> ----------
>  4MB
> (1 row)
>
> postgres=# show maintenance_work_mem ;
>  maintenance_work_mem
> ----------------------
>  512MB
> (1 row)
>
> postgres=# show max_parallel_workers;
>  max_parallel_workers
> ----------------------
>  8
> (1 row)
>
> postgres=# show max_parallel_maintenance_workers ;
>  max_parallel_maintenance_workers
> ----------------------------------
>  2
> (1 row)
>
> postgres=# select count(*) from pg_stat_activity ;
>  count
> -------
>     93
> (1 row)
>
> postgres=# show max_connections ;
>  max_connections
> -----------------
>  1000
> (1 row)
>
> No changes on the application side were made before the out-of-memory happened.
>
>
>
> So far, it has happened three times (during a single week) from the 14.3 -> 14.4 upgrade, before 14.4 we haven't
sufferedfrom such an issue. 
>
> Questions:e.
> 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the instance and reinstalling 14.3 PG packages (to
prove,that the issue disappear)? 

> 2) What is the best way to diagnose what is the root cause?
You can turn on sql logging to see what query is causing the crash.
Has any new code added anything with
   a.id!= b.id
on any large table?




What we have started to collect:

When you think you see it happening you can query the
System tables for currently running queries. pg_stat*


> - vmstat -nwt (per second)
> - vmstat -nawt (per




Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Pavel Stehule
Date:
Hi

It's looks like memory leak

  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); 1077400 used
2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of memory
2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed on request of size 152094068 in memory context "TopTransactionContext".
2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:  automatic vacuum of table "prematch.replication.tab_queue_tmp"

2022-07-02 14:48:47 CEST [4476]: [42-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
2022-07-02 14:48:47 CEST [4476]: [44-1] user=,db=,host=,app= LOG:  background worker "parallel worker" (PID 4303) was terminated by signal 6: Aborted


But what you see is probably just side effect. Do you use some extension in C++? This is C++ message. This process kills Postgres.

It can be related to some operation over prematch.replication.tab_queue_tmp because in both cases an atovacuum was started.
 


132    hba parser context: 25600 total in 6 blocks; 4464 free (8 chunks); 21136 used
133  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
134Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); 1077400 used
1352022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of memory
1362022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed on request of size 152094068 in memory context "TopTransactionContext".
1372022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:  automatic vacuum of table "prematch.replication.tab_queue_tmp"

I am not sure, but it looks like corrupted table  prematch.replication.tab_queue_tmp


And a few days later again:
2022-07-08 14:54:22 CEST [4791]: [81186-1] user=app_evaluation,db=prematch,host=172.25.0.80,app=Evaluation_STS_UAT DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000' terminate called after throwing an instance of 'std::bad_alloc' terminate called after throwing an instance of 'std::bad_alloc' what(): what(): std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1] user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251) was terminated by signal 6: Aborted
2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:  terminating any other active server processes


Thanks Ales Zeleny

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:
Hello,

po 11. 7. 2022 v 14:47 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

It's looks like memory leak

As we are collecting some data for Postgres processes, I've found processes using Rss Anonymous memory allocations:

Some examples:
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31522 RssAnon:  742600 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31522 RssAnon:  742600 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31522 RssAnon:  742852 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31522 RssAnon:  743112 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31522 RssAnon:  743112 kB
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31647 RssAnon:  684632 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31647 RssAnon:  684632 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31647 RssAnon:  684676 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31647 RssAnon:  684700 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31647 RssAnon:  684824 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB

other processes uses less memory:
2022-07-08T14:52:08+0200 Name: postmaster Pid: 3869 RssAnon:    3256 kB
2022-07-08T14:53:02+0200 Name: postmaster Pid: 3869 RssAnon:    3256 kB
2022-07-08T14:53:54+0200 Name: postmaster Pid: 3869 RssAnon:    3256 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 4217 RssAnon:    2532 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 4217 RssAnon:    2532 kB



And Shared memory:
2022-07-08T14:59:12+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:20+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:28+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:37+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:45+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:53+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB

while other processes uses failry less of shared mem:

2022-07-08T14:55:25+0200 Name: postmaster Pid: 8521 RssShmem:    1988 kB
2022-07-08T14:55:33+0200 Name: postmaster Pid: 8521 RssShmem:    1988 kB
2022-07-08T14:55:40+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:55:49+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:55:57+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:56:06+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:56:15+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB

Wich processes are better candidates for investigation, the ones using more Anonymous memory or the ones using larger shared memory?
I can do the (tedious) work to get a list of statements that a given PID executed (at least the statements that appear in the minute snapshots).


  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); 1077400 used
2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of memory
2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed on request of size 152094068 in memory context "TopTransactionContext".
2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:  automatic vacuum of table "prematch.replication.tab_queue_tmp"

2022-07-02 14:48:47 CEST [4476]: [42-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
2022-07-02 14:48:47 CEST [4476]: [44-1] user=,db=,host=,app= LOG:  background worker "parallel worker" (PID 4303) was terminated by signal 6: Aborted


But what you see is probably just side effect. Do you use some extension in C++? This is C++ message. This process kills Postgres.

These extensions are installed (and no change from 14.3):

                                            List of installed extensions
        Name        | Version |   Schema   |                              Description                              
--------------------+---------+------------+------------------------------------------------------------------------
 amcheck            | 1.3     | public     | functions for verifying relation integrity
 dblink             | 1.2     | public     | connect to other PostgreSQL databases from within a database
 file_fdw           | 1.0     | public     | foreign-data wrapper for flat file access
 hstore             | 1.8     | public     | data type for storing sets of (key, value) pairs
 hypopg             | 1.3.1   | public     | Hypothetical indexes for PostgreSQL
 pageinspect        | 1.9     | public     | inspect the contents of database pages at a low level
 pg_buffercache     | 1.3     | public     | examine the shared buffer cache
 pg_stat_kcache     | 2.2.0   | public     | Kernel statistics gathering
 pg_stat_statements | 1.9     | public     | track planning and execution statistics of all SQL statements executed
 pgcrypto           | 1.3     | public     | cryptographic functions
 pgstattuple        | 1.5     | public     | show tuple-level statistics
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
 plpgsql_check      | 2.1     | public     | extended check for plpgsql functions
 postgres_fdw       | 1.1     | public     | foreign-data wrapper for remote PostgreSQL servers
 tablefunc          | 1.0     | public     | functions that manipulate whole tables, including crosstab
(15 rows)


 

It can be related to some operation over prematch.replication.tab_queue_tmp because in both cases an atovacuum was started.

On 14.3 (and also older major versions), there were no such issues. This table is heavily updated, so the autovacuum is frequent here.
 


132    hba parser context: 25600 total in 6 blocks; 4464 free (8 chunks); 21136 used
133  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
134Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); 1077400 used
1352022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of memory
1362022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed on request of size 152094068 in memory context "TopTransactionContext".
1372022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:  automatic vacuum of table "prematch.replication.tab_queue_tmp"

I am not sure, but it looks like corrupted table  prematch.replication.tab_queue_tmp

$ /usr/pgsql-14/bin/pg_amcheck -v -d prematch -t replication.tab_queue_tmp --heapallindexed
pg_amcheck: including database "prematch"
pg_amcheck: in database "prematch": using amcheck version "1.3" in schema "public"
pg_amcheck: checking heap table "prematch.replication.tab_queue_tmp"
pg_amcheck: checking btree index "prematch.replication.tab_queue_tmp_pkey"
$ echo $?
0
 
Something else I can check?


Thanks Ales



And a few days later again:
2022-07-08 14:54:22 CEST [4791]: [81186-1] user=app_evaluation,db=prematch,host=172.25.0.80,app=Evaluation_STS_UAT DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000' terminate called after throwing an instance of 'std::bad_alloc' terminate called after throwing an instance of 'std::bad_alloc' what(): what(): std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1] user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251) was terminated by signal 6: Aborted
2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:  terminating any other active server processes


Thanks Ales Zeleny

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Michael Paquier
Date:
On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
> So far, it has happened three times (during a single week) from the 14.3 ->
> 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
>
> Questions:
> 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the instance
> and reinstalling 14.3 PG packages (to prove, that the issue disappear)?
> 2) What is the best way to diagnose what is the root cause?

Hmm.  14.4 has nothing in its release notes that would point to a
change in the vacuum or autovacuum's code paths:
https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4

There is nothing specific after a look at the changes as of, and I am
not grabbing anything that would imply a change in memory context
handling either:
`git log --stat REL_14_3..REL_14_4`
`git diff REL_14_3..REL_14_4 -- *.c`

Saying that, you should be able to downgrade safely as there are no
changes in WAL format or such that would break things.  Saying that,
the corruption issue caused by CONCURRENTLY is something you'd still
have to face.

> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of
> memory
> 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed
> on request of size 152094068 in memory context "TopTransactionContext".
> 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>  automatic vacuum of table "prematch.replication.tab_queue_tmp"

This is the interesting part.  Do you happen to use logical
replication in a custom C++ plugin?

> 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could
> not fork worker process: Cannot allocate memory
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
>
> DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000'
> terminate
> called after throwing an instance of 'std::bad_alloc' terminate called
> after throwing an instance of 'std::bad_alloc' what(): what():
> std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1]
> user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251)
> was terminated by signal 6: Aborted
> 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:
>  terminating any other active server processes

Looks like something is going wrong in the memory handling of one of
your C++ extensions here.  If you can isolate an issue using a query
without any custom code, that would be a Postgres problem, but I think
that you are missing a trick in it.
--
Michael

Attachment

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:

st 13. 7. 2022 v 2:20 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
> So far, it has happened three times (during a single week) from the 14.3 ->
> 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
>
> Questions:
> 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the instance
> and reinstalling 14.3 PG packages (to prove, that the issue disappear)?
> 2) What is the best way to diagnose what is the root cause?

Hmm.  14.4 has nothing in its release notes that would point to a
change in the vacuum or autovacuum's code paths:
https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4

There is nothing specific after a look at the changes as of, and I am
not grabbing anything that would imply a change in memory context
handling either:
`git log --stat REL_14_3..REL_14_4`
`git diff REL_14_3..REL_14_4 -- *.c`

Saying that, you should be able to downgrade safely as there are no
changes in WAL format or such that would break things.  Saying that,
the corruption issue caused by CONCURRENTLY is something you'd still
have to face.


Thanks, good to know that, we can use it for a test case, since we already hit the CONCURRENTLY bug on 14.3.

> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of
> memory
> 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed
> on request of size 152094068 in memory context "TopTransactionContext".
> 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>  automatic vacuum of table "prematch.replication.tab_queue_tmp"

This is the interesting part.  Do you happen to use logical
replication in a custom C++ plugin?

We are using logical replication to other instances (pg_output) and decoderbufs
 
> 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could
> not fork worker process: Cannot allocate memory
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
>
> DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000'
> terminate
> called after throwing an instance of 'std::bad_alloc' terminate called
> after throwing an instance of 'std::bad_alloc' what(): what():
> std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1]
> user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251)
> was terminated by signal 6: Aborted
> 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:
>  terminating any other active server processes

Looks like something is going wrong in the memory handling of one of
your C++ extensions here.  If you can isolate an issue using a query
without any custom code, that would be a Postgres problem, but I think
that you are missing a trick in it.

Here are extensions installed in the database served by the cluster:
prematch=# \dx
                                            List of installed extensions
        Name        | Version |   Schema   |                              Description                              
--------------------+---------+------------+------------------------------------------------------------------------
 amcheck            | 1.3     | public     | functions for verifying relation integrity
 dblink             | 1.2     | public     | connect to other PostgreSQL databases from within a database
 file_fdw           | 1.0     | public     | foreign-data wrapper for flat file access
 hstore             | 1.8     | public     | data type for storing sets of (key, value) pairs
 hypopg             | 1.3.1   | public     | Hypothetical indexes for PostgreSQL
 pageinspect        | 1.9     | public     | inspect the contents of database pages at a low level
 pg_buffercache     | 1.3     | public     | examine the shared buffer cache
 pg_stat_kcache     | 2.2.0   | public     | Kernel statistics gathering
 pg_stat_statements | 1.9     | public     | track planning and execution statistics of all SQL statements executed
 pgcrypto           | 1.3     | public     | cryptographic functions
 pgstattuple        | 1.5     | public     | show tuple-level statistics
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
 plpgsql_check      | 2.1     | public     | extended check for plpgsql functions
 postgres_fdw       | 1.1     | public     | foreign-data wrapper for remote PostgreSQL servers
 tablefunc          | 1.0     | public     | functions that manipulate whole tables, including crosstab
(15 rows)

When we started experiencing these issues, based on Prometheus node-exporter statistics, I've begun collecting /proc/<pid>/status for all PG instance processes in an infinite loop.
From this, I can extract some data. It looks like RssAnon memory is what I can see growing from several hundred kB to 1-2GB over some period (depends on workload, but usually 24 hours).
The sessions where RssAnon memory growth are PHP client applications, having multiple connections it might consume 10GB of memory. The same application (I've asked for deployed changes, but no one was released close to the upgrade time).
Comparing Prometheus server data,  RssAnon seems to directly follow node_memory_Active_anon_bytes node exporter metrics. If the "suspected" sessions disconnects, node_memory_Active_anon_bytes drops down.
I've checked node_memory_Active_anon_bytes metric before the upgrade and (same application workload) it was pretty stable on PG 13.8, later 14.3 but immediately after the upgrade,  node_memory_Active_anon_bytes start growth, and once the Commit limit was reached, the fork()/Out of memory issues appears.

So for the application user running the PHP, I've set log_min_duration_statemenet = 0, but no unexpected statements or arguments (some functions have int array argument) were found in the logfile.

Checking the RssAnon from proc/pid/status I've found some points where RssAnon memory usage grew very steep for a minute, but no "suspicious" queries/arguments were found in the instance logfile.
 
Any hint, on how to get the root cause would be appreciated since so far I've failed to isolate the issue reproducible testcase.
At least I hope that looking for the RssAnon process memory is an appropriate metric, if not, let me know and I'll try to update the monitoring to get the root cause.

I can imagine a workaround with client application regular reconnect..., but u to 14.3 it works, so I'd like to fix the issue either on our application side or at PG side if it is a PG problem.

Kind regards Ales Zeleny

--
Michael

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Pavel Stehule
Date:


čt 14. 7. 2022 v 21:26 odesílatel Aleš Zelený <zeleny.ales@gmail.com> napsal:

st 13. 7. 2022 v 2:20 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
> So far, it has happened three times (during a single week) from the 14.3 ->
> 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
>
> Questions:
> 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the instance
> and reinstalling 14.3 PG packages (to prove, that the issue disappear)?
> 2) What is the best way to diagnose what is the root cause?

Hmm.  14.4 has nothing in its release notes that would point to a
change in the vacuum or autovacuum's code paths:
https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4

There is nothing specific after a look at the changes as of, and I am
not grabbing anything that would imply a change in memory context
handling either:
`git log --stat REL_14_3..REL_14_4`
`git diff REL_14_3..REL_14_4 -- *.c`

Saying that, you should be able to downgrade safely as there are no
changes in WAL format or such that would break things.  Saying that,
the corruption issue caused by CONCURRENTLY is something you'd still
have to face.


Thanks, good to know that, we can use it for a test case, since we already hit the CONCURRENTLY bug on 14.3.

> 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of
> memory
> 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed
> on request of size 152094068 in memory context "TopTransactionContext".
> 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>  automatic vacuum of table "prematch.replication.tab_queue_tmp"

This is the interesting part.  Do you happen to use logical
replication in a custom C++ plugin?

We are using logical replication to other instances (pg_output) and decoderbufs
 
> 2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could
> not fork worker process: Cannot allocate memory
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
>
> DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000'
> terminate
> called after throwing an instance of 'std::bad_alloc' terminate called
> after throwing an instance of 'std::bad_alloc' what(): what():
> std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1]
> user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251)
> was terminated by signal 6: Aborted
> 2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:
>  terminating any other active server processes

Looks like something is going wrong in the memory handling of one of
your C++ extensions here.  If you can isolate an issue using a query
without any custom code, that would be a Postgres problem, but I think
that you are missing a trick in it.

Here are extensions installed in the database served by the cluster:
prematch=# \dx
                                            List of installed extensions
        Name        | Version |   Schema   |                              Description                              
--------------------+---------+------------+------------------------------------------------------------------------
 amcheck            | 1.3     | public     | functions for verifying relation integrity
 dblink             | 1.2     | public     | connect to other PostgreSQL databases from within a database
 file_fdw           | 1.0     | public     | foreign-data wrapper for flat file access
 hstore             | 1.8     | public     | data type for storing sets of (key, value) pairs
 hypopg             | 1.3.1   | public     | Hypothetical indexes for PostgreSQL
 pageinspect        | 1.9     | public     | inspect the contents of database pages at a low level
 pg_buffercache     | 1.3     | public     | examine the shared buffer cache
 pg_stat_kcache     | 2.2.0   | public     | Kernel statistics gathering
 pg_stat_statements | 1.9     | public     | track planning and execution statistics of all SQL statements executed
 pgcrypto           | 1.3     | public     | cryptographic functions
 pgstattuple        | 1.5     | public     | show tuple-level statistics
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
 plpgsql_check      | 2.1     | public     | extended check for plpgsql functions
 postgres_fdw       | 1.1     | public     | foreign-data wrapper for remote PostgreSQL servers
 tablefunc          | 1.0     | public     | functions that manipulate whole tables, including crosstab
(15 rows)

When we started experiencing these issues, based on Prometheus node-exporter statistics, I've begun collecting /proc/<pid>/status for all PG instance processes in an infinite loop.
From this, I can extract some data. It looks like RssAnon memory is what I can see growing from several hundred kB to 1-2GB over some period (depends on workload, but usually 24 hours).
The sessions where RssAnon memory growth are PHP client applications, having multiple connections it might consume 10GB of memory. The same application (I've asked for deployed changes, but no one was released close to the upgrade time).
Comparing Prometheus server data,  RssAnon seems to directly follow node_memory_Active_anon_bytes node exporter metrics. If the "suspected" sessions disconnects, node_memory_Active_anon_bytes drops down.
I've checked node_memory_Active_anon_bytes metric before the upgrade and (same application workload) it was pretty stable on PG 13.8, later 14.3 but immediately after the upgrade,  node_memory_Active_anon_bytes start growth, and once the Commit limit was reached, the fork()/Out of memory issues appears.

So for the application user running the PHP, I've set log_min_duration_statemenet = 0, but no unexpected statements or arguments (some functions have int array argument) were found in the logfile.

Checking the RssAnon from proc/pid/status I've found some points where RssAnon memory usage grew very steep for a minute, but no "suspicious" queries/arguments were found in the instance logfile.
 
Any hint, on how to get the root cause would be appreciated since so far I've failed to isolate the issue reproducible testcase.
At least I hope that looking for the RssAnon process memory is an appropriate metric, if not, let me know and I'll try to update the monitoring to get the root cause.

I can imagine a workaround with client application regular reconnect..., but u to 14.3 it works, so I'd like to fix the issue either on our application side or at PG side if it is a PG problem.

Theoretically, there can be memory leak in libpq. There were some changes.

Regards

Pavel


Kind regards Ales Zeleny

--
Michael

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Tomas Vondra
Date:

On 7/14/22 21:25, Aleš Zelený wrote:
> 
> st 13. 7. 2022 v 2:20 odesílatel Michael Paquier <michael@paquier.xyz
> <mailto:michael@paquier.xyz>> napsal:
> 
>     On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
>     > So far, it has happened three times (during a single week) from
>     the 14.3 ->
>     > 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
>     >
>     > Questions:
>     > 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the
>     instance
>     > and reinstalling 14.3 PG packages (to prove, that the issue
>     disappear)?
>     > 2) What is the best way to diagnose what is the root cause?
> 
>     Hmm.  14.4 has nothing in its release notes that would point to a
>     change in the vacuum or autovacuum's code paths:
>     https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
>     <https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4>
> 
>     There is nothing specific after a look at the changes as of, and I am
>     not grabbing anything that would imply a change in memory context
>     handling either:
>     `git log --stat REL_14_3..REL_14_4`
>     `git diff REL_14_3..REL_14_4 -- *.c`
> 
>     Saying that, you should be able to downgrade safely as there are no
>     changes in WAL format or such that would break things.  Saying that,
>     the corruption issue caused by CONCURRENTLY is something you'd still
>     have to face.
> 
> 
> Thanks, good to know that, we can use it for a test case, since we
> already hit the CONCURRENTLY bug on 14.3.
> 
>     > 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app=
>     ERROR:  out of
>     > memory
>     > 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app=
>     DETAIL:  Failed
>     > on request of size 152094068 in memory context
>     "TopTransactionContext".
>     > 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>     >  automatic vacuum of table "prematch.replication.tab_queue_tmp"
> 
>     This is the interesting part.  Do you happen to use logical
>     replication in a custom C++ plugin?
> 
> 
> We are using logical replication to other instances (pg_output) and
> decoderbufs
> https://github.com/debezium/postgres-decoderbufs
> <https://github.com/debezium/postgres-decoderbufs> for other applications.
>  

This is probably just a red herring - std:bad_alloc is what the process
that runs into the overcommit limit gets. But the real issue (e.g.
memory leak) is likely somewhere else - different part of the code,
different process ...

> ...
> 
> Checking the RssAnon from proc/pid/status I've found some points where
> RssAnon memory usage grew very steep for a minute, but no "suspicious"
> queries/arguments were found in the instance logfile.
>  
> Any hint, on how to get the root cause would be appreciated since so far
> I've failed to isolate the issue reproducible testcase.
> At least I hope that looking for the RssAnon process memory is an
> appropriate metric, if not, let me know and I'll try to update the
> monitoring to get the root cause.
> 
> I can imagine a workaround with client application regular reconnect...,
> but u to 14.3 it works, so I'd like to fix the issue either on our
> application side or at PG side if it is a PG problem.
> 

I think it's be interesting to get memory context stats from the
processes consuming a lot of memory. If you know which processes are
suspect (and it seems you know, bacause if a reconnect helps it's the
backend handling the connection), you can attach a debugger and do

   $ gdb -p $PID
   call MemoryContextStats(TopMemoryContext)

which will log info about memory contexts, just like autovacuum.
Hopefully that tells us memory context is bloated, and that might point
us to particular part of the code.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:


Dne čt 14. 7. 2022 23:11 uživatel Tomas Vondra <tomas.vondra@enterprisedb.com> napsal:


On 7/14/22 21:25, Aleš Zelený wrote:
>
> st 13. 7. 2022 v 2:20 odesílatel Michael Paquier <michael@paquier.xyz
> <mailto:michael@paquier.xyz>> napsal:
>
>     On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
>     > So far, it has happened three times (during a single week) from
>     the 14.3 ->
>     > 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
>     >
>     > Questions:
>     > 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the
>     instance
>     > and reinstalling 14.3 PG packages (to prove, that the issue
>     disappear)?
>     > 2) What is the best way to diagnose what is the root cause?
>
>     Hmm.  14.4 has nothing in its release notes that would point to a
>     change in the vacuum or autovacuum's code paths:
>     https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
>     <https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4>
>
>     There is nothing specific after a look at the changes as of, and I am
>     not grabbing anything that would imply a change in memory context
>     handling either:
>     `git log --stat REL_14_3..REL_14_4`
>     `git diff REL_14_3..REL_14_4 -- *.c`
>
>     Saying that, you should be able to downgrade safely as there are no
>     changes in WAL format or such that would break things.  Saying that,
>     the corruption issue caused by CONCURRENTLY is something you'd still
>     have to face.
>
>
> Thanks, good to know that, we can use it for a test case, since we
> already hit the CONCURRENTLY bug on 14.3.
>
>     > 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app=
>     ERROR:  out of
>     > memory
>     > 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app=
>     DETAIL:  Failed
>     > on request of size 152094068 in memory context
>     "TopTransactionContext".
>     > 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>     >  automatic vacuum of table "prematch.replication.tab_queue_tmp"
>
>     This is the interesting part.  Do you happen to use logical
>     replication in a custom C++ plugin?
>
>
> We are using logical replication to other instances (pg_output) and
> decoderbufs
> https://github.com/debezium/postgres-decoderbufs
> <https://github.com/debezium/postgres-decoderbufs> for other applications.
>  

This is probably just a red herring - std:bad_alloc is what the process
that runs into the overcommit limit gets. But the real issue (e.g.
memory leak) is likely somewhere else - different part of the code,
different process ...

> ...
>
> Checking the RssAnon from proc/pid/status I've found some points where
> RssAnon memory usage grew very steep for a minute, but no "suspicious"
> queries/arguments were found in the instance logfile.
>  
> Any hint, on how to get the root cause would be appreciated since so far
> I've failed to isolate the issue reproducible testcase.
> At least I hope that looking for the RssAnon process memory is an
> appropriate metric, if not, let me know and I'll try to update the
> monitoring to get the root cause.
>
> I can imagine a workaround with client application regular reconnect...,
> but u to 14.3 it works, so I'd like to fix the issue either on our
> application side or at PG side if it is a PG problem.
>

I think it's be interesting to get memory context stats from the
processes consuming a lot of memory. If you know which processes are
suspect (and it seems you know, bacause if a reconnect helps it's the
backend handling the connection), you can attach a debugger and do

   $ gdb -p $PID
   call MemoryContextStats(TopMemoryContext)

which will log info about memory contexts, just like autovacuum.
Hopefully that tells us memory context is bloated, and that might point
us to particular part of the code.

If the RssAnon memory is a good indicator, i can then determine the backends and dump memory context.
It'll take me some time since I'm out of office for vacation, but I'll manage that somewhat way.

Thanks for all to the hints!

Aleš 

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:
Hello,




čt 14. 7. 2022 v 23:31 odesílatel Aleš Zelený <zeleny.ales@gmail.com> napsal:


Dne čt 14. 7. 2022 23:11 uživatel Tomas Vondra <tomas.vondra@enterprisedb.com> napsal:


On 7/14/22 21:25, Aleš Zelený wrote:
>
> st 13. 7. 2022 v 2:20 odesílatel Michael Paquier <michael@paquier.xyz
> <mailto:michael@paquier.xyz>> napsal:
>
>     On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
>     > So far, it has happened three times (during a single week) from
>     the 14.3 ->
>     > 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
>     >
>     > Questions:
>     > 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the
>     instance
>     > and reinstalling 14.3 PG packages (to prove, that the issue
>     disappear)?
>     > 2) What is the best way to diagnose what is the root cause?
>
>     Hmm.  14.4 has nothing in its release notes that would point to a
>     change in the vacuum or autovacuum's code paths:
>     https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
>     <https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4>
>
>     There is nothing specific after a look at the changes as of, and I am
>     not grabbing anything that would imply a change in memory context
>     handling either:
>     `git log --stat REL_14_3..REL_14_4`
>     `git diff REL_14_3..REL_14_4 -- *.c`
>
>     Saying that, you should be able to downgrade safely as there are no
>     changes in WAL format or such that would break things.  Saying that,
>     the corruption issue caused by CONCURRENTLY is something you'd still
>     have to face.
>
>
> Thanks, good to know that, we can use it for a test case, since we
> already hit the CONCURRENTLY bug on 14.3.
>
>     > 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app=
>     ERROR:  out of
>     > memory
>     > 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app=
>     DETAIL:  Failed
>     > on request of size 152094068 in memory context
>     "TopTransactionContext".
>     > 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
>     >  automatic vacuum of table "prematch.replication.tab_queue_tmp"
>
>     This is the interesting part.  Do you happen to use logical
>     replication in a custom C++ plugin?
>
>
> We are using logical replication to other instances (pg_output) and
> decoderbufs
> https://github.com/debezium/postgres-decoderbufs
> <https://github.com/debezium/postgres-decoderbufs> for other applications.
>  

This is probably just a red herring - std:bad_alloc is what the process
that runs into the overcommit limit gets. But the real issue (e.g.
memory leak) is likely somewhere else - different part of the code,
different process ...

> ...
>
> Checking the RssAnon from proc/pid/status I've found some points where
> RssAnon memory usage grew very steep for a minute, but no "suspicious"
> queries/arguments were found in the instance logfile.
>  
> Any hint, on how to get the root cause would be appreciated since so far
> I've failed to isolate the issue reproducible testcase.
> At least I hope that looking for the RssAnon process memory is an
> appropriate metric, if not, let me know and I'll try to update the
> monitoring to get the root cause.
>
> I can imagine a workaround with client application regular reconnect...,
> but u to 14.3 it works, so I'd like to fix the issue either on our
> application side or at PG side if it is a PG problem.
>

I think it's be interesting to get memory context stats from the
processes consuming a lot of memory. If you know which processes are
suspect (and it seems you know, bacause if a reconnect helps it's the
backend handling the connection), you can attach a debugger and do

   $ gdb -p $PID
   call MemoryContextStats(TopMemoryContext)

which will log info about memory contexts, just like autovacuum.
Hopefully that tells us memory context is bloated, and that might point
us to particular part of the code.

after some time, I've found a process consuming over 1GB of memory"
-bash-4.2$ grep RssAnon /proc/*/status | sort -nk2 | tail
/proc/12156/status:RssAnon:  902984 kB
/proc/8303/status:RssAnon:  912848 kB
/proc/31187/status:RssAnon:  920424 kB
/proc/16469/status:RssAnon:  922864 kB
/proc/13769/status:RssAnon:  925824 kB
/proc/15449/status:RssAnon:  932452 kB
/proc/26144/status:RssAnon:  936892 kB
/proc/22570/status:RssAnon:  937920 kB
/proc/2910/status:RssAnon: 1039124 kB
/proc/17048/status:RssAnon: 1053952 kB

Here are memory contexts for PID 17048:

TopMemoryContext: 422592 total in 14 blocks; 42536 free (169 chunks); 380056 used
  pgstat TabStatusArray lookup hash table: 16384 total in 2 blocks; 7520 free (3 chunks); 8864 used
  PL/pgSQL function: 16384 total in 2 blocks; 5832 free (1 chunks); 10552 used: feeds_api.update_match(bigint,date,time without time zone)
  PL/pgSQL function: 16384 total in 2 blocks; 2080 free (1 chunks); 14304 used: opportunities.tr_taboddstopchangeshistory_oddstopchangestime()
  PL/pgSQL function: 16384 total in 2 blocks; 8576 free (1 chunks); 7808 used: opportunities.tr_tabodds_risklimits()
  PL/pgSQL function: 16384 total in 2 blocks; 2072 free (1 chunks); 14312 used: opportunities.tr_taboddshistory_oddshistorytime()
  PL/pgSQL function: 32768 total in 3 blocks; 6488 free (1 chunks); 26280 used: opportunities.fix_last_minute_odds(bigint,bigint)
  PL/pgSQL function: 65536 total in 4 blocks; 26624 free (1 chunks); 38912 used: opportunities.create_odds_tip_longname_lang(bigint,integer,integer,integer)
  PL/pgSQL function: 16384 total in 2 blocks; 3472 free (1 chunks); 12912 used: feeds_api.create_odds(bigint,integer,numeric,numeric,numeric,numeric,numeric,integer,jsonb)
  PL/pgSQL function: 65536 total in 4 blocks; 29648 free (2 chunks); 35888 used: feeds_api.create_opportunity(integer,bigint,bigint,integer,integer,integer,integer,integer,integer,i...
  PL/pgSQL function: 16384 total in 2 blocks; 7576 free (1 chunks); 8808 used: feeds.tr_tabopptysparentdata_ismain()
  PL/pgSQL function: 32768 total in 3 blocks; 11720 free (1 chunks); 21048 used: feeds_api.set_game_data(integer,bigint,bigint,character varying)
  PL/pgSQL function: 32768 total in 3 blocks; 7712 free (1 chunks); 25056 used: opportunities_api.set_oppty_info(bigint,boolean)
  PL/pgSQL function: 8192 total in 1 blocks; 3488 free (1 chunks); 4704 used: feeds_api.set_oppty_info_number(bigint)
  PL/pgSQL function: 16384 total in 2 blocks; 6616 free (1 chunks); 9768 used: b5_offer.tr_tabexternalopportunities_checknotnull()
  PL/pgSQL function: 32768 total in 3 blocks; 17928 free (5 chunks); 14840 used: opportunities.generate_league2oppty_correlations(bigint)
  PL/pgSQL function: 131072 total in 5 blocks; 61648 free (6 chunks); 69424 used: opportunities.generate_league2league_correlations(bigint)
  PL/pgSQL function: 16384 total in 2 blocks; 5616 free (1 chunks); 10768 used: opportunities.tr_tabopptytree_addcorrelations()
  PL/pgSQL function: 65536 total in 4 blocks; 19864 free (1 chunks); 45672 used: opportunities.replication_151_tab_oppty_tree()
  PL/pgSQL function: 16384 total in 2 blocks; 7568 free (1 chunks); 8816 used: greep.tr_shared_changedat()
  PL/pgSQL function: 32768 total in 3 blocks; 2112 free (1 chunks); 30656 used: feeds_api.create_match(integer,integer,integer,jsonb,bigint,integer,integer,integer,integer,jsonb,in...
  HandleParallelMessages: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
  pgqs_localhash: 21448 total in 2 blocks; 2616 free (0 chunks); 18832 used
  PL/pgSQL function: 16384 total in 2 blocks; 2072 free (1 chunks); 14312 used: opportunities.tr_tabopptyhistory_opptyhistorytime()
  PL/pgSQL function: 16384 total in 2 blocks; 2360 free (1 chunks); 14024 used: opportunities.tr_tabopportunitiesevaluationqueue_validtime()
  PL/pgSQL function: 16384 total in 2 blocks; 2512 free (1 chunks); 13872 used: opportunities.tr_tabopportunitiesevaluationqueue_evaluatedtime()
  PL/pgSQL function: 16384 total in 2 blocks; 2360 free (1 chunks); 14024 used: opportunities.tr_tabopportunitiesevaluationqueue_addedtime()
  PL/pgSQL function: 16384 total in 2 blocks; 7512 free (1 chunks); 8872 used: feeds_api.evaluate_opportunities(bigint[],integer,integer,integer,integer,integer)
  Btree proof lookup cache: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  PL/pgSQL function: 8192 total in 1 blocks; 1384 free (1 chunks); 6808 used: feeds_api.change_odds_is_winning(bigint[],integer)
  PL/pgSQL function: 32768 total in 3 blocks; 3160 free (1 chunks); 29608 used: opportunities.tr_tabopportunity_resultchange()
  PL/pgSQL function: 16384 total in 2 blocks; 4424 free (1 chunks); 11960 used: opportunities.tr_tabopportunity_opptynamechange()
  PL/pgSQL function: 16384 total in 2 blocks; 7448 free (1 chunks); 8936 used: opportunities.tr_tabopportunity_marketingstatechange()
  PL/pgSQL function: 65536 total in 4 blocks; 27224 free (2 chunks); 38312 used: opportunities.tr_tabopportunity_history()
  PL/pgSQL function: 32768 total in 3 blocks; 16224 free (3 chunks); 16544 used: opportunities.tr_tabopportunity_datetimechange()
  PL/pgSQL function: 32768 total in 3 blocks; 15440 free (1 chunks); 17328 used: opportunities.tr_tabopportunity_addcorrelation()
  PL/pgSQL function: 32768 total in 3 blocks; 15672 free (1 chunks); 17096 used: opportunities.tr_tabopportunities_notifycanchanges()
  Sequence values: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  PL/pgSQL function: 262144 total in 6 blocks; 80744 free (0 chunks); 181400 used: opportunities.replication_151_tab_opportunities()
  PL/pgSQL function: 16384 total in 2 blocks; 8904 free (5 chunks); 7480 used: betsys.lastuser_change()
  PL/pgSQL function: 16384 total in 2 blocks; 2448 free (1 chunks); 13936 used: opportunities.tr_tabopportunities_opptyevaltime()
  PL/pgSQL function: 65536 total in 4 blocks; 29064 free (1 chunks); 36472 used: opportunities.tr_tabopportunities_checkstatechange()
  PL/pgSQL function: 16384 total in 2 blocks; 7568 free (1 chunks); 8816 used: greep.tr_shared_changedat()
  PL/pgSQL function: 32768 total in 3 blocks; 15512 free (1 chunks); 17256 used: opportunities.tr_tabopportunities_opptyenddatetime()
  PL/pgSQL function: 32768 total in 3 blocks; 16008 free (4 chunks); 16760 used: opportunities.tr_tabopportunity_checkstatechange_au()
  PL/pgSQL function: 8192 total in 1 blocks; 1752 free (1 chunks); 6440 used: feeds_api.change_opportunities_state(bigint[],integer)
  PL/pgSQL function: 65536 total in 4 blocks; 27640 free (1 chunks); 37896 used: opportunities.tr_tabodds_aggregs()
  PL/pgSQL function: 16384 total in 2 blocks; 7128 free (1 chunks); 9256 used: opportunities.tr_tabodds_notifyoddsblocked()
  PL/pgSQL function: 65536 total in 4 blocks; 31200 free (1 chunks); 34336 used: opportunities.tr_tabodds_history()
  Record information cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  PL/pgSQL function: 131072 total in 5 blocks; 65672 free (2 chunks); 65400 used: opportunities.replication_151_tab_odds()
  RI compare cache: 16384 total in 2 blocks; 6656 free (3 chunks); 9728 used
  RI query cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  RI constraint cache: 40648 total in 2 blocks; 2616 free (0 chunks); 38032 used
  PL/pgSQL function: 16384 total in 2 blocks; 7920 free (1 chunks); 8464 used: opportunities.tr_tabodds_fixoddsvalue()
  PLpgSQL cast info: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
    PLpgSQL cast cache: 8192 total in 1 blocks; 1576 free (0 chunks); 6616 used
  PL/pgSQL function: 16384 total in 2 blocks; 7568 free (5 chunks); 8816 used: greep.tr_shared_changedat()
  CFuncHash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  TableSpace cache: 8192 total in 1 blocks; 2088 free (0 chunks); 6104 used
  Operator lookup cache: 40960 total in 3 blocks; 18888 free (10 chunks); 22072 used
  Type information cache: 24376 total in 2 blocks; 2616 free (0 chunks); 21760 used
  Function stat entries: 16384 total in 2 blocks; 4576 free (2 chunks); 11808 used
  Prepared Queries: 16384 total in 2 blocks; 6656 free (3 chunks); 9728 used
  RowDescriptionContext: 8192 total in 1 blocks; 6888 free (0 chunks); 1304 used
  MessageContext: 8192 total in 1 blocks; 6888 free (1 chunks); 1304 used
  Operator class cache: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  smgr relation table: 65536 total in 4 blocks; 20768 free (14 chunks); 44768 used
  TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
  Portal hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  TopPortalContext: 8192 total in 1 blocks; 7928 free (2 chunks); 264 used
  Relcache by OID: 32768 total in 3 blocks; 13608 free (6 chunks); 19160 used
  CacheMemoryContext: 2919800 total in 48 blocks; 218712 free (4 chunks); 2701088 used
    CachedPlanSource: 4096 total in 3 blocks; 1496 free (0 chunks); 2600 used: SELECT * FROM feeds_api.change_odds_is_winning(_id_odds := $1, _odds_is_winning := $2)
      CachedPlanQuery: 4096 total in 3 blocks; 1568 free (3 chunks); 2528 used
    CachedPlan: 4096 total in 3 blocks; 136 free (0 chunks); 3960 used: INSERT INTO replication.tab_queue_tmp (id_queue_data)       VALUES (currval('replication.tab_queue_d...
    CachedPlan: 4096 total in 3 blocks; 136 free (0 chunks); 3960 used: INSERT INTO replication.tab_queue_tmp (id_queue_data)       VALUES (currval('replication.tab_queue_d...
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: tab_queue_tmp_pkey
    CachedPlan: 4096 total in 3 blocks; 136 free (0 chunks); 3960 used: INSERT INTO replication.tab_queue_tmp (id_queue_data)                         VALUES (currval('repli...
    CachedPlan: 4096 total in 3 blocks; 136 free (0 chunks); 3960 used: INSERT INTO replication.tab_queue_tmp (id_queue_data)     VALUES (currval('replication.tab_queue_dat...
    CachedPlan: 4096 total in 3 blocks; 136 free (0 chunks); 3960 used: INSERT INTO replication.tab_queue_tmp (id_queue_data)                         VALUES (currval('repli...
    CachedPlan: 4096 total in 3 blocks; 136 free (0 chunks); 3960 used: INSERT INTO replication.tab_queue_tmp (id_queue_data)     VALUES (currval('replication.tab_queue_dat...
    CachedPlan: 4096 total in 3 blocks; 136 free (0 chunks); 3960 used: INSERT INTO replication.tab_queue_tmp (id_queue_data)     VALUES (currval('replication.tab_queue_dat...
    index info: 4096 total in 3 blocks; 2264 free (2 chunks); 1832 used: ix_tabopportunitiesevaluationqueue_idopportunity_pk_active_2
    index info: 4096 total in 3 blocks; 2024 free (2 chunks); 2072 used: ix_tabopportunitiesevaluationqueue_idopportunity_pk_active_1
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunitiesevaluationqueue_idodds
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: pk_tabopportunitiesevaluationqueue_idopportunitiesevaluationque
    CachedPlan: 16384 total in 5 blocks; 3656 free (0 chunks); 12728 used: INSERT INTO opportunities.tab_opportunities_evaluation_queue (id_odds, evaluation_type, id_opportuni...
    CachedPlan: 8192 total in 4 blocks; 4000 free (0 chunks); 4192 used: INSERT INTO replication.tab_queue_data (queue_data, id_action_type, id_table, id_replication_group) ...
    CachedPlan: 8192 total in 4 blocks; 4000 free (0 chunks); 4192 used: INSERT INTO replication.tab_queue_data (queue_data, id_action_type, id_table, id_replication_group) ...
    CachedPlan: 8192 total in 4 blocks; 4000 free (0 chunks); 4192 used: INSERT INTO replication.tab_queue_data (queue_data, id_action_type, id_table, id_replication_group) ...
    CachedPlan: 8192 total in 4 blocks; 4000 free (0 chunks); 4192 used: INSERT INTO replication.tab_queue_data (queue_data, id_action_type, id_table, id_replication_group) ...
    CachedPlan: 8192 total in 4 blocks; 4000 free (0 chunks); 4192 used: INSERT INTO replication.tab_queue_data (queue_data, id_action_type, id_table, id_replication_group) ...
    CachedPlan: 8192 total in 4 blocks; 4000 free (0 chunks); 4192 used: INSERT INTO replication.tab_queue_data (queue_data, id_action_type, id_table, id_replication_group) ...
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: i_replication_tabqueuedata_idreplicationgroup
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: tab_queue_data_pkey
    CachedPlan: 8192 total in 4 blocks; 4000 free (0 chunks); 4192 used: INSERT INTO replication.tab_queue_data (queue_data, id_action_type, id_table, id_replication_group) ...
    CachedPlan: 16384 total in 5 blocks; 8000 free (2 chunks); 8384 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
    CachedPlan: 134336 total in 10 blocks; 42112 free (0 chunks); 92224 used: WITH opp AS MATERIALIZED         (             SELECT DISTINCT o2.id_opportunity             FROM  o...
    CachedPlan: 65536 total in 7 blocks; 31968 free (0 chunks); 33568 used: SELECT lc.contestant_name_lang                                             FROM opportunities.tab_op...
    CachedPlan: 65536 total in 7 blocks; 31968 free (0 chunks); 33568 used: SELECT lc.contestant_name_lang                                             FROM opportunities.tab_op...
    CachedPlan: 16384 total in 5 blocks; 8000 free (2 chunks); 8384 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
    CachedPlan: 16384 total in 5 blocks; 8000 free (2 chunks); 8384 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
    CachedPlan: 16384 total in 5 blocks; 8000 free (2 chunks); 8384 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
    CachedPlan: 16384 total in 5 blocks; 8000 free (2 chunks); 8384 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
    CachedPlan: 16384 total in 5 blocks; 8000 free (2 chunks); 8384 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
    CachedPlan: 8192 total in 4 blocks; 3672 free (0 chunks); 4520 used: EXISTS         (             SELECT               FROM opportunities.tab_oppty_tree              WHE...
    CachedPlan: 65536 total in 7 blocks; 32224 free (0 chunks); 33312 used: SELECT CASE        WHEN SUBSTRING(op.marketing_states FROM 1 FOR 1) = '1' AND op.oppty_end_date + op...
    CachedPlanSource: 4096 total in 3 blocks; 1384 free (1 chunks); 2712 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
      CachedPlanQuery: 8192 total in 4 blocks; 1504 free (0 chunks); 6688 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlan: 4096 total in 3 blocks; 392 free (0 chunks); 3704 used: INSERT INTO opportunities.tab_oppty_tree     (         id_opportunity,         id_opportunity_parent...
    CachedPlan: 35840 total in 6 blocks; 15344 free (1 chunks); 20496 used: INSERT INTO opportunities.tab_opportunities     (         id_event,         id_oppty_type,         i...
    CachedPlanSource: 4096 total in 3 blocks; 1384 free (1 chunks); 2712 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
      CachedPlanQuery: 8192 total in 4 blocks; 1504 free (0 chunks); 6688 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlan: 8192 total in 4 blocks; 3664 free (1 chunks); 4528 used: NOT EXISTS         (             SELECT             FROM opportunities.tab_oppty_tree             WH...
    CachedPlan: 16384 total in 5 blocks; 7960 free (2 chunks); 8424 used: UPDATE opportunities.tab_opportunities        SET oppty_info_number = _info_number,            id_op...
    CachedPlanSource: 4096 total in 3 blocks; 1384 free (1 chunks); 2712 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
      CachedPlanQuery: 8192 total in 4 blocks; 1504 free (3 chunks); 6688 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlan: 8192 total in 4 blocks; 616 free (2 chunks); 7576 used: SELECT         oppty_info_number,         id_event,         id_oppty_type                           ...
    CachedPlanSource: 4096 total in 3 blocks; 1384 free (1 chunks); 2712 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
      CachedPlanQuery: 8192 total in 4 blocks; 1504 free (0 chunks); 6688 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlan: 134408 total in 10 blocks; 51528 free (0 chunks); 82880 used: INSERT INTO opportunities.tab_correlations (id_opportunity, id_opportunity_2)     SELECT         crr...
    CachedPlan: 277504 total in 9 blocks; 15680 free (0 chunks); 261824 used: INSERT INTO opportunities.tab_correlations (id_opportunity, id_opportunity_2)     -- id_league from ...
    CachedPlan: 134144 total in 8 blocks; 30024 free (0 chunks); 104120 used: SELECT         nm1.contestant_name,         COALESCE(nm2.contestant_name, nm1.contestant_name),     ...
    CachedPlan: 32768 total in 6 blocks; 2064 free (0 chunks); 30704 used: SELECT evn.id_league     FROM opportunities.tab_opportunities    AS opp     JOIN opportunities.tab_e...
    CachedPlan: 16384 total in 5 blocks; 6880 free (0 chunks); 9504 used: NEW.id_opportunity <> NEW.id_opportunity_parent OR         EXISTS         (             SELECT      ...
    CachedPlan: 16384 total in 5 blocks; 4952 free (1 chunks); 11432 used: (TG_OP='INSERT' AND CASE WHEN get_session_var('id_data_provider') = '12' THEN FALSE ELSE (SELECT CAS...
    CachedPlanSource: 4096 total in 3 blocks; 1384 free (1 chunks); 2712 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
      CachedPlanQuery: 8192 total in 4 blocks; 1504 free (0 chunks); 6688 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlanSource: 4096 total in 3 blocks; 1384 free (1 chunks); 2712 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
      CachedPlanQuery: 8192 total in 4 blocks; 1504 free (0 chunks); 6688 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlan: 4096 total in 3 blocks; 392 free (0 chunks); 3704 used: INSERT INTO opportunities.tab_oppty_tree     (         id_opportunity,         id_opportunity_parent...
    CachedPlan: 33792 total in 6 blocks; 14192 free (0 chunks); 19600 used: INSERT INTO opportunities.tab_opportunities     (         id_event,         id_oppty_type,         o...
    CachedPlan: 16384 total in 5 blocks; 8000 free (2 chunks); 8384 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
    CachedPlanSource: 4096 total in 3 blocks; 1384 free (1 chunks); 2712 used: SELECT 1 FROM ONLY "opportunities"."tab_opportunities" x WHERE "id_opportunity" OPERATOR(pg_catalog....
      CachedPlanQuery: 8192 total in 4 blocks; 1504 free (0 chunks); 6688 used
    SPI Plan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlan: 32768 total in 6 blocks; 9352 free (0 chunks); 23416 used: EXISTS (         SELECT         FROM  new_items AS ni         JOIN  old_items AS oi ON oi.id_opportu...
    CachedPlan: 8192 total in 4 blocks; 3120 free (1 chunks); 5072 used: OLD.oppty_name IS DISTINCT FROM NEW.oppty_name     AND EXISTS     (         SELECT         FROM oppo...
    CachedPlan: 9216 total in 4 blocks; 1408 free (1 chunks); 7808 used: NEW.oppty_end_date + NEW.oppty_end_time > NOW()         AND (COALESCE(OLD.oppty_end_date + OLD.oppty...
    index info: 2048 total in 2 blocks; 392 free (0 chunks); 1656 used: ix_tabopptytree_idopportunityparent_temporary
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: pk_tabopptytree_idopportunity
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopptytree_idopportunityparent
    index info: 3072 total in 2 blocks; 808 free (1 chunks); 2264 used: pg_statistic_relid_att_inh_index
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: ix_tabopportunities_opptybetradarid_iddataprovider
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_opptyevaltimetz
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_opptyendtimestamptz
    index info: 2048 total in 2 blocks; 184 free (0 chunks); 1864 used: ix_tabopportunities_opptyenddatetime
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_idconstestant4
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_idconstestant3
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_opptyexternalcompleteid
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_opptybetradarid
    index info: 4096 total in 3 blocks; 2024 free (2 chunks); 2072 used: ix_tabopportunities_opptyenddatetime_idopptystate
    index info: 2048 total in 2 blocks; 392 free (0 chunks); 1656 used: ix_tabopportunities_opptyenddatetime_obsolete
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_opptyenddate
    index info: 2048 total in 2 blocks; 264 free (0 chunks); 1784 used: ix_tabopportunities_lastminute
    index info: 2048 total in 2 blocks; 824 free (0 chunks); 1224 used: ix_tabopportunities_info
    index info: 3072 total in 2 blocks; 808 free (1 chunks); 2264 used: ix_tabopportunities_idopptytype_vsportsid_idevent
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_idopportunitytype
    index info: 3072 total in 2 blocks; 808 free (1 chunks); 2264 used: ix_tabopportunities_idopportunitystate_opptyenddate_idevent
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: ix_tabopportunities_idopportunitystate_opptyenddate
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_idopportunitystate
    index info: 3072 total in 2 blocks; 808 free (1 chunks); 2264 used: ix_tabopportunities_idopportunity_opptyenddate_opptyinfonumber
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: ix_tabopportunities_idevent_opptyenddate
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_idconstestant2
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_tabopportunities_idconstestant1
    index info: 2048 total in 2 blocks; 416 free (1 chunks); 1632 used: ix_tabopportunities_enddateinfonumber
    index info: 2048 total in 2 blocks; 360 free (0 chunks); 1688 used: idx_tabopportunities_opptyspecialvalues_season_id
    index info: 2048 total in 2 blocks; 360 free (0 chunks); 1688 used: idx_tabopportunities_opptyspecialvalues_match_id
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: pk_tabopportunities_idopportunity
    CachedPlan: 16384 total in 5 blocks; 6712 free (0 chunks); 9672 used: UPDATE opportunities.lst_oppty_numbers     SET id_opportunity = _id_opportunity     WHERE id_oppty_n...
    CachedPlan: 8192 total in 4 blocks; 3704 free (1 chunks); 4488 used: UPDATE opportunities.lst_oppty_numbers                    SET id_opportunity  = NULL                ...
    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: pg_toast_2619_index
    index info: 2048 total in 2 blocks; 584 free (0 chunks); 1464 used: ix_lstopptynumbers_freeopptynumbers
    index info: 2048 total in 2 blocks; 792 free (0 chunks); 1256 used: ix_lstopptynumbers_idopportunity
    1455 more child contexts containing 6947040 total in 4141 blocks; 2312976 free (668 chunks); 4634064 used
  WAL record construction: 49768 total in 2 blocks; 6360 free (0 chunks); 43408 used
  PrivateRefCount: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
  MdSmgr: 16384 total in 2 blocks; 5624 free (20 chunks); 10760 used
  LOCALLOCK hash: 32768 total in 3 blocks; 12712 free (8 chunks); 20056 used
  plpgsql_check - profiler context: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
    plpgsql_check function execution statistics: 8192 total in 1 blocks; 2488 free (0 chunks); 5704 used
    plpgsql_check function profiler local chunks: 8192 total in 1 blocks; 2488 free (0 chunks); 5704 used
    plpgsql_check function profiler local cache: 8192 total in 1 blocks; 2488 free (0 chunks); 5704 used
  plpgsql_check function cache: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
  Rendezvous variable hash: 8192 total in 1 blocks; 552 free (0 chunks); 7640 used
  PLpgSQL function hash: 40864 total in 3 blocks; 2616 free (0 chunks); 38248 used
  Timezones: 104120 total in 2 blocks; 2616 free (0 chunks); 101504 used
  ErrorContext: 8192 total in 1 blocks; 7928 free (8 chunks); 264 used
Grand total: 14312808 bytes in 4752 blocks; 3920880 free (1043 chunks); 10391928 used


I hope, this will help.

Some additional information:

-bash-4.2$ cat /proc/17048/status
Name: postmaster
Umask: 0077
State: S (sleeping)
Tgid: 17048
Ngid: 0
Pid: 17048
PPid: 4476
TracerPid: 0
Uid: 26 26 26 26
Gid: 26 26 26 26
FDSize: 1024
Groups: 26
VmPeak: 12212748 kB
VmSize: 12212284 kB
VmLck:       0 kB
VmPin:       0 kB
VmHWM: 9421940 kB
VmRSS: 9421940 kB
RssAnon: 1054764 kB
RssFile:   15796 kB
RssShmem: 8351380 kB
VmData: 1055404 kB
VmStk:     132 kB
VmExe:    7816 kB
VmLib:   85312 kB
VmPTE:   23264 kB
VmSwap:       0 kB
Threads: 1
SigQ: 0/160197
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000400000
SigIgn: 0000000001701800
SigCgt: 00000001a0006287
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: ffffff
Cpus_allowed_list: 0-23
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 4748002
nonvoluntary_ctxt_switches: 12237

-bash-4.2$ pmap 17048
17048:   postgres: db-prematch: app_feed prematch 172.25.0.231(47890) idle
0000000000400000   7816K r-x-- postgres
0000000000da1000      4K r---- postgres
0000000000da2000     68K rw--- postgres
0000000000db3000    200K rw---   [ anon ]
0000000002d15000    504K rw---   [ anon ]
0000000002d93000 838092K rw---   [ anon ]
00007fd999777000 180232K rw---   [ anon ]
00007fd9a8d75000  32772K rw---   [ anon ]
00007fd9ac5d2000   1024K rw-s- PostgreSQL.380355184
00007fd9ac6d2000    196K rw-s- PostgreSQL.38763112
00007fd9ac703000    148K r-x-- libtinfo.so.5.9
00007fd9ac728000   2048K ----- libtinfo.so.5.9
00007fd9ac928000     16K r---- libtinfo.so.5.9
00007fd9ac92c000      4K rw--- libtinfo.so.5.9
00007fd9ac92d000    212K r-x-- libedit.so.0.0.42
00007fd9ac962000   2044K ----- libedit.so.0.0.42
00007fd9acb61000     12K r---- libedit.so.0.0.42
00007fd9acb64000      4K rw--- libedit.so.0.0.42
00007fd9acb65000     20K rw---   [ anon ]
00007fd9acb6a000     28K r-x-- libffi.so.6.0.1
00007fd9acb71000   2044K ----- libffi.so.6.0.1
00007fd9acd70000      4K r---- libffi.so.6.0.1
00007fd9acd71000      4K rw--- libffi.so.6.0.1
00007fd9acd72000  45952K r-x-- libLLVM-5.0.so
00007fd9afa52000   2048K ----- libLLVM-5.0.so
00007fd9afc52000   3072K r---- libLLVM-5.0.so
00007fd9aff52000    444K rw--- libLLVM-5.0.so
00007fd9affc1000    372K rw---   [ anon ]
00007fd9b001e000    116K r-x-- llvmjit.so
00007fd9b003b000   2048K ----- llvmjit.so
00007fd9b023b000      4K r---- llvmjit.so
00007fd9b023c000      4K rw--- llvmjit.so
00007fd9b023d000   1028K rw---   [ anon ]
00007fd9b033e000     48K r-x-- libnss_files-2.17.so
00007fd9b034a000   2044K ----- libnss_files-2.17.so
00007fd9b0549000      4K r---- libnss_files-2.17.so
00007fd9b054a000      4K rw--- libnss_files-2.17.so
00007fd9b054b000     24K rw---   [ anon ]
00007fd9b0551000 10827040K rw-s- zero (deleted)
00007fdc45299000     32K r-x-- libprotobuf-c.so.1
00007fdc452a1000   2044K ----- libprotobuf-c.so.1
00007fdc454a0000      4K r---- libprotobuf-c.so.1
00007fdc454a1000      4K rw--- libprotobuf-c.so.1
00007fdc454a2000     24K r-x-- decoderbufs.so
00007fdc454a8000   2048K ----- decoderbufs.so
00007fdc456a8000      4K r---- decoderbufs.so
00007fdc456a9000      4K rw--- decoderbufs.so
00007fdc456aa000    184K r-x-- plpgsql.so
00007fdc456d8000   2044K ----- plpgsql.so
00007fdc458d7000      8K r---- plpgsql.so
00007fdc458d9000      4K rw--- plpgsql.so
00007fdc458da000    192K r-x-- plpgsql_check.so
00007fdc4590a000   2044K ----- plpgsql_check.so
00007fdc45b09000      4K r---- plpgsql_check.so
00007fdc45b0a000      4K rw--- plpgsql_check.so
00007fdc45b0b000     24K r-x-- pg_wait_sampling.so
00007fdc45b11000   2044K ----- pg_wait_sampling.so
00007fdc45d10000      4K r---- pg_wait_sampling.so
00007fdc45d11000      4K rw--- pg_wait_sampling.so
00007fdc45d12000     20K r-x-- pg_stat_kcache.so
00007fdc45d17000   2044K ----- pg_stat_kcache.so
00007fdc45f16000      4K r---- pg_stat_kcache.so
00007fdc45f17000      4K rw--- pg_stat_kcache.so
00007fdc45f18000     16K rw---   [ anon ]
00007fdc45f1c000     32K r-x-- pg_qualstats.so
00007fdc45f24000   2044K ----- pg_qualstats.so
00007fdc46123000      4K r---- pg_qualstats.so
00007fdc46124000      4K rw--- pg_qualstats.so
00007fdc46125000     32K r-x-- pg_stat_statements.so
00007fdc4612d000   2044K ----- pg_stat_statements.so
00007fdc4632c000      4K r---- pg_stat_statements.so
00007fdc4632d000      4K rw--- pg_stat_statements.so
00007fdc4632e000 103692K r---- locale-archive
00007fdc4c871000      8K r-x-- libfreebl3.so
00007fdc4c873000   2044K ----- libfreebl3.so
00007fdc4ca72000      4K r---- libfreebl3.so
00007fdc4ca73000      4K rw--- libfreebl3.so
00007fdc4ca74000     60K r-x-- libbz2.so.1.0.6
00007fdc4ca83000   2044K ----- libbz2.so.1.0.6
00007fdc4cc82000      4K r---- libbz2.so.1.0.6
00007fdc4cc83000      4K rw--- libbz2.so.1.0.6
00007fdc4cc84000     92K r-x-- libelf-0.176.so
00007fdc4cc9b000   2044K ----- libelf-0.176.so
00007fdc4ce9a000      4K r---- libelf-0.176.so
00007fdc4ce9b000      4K rw--- libelf-0.176.so
00007fdc4ce9c000    384K r-x-- libpcre.so.1.2.0
00007fdc4cefc000   2048K ----- libpcre.so.1.2.0
00007fdc4d0fc000      4K r---- libpcre.so.1.2.0
00007fdc4d0fd000      4K rw--- libpcre.so.1.2.0
00007fdc4d0fe000     16K r-x-- libattr.so.1.1.0
00007fdc4d102000   2044K ----- libattr.so.1.1.0
00007fdc4d301000      4K r---- libattr.so.1.1.0
00007fdc4d302000      4K rw--- libattr.so.1.1.0
00007fdc4d303000     32K r-x-- libcrypt-2.17.so
00007fdc4d30b000   2044K ----- libcrypt-2.17.so
00007fdc4d50a000      4K r---- libcrypt-2.17.so
00007fdc4d50b000      4K rw--- libcrypt-2.17.so
00007fdc4d50c000    184K rw---   [ anon ]
00007fdc4d53a000     16K r-x-- libcap-ng.so.0.0.0
00007fdc4d53e000   2048K ----- libcap-ng.so.0.0.0
00007fdc4d73e000      4K r---- libcap-ng.so.0.0.0
00007fdc4d73f000      4K rw--- libcap-ng.so.0.0.0
00007fdc4d740000    312K r-x-- libdw-0.176.so
00007fdc4d78e000   2048K ----- libdw-0.176.so
00007fdc4d98e000      8K r---- libdw-0.176.so
00007fdc4d990000      4K rw--- libdw-0.176.so
00007fdc4d991000     16K r-x-- libgpg-error.so.0.10.0
00007fdc4d995000   2044K ----- libgpg-error.so.0.10.0
00007fdc4db94000      4K r---- libgpg-error.so.0.10.0
00007fdc4db95000      4K rw--- libgpg-error.so.0.10.0
00007fdc4db96000    500K r-x-- libgcrypt.so.11.8.2
00007fdc4dc13000   2044K ----- libgcrypt.so.11.8.2
00007fdc4de12000      4K r---- libgcrypt.so.11.8.2
00007fdc4de13000     12K rw--- libgcrypt.so.11.8.2
00007fdc4de16000      4K rw---   [ anon ]
00007fdc4de17000    144K r-x-- libselinux.so.1
00007fdc4de3b000   2044K ----- libselinux.so.1
00007fdc4e03a000      4K r---- libselinux.so.1
00007fdc4e03b000      4K rw--- libselinux.so.1
00007fdc4e03c000      8K rw---   [ anon ]
00007fdc4e03e000     16K r-x-- libcap.so.2.22
00007fdc4e042000   2044K ----- libcap.so.2.22
00007fdc4e241000      4K r---- libcap.so.2.22
00007fdc4e242000      4K rw--- libcap.so.2.22
00007fdc4e243000  20296K r-x-- libicudata.so.50.2
00007fdc4f615000   2044K ----- libicudata.so.50.2
00007fdc4f814000      4K r---- libicudata.so.50.2
00007fdc4f815000      4K rw--- libicudata.so.50.2
00007fdc4f816000     84K r-x-- libgcc_s-4.8.5-20150702.so.1
00007fdc4f82b000   2044K ----- libgcc_s-4.8.5-20150702.so.1
00007fdc4fa2a000      4K r---- libgcc_s-4.8.5-20150702.so.1
00007fdc4fa2b000      4K rw--- libgcc_s-4.8.5-20150702.so.1
00007fdc4fa2c000    932K r-x-- libstdc++.so.6.0.19
00007fdc4fb15000   2044K ----- libstdc++.so.6.0.19
00007fdc4fd14000     32K r---- libstdc++.so.6.0.19
00007fdc4fd1c000      8K rw--- libstdc++.so.6.0.19
00007fdc4fd1e000     84K rw---   [ anon ]
00007fdc4fd33000    232K r-x-- libnspr4.so
00007fdc4fd6d000   2044K ----- libnspr4.so
00007fdc4ff6c000      4K r---- libnspr4.so
00007fdc4ff6d000      8K rw--- libnspr4.so
00007fdc4ff6f000      8K rw---   [ anon ]
00007fdc4ff71000     16K r-x-- libplc4.so
00007fdc4ff75000   2044K ----- libplc4.so
00007fdc50174000      4K r---- libplc4.so
00007fdc50175000      4K rw--- libplc4.so
00007fdc50176000     12K r-x-- libplds4.so
00007fdc50179000   2044K ----- libplds4.so
00007fdc50378000      4K r---- libplds4.so
00007fdc50379000      4K rw--- libplds4.so
00007fdc5037a000    164K r-x-- libnssutil3.so
00007fdc503a3000   2044K ----- libnssutil3.so
00007fdc505a2000     28K r---- libnssutil3.so
00007fdc505a9000      4K rw--- libnssutil3.so
00007fdc505aa000   1176K r-x-- libnss3.so
00007fdc506d0000   2048K ----- libnss3.so
00007fdc508d0000     20K r---- libnss3.so
00007fdc508d5000      8K rw--- libnss3.so
00007fdc508d7000      8K rw---   [ anon ]
00007fdc508d9000    148K r-x-- libsmime3.so
00007fdc508fe000   2044K ----- libsmime3.so
00007fdc50afd000     12K r---- libsmime3.so
00007fdc50b00000      4K rw--- libsmime3.so
00007fdc50b01000    332K r-x-- libssl3.so
00007fdc50b54000   2048K ----- libssl3.so
00007fdc50d54000     16K r---- libssl3.so
00007fdc50d58000      4K rw--- libssl3.so
00007fdc50d59000      4K rw---   [ anon ]
00007fdc50d5a000    112K r-x-- libsasl2.so.3.0.0
00007fdc50d76000   2044K ----- libsasl2.so.3.0.0
00007fdc50f75000      4K r---- libsasl2.so.3.0.0
00007fdc50f76000      4K rw--- libsasl2.so.3.0.0
00007fdc50f77000     56K r-x-- liblber-2.4.so.2.10.7
00007fdc50f85000   2044K ----- liblber-2.4.so.2.10.7
00007fdc51184000      4K r---- liblber-2.4.so.2.10.7
00007fdc51185000      4K rw--- liblber-2.4.so.2.10.7
00007fdc51186000     88K r-x-- libresolv-2.17.so
00007fdc5119c000   2048K ----- libresolv-2.17.so
00007fdc5139c000      4K r---- libresolv-2.17.so
00007fdc5139d000      4K rw--- libresolv-2.17.so
00007fdc5139e000      8K rw---   [ anon ]
00007fdc513a0000     12K r-x-- libkeyutils.so.1.5
00007fdc513a3000   2044K ----- libkeyutils.so.1.5
00007fdc515a2000      4K r---- libkeyutils.so.1.5
00007fdc515a3000      4K rw--- libkeyutils.so.1.5
00007fdc515a4000     56K r-x-- libkrb5support.so.0.1
00007fdc515b2000   2048K ----- libkrb5support.so.0.1
00007fdc517b2000      4K r---- libkrb5support.so.0.1
00007fdc517b3000      4K rw--- libkrb5support.so.0.1
00007fdc517b4000    196K r-x-- libk5crypto.so.3.1
00007fdc517e5000   2044K ----- libk5crypto.so.3.1
00007fdc519e4000      8K r---- libk5crypto.so.3.1
00007fdc519e6000      4K rw--- libk5crypto.so.3.1
00007fdc519e7000     12K r-x-- libcom_err.so.2.1
00007fdc519ea000   2044K ----- libcom_err.so.2.1
00007fdc51be9000      4K r---- libcom_err.so.2.1
00007fdc51bea000      4K rw--- libcom_err.so.2.1
00007fdc51beb000    868K r-x-- libkrb5.so.3.3
00007fdc51cc4000   2044K ----- libkrb5.so.3.3
00007fdc51ec3000     56K r---- libkrb5.so.3.3
00007fdc51ed1000     12K rw--- libkrb5.so.3.3
00007fdc51ed4000    120K r-x-- libaudit.so.1.0.0
00007fdc51ef2000   2044K ----- libaudit.so.1.0.0
00007fdc520f1000      4K r---- libaudit.so.1.0.0
00007fdc520f2000      4K rw--- libaudit.so.1.0.0
00007fdc520f3000     40K rw---   [ anon ]
00007fdc520fd000    148K r-x-- liblzma.so.5.2.2
00007fdc52122000   2044K ----- liblzma.so.5.2.2
00007fdc52321000      4K r---- liblzma.so.5.2.2
00007fdc52322000      4K rw--- liblzma.so.5.2.2
00007fdc52323000     84K r-x-- libz.so.1.2.7
00007fdc52338000   2044K ----- libz.so.1.2.7
00007fdc52537000      4K r---- libz.so.1.2.7
00007fdc52538000      4K rw--- libz.so.1.2.7
00007fdc52539000   1804K r-x-- libc-2.17.so
00007fdc526fc000   2048K ----- libc-2.17.so
00007fdc528fc000     16K r---- libc-2.17.so
00007fdc52900000      8K rw--- libc-2.17.so
00007fdc52902000     20K rw---   [ anon ]
00007fdc52907000    188K r-x-- libsystemd.so.0.6.0
00007fdc52936000   2048K ----- libsystemd.so.0.6.0
00007fdc52b36000      4K r---- libsystemd.so.0.6.0
00007fdc52b37000      4K rw--- libsystemd.so.0.6.0
00007fdc52b38000   1424K r-x-- libicuuc.so.50.2
00007fdc52c9c000   2048K ----- libicuuc.so.50.2
00007fdc52e9c000     64K r---- libicuuc.so.50.2
00007fdc52eac000      4K rw--- libicuuc.so.50.2
00007fdc52ead000     16K rw---   [ anon ]
00007fdc52eb1000   1984K r-x-- libicui18n.so.50.2
00007fdc530a1000   2048K ----- libicui18n.so.50.2
00007fdc532a1000     48K r---- libicui18n.so.50.2
00007fdc532ad000      8K rw--- libicui18n.so.50.2
00007fdc532af000      4K rw---   [ anon ]
00007fdc532b0000    328K r-x-- libldap-2.4.so.2.10.7
00007fdc53302000   2048K ----- libldap-2.4.so.2.10.7
00007fdc53502000      8K r---- libldap-2.4.so.2.10.7
00007fdc53504000      4K rw--- libldap-2.4.so.2.10.7
00007fdc53505000   1028K r-x-- libm-2.17.so
00007fdc53606000   2044K ----- libm-2.17.so
00007fdc53805000      4K r---- libm-2.17.so
00007fdc53806000      4K rw--- libm-2.17.so
00007fdc53807000      8K r-x-- libdl-2.17.so
00007fdc53809000   2048K ----- libdl-2.17.so
00007fdc53a09000      4K r---- libdl-2.17.so
00007fdc53a0a000      4K rw--- libdl-2.17.so
00007fdc53a0b000     28K r-x-- librt-2.17.so
00007fdc53a12000   2044K ----- librt-2.17.so
00007fdc53c11000      4K r---- librt-2.17.so
00007fdc53c12000      4K rw--- librt-2.17.so
00007fdc53c13000     92K r-x-- libpthread-2.17.so
00007fdc53c2a000   2044K ----- libpthread-2.17.so
00007fdc53e29000      4K r---- libpthread-2.17.so
00007fdc53e2a000      4K rw--- libpthread-2.17.so
00007fdc53e2b000     16K rw---   [ anon ]
00007fdc53e2f000    296K r-x-- libgssapi_krb5.so.2.2
00007fdc53e79000   2048K ----- libgssapi_krb5.so.2.2
00007fdc54079000      4K r---- libgssapi_krb5.so.2.2
00007fdc5407a000      8K rw--- libgssapi_krb5.so.2.2
00007fdc5407c000   2264K r-x-- libcrypto.so.1.0.2k
00007fdc542b2000   2048K ----- libcrypto.so.1.0.2k
00007fdc544b2000    112K r---- libcrypto.so.1.0.2k
00007fdc544ce000     52K rw--- libcrypto.so.1.0.2k
00007fdc544db000     16K rw---   [ anon ]
00007fdc544df000    412K r-x-- libssl.so.1.0.2k
00007fdc54546000   2048K ----- libssl.so.1.0.2k
00007fdc54746000     16K r---- libssl.so.1.0.2k
00007fdc5474a000     28K rw--- libssl.so.1.0.2k
00007fdc54751000     52K r-x-- libpam.so.0.83.1
00007fdc5475e000   2048K ----- libpam.so.0.83.1
00007fdc5495e000      4K r---- libpam.so.0.83.1
00007fdc5495f000      4K rw--- libpam.so.0.83.1
00007fdc54960000   1404K r-x-- libxml2.so.2.9.1
00007fdc54abf000   2044K ----- libxml2.so.2.9.1
00007fdc54cbe000     32K r---- libxml2.so.2.9.1
00007fdc54cc6000      8K rw--- libxml2.so.2.9.1
00007fdc54cc8000      8K rw---   [ anon ]
00007fdc54cca000     80K r-x-- liblz4.so.1.7.5
00007fdc54cde000   2044K ----- liblz4.so.1.7.5
00007fdc54edd000      4K r---- liblz4.so.1.7.5
00007fdc54ede000      4K rw--- liblz4.so.1.7.5
00007fdc54edf000    136K r-x-- ld-2.17.so
00007fdc54f1a000   1588K rw---   [ anon ]
00007fdc550a7000    212K rw-s- PostgreSQL.4009629762
00007fdc550dc000    104K rw---   [ anon ]
00007fdc550fc000      8K r-x--   [ anon ]
00007fdc550fe000      4K rw-s-   [ shmid=0x67418000 ]
00007fdc550ff000      4K rw---   [ anon ]
00007fdc55100000      4K r---- ld-2.17.so
00007fdc55101000      4K rw--- ld-2.17.so
00007fdc55102000      4K rw---   [ anon ]
00007fffd382b000    132K rw---   [ stack ]
00007fffd38f0000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total         12212288K

-bash-4.2$ smem -u
User     Count     Swap      USS      PSS      RSS
postgres   326        0 10503276 20969859 175615624
-bash-4.2$ smem
^C
  PID User     Command                         Swap      USS      PSS      RSS
27250 postgres postgres: db-prematch: arch        0      224      246     2592
27251 postgres postgres: db-prematch: stat        0      512      538     2716
27253 postgres postgres: db-prematch: logi        0      548      567     3480
27249 postgres postgres: db-prematch: auto        0      664      696     3956
27248 postgres postgres: db-prematch: walw        0      244      878    19296
...27246 postgres postgres: db-prematch: chec        0    88308   851490 10271612
12156 postgres postgres: db-prematch: app_        0   902776  1400234  9066116
 8303 postgres postgres: db-prematch: app_        0   913768  1411286  9067356
16469 postgres postgres: db-prematch: app_        0   923668  1417847  9071168
31187 postgres postgres: db-prematch: app_        0   923024  1425936  9196732
13769 postgres postgres: db-prematch: app_        0   925820  1437147  9283344
26144 postgres postgres: db-prematch: app_        0   937440  1437802  9180720
15449 postgres postgres: db-prematch: app_        0   932776  1440671  9262056
22570 postgres postgres: db-prematch: app_        0   938680  1447851  9292176
 2910 postgres postgres: db-prematch: app_        0  1039452  1536980  9183668
17048 postgres postgres: db-prematch: app_        0  1054244  1564436  9425556

-bash-4.2$ smem -w
Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory      18242380   16741684    1500696
userspace memory           21358228   10746068   10612160
free memory                 1433148    1433148          0

-bash-4.2$ smem -m
Map                                       PIDs   AVGPSS      PSS
/dev/shm/PostgreSQL.4009629762             316        0        0
/usr/lib64/gconv/gconv-modules.cache        18        0        0
/usr/lib64/libattr.so.1.1.0                318        0        0
...
/usr/pgsql-14/lib/libpq.so.5.14             15       27      415
/usr/lib64/libkrb5.so.3.3                  319        1      445
/usr/pgsql-14/lib/pg_qualstats.so          318        1      572
/usr/pgsql-14/lib/pg_stat_kcache.so        318        1      572
/usr/lib64/libpython2.7.so.1.0               1      945      945
/usr/pgsql-14/lib/pg_stat_statements.so    318        3     1212
/usr/lib64/libpthread-2.17.so              320        4     1304
/usr/lib64/ld-2.17.so                      322        4     1373
/usr/pgsql-14/lib/plpgsql.so               318        4     1376
/usr/lib64/libcrypto.so.1.0.2k             319        4     1499
/usr/lib64/libc-2.17.so                    322        9     3179
[stack]                                    322       26     8498
/usr/pgsql-14/bin/postgres                 318       57    18281
/usr/lib64/llvm5.0/lib/libLLVM-5.0.so       10     4098    40984
<anonymous>                                322     6557  2111438
[heap]                                     322    25093  8080218
/dev/zero                                  316    33834 10691660


Kind Regards Ales Zeleny
 

If the RssAnon memory is a good indicator, i can then determine the backends and dump memory context.
It'll take me some time since I'm out of office for vacation, but I'll manage that somewhat way.

Thanks for all to the hints!

Aleš 

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Tom Lane
Date:
=?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?= <zeleny.ales@gmail.com> writes:
> after some time, I've found a process consuming over 1GB of memory"
> -bash-4.2$ grep RssAnon /proc/*/status | sort -nk2 | tail
> /proc/17048/status:RssAnon: 1053952 kB

> Here are memory contexts for PID 17048:

> TopMemoryContext: 422592 total in 14 blocks; 42536 free (169 chunks);
> 380056 used
...
> Grand total: 14312808 bytes in 4752 blocks; 3920880 free (1043 chunks);
> 10391928 used

OK, so PG's normal memory consumption is only ~14MB.  Where'd the
rest of it go?

> -bash-4.2$ pmap 17048
...
> 0000000002d93000 838092K rw---   [ anon ]
> 00007fd999777000 180232K rw---   [ anon ]
> 00007fd9a8d75000  32772K rw---   [ anon ]
...
Probably the 838M chunk is shared memory?  Is that within hailing
distance of your shared_buffers setting?
...
> 00007fd9b0551000 10827040K rw-s- zero (deleted)
...
And here we have the culprit, evidently ... but what the dickens
is it?  I can't think of any mechanism within Postgres that would
create such an allocation.

            regards, tom lane



Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:

po 18. 7. 2022 v 16:25 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Aleš Zelený <zeleny.ales@gmail.com> writes:
> after some time, I've found a process consuming over 1GB of memory"
> -bash-4.2$ grep RssAnon /proc/*/status | sort -nk2 | tail
> /proc/17048/status:RssAnon: 1053952 kB

> Here are memory contexts for PID 17048:

> TopMemoryContext: 422592 total in 14 blocks; 42536 free (169 chunks);
> 380056 used
...
> Grand total: 14312808 bytes in 4752 blocks; 3920880 free (1043 chunks);
> 10391928 used

OK, so PG's normal memory consumption is only ~14MB.  Where'd the
rest of it go?

> -bash-4.2$ pmap 17048
...
> 0000000002d93000 838092K rw---   [ anon ]
> 00007fd999777000 180232K rw---   [ anon ]
> 00007fd9a8d75000  32772K rw---   [ anon ]

and a few hours later it even grew:

-bash-4.2$ cat 20220718_200230.pmap.17048.log | grep anon
0000000000db3000    200K rw---   [ anon ]
0000000002d15000    504K rw---   [ anon ]
0000000002d93000 934476K rw---   [ anon ]
00007fd989776000 311304K rw---   [ anon ]
00007fd9a8d75000  32772K rw---   [ anon ]
00007fd9acb65000     20K rw---   [ anon ]
00007fd9affc1000    372K rw---   [ anon ]

From previous observation I know, that the process RssAnon memory grew over time, sometimes there are some steps. Still, generally, the growth is linear until the process finishes or we run out of memory, and the cluster is reinitialized by the postmaster.
No such behavior on older versions until 14.3 (including).

...
Probably the 838M chunk is shared memory?  Is that within hailing
distance of your shared_buffers setting?

postgres=# show shared_buffers ;
 shared_buffers
----------------
 10GB
(1 row)

...
> 00007fd9b0551000 10827040K rw-s- zero (deleted)

These 10GB matches 10GB configured as shared buffers.
 
...
And here we have the culprit, evidently ... but what the dickens
is it?  I can't think of any mechanism within Postgres that would
create such an allocation.

                        regards, tom lane
 
Kind regards
Ales Zeleny

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Pavel Stehule
Date:


po 18. 7. 2022 v 20:26 odesílatel Aleš Zelený <zeleny.ales@gmail.com> napsal:

po 18. 7. 2022 v 16:25 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Aleš Zelený <zeleny.ales@gmail.com> writes:
> after some time, I've found a process consuming over 1GB of memory"
> -bash-4.2$ grep RssAnon /proc/*/status | sort -nk2 | tail
> /proc/17048/status:RssAnon: 1053952 kB

> Here are memory contexts for PID 17048:

> TopMemoryContext: 422592 total in 14 blocks; 42536 free (169 chunks);
> 380056 used
...
> Grand total: 14312808 bytes in 4752 blocks; 3920880 free (1043 chunks);
> 10391928 used

OK, so PG's normal memory consumption is only ~14MB.  Where'd the
rest of it go?

> -bash-4.2$ pmap 17048
...
> 0000000002d93000 838092K rw---   [ anon ]
> 00007fd999777000 180232K rw---   [ anon ]
> 00007fd9a8d75000  32772K rw---   [ anon ]

and a few hours later it even grew:

-bash-4.2$ cat 20220718_200230.pmap.17048.log | grep anon
0000000000db3000    200K rw---   [ anon ]
0000000002d15000    504K rw---   [ anon ]
0000000002d93000 934476K rw---   [ anon ]
00007fd989776000 311304K rw---   [ anon ]
00007fd9a8d75000  32772K rw---   [ anon ]
00007fd9acb65000     20K rw---   [ anon ]
00007fd9affc1000    372K rw---   [ anon ]

From previous observation I know, that the process RssAnon memory grew over time, sometimes there are some steps. Still, generally, the growth is linear until the process finishes or we run out of memory, and the cluster is reinitialized by the postmaster.
No such behavior on older versions until 14.3 (including).

...
Probably the 838M chunk is shared memory?  Is that within hailing
distance of your shared_buffers setting?

postgres=# show shared_buffers ;
 shared_buffers
----------------
 10GB
(1 row)

...
> 00007fd9b0551000 10827040K rw-s- zero (deleted)

These 10GB matches 10GB configured as shared buffers.
 
...
And here we have the culprit, evidently ... but what the dickens
is it?  I can't think of any mechanism within Postgres that would
create such an allocation.

I checked code, and there is lot of usage of malloc function.

static void *
GenerationAlloc(MemoryContext context, Size size)
{
<-->GenerationContext *set = (GenerationContext *) context;
<-->GenerationBlock *block;
<-->GenerationChunk *chunk;
<-->Size<--><-->chunk_size = MAXALIGN(size);
<-->Size<--><-->required_size = chunk_size + Generation_CHUNKHDRSZ;

<-->/* is it an over-sized chunk? if yes, allocate special block */
<-->if (chunk_size > set->allocChunkLimit)
<-->{
<--><-->Size<--><-->blksize = required_size + Generation_BLOCKHDRSZ;

<--><-->block = (GenerationBlock *) malloc(blksize);
<--><-->if (block == NULL)
<--><--><-->return NULL;

Cannot be some memory lost in this allocation?

Regards

Pavel

 

                        regards, tom lane
 
Kind regards
Ales Zeleny

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Tom Lane
Date:
=?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?= <zeleny.ales@gmail.com> writes:
> postgres=# show shared_buffers ;
>  shared_buffers
> ----------------
>  10GB
> (1 row)

Oh!  In that case, there is absolutely nothing to see here.
This chunk:

> 00007fd9b0551000 10827040K rw-s- zero (deleted)

must be your shared-memory region, and it's accounting for just
about all of the process's memory space.  There is no reason
to suspect a leak.

I think you've been fooled by the fact that "top" and some other
tools frequently don't report a shared-memory page as part of a
process's usage until that process touches that particular page.
Thus, the reported memory usage of a Postgres backend will grow
over time as it randomly happens to touch different buffers within
the shared buffer arena.  That does not constitute a leak, but
it does make such tools next door to useless for detecting
actual leaks :-(.  You can only believe there's a leak if the
reported usage doesn't level off after reaching the vicinity of
your shared memory size.

So back to why you're getting these out-of-memory failures:
we still don't have much info about that.  I wonder whether
there is something watching the total usage reported by "top"
and taking action based on that entirely-misleading number.
The kernel itself should be aware that there's no extra memory
pressure from N backends all using the same shared memory
segment, but it can be hard for outside tools to know that.

At this point I suspect that PG 14.3 vs 14.4 is a red herring,
and what you should be looking into is what else you updated
at about the same time, particularly in the line of container
management tools or the like (if you use any).

            regards, tom lane



Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:
Hello,

po 18. 7. 2022 v 21:04 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Aleš Zelený��Z��Z��r�~��}���)���)i� <zeleny.ales@gmail.com> writes:
> postgres=# show shared_buffers ;
>  shared_buffers
> ----------------
>  10GB
> (1 row)

Oh!  In that case, there is absolutely nothing to see here.
This chunk:

> 00007fd9b0551000 10827040K rw-s- zero (deleted)

must be your shared-memory region, and it's accounting for just
about all of the process's memory space.  There is no reason
to suspect a leak.

I agree this 10 GB is expected to be accounted for shared buffers (just a size correlation) and therefore no memory leak here.

 
I think you've been fooled by the fact that "top" and some other
tools frequently don't report a shared-memory page as part of a
process's usage until that process touches that particular page.
Thus, the reported memory usage of a Postgres backend will grow
over time as it randomly happens to touch different buffers within
the shared buffer arena.  That does not constitute a leak, but
it does make such tools next door to useless for detecting
actual leaks :-(.  You can only believe there's a leak if the
reported usage doesn't level off after reaching the vicinity of
your shared memory size.

On the other hand, if we do have 10GB of shared buffers plus ~14MB reported by Top Memory Context (even if /proc/<pid>/status reports RssAnon memory when touched, the process VM peak is ~12 GB?
-bash-4.2$ cat 20220718_200230.status.17048.log
Name: postmaster
Umask: 0077
State: S (sleeping)
Tgid: 17048
Ngid: 0
Pid: 17048
PPid: 4476
TracerPid: 0
Uid: 26 26 26 26
Gid: 26 26 26 26
FDSize: 1024
Groups: 26
VmPeak: 12477752 kB
VmSize: 12439740 kB
VmLck:       0 kB
VmPin:       0 kB
VmHWM: 9811560 kB
VmRSS: 9811560 kB
RssAnon: 1283964 kB
RssFile:   15784 kB
RssShmem: 8511812 kB
VmData: 1282860 kB
VmStk:     132 kB
VmExe:    7816 kB
VmLib:   85312 kB
VmPTE:   23716 kB
VmSwap:       0 kB
Threads: 1
SigQ: 0/160197
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000400000
SigIgn: 0000000001701800
SigCgt: 00000001a0006287
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: ffffff
Cpus_allowed_list: 0-23
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000003
Mems_allowed_list: 0-1
voluntary_ctxt_switches: 5273344
nonvoluntary_ctxt_switches: 13651

I'd expect VMPeak size close to 10GB  (suppose that all shared memory pages were touched) shared_buffers plus ~14MB of private memory, but
VMPeak: 12477752 [kB] - 10485760 shared buffers [kB] = 1991992 kB Which is a bit more than reported RssAnon: 1283964 kB  memory for the process, but very close.

pmap taken in the same time:

-bash-4.2$ cat 20220718_200230.pmap.17048.log | grep -e anon -e deleted
0000000000db3000    200K rw---   [ anon ]
0000000002d15000    504K rw---   [ anon ]
0000000002d93000 934476K rw---   [ anon ]  <<<= this is suspected to be responsible for the OutOfMemory -> failing fork(), reinitialize the cluser
00007fd989776000 311304K rw---   [ anon ]  <<<= togehter with this one
00007fd9a8d75000  32772K rw---   [ anon ]
00007fd9acb65000     20K rw---   [ anon ]
00007fd9affc1000    372K rw---   [ anon ]
00007fd9b023d000   1028K rw---   [ anon ]
00007fd9b054b000     24K rw---   [ anon ]
00007fd9b0551000 10827040K rw-s- zero (deleted)   <<<= expected to be shared_buffers
00007fdc45f18000     16K rw---   [ anon ]
00007fdc4d50c000    184K rw---   [ anon ]
00007fdc4de16000      4K rw---   [ anon ]
00007fdc4e03c000      8K rw---   [ anon ]
00007fdc4fd1e000     84K rw---   [ anon ]
00007fdc4ff6f000      8K rw---   [ anon ]
00007fdc508d7000      8K rw---   [ anon ]
00007fdc50d59000      4K rw---   [ anon ]
00007fdc5139e000      8K rw---   [ anon ]
00007fdc520f3000     40K rw---   [ anon ]
00007fdc52902000     20K rw---   [ anon ]
00007fdc52ead000     16K rw---   [ anon ]
00007fdc532af000      4K rw---   [ anon ]
00007fdc53e2b000     16K rw---   [ anon ]
00007fdc544db000     16K rw---   [ anon ]
00007fdc54cc8000      8K rw---   [ anon ]
00007fdc54f1a000   1588K rw---   [ anon ]
00007fdc550dc000    104K rw---   [ anon ]
00007fdc550fc000      8K r-x--   [ anon ]
00007fdc550ff000      4K rw---   [ anon ]
00007fdc55102000      4K rw---   [ anon ]
00007fffd38f0000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]

 
The machine has 40GB of RAM. It is a dedicated DB server, only a single PG instance is running there. Since we again run out of memory from my last reply, now it looks  healthy:

-bash-4.2$ psql -c "select count(*) from pg_stat_activity"
 count
-------
   336
(1 row)

$-bash-4.2$ sudo smem -u
User     Count     Swap      USS      PSS      RSS
chrony       1        0      512      515      872
dbus         1        0     1064     1074     1432
postfix      2        0     2752     3122     5540
polkitd      1        0     9152     9276     9816
postgres-exporter     1        0    11912    11912    11916
zelenya      9        0    34320    35550    43648
sql_exporter     1        0    46332    46332    46340
root        28        0   267920   274273   298388
postgres   336        0  5904884 16399946 143527180

-bash-4.2$ free -h
              total        used        free      shared  buff/cache   available
Mem:            39G        7,3G        1,2G         10G         30G         21G
Swap:            0B          0B          0B


The processes suspect to be responsible for memory exhaustion (using the RssAnon memory from /proc/<pid>/status files always belong to a single user:

-bash-4.2$ psql -c "select count(*) from pg_stat_activity where usename='app_feed'"
 count
-------
    67
(1 row)

So the available 21GB (under normal conditions used by the kernel as file system page cache) is not enough if 20 of the processes allocate 1GB private memory -> OS CommitLimit is reached and OutOfMemory error happens.


So back to why you're getting these out-of-memory failures:
we still don't have much info about that.  I wonder whether
there is something watching the total usage reported by "top"
and taking action based on that entirely-misleading number.
The kernel itself should be aware that there's no extra memory
pressure from N backends all using the same shared memory
segment, but it can be hard for outside tools to know that.

There is no tool taking action when high memory usage is detected, OS kernel provides all the memory requested by processes until it happens, that a fork() fails (usually when autovacuum launcher attempts to spawn a new autovacuum worker as these are most frequently started new processes. But the failure comes from the failed fork() due to a lack of memory for new process creation.
 
2022-07-01 06:51:02 CEST [4476]: [26-1] user=,db=,host=,app= LOG:  could not fork autovacuum worker process: Cannot allocate memory
2022-07-01 06:51:03 CEST [4476]: [27-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
2022-07-01 06:51:03 CEST [4476]: [28-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
2022-07-01 06:51:03 CEST [18882]: [1332-1] user=app_feed,db=prematch,host=172.25.0.231,app=[unknown] ERROR:  could not resize shared memory segment "/PostgreSQL.1824635724" to 204064 bytes: No space left on device

So OS kernel was short of available memory.

Memory Commit limit on the system:
-bash-4.2$ sysctl -q vm.overcommit_memory
vm.overcommit_memory = 2
-bash-4.2$ sysctl -q vm.overcommit_ratio
vm.overcommit_ratio = 95

swap is disabled:
-bash-4.2$ swapon -s
-bash-4.2$

CommitLimit = RAM * overcommit_ratio / 100 + swap  ~ 37GB (ignoring total_huge_TLB as huge pages are not used) as can be seen in meminfo:

-bash-4.2$ cat /proc/meminfo
MemTotal:       41033756 kB
MemFree:         1549608 kB
MemAvailable:   24782516 kB
Buffers:            8468 kB
Cached:         33667252 kB
SwapCached:            0 kB
Active:         24600744 kB
Inactive:       12524428 kB
Active(anon):   14131536 kB
Inactive(anon):   130116 kB
Active(file):   10469208 kB
Inactive(file): 12394312 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:              1876 kB
Writeback:            16 kB
AnonPages:       3449376 kB
Mapped:         10796800 kB
Shmem:          10811896 kB
Slab:             987832 kB
SReclaimable:     813916 kB
SUnreclaim:       173916 kB
KernelStack:       13648 kB
PageTables:       823236 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    38982068 kB
Committed_AS:   19261792 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      256724 kB
VmallocChunk:   34338043900 kB
Percpu:             5632 kB
HardwareCorrupted:     0 kB
AnonHugePages:     65536 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      110464 kB
DirectMap2M:     6180864 kB
DirectMap1G:    37748736 kB

From the node-exporter metric, I can see that the component that grows over time until the commit limit is reached and the cluster is reinitialized is "Active(anon)". Diving into /proc/<pid>/status I can see multiple processes of the app_feed DB user RssAnon memory growth over time and the sum of the increments matches the "Active(anon)" growth seen in the global view in /proc/meminfo.

This observed growth together with the anonymous memory segments (on top of shared buffers mentioned above) was the reason why after the upgrade I suspect an issue in PG14.4.


At this point I suspect that PG 14.3 vs 14.4 is a red herring,
and what you should be looking into is what else you updated
at about the same time, particularly in the line of container
management tools or the like (if you use any).

As mentioned, the machine is a dedicated DB server serving a single PG instance. No containers are in place, no cgroup setup (neither memory, CPU limits), clean CentOS7 + epel-release + pg packages (node_exporter and postgres_exporter for monitoring).

The only change on that machine was an upgrade from PG14.3 to PG14.4 (using PGDG RPM repo packages) and from that moment, the memory usage and out-of-memory issues started, the first time less than 24 hours after the upgrade.

I'm far from being sure it is a Postgres memory leak, my primary intention is to provide as much data as I could to allow helping me identify this issue's root cause. Wen, I'll be back from my vacation (2 weeks) I'd like to downgrade to 14.3 (and we have to resist using index concurrently operations) to see whether such memory allocations disappear or not.

Any hints or tips on what to do to analyze it better (what data from OS we have to collect over time... (ideally avoiding the downgrade exercise) are welcome.

Kind regards Ales Zeleny

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:
Hello,

I did some testing and the result is that on pg12 there are no such problems. Pg13 and Pg 14.3 tests will follow but based on monitoring processed RssAnon memory, I've found a correlation that only processes by a single user are suffering from the memory allocation (it looks to be a memory leak to me).
For the given DB user, I've configured log_min_duration_stattement to 0 and afterward analyzed the Postgres server log with all the statements in the time window when I spot a significant memory growth for a single process in a short time (a few minutes).
This correlation points my attention to a SQL function called frequently and I've tried to do some testing with it.



CREATE OR REPLACE FUNCTION tescase (_id_data_provider integer, _external_id bigint DEFAULT NULL::bigint, _external_complete_id character varying DEFAULT NULL::character varying)
 RETURNS TABLE(id_gm bigint, id_opp bigint, id_opp_state integer)
 LANGUAGE sql
AS $function$
SELECT ... simple join of two tables...
     WHERE opd.id_data_provider = _id_data_provider
       AND CASE WHEN _external_id IS NULL
                THEN external_id IS NULL
                ELSE external_id = _external_id
           END
       AND CASE WHEN _external_complete_id IS NULL
                THEN _external_complete_id IS NULL
                ELSE external_complete_id = _external_complete_id
           END;
$function$
;

It is a kind of creative construct for me, but it works. The key here is that if I replace at least one of the "CASEd" where conditions, it seems not to suffer from the memory leak issue.

Finally, I've found, that even having the function as is and before the test disabling JIT (SET jit = off;) and calling the function 100k times, RssAnon memory for the given process is stable and only 3612 kB, while when JIT is enabled (the default setting on the server suffering from the memory leak, RssAnon memory for the given process growth in a linear manner over time (canceled when it reached 5GB).
Tested on real application data I could not share, but if I got a chance to give try to the preparation of a synthetic test case, then I thought it'll be time to submit it as a bug.

In the smaps for the given process, it is possible to identify the address of a heap allocated memory wich constantly grows every execution (while other mapped heap segment's size is static ), but I have no clue whether it might help, it looks like:

while :; do date; sed -n '/^02d93000.*/,/^VmFlags/p' /proc/31600/smaps; sleep 2; done

Út srp  2 17:10:18 CEST 2022
02d93000-20fe1000 rw-p 00000000 00:00 0                                  [heap]
Size:             493880 kB
Rss:              492784 kB
Pss:              492784 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:    492784 kB
Referenced:       492560 kB
Anonymous:        492784 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mp me ac sd
Út srp  2 17:10:20 CEST 2022
02d93000-21003000 rw-p 00000000 00:00 0                                  [heap]
Size:             494016 kB
Rss:              493660 kB
Pss:              493660 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:    493660 kB
Referenced:       493436 kB
Anonymous:        493660 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mp me ac sd

Thanks for any hints or comments.
Ales

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Tom Lane
Date:
=?UTF-8?B?QWxlxaEgWmVsZW7DvQ==?= <zeleny.ales@gmail.com> writes:
> CREATE OR REPLACE FUNCTION tescase (_id_data_provider integer, _external_id
> bigint DEFAULT NULL::bigint, _external_complete_id character varying
> DEFAULT NULL::character varying)
>  RETURNS TABLE(id_gm bigint, id_opp bigint, id_opp_state integer)
>  LANGUAGE sql
> AS $function$
> SELECT ... simple join of two tables...
>      WHERE opd.id_data_provider = _id_data_provider
>        AND CASE WHEN _external_id IS NULL
>                 THEN external_id IS NULL
>                 ELSE external_id = _external_id
>            END
>        AND CASE WHEN _external_complete_id IS NULL
>                 THEN _external_complete_id IS NULL
>                 ELSE external_complete_id = _external_complete_id
>            END;
> $function$
> ;

> It is a kind of creative construct for me, but it works. The key here is
> that if I replace at least one of the "CASEd" where conditions, it seems
> not to suffer from the memory leak issue.

> Finally, I've found, that even having the function as is and before the
> test disabling JIT (SET jit = off;) and calling the function 100k times,
> RssAnon memory for the given process is stable and only 3612 kB, while when
> JIT is enabled (the default setting on the server suffering from the memory
> leak, RssAnon memory for the given process growth in a linear manner over
> time (canceled when it reached 5GB).

Ah.  I bet this is another instance of the known memory leakage problems
with JIT inlining [1].  Per Andres' comments in that thread, it seems
hard to solve properly.  For now all I can recommend is to disable that.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/20201001021609.GC8476%40telsasoft.com



Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:
Hello,

thanks for the information and the link!

Ales

čt 4. 8. 2022 v 1:05 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Aleš Zelený <zeleny.ales@gmail.com> writes:
> CREATE OR REPLACE FUNCTION tescase (_id_data_provider integer, _external_id
> bigint DEFAULT NULL::bigint, _external_complete_id character varying
> DEFAULT NULL::character varying)
>  RETURNS TABLE(id_gm bigint, id_opp bigint, id_opp_state integer)
>  LANGUAGE sql
> AS $function$
> SELECT ... simple join of two tables...
>      WHERE opd.id_data_provider = _id_data_provider
>        AND CASE WHEN _external_id IS NULL
>                 THEN external_id IS NULL
>                 ELSE external_id = _external_id
>            END
>        AND CASE WHEN _external_complete_id IS NULL
>                 THEN _external_complete_id IS NULL
>                 ELSE external_complete_id = _external_complete_id
>            END;
> $function$
> ;

> It is a kind of creative construct for me, but it works. The key here is
> that if I replace at least one of the "CASEd" where conditions, it seems
> not to suffer from the memory leak issue.

> Finally, I've found, that even having the function as is and before the
> test disabling JIT (SET jit = off;) and calling the function 100k times,
> RssAnon memory for the given process is stable and only 3612 kB, while when
> JIT is enabled (the default setting on the server suffering from the memory
> leak, RssAnon memory for the given process growth in a linear manner over
> time (canceled when it reached 5GB).

Ah.  I bet this is another instance of the known memory leakage problems
with JIT inlining [1].  Per Andres' comments in that thread, it seems
hard to solve properly.  For now all I can recommend is to disable that.

                        regards, tom lane

[1] https://www.postgresql.org/message-id/flat/20201001021609.GC8476%40telsasoft.com

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Joe Conway
Date:
On 8/4/22 08:34, Aleš Zelený wrote:
>> SELECT ... simple join of two tables...
>>      WHERE opd.id_data_provider = _id_data_provider
>>        AND CASE WHEN _external_id IS NULL
>>                 THEN external_id IS NULL
>>                 ELSE external_id = _external_id
>>            END
>>        AND CASE WHEN _external_complete_id IS NULL
>>                 THEN _external_complete_id IS NULL

Unrelated to your question, but shouldn't that actually read:

        AND CASE WHEN _external_complete_id IS NULL
                 THEN external_complete_id IS NULL
                     ^^^

>>                 ELSE external_complete_id = _external_complete_id
>>            END;
>> $function$

-- 
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:
Hello,

the construct surprised me when I saw it in the function the first time, but it is correct and works as expected - it allows writing the function as SQL instead of PLPGSQL while it ensures that for a parameter null value it evaluates to true instead of filtering the resultset.
What is the issue/better solution proposal?

Kind regards Ales Zeleny


čt 4. 8. 2022 v 23:15 odesílatel Joe Conway <mail@joeconway.com> napsal:
On 8/4/22 08:34, Aleš Zelený wrote:
>> SELECT ... simple join of two tables...
>>      WHERE opd.id_data_provider = _id_data_provider
>>        AND CASE WHEN _external_id IS NULL
>>                 THEN external_id IS NULL
>>                 ELSE external_id = _external_id
>>            END
>>        AND CASE WHEN _external_complete_id IS NULL
>>                 THEN _external_complete_id IS NULL

Unrelated to your question, but shouldn't that actually read:

        AND CASE WHEN _external_complete_id IS NULL
                 THEN external_complete_id IS NULL
                     ^^^

>>                 ELSE external_complete_id = _external_complete_id
>>            END;
>> $function$

--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
"Peter J. Holzer"
Date:
On 2022-08-05 17:47:02 +0200, Aleš Zelený wrote:
> the construct surprised me when I saw it in the function the first time, but it
> is correct and works as expected - it allows writing the function as SQL
> instead of PLPGSQL while it ensures that for a parameter null value it
> evaluates to true instead of filtering the resultset.
> What is the issue

The problem Joe spotted is that a variable name is misspelled in one
place. It should (probably) be external_complete_id, not
_external_complete_id).

> better solution proposal?

I think you can make that clearer by using IS [NOT] DISTINCT FROM:

    SELECT ... simple join of two tables...
    WHERE opd.id_data_provider = _id_data_provider
        AND external_id IS NOT DISTINCT FROM _external_id
        AND external_complete_id IS NOT DISTINCT FROM _external_complete_id
    ;

However, your version may be faster, as last time I looked (it's been
some time) the optimizer wasn't especially good at handlung DISTINCT
FROM (probably because it's so rarely used).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: PostgreSQL 14.4 ERROR: out of memory issues

From
Aleš Zelený
Date:
Hello,

The problem Joe spotted is that a variable name is misspelled in one
place. It should (probably) be external_complete_id, not
_external_complete_id). 

Oh, copy-paste issue :-)
 
> better solution proposal?

I think you can make that clearer by using IS [NOT] DISTINCT FROM:

    SELECT ... simple join of two tables...
    WHERE opd.id_data_provider = _id_data_provider
        AND external_id IS NOT DISTINCT FROM _external_id
        AND external_complete_id IS NOT DISTINCT FROM _external_complete_id
    ;

However, your version may be faster, as last time I looked (it's been
some time) the optimizer wasn't especially good at handlung DISTINCT
FROM (probably because it's so rarely used).g

Distinct from was the original version, but it ignores indexes, 130ms  vs 0.3 ms, you are absolutely correct.

Kind regards
Ales Zeleny