Re: PostgreSQL 14.4 ERROR: out of memory issues - Mailing list pgsql-general

From Aleš Zelený
Subject Re: PostgreSQL 14.4 ERROR: out of memory issues
Date
Msg-id CAODqTUbChP0c=Jnx7+GRqkSK3OeBW+C47iyzEf8zAvDYJ8_RQQ@mail.gmail.com
Whole thread Raw
In response to Re: PostgreSQL 14.4 ERROR: out of memory issues  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-general
Hello,

po 11. 7. 2022 v 14:47 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

It's looks like memory leak

As we are collecting some data for Postgres processes, I've found processes using Rss Anonymous memory allocations:

Some examples:
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31522 RssAnon:  742600 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31522 RssAnon:  742600 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31522 RssAnon:  742852 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31522 RssAnon:  743112 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31522 RssAnon:  743112 kB
2022-07-08T14:50:24+0200 Name: postmaster Pid: 31647 RssAnon:  684632 kB
2022-07-08T14:51:13+0200 Name: postmaster Pid: 31647 RssAnon:  684632 kB
2022-07-08T14:52:05+0200 Name: postmaster Pid: 31647 RssAnon:  684676 kB
2022-07-08T14:52:58+0200 Name: postmaster Pid: 31647 RssAnon:  684700 kB
2022-07-08T14:53:50+0200 Name: postmaster Pid: 31647 RssAnon:  684824 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 7866 RssAnon: 1180960 kB

other processes uses less memory:
2022-07-08T14:52:08+0200 Name: postmaster Pid: 3869 RssAnon:    3256 kB
2022-07-08T14:53:02+0200 Name: postmaster Pid: 3869 RssAnon:    3256 kB
2022-07-08T14:53:54+0200 Name: postmaster Pid: 3869 RssAnon:    3256 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 4217 RssAnon:    2532 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 4217 RssAnon:    2532 kB



And Shared memory:
2022-07-08T14:59:12+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:20+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:28+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:37+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:45+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:59:53+0200 Name: postmaster Pid: 5719 RssShmem:  908264 kB
2022-07-08T14:50:27+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:51:16+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:52:08+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:01+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB
2022-07-08T14:53:53+0200 Name: postmaster Pid: 5721 RssShmem: 1531656 kB

while other processes uses failry less of shared mem:

2022-07-08T14:55:25+0200 Name: postmaster Pid: 8521 RssShmem:    1988 kB
2022-07-08T14:55:33+0200 Name: postmaster Pid: 8521 RssShmem:    1988 kB
2022-07-08T14:55:40+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:55:49+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:55:57+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:56:06+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB
2022-07-08T14:56:15+0200 Name: postmaster Pid: 8521 RssShmem:    2104 kB

Wich processes are better candidates for investigation, the ones using more Anonymous memory or the ones using larger shared memory?
I can do the (tedious) work to get a list of statements that a given PID executed (at least the statements that appear in the minute snapshots).


  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); 1077400 used
2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of memory
2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed on request of size 152094068 in memory context "TopTransactionContext".
2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:  automatic vacuum of table "prematch.replication.tab_queue_tmp"

2022-07-02 14:48:47 CEST [4476]: [42-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
2022-07-02 14:48:47 CEST [4476]: [43-1] user=,db=,host=,app= LOG:  could not fork worker process: Cannot allocate memory
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
2022-07-02 14:48:47 CEST [4476]: [44-1] user=,db=,host=,app= LOG:  background worker "parallel worker" (PID 4303) was terminated by signal 6: Aborted


But what you see is probably just side effect. Do you use some extension in C++? This is C++ message. This process kills Postgres.

These extensions are installed (and no change from 14.3):

                                            List of installed extensions
        Name        | Version |   Schema   |                              Description                              
--------------------+---------+------------+------------------------------------------------------------------------
 amcheck            | 1.3     | public     | functions for verifying relation integrity
 dblink             | 1.2     | public     | connect to other PostgreSQL databases from within a database
 file_fdw           | 1.0     | public     | foreign-data wrapper for flat file access
 hstore             | 1.8     | public     | data type for storing sets of (key, value) pairs
 hypopg             | 1.3.1   | public     | Hypothetical indexes for PostgreSQL
 pageinspect        | 1.9     | public     | inspect the contents of database pages at a low level
 pg_buffercache     | 1.3     | public     | examine the shared buffer cache
 pg_stat_kcache     | 2.2.0   | public     | Kernel statistics gathering
 pg_stat_statements | 1.9     | public     | track planning and execution statistics of all SQL statements executed
 pgcrypto           | 1.3     | public     | cryptographic functions
 pgstattuple        | 1.5     | public     | show tuple-level statistics
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
 plpgsql_check      | 2.1     | public     | extended check for plpgsql functions
 postgres_fdw       | 1.1     | public     | foreign-data wrapper for remote PostgreSQL servers
 tablefunc          | 1.0     | public     | functions that manipulate whole tables, including crosstab
(15 rows)


 

It can be related to some operation over prematch.replication.tab_queue_tmp because in both cases an atovacuum was started.

On 14.3 (and also older major versions), there were no such issues. This table is heavily updated, so the autovacuum is frequent here.
 


132    hba parser context: 25600 total in 6 blocks; 4464 free (8 chunks); 21136 used
133  ErrorContext: 8192 total in 1 blocks; 7928 free (5 chunks); 264 used
134Grand total: 1456224 bytes in 195 blocks; 378824 free (165 chunks); 1077400 used
1352022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app= ERROR:  out of memory
1362022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app= DETAIL:  Failed on request of size 152094068 in memory context "TopTransactionContext".
1372022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:  automatic vacuum of table "prematch.replication.tab_queue_tmp"

I am not sure, but it looks like corrupted table  prematch.replication.tab_queue_tmp

$ /usr/pgsql-14/bin/pg_amcheck -v -d prematch -t replication.tab_queue_tmp --heapallindexed
pg_amcheck: including database "prematch"
pg_amcheck: in database "prematch": using amcheck version "1.3" in schema "public"
pg_amcheck: checking heap table "prematch.replication.tab_queue_tmp"
pg_amcheck: checking btree index "prematch.replication.tab_queue_tmp_pkey"
$ echo $?
0
 
Something else I can check?


Thanks Ales



And a few days later again:
2022-07-08 14:54:22 CEST [4791]: [81186-1] user=app_evaluation,db=prematch,host=172.25.0.80,app=Evaluation_STS_UAT DETAIL: parameters: $1 = '1', $2 = '1748010445', $3 = '0', $4 = '1000' terminate called after throwing an instance of 'std::bad_alloc' terminate called after throwing an instance of 'std::bad_alloc' what(): what(): std::bad_allocstd::bad_alloc 2022-07-08 14:54:23 CEST [4476]: [49-1] user=,db=,host=,app= LOG: background worker "parallel worker" (PID 25251) was terminated by signal 6: Aborted
2022-07-08 14:54:23 CEST [4476]: [51-1] user=,db=,host=,app= LOG:  terminating any other active server processes


Thanks Ales Zeleny

pgsql-general by date:

Previous
From: Rob Sargent
Date:
Subject: Re: Get the table creation DDL
Next
From: Jean Baro
Date:
Subject: Is there a official benchmark comparing PG versions?