Thread: huge memory of Postgresql backend process

huge memory of Postgresql backend process

From

"James Pang (chaolpan)"

Date:

08 September 2022, 06:10:28

Hi,

We just switch from Oracle to PGV13, all of these tables got partition (256 HASH partition) , when UPDATE/DELETE … WHERE . The memory increase verify quickly until >130M. not too much data in these tables. When we switch to PGV14.5 , test again, still see similar memory consumption issue. For example , one Postgresql JDBC client ,

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ nMaj nMin WCHAN COMMAND

551701 postgres 20 0 45.6g 131468 11664 S 0.0 0.1 0:03.11 0 35k do_epoll_w+ postgres:

When we rebuild the table to no-partitioned tables , same table columns and same data size, the “RES” memory only 25MB. For both partitioned and non-partitioned table, the UPDATE/DELETE run fast. The only difference is big difference on memory consumption. And even the session is IDLE there long time, the backend process did NOT release memory to OS. Is it expected to see that ?

Server memory 128G, shared_buffer 45G, effective_cache_size=80G. PG server both V13.4 and V14.5 has same issue. Postgresql JDBC client driver 42.3.4

Thanks,

James

Re: huge memory of Postgresql backend process

From

Tom Lane

Date:

08 September 2022, 13:47:57

"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
>   We just switch from Oracle to PGV13, all of these tables got partition (256 HASH partition) ,  when UPDATE/DELETE
...WHERE . The memory increase verify quickly until >130M. not too much data in these tables.  When we switch to
PGV14.5, test again, still see similar memory consumption issue. 

I see no particular reason to think this is a bug.  If you have a
lot of partitions, there is going to be a lot of metadata for them.
Moreover, HASH partitioning is effectively incapable of being
pruned, so that every query is going to touch every partition.
(IMO, hash partitioning is basically never a good idea.)

            regards, tom lane

RE: huge memory of Postgresql backend process

From

"James Pang (chaolpan)"

Date:

08 September 2022, 14:08:19

When rebuild from partitioned to non-partitioned tables with same data size, same test load only use 24-25MB/per
backendserver. From auto_explain that show both DELETE and UPDATE use partition pruning , so the sql running very fast.
Butonly see process memory increased quickly , with hundreds of connections to do similar "update/delete" on multiple
partitiontables  
test,  server physical memory got used up and  a lot of "out of memory" error dumped to pglog, but from dumped  memory
contextstats , we only saw tens of  MB memory used, instead of 160mb.  
   Looks like Postgresql backend server  try to allocate one large memory suddenly based on some estimation rule when
update/deletefrom partition tables, and actually not used so much memory. we found that never free back to Operating
systemafter the backend process idle long time.   From OS pmap  command,  almost of memory  " xxxxx  rw---   [ anon ]".
 Maybe it's an overestimated memory allocation from OS than it's real usage ?  

Thanks,

James

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Thursday, September 8, 2022 9:48 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: huge memory of Postgresql backend process

"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
>   We just switch from Oracle to PGV13, all of these tables got partition (256 HASH partition) ,  when UPDATE/DELETE
...WHERE . The memory increase verify quickly until >130M. not too much data in these tables.  When we switch to
PGV14.5, test again, still see similar memory consumption issue. 

I see no particular reason to think this is a bug.  If you have a lot of partitions, there is going to be a lot of
metadatafor them. 
Moreover, HASH partitioning is effectively incapable of being pruned, so that every query is going to touch every
partition.
(IMO, hash partitioning is basically never a good idea.)

            regards, tom lane

Re: huge memory of Postgresql backend process

From

David Rowley

Date:

08 September 2022, 21:35:04

On Fri, 9 Sept 2022 at 02:08, James Pang (chaolpan) <chaolpan@cisco.com> wrote:
> Looks like Postgresql backend server try to allocate one large memory suddenly based on some estimation rule when
update/deletefrom partition tables, and actually not used so much memory. we found that never free back to Operating
systemafter the backend process idle long time. From OS pmap command, almost of memory " xxxxx rw--- [ anon ]".
Maybe it's an overestimated memory allocation from OS than it's real usage ?

I see you've been asking this question in various places around the
mailing lists.

It seems what you're complaining about is mentioned in our documents
under [1]. Namely:

"Another reason to be concerned about having a large number of
partitions is that the server's memory consumption may grow
significantly over time, especially if many sessions touch large
numbers of partitions. That's because each partition requires its
metadata to be loaded into the local memory of each session that
touches it."

What that section of the documents does not clearly mention is that
this "metadata" is kept for the life of the backend process. This is
probably implied by the "significantly over time", but maybe we could
have been more clear that we don't free the relcache entries.

James, if you're having OOM problems due to this then the solution is
to reduce the number of partitions in the partitioned table, or change
things so that you don't query as many partitions from as many
backends. The number of relcache entries stored for partitions will be
number_of_partitions_you_query_in_the_backend *
the_number_of_backends. So you'll either want to review the number of
partitions or the number of backends (aka max_connections)

David

[1] https://www.postgresql.org/docs/13/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES

Re: huge memory of Postgresql backend process

From

David Rowley

Date:

08 September 2022, 21:42:19

On Fri, 9 Sept 2022 at 01:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Moreover, HASH partitioning is effectively incapable of being
> pruned, so that every query is going to touch every partition.
> (IMO, hash partitioning is basically never a good idea.)

I think that might have only briefly been true during the v11 cycle,
but by the time the release came we had hash partition and partition
pruning.

David

Re: huge memory of Postgresql backend process

From

Amit Langote

Date:

09 September 2022, 06:30:32

On Fri, Sep 9, 2022 at 6:42 AM David Rowley <dgrowleyml@gmail.com> wrote:
> On Fri, 9 Sept 2022 at 01:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Moreover, HASH partitioning is effectively incapable of being
> > pruned, so that every query is going to touch every partition.
> > (IMO, hash partitioning is basically never a good idea.)
>
> I think that might have only briefly been true during the v11 cycle,
> but by the time the release came we had hash partition and partition
> pruning.

That is correct.

create table hp (a int, b text) partition by hash (a, b);
create table hp0 partition of hp for values with (modulus 2, remainder 0);
create table hp1 partition of hp for values with (modulus 2, remainder 1);
insert into hp values (1, 'abc');
select tableoid::regclass, * from hp;
 tableoid | a |  b
----------+---+-----
 hp0      | 1 | abc
(1 row)
explain select * from hp where a = 1 and b = 'abc';
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on hp0 hp  (cost=0.00..29.05 rows=1 width=36)
   Filter: ((a = 1) AND (b = 'abc'::text))
(2 rows)

Maybe, James is thinking that the reason for high memory usage is the
same when using PG v13 as it is when using v14.  v13 can't handle
UPDATE/DELETEs of partitioned tables as well as v14, though only for
queries where partition pruning isn't being used.  It's true though
that the backend-lifetime caching of partition metadata, especially
when there are too many backends doing it, can add up over many
backends and many partitions accessed in each.   So your advice of
lowering the number of backends or the number of partitions will help.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

RE: huge memory of Postgresql backend process

From

"James Pang (chaolpan)"

Date:

09 September 2022, 08:19:33

Thanks for you suggestion, yes, we reduced partition count from 256 to 128, then to 64. The memory got reduced from
170Mto 91MB then to 60M.  
 
  It's from Postgresql JDBC client, preparedStatementCacheQueries=(default 256), reduce that from JDBC parameter can
helpreduce  "cached plan context" ? 
 

Thanks,

James  

-----Original Message-----
From: David Rowley <dgrowleyml@gmail.com> 
Sent: Friday, September 9, 2022 5:35 AM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: huge memory of Postgresql backend process

On Fri, 9 Sept 2022 at 02:08, James Pang (chaolpan) <chaolpan@cisco.com> wrote:
>    Looks like Postgresql backend server  try to allocate one large memory suddenly based on some estimation rule when
update/deletefrom partition tables, and actually not used so much memory. we found that never free back to Operating
systemafter the backend process idle long time.   From OS pmap  command,  almost of memory  " xxxxx  rw---   [ anon ]".
 Maybe it's an overestimated memory allocation from OS than it's real usage ?
 

I see you've been asking this question in various places around the mailing lists.

It seems what you're complaining about is mentioned in our documents under [1]. Namely:

"Another reason to be concerned about having a large number of partitions is that the server's memory consumption may
growsignificantly over time, especially if many sessions touch large numbers of partitions. That's because each
partitionrequires its metadata to be loaded into the local memory of each session that touches it."
 

What that section of the documents does not clearly mention is that this "metadata" is kept for the life of the backend
process.This is probably implied by the "significantly over time", but maybe we could have been more clear that we
don'tfree the relcache entries.
 

James, if you're having OOM problems due to this then the solution is to reduce the number of partitions in the
partitionedtable, or change things so that you don't query as many partitions from as many backends. The number of
relcacheentries stored for partitions will be number_of_partitions_you_query_in_the_backend * the_number_of_backends.
Soyou'll either want to review the number of partitions or the number of backends (aka max_connections)
 

David

[1] https://www.postgresql.org/docs/13/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES

Re: huge memory of Postgresql backend process

From

Amit Langote

Date:

09 September 2022, 08:41:51

Hi,

On Fri, Sep 9, 2022 at 5:19 PM James Pang (chaolpan) <chaolpan@cisco.com> wrote:
>  Thanks for you suggestion, yes, we reduced partition count from 256 to 128, then to 64. The memory got reduced from
170Mto 91MB then to 60M.

>   It's from Postgresql JDBC client, preparedStatementCacheQueries=(default 256), reduce that from JDBC parameter can
helpreduce  "cached plan context" ?

Interesting that you mention "cached plan context".

Cached plans for queries over partitioned tables tend to be big (in
terms of memory they take up in the plan cache) because the planner
can't use partition pruning.  One workaround might be to forcefully
prevent plan caching by setting plan_cache_mode to
'force_custom_plan',  which makes a backend rebuild the plan on every
execution of a prepared statement.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

RE: huge memory of Postgresql backend process

From

"James Pang (chaolpan)"

Date:

10 September 2022, 10:50:13

Hi,
  We plan to convert one large partition table to multiple non-partitioned tables, for example , one 256 HASH partition
tableto 256 small tables. Then application will query small tables directly to get quick sql response time.  Does that
helpreduce backend process memory for "metadata", I can expect to reduce partition related cache of backend process,
butthe  table/index count is same. 

Thanks,

James 

-----Original Message-----
From: Amit Langote <amitlangote09@gmail.com> 
Sent: Friday, September 9, 2022 4:42 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: David Rowley <dgrowleyml@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; PostgreSQL mailing lists
<pgsql-bugs@lists.postgresql.org>
Subject: Re: huge memory of Postgresql backend process

Hi,

On Fri, Sep 9, 2022 at 5:19 PM James Pang (chaolpan) <chaolpan@cisco.com> wrote:
>  Thanks for you suggestion, yes, we reduced partition count from 256 to 128, then to 64. The memory got reduced from
170Mto 91MB then to 60M.

>   It's from Postgresql JDBC client, preparedStatementCacheQueries=(default 256), reduce that from JDBC parameter can
helpreduce  "cached plan context" ?

Interesting that you mention "cached plan context".

Cached plans for queries over partitioned tables tend to be big (in terms of memory they take up in the plan cache)
becausethe planner can't use partition pruning.  One workaround might be to forcefully prevent plan caching by setting
plan_cache_modeto 'force_custom_plan',  which makes a backend rebuild the plan on every execution of a prepared
statement.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

RE: huge memory of Postgresql backend process

From

"James Pang (chaolpan)"

Date:

10 September 2022, 10:52:58

Hi,
  With 256 hash partition tables, we got huge backend memory increase with JDBC driver client prepared statements, even
whenreduce partition count from 256 to 64,  27 partition tables insert/update/delete sql still consumes 60mb.   We plan
toconvert one large partition table to multiple non-partitioned tables, for example , one 256 HASH partition table to
256small tables. Then application will query small tables directly to get quick sql response time.  Does that help
reducebackend process memory for "metadata", I can expect to reduce partition related cache of backend process, but the
table/index count is same. 

Thanks,

James 

-----Original Message-----
From: Amit Langote <amitlangote09@gmail.com> 
Sent: Friday, September 9, 2022 4:42 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: David Rowley <dgrowleyml@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; PostgreSQL mailing lists
<pgsql-bugs@lists.postgresql.org>
Subject: Re: huge memory of Postgresql backend process

Hi,

On Fri, Sep 9, 2022 at 5:19 PM James Pang (chaolpan) <chaolpan@cisco.com> wrote:
>  Thanks for you suggestion, yes, we reduced partition count from 256 to 128, then to 64. The memory got reduced from
170Mto 91MB then to 60M.

>   It's from Postgresql JDBC client, preparedStatementCacheQueries=(default 256), reduce that from JDBC parameter can
helpreduce  "cached plan context" ?

Interesting that you mention "cached plan context".

Cached plans for queries over partitioned tables tend to be big (in terms of memory they take up in the plan cache)
becausethe planner can't use partition pruning.  One workaround might be to forcefully prevent plan caching by setting
plan_cache_modeto 'force_custom_plan',  which makes a backend rebuild the plan on every execution of a prepared
statement.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

RE: huge memory of Postgresql backend process

From

"James Pang (chaolpan)"

Date:

13 September 2022, 08:50:04

Hi,
   Based on this test in our environment, we only use one prepare "PREPARE
updmetadata(varchar,varchar,varchar,varchar,bigint,varchar)AS  UPDATE xxxxx          
SET xxxx = $1,xxxxx = $2,KEYURL = $3, xxxx = $4
WHERE xxxx = $5 AND xxxxx = $6          ($6 and $5 is primary key)  , it run fast.

Start a new connection :   RES=12894k
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1837663 postgres  20   0   45.5g  16000  11936 S   0.0   0.0   0:00.02 postgres: xxxxxxxxxxx(50048) idle

Create a server prepared statement for this sql ,...run several times , we start to see server side cached statement
andmemory increased quickly to 72800k 
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1837663 postgres  20   0   45.6g  72800  12000 S   0.0   0.1   0:00.17 postgres: xxxxxxxxx(50048) idle

--after that ,more execute prepare statement ,no more changes ...

--adding a new prepare sql statement will increase about  several MB bytes.

   Server prepared sql statement memory allocation , depends on partition count , for 256 partition count, it initially
askingabout 60MB memory one time, then several MB memory for following sql statements depends on the SQL statement.
Isthis kind of memory allocation is expected ?  or maybe fine tuned memory allocation for "large connections user case"
toavoid out of memory issue?  
  Another interesting point is , when use psql .. -h localhost ,  the memory allocation is much less (only several MB)
withabove test , since local running ok with same SQL and same table,  for remote connection, need much more memory
instead. 

Thanks,

James

-----Original Message-----
From: James Pang (chaolpan)
Sent: Thursday, September 8, 2022 10:08 PM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: RE: huge memory of Postgresql backend process

   When rebuild from partitioned to non-partitioned tables with same data size, same test load only use 24-25MB/per
backendserver. From auto_explain that show both DELETE and UPDATE use partition pruning , so the sql running very fast.
Butonly see process memory increased quickly , with hundreds of connections to do similar "update/delete" on multiple
partitiontables test,  server physical memory got used up and  a lot of "out of memory" error dumped to pglog, but from
dumped memory context stats , we only saw tens of  MB memory used, instead of 160mb.  
   Looks like Postgresql backend server  try to allocate one large memory suddenly based on some estimation rule when
update/deletefrom partition tables, and actually not used so much memory. we found that never free back to Operating
systemafter the backend process idle long time.   From OS pmap  command,  almost of memory  " xxxxx  rw---   [ anon ]".
 Maybe it's an overestimated memory allocation from OS than it's real usage ?  

Thanks,

James

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Thursday, September 8, 2022 9:48 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: huge memory of Postgresql backend process

"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
>   We just switch from Oracle to PGV13, all of these tables got partition (256 HASH partition) ,  when UPDATE/DELETE
...WHERE . The memory increase verify quickly until >130M. not too much data in these tables.  When we switch to
PGV14.5, test again, still see similar memory consumption issue. 

I see no particular reason to think this is a bug.  If you have a lot of partitions, there is going to be a lot of
metadatafor them. 
Moreover, HASH partitioning is effectively incapable of being pruned, so that every query is going to touch every
partition.
(IMO, hash partitioning is basically never a good idea.)

            regards, tom lane

Re: huge memory of Postgresql backend process

From

David Rowley

Date:

15 September 2022, 21:29:07

On Sat, 10 Sept 2022 at 22:50, James Pang (chaolpan) <chaolpan@cisco.com> wrote:
>   We plan to convert one large partition table to multiple non-partitioned tables, for example , one 256 HASH
partitiontable to 256 small tables. Then application will query small tables directly to get quick sql response time.
Doesthat help reduce backend process memory for "metadata", I can expect to reduce partition related cache of backend
process,but the  table/index count is same. 

Not really. The metadata (aka relcache) we're talking about is per
relation and it is loaded into the backend when a relation is first
accessed in a backend.  Both tables and partitions are relations.  A
partition may use slightly more memory in the relcache for storing the
partition constraint.  However, that's probably quite insignificant
compared to the other data stored in relcache.  The key here is likely
how many relations are being accessed from a given backend.  HASH
partitioning does tend to lend itself to many partitions being
accessed in a short space of time.  That's quite different from say,
having a RANGE partitioned table on time-series data with one
partition per month. Your workload might only access 1 partition per
month, in that case.

You've not really explained your use case, so it's hard to know if
HASH partitioning is best suited for your schema or not.

David

Re: huge memory of Postgresql backend process

From

David Rowley

Date:

15 September 2022, 21:38:25

On Sat, 10 Sept 2022 at 22:53, James Pang (chaolpan) <chaolpan@cisco.com> wrote:
> With 256 hash partition tables, we got huge backend memory increase with JDBC driver client prepared statements,
evenwhen reduce partition count from 256 to 64, 27 partition tables insert/update/delete sql still consumes 60mb. We
planto convert one large partition table to multiple non-partitioned tables, for example , one 256 HASH partition table
to256 small tables. Then application will query small tables directly to get quick sql response time. Does that help
reducebackend process memory for "metadata", I can expect to reduce partition related cache of backend process, but the
table/index count is same.

There are two things to consider here. The "metadata", aka relcache is
something that's stored for every table or partition you access from a
backend. When you're using prepared statements, you're also storing
query plans for pre-prepared queries in the backend memory too.
Plans, especially UPDATE/DELETE plans to partitioned tables tend to be
larger than plans for non-partitioned tables, so the plans to
partitioned tables will consume more memory. If you start to query
non-partitioned tables then these plans are likely to get smaller,
therefore your backend is likely to consume less memory.

You may also want to look at [1]. In particular:

"Server-prepared statements consume memory both on the client and the
server, so pgJDBC limits the number of server-prepared statements per
connection. It can be configured via preparedStatementCacheQueries
(default 256 , the number of queries known to pgJDBC), and
preparedStatementCacheSizeMiB (default 5 , that is the client side
cache size in megabytes per connection). Only a subset of statement
cache is server-prepared as some of the statements might fail to reach
prepareThreshold."

It's likely if you're querying individual partitions then you'll hit
that 256 query limit more quickly since you'll have more unique
queries that you're running against the backend. You might just want
to consider passing in some value less than 256 and still query
partitioned tables instead.

David

[1] https://jdbc.postgresql.org/documentation/server-prepare/#server-prepared-statements

Re: huge memory of Postgresql backend process

From

David Rowley

Date:

15 September 2022, 21:45:02

On Tue, 13 Sept 2022 at 20:50, James Pang (chaolpan) <chaolpan@cisco.com> wrote:
>    Server prepared sql statement memory allocation , depends on partition count , for 256 partition count, it
initiallyasking about 60MB memory one time, then several MB memory for following sql statements depends on the SQL
statement.  Is this kind of memory allocation is expected ?  or maybe fine tuned memory allocation for "large
connectionsuser case" to avoid out of memory issue? 
>   Another interesting point is , when use psql .. -h localhost ,  the memory allocation is much less (only several
MB)with above test , since local running ok with same SQL and same table,  for remote connection, need much more memory
instead.

It's important to understand that JDBC is probably using PREPAREd
statements. If you're just using psql to execute the queries directly,
then no query plan is stored in the backend memory after the execution
of the query has completed. With PREPARE statements, the plan will be
stored after the first EXECUTE and only released when you DEALLOCATE
the cached plan or close the connection.  In PostgreSQL 14 you can use
"select * from pg_get_backend_memory_contexts();" to get a better
understanding of the memory allocations within the backend you're
querying from. Look out for rows with name set to CachedPlanSource and
CachedPlanQuery.

David