Thread: huge memory of Postgresql backend process
Hi,
We just switch from Oracle to PGV13, all of these tables got partition (256 HASH partition) , when UPDATE/DELETE … WHERE . The memory increase verify quickly until >130M. not too much data in these tables. When we switch to PGV14.5 , test again, still see similar memory consumption issue. For example , one Postgresql JDBC client ,
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ nMaj nMin WCHAN COMMAND
551701 postgres 20 0 45.6g 131468 11664 S 0.0 0.1 0:03.11 0 35k do_epoll_w+ postgres:
When we rebuild the table to no-partitioned tables , same table columns and same data size, the “RES” memory only 25MB. For both partitioned and non-partitioned table, the UPDATE/DELETE run fast. The only difference is big difference on memory consumption. And even the session is IDLE there long time, the backend process did NOT release memory to OS. Is it expected to see that ?
Server memory 128G, shared_buffer 45G, effective_cache_size=80G. PG server both V13.4 and V14.5 has same issue. Postgresql JDBC client driver 42.3.4
Thanks,
James
"James Pang (chaolpan)" <chaolpan@cisco.com> writes: > We just switch from Oracle to PGV13, all of these tables got partition (256 HASH partition) , when UPDATE/DELETE ...WHERE . The memory increase verify quickly until >130M. not too much data in these tables. When we switch to PGV14.5, test again, still see similar memory consumption issue. I see no particular reason to think this is a bug. If you have a lot of partitions, there is going to be a lot of metadata for them. Moreover, HASH partitioning is effectively incapable of being pruned, so that every query is going to touch every partition. (IMO, hash partitioning is basically never a good idea.) regards, tom lane
When rebuild from partitioned to non-partitioned tables with same data size, same test load only use 24-25MB/per backendserver. From auto_explain that show both DELETE and UPDATE use partition pruning , so the sql running very fast. Butonly see process memory increased quickly , with hundreds of connections to do similar "update/delete" on multiple partitiontables test, server physical memory got used up and a lot of "out of memory" error dumped to pglog, but from dumped memory contextstats , we only saw tens of MB memory used, instead of 160mb. Looks like Postgresql backend server try to allocate one large memory suddenly based on some estimation rule when update/deletefrom partition tables, and actually not used so much memory. we found that never free back to Operating systemafter the backend process idle long time. From OS pmap command, almost of memory " xxxxx rw--- [ anon ]". Maybe it's an overestimated memory allocation from OS than it's real usage ? Thanks, James -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Thursday, September 8, 2022 9:48 PM To: James Pang (chaolpan) <chaolpan@cisco.com> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org> Subject: Re: huge memory of Postgresql backend process "James Pang (chaolpan)" <chaolpan@cisco.com> writes: > We just switch from Oracle to PGV13, all of these tables got partition (256 HASH partition) , when UPDATE/DELETE ...WHERE . The memory increase verify quickly until >130M. not too much data in these tables. When we switch to PGV14.5, test again, still see similar memory consumption issue. I see no particular reason to think this is a bug. If you have a lot of partitions, there is going to be a lot of metadatafor them. Moreover, HASH partitioning is effectively incapable of being pruned, so that every query is going to touch every partition. (IMO, hash partitioning is basically never a good idea.) regards, tom lane
On Fri, 9 Sept 2022 at 02:08, James Pang (chaolpan) <chaolpan@cisco.com> wrote: > Looks like Postgresql backend server try to allocate one large memory suddenly based on some estimation rule when update/deletefrom partition tables, and actually not used so much memory. we found that never free back to Operating systemafter the backend process idle long time. From OS pmap command, almost of memory " xxxxx rw--- [ anon ]". Maybe it's an overestimated memory allocation from OS than it's real usage ? I see you've been asking this question in various places around the mailing lists. It seems what you're complaining about is mentioned in our documents under [1]. Namely: "Another reason to be concerned about having a large number of partitions is that the server's memory consumption may grow significantly over time, especially if many sessions touch large numbers of partitions. That's because each partition requires its metadata to be loaded into the local memory of each session that touches it." What that section of the documents does not clearly mention is that this "metadata" is kept for the life of the backend process. This is probably implied by the "significantly over time", but maybe we could have been more clear that we don't free the relcache entries. James, if you're having OOM problems due to this then the solution is to reduce the number of partitions in the partitioned table, or change things so that you don't query as many partitions from as many backends. The number of relcache entries stored for partitions will be number_of_partitions_you_query_in_the_backend * the_number_of_backends. So you'll either want to review the number of partitions or the number of backends (aka max_connections) David [1] https://www.postgresql.org/docs/13/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES
On Fri, 9 Sept 2022 at 01:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Moreover, HASH partitioning is effectively incapable of being > pruned, so that every query is going to touch every partition. > (IMO, hash partitioning is basically never a good idea.) I think that might have only briefly been true during the v11 cycle, but by the time the release came we had hash partition and partition pruning. David
On Fri, Sep 9, 2022 at 6:42 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Fri, 9 Sept 2022 at 01:48, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Moreover, HASH partitioning is effectively incapable of being > > pruned, so that every query is going to touch every partition. > > (IMO, hash partitioning is basically never a good idea.) > > I think that might have only briefly been true during the v11 cycle, > but by the time the release came we had hash partition and partition > pruning. That is correct. create table hp (a int, b text) partition by hash (a, b); create table hp0 partition of hp for values with (modulus 2, remainder 0); create table hp1 partition of hp for values with (modulus 2, remainder 1); insert into hp values (1, 'abc'); select tableoid::regclass, * from hp; tableoid | a | b ----------+---+----- hp0 | 1 | abc (1 row) explain select * from hp where a = 1 and b = 'abc'; QUERY PLAN -------------------------------------------------------- Seq Scan on hp0 hp (cost=0.00..29.05 rows=1 width=36) Filter: ((a = 1) AND (b = 'abc'::text)) (2 rows) Maybe, James is thinking that the reason for high memory usage is the same when using PG v13 as it is when using v14. v13 can't handle UPDATE/DELETEs of partitioned tables as well as v14, though only for queries where partition pruning isn't being used. It's true though that the backend-lifetime caching of partition metadata, especially when there are too many backends doing it, can add up over many backends and many partitions accessed in each. So your advice of lowering the number of backends or the number of partitions will help. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Thanks for you suggestion, yes, we reduced partition count from 256 to 128, then to 64. The memory got reduced from 170Mto 91MB then to 60M. It's from Postgresql JDBC client, preparedStatementCacheQueries=(default 256), reduce that from JDBC parameter can helpreduce "cached plan context" ? Thanks, James -----Original Message----- From: David Rowley <dgrowleyml@gmail.com> Sent: Friday, September 9, 2022 5:35 AM To: James Pang (chaolpan) <chaolpan@cisco.com> Cc: Tom Lane <tgl@sss.pgh.pa.us>; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org> Subject: Re: huge memory of Postgresql backend process On Fri, 9 Sept 2022 at 02:08, James Pang (chaolpan) <chaolpan@cisco.com> wrote: > Looks like Postgresql backend server try to allocate one large memory suddenly based on some estimation rule when update/deletefrom partition tables, and actually not used so much memory. we found that never free back to Operating systemafter the backend process idle long time. From OS pmap command, almost of memory " xxxxx rw--- [ anon ]". Maybe it's an overestimated memory allocation from OS than it's real usage ? I see you've been asking this question in various places around the mailing lists. It seems what you're complaining about is mentioned in our documents under [1]. Namely: "Another reason to be concerned about having a large number of partitions is that the server's memory consumption may growsignificantly over time, especially if many sessions touch large numbers of partitions. That's because each partitionrequires its metadata to be loaded into the local memory of each session that touches it." What that section of the documents does not clearly mention is that this "metadata" is kept for the life of the backend process.This is probably implied by the "significantly over time", but maybe we could have been more clear that we don'tfree the relcache entries. James, if you're having OOM problems due to this then the solution is to reduce the number of partitions in the partitionedtable, or change things so that you don't query as many partitions from as many backends. The number of relcacheentries stored for partitions will be number_of_partitions_you_query_in_the_backend * the_number_of_backends. Soyou'll either want to review the number of partitions or the number of backends (aka max_connections) David [1] https://www.postgresql.org/docs/13/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES
Hi, On Fri, Sep 9, 2022 at 5:19 PM James Pang (chaolpan) <chaolpan@cisco.com> wrote: > Thanks for you suggestion, yes, we reduced partition count from 256 to 128, then to 64. The memory got reduced from 170Mto 91MB then to 60M. > It's from Postgresql JDBC client, preparedStatementCacheQueries=(default 256), reduce that from JDBC parameter can helpreduce "cached plan context" ? Interesting that you mention "cached plan context". Cached plans for queries over partitioned tables tend to be big (in terms of memory they take up in the plan cache) because the planner can't use partition pruning. One workaround might be to forcefully prevent plan caching by setting plan_cache_mode to 'force_custom_plan', which makes a backend rebuild the plan on every execution of a prepared statement. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Hi, We plan to convert one large partition table to multiple non-partitioned tables, for example , one 256 HASH partition tableto 256 small tables. Then application will query small tables directly to get quick sql response time. Does that helpreduce backend process memory for "metadata", I can expect to reduce partition related cache of backend process, butthe table/index count is same. Thanks, James -----Original Message----- From: Amit Langote <amitlangote09@gmail.com> Sent: Friday, September 9, 2022 4:42 PM To: James Pang (chaolpan) <chaolpan@cisco.com> Cc: David Rowley <dgrowleyml@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org> Subject: Re: huge memory of Postgresql backend process Hi, On Fri, Sep 9, 2022 at 5:19 PM James Pang (chaolpan) <chaolpan@cisco.com> wrote: > Thanks for you suggestion, yes, we reduced partition count from 256 to 128, then to 64. The memory got reduced from 170Mto 91MB then to 60M. > It's from Postgresql JDBC client, preparedStatementCacheQueries=(default 256), reduce that from JDBC parameter can helpreduce "cached plan context" ? Interesting that you mention "cached plan context". Cached plans for queries over partitioned tables tend to be big (in terms of memory they take up in the plan cache) becausethe planner can't use partition pruning. One workaround might be to forcefully prevent plan caching by setting plan_cache_modeto 'force_custom_plan', which makes a backend rebuild the plan on every execution of a prepared statement. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Hi, With 256 hash partition tables, we got huge backend memory increase with JDBC driver client prepared statements, even whenreduce partition count from 256 to 64, 27 partition tables insert/update/delete sql still consumes 60mb. We plan toconvert one large partition table to multiple non-partitioned tables, for example , one 256 HASH partition table to 256small tables. Then application will query small tables directly to get quick sql response time. Does that help reducebackend process memory for "metadata", I can expect to reduce partition related cache of backend process, but the table/index count is same. Thanks, James -----Original Message----- From: Amit Langote <amitlangote09@gmail.com> Sent: Friday, September 9, 2022 4:42 PM To: James Pang (chaolpan) <chaolpan@cisco.com> Cc: David Rowley <dgrowleyml@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org> Subject: Re: huge memory of Postgresql backend process Hi, On Fri, Sep 9, 2022 at 5:19 PM James Pang (chaolpan) <chaolpan@cisco.com> wrote: > Thanks for you suggestion, yes, we reduced partition count from 256 to 128, then to 64. The memory got reduced from 170Mto 91MB then to 60M. > It's from Postgresql JDBC client, preparedStatementCacheQueries=(default 256), reduce that from JDBC parameter can helpreduce "cached plan context" ? Interesting that you mention "cached plan context". Cached plans for queries over partitioned tables tend to be big (in terms of memory they take up in the plan cache) becausethe planner can't use partition pruning. One workaround might be to forcefully prevent plan caching by setting plan_cache_modeto 'force_custom_plan', which makes a backend rebuild the plan on every execution of a prepared statement. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Hi, Based on this test in our environment, we only use one prepare "PREPARE updmetadata(varchar,varchar,varchar,varchar,bigint,varchar)AS UPDATE xxxxx SET xxxx = $1,xxxxx = $2,KEYURL = $3, xxxx = $4 WHERE xxxx = $5 AND xxxxx = $6 ($6 and $5 is primary key) , it run fast. Start a new connection : RES=12894k PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1837663 postgres 20 0 45.5g 16000 11936 S 0.0 0.0 0:00.02 postgres: xxxxxxxxxxx(50048) idle Create a server prepared statement for this sql ,...run several times , we start to see server side cached statement andmemory increased quickly to 72800k PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1837663 postgres 20 0 45.6g 72800 12000 S 0.0 0.1 0:00.17 postgres: xxxxxxxxx(50048) idle --after that ,more execute prepare statement ,no more changes ... --adding a new prepare sql statement will increase about several MB bytes. Server prepared sql statement memory allocation , depends on partition count , for 256 partition count, it initially askingabout 60MB memory one time, then several MB memory for following sql statements depends on the SQL statement. Isthis kind of memory allocation is expected ? or maybe fine tuned memory allocation for "large connections user case" toavoid out of memory issue? Another interesting point is , when use psql .. -h localhost , the memory allocation is much less (only several MB) withabove test , since local running ok with same SQL and same table, for remote connection, need much more memory instead. Thanks, James -----Original Message----- From: James Pang (chaolpan) Sent: Thursday, September 8, 2022 10:08 PM To: Tom Lane <tgl@sss.pgh.pa.us> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org> Subject: RE: huge memory of Postgresql backend process When rebuild from partitioned to non-partitioned tables with same data size, same test load only use 24-25MB/per backendserver. From auto_explain that show both DELETE and UPDATE use partition pruning , so the sql running very fast. Butonly see process memory increased quickly , with hundreds of connections to do similar "update/delete" on multiple partitiontables test, server physical memory got used up and a lot of "out of memory" error dumped to pglog, but from dumped memory context stats , we only saw tens of MB memory used, instead of 160mb. Looks like Postgresql backend server try to allocate one large memory suddenly based on some estimation rule when update/deletefrom partition tables, and actually not used so much memory. we found that never free back to Operating systemafter the backend process idle long time. From OS pmap command, almost of memory " xxxxx rw--- [ anon ]". Maybe it's an overestimated memory allocation from OS than it's real usage ? Thanks, James -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Thursday, September 8, 2022 9:48 PM To: James Pang (chaolpan) <chaolpan@cisco.com> Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org> Subject: Re: huge memory of Postgresql backend process "James Pang (chaolpan)" <chaolpan@cisco.com> writes: > We just switch from Oracle to PGV13, all of these tables got partition (256 HASH partition) , when UPDATE/DELETE ...WHERE . The memory increase verify quickly until >130M. not too much data in these tables. When we switch to PGV14.5, test again, still see similar memory consumption issue. I see no particular reason to think this is a bug. If you have a lot of partitions, there is going to be a lot of metadatafor them. Moreover, HASH partitioning is effectively incapable of being pruned, so that every query is going to touch every partition. (IMO, hash partitioning is basically never a good idea.) regards, tom lane
On Sat, 10 Sept 2022 at 22:50, James Pang (chaolpan) <chaolpan@cisco.com> wrote: > We plan to convert one large partition table to multiple non-partitioned tables, for example , one 256 HASH partitiontable to 256 small tables. Then application will query small tables directly to get quick sql response time. Doesthat help reduce backend process memory for "metadata", I can expect to reduce partition related cache of backend process,but the table/index count is same. Not really. The metadata (aka relcache) we're talking about is per relation and it is loaded into the backend when a relation is first accessed in a backend. Both tables and partitions are relations. A partition may use slightly more memory in the relcache for storing the partition constraint. However, that's probably quite insignificant compared to the other data stored in relcache. The key here is likely how many relations are being accessed from a given backend. HASH partitioning does tend to lend itself to many partitions being accessed in a short space of time. That's quite different from say, having a RANGE partitioned table on time-series data with one partition per month. Your workload might only access 1 partition per month, in that case. You've not really explained your use case, so it's hard to know if HASH partitioning is best suited for your schema or not. David
On Sat, 10 Sept 2022 at 22:53, James Pang (chaolpan) <chaolpan@cisco.com> wrote: > With 256 hash partition tables, we got huge backend memory increase with JDBC driver client prepared statements, evenwhen reduce partition count from 256 to 64, 27 partition tables insert/update/delete sql still consumes 60mb. We planto convert one large partition table to multiple non-partitioned tables, for example , one 256 HASH partition table to256 small tables. Then application will query small tables directly to get quick sql response time. Does that help reducebackend process memory for "metadata", I can expect to reduce partition related cache of backend process, but the table/index count is same. There are two things to consider here. The "metadata", aka relcache is something that's stored for every table or partition you access from a backend. When you're using prepared statements, you're also storing query plans for pre-prepared queries in the backend memory too. Plans, especially UPDATE/DELETE plans to partitioned tables tend to be larger than plans for non-partitioned tables, so the plans to partitioned tables will consume more memory. If you start to query non-partitioned tables then these plans are likely to get smaller, therefore your backend is likely to consume less memory. You may also want to look at [1]. In particular: "Server-prepared statements consume memory both on the client and the server, so pgJDBC limits the number of server-prepared statements per connection. It can be configured via preparedStatementCacheQueries (default 256 , the number of queries known to pgJDBC), and preparedStatementCacheSizeMiB (default 5 , that is the client side cache size in megabytes per connection). Only a subset of statement cache is server-prepared as some of the statements might fail to reach prepareThreshold." It's likely if you're querying individual partitions then you'll hit that 256 query limit more quickly since you'll have more unique queries that you're running against the backend. You might just want to consider passing in some value less than 256 and still query partitioned tables instead. David [1] https://jdbc.postgresql.org/documentation/server-prepare/#server-prepared-statements
On Tue, 13 Sept 2022 at 20:50, James Pang (chaolpan) <chaolpan@cisco.com> wrote: > Server prepared sql statement memory allocation , depends on partition count , for 256 partition count, it initiallyasking about 60MB memory one time, then several MB memory for following sql statements depends on the SQL statement. Is this kind of memory allocation is expected ? or maybe fine tuned memory allocation for "large connectionsuser case" to avoid out of memory issue? > Another interesting point is , when use psql .. -h localhost , the memory allocation is much less (only several MB)with above test , since local running ok with same SQL and same table, for remote connection, need much more memory instead. It's important to understand that JDBC is probably using PREPAREd statements. If you're just using psql to execute the queries directly, then no query plan is stored in the backend memory after the execution of the query has completed. With PREPARE statements, the plan will be stored after the first EXECUTE and only released when you DEALLOCATE the cached plan or close the connection. In PostgreSQL 14 you can use "select * from pg_get_backend_memory_contexts();" to get a better understanding of the memory allocations within the backend you're querying from. Look out for rows with name set to CachedPlanSource and CachedPlanQuery. David