Thread: Postgres server 12.2 crash with process exited abnormally andpossibly corrupted shared memory

Hi Team,

 

I have using postgresql server v12.2  on CentOS Linux release 7.3.1611 (Core).

 

My application is working fine with non partition tables but recently we are trying to adopt partition table on few of application tables.

So we have created List partition on 6 table. 2 out of 6 tables have 24 partitions and 4 out of 6 tables have 500 list partitions. After performing partition table, we are trying to run our application it is getting crash as I can see the memory utilization is consumed 100% and once it reach to 100%  Postgres server getting crash with following error

 

2020-06-09 00:47:41.876 IDT pid:9701 xid:0 ip:10.247.150.107 WARNING:  terminating connection because of crash of another server process

2020-06-09 00:47:41.876 IDT pid:9701 xid:0 ip:10.247.150.107 DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

2020-06-09 00:47:44.606 IDT pid:9701 xid:0 ip: HINT:  In a moment you should be able to reconnect to the database and repeat your command.

2020-06-09 00:47:41.876 IDT pid:9961 xid:0 ip:10.247.150.107 FATAL:  the database system is in recovery mode

 

Prior to partition changes Postgres server utilizing 30% of total memory. Does number of partition is impacting the memory utilization?

 

My configuration as follows

 

PG version 12.2

RAM on Server : 78GB

effective_cache_size 44GB

Shared_buffers  21GB

max_connections 2000

work_mem   9MB

max_worker_processes 24

 

Thanks & Regards,

Ishan Joshi

 

This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service

On Tue, Jun 9, 2020 at 8:35 AM Ishan Joshi <Ishan.Joshi@amdocs.com> wrote:

I have using postgresql server v12.2  on CentOS Linux release 7.3.1611 (Core).

 

My application is working fine with non partition tables but recently we are trying to adopt partition table on few of application tables.

So we have created List partition on 6 table. 2 out of 6 tables have 24 partitions and 4 out of 6 tables have 500 list partitions. After performing partition table, we are trying to run our application it is getting crash as I can see the memory utilization is consumed 100% and once it reach to 100%  Postgres server getting crash with following error



How many rows did these tables have before partitioning? Why did you decide to partition? Do these list partitions allow for plan-time pruning? Do they support partition wise joins? work_mem can be used for each node of the plan and if you are getting parallel scans of many tables or indexes where you previously had one, that could be an issue.

2000 for max_connections strikes me as quite high. Consider the use of a connection pooler like pgbouncer or pgpool such that Postgres can be run with max connections more like 2-5x your number of CPUs, and those connections get re-used as needed. There is some fixed memory overhead for each potential connection.

Hi Michael,

 

Thanks for your response.

 

Please find answers for your questions

How many rows did these tables have before partitioning?     à  We starts test with  0 rows in partition table.

Why did you decide to partition?                                                  à  These tables are heave tables with high number of DML operation performed on this tables with high number of rows generated every hour.

Do these list partitions allow for plan-time pruning?                à   WE have tune application queries to utilize partition pruning. Still we have 2-3 queries not utilizing partition pruning and we are working on same.

Do they support partition wise joins?                                          à  Most of the queries are querying to single table.  We have change our queries that can utilize partition key.

work_mem can be used for each node of the plan and if you are getting parallel scans of many tables or indexes where you previously had one, that could be an issue. à some of query are scanning indexes on all the partition.

 

Current work_mem is set with 9MB.

cpu_tuple_cost = 0.03

seq_page_cost = 0.7

random_page_cost=1

huge_pages=off.

 

 

Thanks & Regards,

Ishan Joshi

 

From: Michael Lewis <mlewis@entrata.com>
Sent: Wednesday, June 10, 2020 1:23 AM
To: Ishan Joshi <Ishan.Joshi@amdocs.com>
Cc: pgsql-general@postgresql.org
Subject: Re: Postgres server 12.2 crash with process exited abnormally and possibly corrupted shared memory

 

On Tue, Jun 9, 2020 at 8:35 AM Ishan Joshi <Ishan.Joshi@amdocs.com> wrote:

I have using postgresql server v12.2  on CentOS Linux release 7.3.1611 (Core).

 

My application is working fine with non partition tables but recently we are trying to adopt partition table on few of application tables.

So we have created List partition on 6 table. 2 out of 6 tables have 24 partitions and 4 out of 6 tables have 500 list partitions. After performing partition table, we are trying to run our application it is getting crash as I can see the memory utilization is consumed 100% and once it reach to 100%  Postgres server getting crash with following error

 

 

How many rows did these tables have before partitioning? Why did you decide to partition? Do these list partitions allow for plan-time pruning? Do they support partition wise joins? work_mem can be used for each node of the plan and if you are getting parallel scans of many tables or indexes where you previously had one, that could be an issue.

 

2000 for max_connections strikes me as quite high. Consider the use of a connection pooler like pgbouncer or pgpool such that Postgres can be run with max connections more like 2-5x your number of CPUs, and those connections get re-used as needed. There is some fixed memory overhead for each potential connection.

This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service

On Wed, Jun 10, 2020 at 12:05 AM Ishan Joshi <Ishan.Joshi@amdocs.com> wrote:

How many rows did these tables have before partitioning?     à  We starts test with  0 rows in partition table.


Partitions are far from free and pruning is great but not guaranteed. How many total rows do you currently have or foresee having in the biggest table (all partitions combined, or without partitioning) within the next 1-3 years? 1 million? 100 million? 2 billion?  You may have partitioned before it was prudent to do so is the point. Just because something can be done, doesn't mean it should be. What sort of key are you partitioning on?

Also, what do you mean about tests? There are some assumptions made about empty tables since stats may indicate they are empty when they are not. If your tests involve many empty tables, then it may give rather different performance than real life where there are few or no empty partitions.


Hi Michael,

 

We have table having rows 2.5 millions records inserted and updated in each hour and another table having about 1 million records in an hour. WE have dedicated column added in table that have list of 1-500 and this column is used as partition key.

Idea behind the 500 partition is to store 1 year data in the table with partition key value change at every 20hrs.

As the data is huge in these tables we are approaching to partition and the list partition for performing  the maintenance on these tables.

 

This is not a test as we have partition on these tables with oracle but as we migrate to postgres, we are enabling the feature in postgres as well.

 

As we want to see the impact from beginning 0 size and understanding the details for next 72 hrs under heavy load.

 

As I just executed the same environment with 100 partition on these tables, Run was running for 12 hrs with constant 70% RAM utilization and 50% cpu utilization.

 

So I am suspecting the number of partition is the issue behind the memory utilization.

 

Thanks & Regards,

Ishan Joshi

 

From: Michael Lewis <mlewis@entrata.com>
Sent: Wednesday, June 10, 2020 10:28 PM
To: Ishan Joshi <Ishan.Joshi@amdocs.com>
Cc: pgsql-general@postgresql.org
Subject: Re: Postgres server 12.2 crash with process exited abnormally and possibly corrupted shared memory

 

On Wed, Jun 10, 2020 at 12:05 AM Ishan Joshi <Ishan.Joshi@amdocs.com> wrote:

How many rows did these tables have before partitioning?     à  We starts test with  0 rows in partition table.

 

Partitions are far from free and pruning is great but not guaranteed. How many total rows do you currently have or foresee having in the biggest table (all partitions combined, or without partitioning) within the next 1-3 years? 1 million? 100 million? 2 billion?  You may have partitioned before it was prudent to do so is the point. Just because something can be done, doesn't mean it should be. What sort of key are you partitioning on?

 

Also, what do you mean about tests? There are some assumptions made about empty tables since stats may indicate they are empty when they are not. If your tests involve many empty tables, then it may give rather different performance than real life where there are few or no empty partitions.

 

 

This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service

Hi,

 

Adding more, We have started to test individual tables now. We have tried two tables with 24 partitions and this increased the memory utilization from 30% to 50-60%. Then we have added another table with 500 partition and our test got crashed after few hours of consumption with 100% memory utilization.

 

Last night I had executed test with 200 partition on same table and executed 15 hrs test run and the memory utilization was stable with 80% of memory utilization. With referencing to this result I have executed log test run for 3 days with same 200 partitions.

 

But again suspect is going to number of partition only.

 

How can we check the memory utilization by a table object  or a specific query ?

 

 

Thanks & Regards,

Ishan Joshi

 

From: Ishan Joshi
Sent: Wednesday, June 10, 2020 10:42 PM
To: Michael Lewis <mlewis@entrata.com>
Cc: pgsql-general@postgresql.org
Subject: RE: Postgres server 12.2 crash with process exited abnormally and possibly corrupted shared memory

 

Hi Michael,

 

We have table having rows 2.5 millions records inserted and updated in each hour and another table having about 1 million records in an hour. WE have dedicated column added in table that have list of 1-500 and this column is used as partition key.

Idea behind the 500 partition is to store 1 year data in the table with partition key value change at every 20hrs.

 

As the data is huge in these tables we are approaching to partition and the list partition for performing  the maintenance on these tables.

 

This is not a test as we have partition on these tables with oracle but as we migrate to postgres, we are enabling the feature in postgres as well.

 

As we want to see the impact from beginning 0 size and understanding the details for next 72 hrs under heavy load.

 

As I just executed the same environment with 100 partition on these tables, Run was running for 12 hrs with constant 70% RAM utilization and 50% cpu utilization.

 

So I am suspecting the number of partition is the issue behind the memory utilization.

 

Thanks & Regards,

Ishan Joshi

 

From: Michael Lewis <mlewis@entrata.com>
Sent: Wednesday, June 10, 2020 10:28 PM
To: Ishan Joshi <Ishan.Joshi@amdocs.com>
Cc: pgsql-general@postgresql.org
Subject: Re: Postgres server 12.2 crash with process exited abnormally and possibly corrupted shared memory

 

On Wed, Jun 10, 2020 at 12:05 AM Ishan Joshi <Ishan.Joshi@amdocs.com> wrote:

How many rows did these tables have before partitioning?     à  We starts test with  0 rows in partition table.

 

Partitions are far from free and pruning is great but not guaranteed. How many total rows do you currently have or foresee having in the biggest table (all partitions combined, or without partitioning) within the next 1-3 years? 1 million? 100 million? 2 billion?  You may have partitioned before it was prudent to do so is the point. Just because something can be done, doesn't mean it should be. What sort of key are you partitioning on?

 

Also, what do you mean about tests? There are some assumptions made about empty tables since stats may indicate they are empty when they are not. If your tests involve many empty tables, then it may give rather different performance than real life where there are few or no empty partitions.

 

 

This email and the information contained herein is proprietary and confidential and subject to the Amdocs Email Terms of Service, which you may review at https://www.amdocs.com/about/email-terms-of-service