My Experiment of PG crash when dealing with huge amount of data - Mailing list pgsql-general

From 高健
Subject My Experiment of PG crash when dealing with huge amount of data
Date
Msg-id CAL454F3M19-Qgbfj3fp8_=w3CWWy5AWGKNTP=sD33EfFDv6JnA@mail.gmail.com
Whole thread Raw
Responses Re: My Experiment of PG crash when dealing with huge amount of data  (hxreno1 <hxreno1@gmail.com>)
Re: My Experiment of PG crash when dealing with huge amount of data  (Michael Paquier <michael.paquier@gmail.com>)
Re: My Experiment of PG crash when dealing with huge amount of data  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
Hello:

I  have done the following experiment to test :

PG's activity when dealing with data which is bigger in size than total memory of the whole os system.

The result is:
PG says:
---------------------------- 
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
The connection to the server was lost. Attempting reset: Failed.
!> 
-----------------------------

In log, I can see the following:
LOG:  background writer process (PID 3221) was terminated by signal 9: Killed

But why it need to kill the bgwriter ? 
And that PG will use as much resource as it can to finish a sql dml , till it used almost all the resource and then crash?  

I used the default  checkpoint_segments(3) and shard_buffers(32MB) settings. And my machine has only 1024MB memory.
The process is as :
1) create table whose one tuple is more than 1KB:

postgres=# create table test01(id integer, val char(1024)); 

2) insert into the table 2457600 records, which makes totally more than 2400MB:

postgres=# insert into test01 values(generate_series(1,2457600),repeat( chr(int4(random()*26)+65),1024));

It really took a few minutes, When the sql statement run, I can see that the server process is consuming 80% of total memory of the os.

----------------------------------------------------------------------------------------------------------------------------------

[root@server ~]# ps aux | grep post

root      3180  0.0  0.0 105296   712 pts/1    S    16:31   0:00 su - postgres

postgres  3181  0.0  0.0  70304   676 pts/1    S+   16:31   0:00 -bash

postgres  3219  0.0  0.2 113644  2864 pts/1    S    16:32   0:00 /usr/local/pgsql/bin/postgres -D /gao/data

postgres  3221  0.4  3.0 113724 35252 ?        Ss   16:32   0:01 postgres: writer process                  

postgres  3222  0.2  0.1 113644  1616 ?        Ds   16:32   0:00 postgres: wal writer process              

postgres  3223  0.0  0.0 114380  1148 ?        Ss   16:32   0:00 postgres: autovacuum launcher process     

postgres  3224  0.0  0.0  73332   472 ?        Ss   16:32   0:00 postgres: stats collector process         

root      3252  0.0  0.0 105296   712 pts/2    S    16:32   0:00 su - postgres

postgres  3253  0.0  0.0  70304   676 pts/2    S    16:32   0:00 -bash

postgres  3285  0.0  0.0  83488   740 pts/2    S+   16:32   0:00 ./psql

postgres  3286 14.8 80.2 2598332 924308 ?      Ds   16:32   0:35 postgres: postgres postgres [local] INSERT

root      3333  0.0  0.0  65424   812 pts/3    S+   16:36   0:00 grep post

--------------------------------------------------------

After a while, I found in the PG's log , the following information:
--------------------------------------------------------

LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (11 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (15 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (23 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (13 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  background writer process (PID 3221) was terminated by signal 9: Killed
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
LOG:  all server processes terminated; reinitializing
FATAL:  the database system is in recovery mode
LOG:  database system was interrupted; last known up at 2013-08-30 16:36:42 CST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  consistent recovery state reached at 0/B7657BD0
LOG:  redo starts at 0/B60FE2B8
LOG:  unexpected pageaddr 0/B044C000 in log file 0, segment 184, offset 4505600
LOG:  redo done at 0/B844B940
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

-------------------------------------------------------------------------------------

Best Regards

pgsql-general by date:

Previous
From: Mistina Michal
Date:
Subject: pgpool-ha not found in yum postgres repository
Next
From: Andreas Joseph Krogh
Date:
Subject: Re: Why is NULL = unbounded for rangetypes?