Re: My Experiment of PG crash when dealing with huge amount of data - Mailing list pgsql-general
From | hxreno1 |
---|---|
Subject | Re: My Experiment of PG crash when dealing with huge amount of data |
Date | |
Msg-id | 52206C6C.4050405@gmail.com Whole thread Raw |
In response to | My Experiment of PG crash when dealing with huge amount of data (高健 <luckyjackgao@gmail.com>) |
List | pgsql-general |
This should be the operating system OOM kills pg process,check syslog On Fri 30 Aug 2013 05:10:42 PM CST, 高健 wrote: > Hello: > > I have done the following experiment to test : > > PG's activity when dealing with data which is bigger in size than > total memory of the whole os system. > > The result is: > PG says: > ---------------------------- > WARNING: terminating connection because of crash of another server > process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > The connection to the server was lost. Attempting reset: Failed. > !> > ----------------------------- > > In log, I can see the following: > LOG: background writer process (PID 3221) was terminated by signal 9: > Killed > > But why it need to kill the bgwriter ? > And that PG will use as much resource as it can to finish a sql dml , > till it used almost all the resource and then crash? > > I used the default checkpoint_segments(3) and shard_buffers(32MB) > settings. And my machine has only 1024MB memory. > The process is as : > 1) create table whose one tuple is more than 1KB: > > postgres=# create table test01(id integer, val char(1024)); > > 2) insert into the table 2457600 records, which makes totally more > than 2400MB: > > postgres=# insert into test01 > values(generate_series(1,2457600),repeat( > chr(int4(random()*26)+65),1024)); > > It really took a few minutes, When the sql statement run, I can see > that the server process is consuming 80% of total memory of the os. > > ---------------------------------------------------------------------------------------------------------------------------------- > > [root@server ~]# ps aux | grep post > > root 3180 0.0 0.0 105296 712 pts/1 S 16:31 0:00 su - > postgres > > postgres 3181 0.0 0.0 70304 676 pts/1 S+ 16:31 0:00 -bash > > postgres 3219 0.0 0.2 113644 2864 pts/1 S 16:32 0:00 > /usr/local/pgsql/bin/postgres -D /gao/data > > postgres 3221 0.4 3.0 113724 35252 ? Ss 16:32 0:01 > postgres: writer process > > postgres 3222 0.2 0.1 113644 1616 ? Ds 16:32 0:00 > postgres: wal writer process > > postgres 3223 0.0 0.0 114380 1148 ? Ss 16:32 0:00 > postgres: autovacuum launcher process > > postgres 3224 0.0 0.0 73332 472 ? Ss 16:32 0:00 > postgres: stats collector process > > root 3252 0.0 0.0 105296 712 pts/2 S 16:32 0:00 su - > postgres > > postgres 3253 0.0 0.0 70304 676 pts/2 S 16:32 0:00 -bash > > postgres 3285 0.0 0.0 83488 740 pts/2 S+ 16:32 0:00 ./psql > > postgres 3286 14.8 80.2 2598332 924308 ? Ds 16:32 0:35 > postgres: postgres postgres [local] INSERT > > root 3333 0.0 0.0 65424 812 pts/3 S+ 16:36 0:00 grep post > > -------------------------------------------------------- > > After a while, I found in the PG's log , the following information: > -------------------------------------------------------- > > LOG: autovacuum launcher started > LOG: database system is ready to accept connections > LOG: checkpoints are occurring too frequently (2 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (1 second apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (2 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (1 second apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (2 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (1 second apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (2 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (4 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (10 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (9 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (7 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (9 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (7 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (5 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (5 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (10 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (11 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (15 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (23 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (7 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (8 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (13 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (7 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (8 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (8 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (8 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (4 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (5 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (9 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: background writer process (PID 3221) was terminated by signal 9: > Killed > LOG: terminating any other active server processes > WARNING: terminating connection because of crash of another server > process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > WARNING: terminating connection because of crash of another server > process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > WARNING: terminating connection because of crash of another server > process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > WARNING: terminating connection because of crash of another server > process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > LOG: all server processes terminated; reinitializing > FATAL: the database system is in recovery mode > LOG: database system was interrupted; last known up at 2013-08-30 > 16:36:42 CST > LOG: database system was not properly shut down; automatic recovery > in progress > LOG: consistent recovery state reached at 0/B7657BD0 > LOG: redo starts at 0/B60FE2B8 > LOG: unexpected pageaddr 0/B044C000 in log file 0, segment 184, > offset 4505600 > LOG: redo done at 0/B844B940 > LOG: autovacuum launcher started > LOG: database system is ready to accept connections > > ------------------------------------------------------------------------------------- > > Best Regards >
pgsql-general by date: