Thread: My Experiment of PG crash when dealing with huge amount of data
postgres=# create table test01(id integer, val char(1024));
2) insert into the table 2457600 records, which makes totally more than 2400MB:
postgres=# insert into test01 values(generate_series(1,2457600),repeat( chr(int4(random()*26)+65),1024));
It really took a few minutes, When the sql statement run, I can see that the server process is consuming 80% of total memory of the os.
----------------------------------------------------------------------------------------------------------------------------------
[root@server ~]# ps aux | grep post
root 3180 0.0 0.0 105296 712 pts/1 S 16:31 0:00 su - postgres
postgres 3181 0.0 0.0 70304 676 pts/1 S+ 16:31 0:00 -bash
postgres 3219 0.0 0.2 113644 2864 pts/1 S 16:32 0:00 /usr/local/pgsql/bin/postgres -D /gao/data
postgres 3221 0.4 3.0 113724 35252 ? Ss 16:32 0:01 postgres: writer process
postgres 3222 0.2 0.1 113644 1616 ? Ds 16:32 0:00 postgres: wal writer process
postgres 3223 0.0 0.0 114380 1148 ? Ss 16:32 0:00 postgres: autovacuum launcher process
postgres 3224 0.0 0.0 73332 472 ? Ss 16:32 0:00 postgres: stats collector process
root 3252 0.0 0.0 105296 712 pts/2 S 16:32 0:00 su - postgres
postgres 3253 0.0 0.0 70304 676 pts/2 S 16:32 0:00 -bash
postgres 3285 0.0 0.0 83488 740 pts/2 S+ 16:32 0:00 ./psql
postgres 3286 14.8 80.2 2598332 924308 ? Ds 16:32 0:35 postgres: postgres postgres [local] INSERT
root 3333 0.0 0.0 65424 812 pts/3 S+ 16:36 0:00 grep post
This should be the operating system OOM kills pg process,check syslog On Fri 30 Aug 2013 05:10:42 PM CST, 高健 wrote: > Hello: > > I have done the following experiment to test : > > PG's activity when dealing with data which is bigger in size than > total memory of the whole os system. > > The result is: > PG says: > ---------------------------- > WARNING: terminating connection because of crash of another server > process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > The connection to the server was lost. Attempting reset: Failed. > !> > ----------------------------- > > In log, I can see the following: > LOG: background writer process (PID 3221) was terminated by signal 9: > Killed > > But why it need to kill the bgwriter ? > And that PG will use as much resource as it can to finish a sql dml , > till it used almost all the resource and then crash? > > I used the default checkpoint_segments(3) and shard_buffers(32MB) > settings. And my machine has only 1024MB memory. > The process is as : > 1) create table whose one tuple is more than 1KB: > > postgres=# create table test01(id integer, val char(1024)); > > 2) insert into the table 2457600 records, which makes totally more > than 2400MB: > > postgres=# insert into test01 > values(generate_series(1,2457600),repeat( > chr(int4(random()*26)+65),1024)); > > It really took a few minutes, When the sql statement run, I can see > that the server process is consuming 80% of total memory of the os. > > ---------------------------------------------------------------------------------------------------------------------------------- > > [root@server ~]# ps aux | grep post > > root 3180 0.0 0.0 105296 712 pts/1 S 16:31 0:00 su - > postgres > > postgres 3181 0.0 0.0 70304 676 pts/1 S+ 16:31 0:00 -bash > > postgres 3219 0.0 0.2 113644 2864 pts/1 S 16:32 0:00 > /usr/local/pgsql/bin/postgres -D /gao/data > > postgres 3221 0.4 3.0 113724 35252 ? Ss 16:32 0:01 > postgres: writer process > > postgres 3222 0.2 0.1 113644 1616 ? Ds 16:32 0:00 > postgres: wal writer process > > postgres 3223 0.0 0.0 114380 1148 ? Ss 16:32 0:00 > postgres: autovacuum launcher process > > postgres 3224 0.0 0.0 73332 472 ? Ss 16:32 0:00 > postgres: stats collector process > > root 3252 0.0 0.0 105296 712 pts/2 S 16:32 0:00 su - > postgres > > postgres 3253 0.0 0.0 70304 676 pts/2 S 16:32 0:00 -bash > > postgres 3285 0.0 0.0 83488 740 pts/2 S+ 16:32 0:00 ./psql > > postgres 3286 14.8 80.2 2598332 924308 ? Ds 16:32 0:35 > postgres: postgres postgres [local] INSERT > > root 3333 0.0 0.0 65424 812 pts/3 S+ 16:36 0:00 grep post > > -------------------------------------------------------- > > After a while, I found in the PG's log , the following information: > -------------------------------------------------------- > > LOG: autovacuum launcher started > LOG: database system is ready to accept connections > LOG: checkpoints are occurring too frequently (2 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (1 second apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (2 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (1 second apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (2 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (1 second apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (2 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (4 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (10 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (9 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (7 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (9 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (7 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (5 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (5 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (10 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (11 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (15 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (23 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (7 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (8 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (13 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (7 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (8 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (8 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (8 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (4 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (5 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: checkpoints are occurring too frequently (9 seconds apart) > HINT: Consider increasing the configuration parameter > "checkpoint_segments". > LOG: background writer process (PID 3221) was terminated by signal 9: > Killed > LOG: terminating any other active server processes > WARNING: terminating connection because of crash of another server > process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > WARNING: terminating connection because of crash of another server > process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > WARNING: terminating connection because of crash of another server > process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > WARNING: terminating connection because of crash of another server > process > DETAIL: The postmaster has commanded this server process to roll back > the current transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > LOG: all server processes terminated; reinitializing > FATAL: the database system is in recovery mode > LOG: database system was interrupted; last known up at 2013-08-30 > 16:36:42 CST > LOG: database system was not properly shut down; automatic recovery > in progress > LOG: consistent recovery state reached at 0/B7657BD0 > LOG: redo starts at 0/B60FE2B8 > LOG: unexpected pageaddr 0/B044C000 in log file 0, segment 184, > offset 4505600 > LOG: redo done at 0/B844B940 > LOG: autovacuum launcher started > LOG: database system is ready to accept connections > > ------------------------------------------------------------------------------------- > > Best Regards >
On Fri, Aug 30, 2013 at 6:10 PM, 高健 <luckyjackgao@gmail.com> wrote: > In log, I can see the following: > LOG: background writer process (PID 3221) was terminated by signal 9: > Killed Assuming that no users on your server manually killed this process, or that no maintenance task you implemented did that, this looks like the Linux OOM killer because of a memory overcommit. Have a look here for more details: http://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT So have a look at dmesg to confirm that, then you could use one of the strategies described in the docs. Also, as you have been doing a bulk INSERT, you should as well increase temporarily checkpoint_segments to reduce the pressure on the background writer by reducing the number of checkpoints happening. This will also make your data load faster. -- Michael
On Fri, Aug 30, 2013 at 2:10 AM, 高健 <luckyjackgao@gmail.com> wrote: > > > postgres=# insert into test01 values(generate_series(1,2457600),repeat( > chr(int4(random()*26)+65),1024)); The construct "values (srf1,srf2)" will generate its entire result set in memory up front, it will not "stream" its results to the insert statement on the fly. To spare memory, you would want to use something like: insert into test01 select generate_series, repeat(chr(int4(random()*26)+65),1024) from generate_series(1,2457600); Cheers, Jeff
>insert into test01 select generate_series,
>repeat(chr(int4(random()*26)+65),1024) from
>generate_series(1,2457600);
On Fri, Aug 30, 2013 at 2:10 AM, 高健 <luckyjackgao@gmail.com> wrote:The construct "values (srf1,srf2)" will generate its entire result set
>
>
> postgres=# insert into test01 values(generate_series(1,2457600),repeat(
> chr(int4(random()*26)+65),1024));
in memory up front, it will not "stream" its results to the insert
statement on the fly.
To spare memory, you would want to use something like:
insert into test01 select generate_series,
repeat(chr(int4(random()*26)+65),1024) from
generate_series(1,2457600);
Cheers,
Jeff
=?UTF-8?B?6auY5YGl?= <luckyjackgao@gmail.com> writes: > If data grows rapidly, maybe our customer will use too much memory , Is > ulimit command a good idea for PG? There's no received wisdom saying that it is. There's a fairly widespread consensus that disabling OOM kill can be a good idea, but I don't recall that many people have tried setting specific ulimits on server processes. Keep in mind that exceeding a ulimit would cause queries to fail outright (whether the server was under much load or not), versus just getting slower if the server starts to swap under too much load. I can imagine situations where that would be considered a good tradeoff, but it's hardly right for everyone. regards, tom lane
On Sun, Sep 1, 2013 at 6:25 PM, 高健 <luckyjackgao@gmail.com> wrote: >>To spare memory, you would want to use something like: > >>insert into test01 select generate_series, >>repeat(chr(int4(random()*26)+65),1024) from >>generate_series(1,2457600); > > Thanks a lot! > > What I am worrying about is that: > If data grows rapidly, maybe our customer will use too much memory , The size of the data has little to do with it. Take your example as an example. The database could have been nearly empty before you started running that query. A hostile or adventurous user can craft queries that will exhaust the server's memory without ever needing any particular amount of data in data_directory, except maybe in the temp tablespace. So it is a matter of what kind of users you have, not how much data you anticipate having on disk. The parts of PostgreSQL that might blow up memory based on ordinary disk-based tables are pretty well protected by shared_buffers, temp_buffers, work_mem, maintenance_work_mem, etc. already. It is the things that don't directly map to data already on disk which are probably more vulnerable. > Is > ulimit command a good idea for PG? I've used ulimit -v on a test server (which was intentionally used to test things to limits of destruction), and was happy with the results. It seemed like it would error out the offending process, or just the offending statement, in a graceful way; rather than having random processes other than the culprit be brutally killed by OOM, or having the machine just swap itself into uselessness. I'd be reluctant to use it on production just on spec that something bad *might* happen without it, but if I started experiencing problems caused by a single rogue process using outrageous amounts of memory, that would be one of my first stops. Experimentally, shared memory does count against the -v limit, and the limit has to be set rather higher than shared_buffers, or else your database won't even start. Cheers, Jeff
On Sun, Sep 1, 2013 at 6:25 PM, 高健 <luckyjackgao@gmail.com> wrote:The size of the data has little to do with it. Take your example as
>>To spare memory, you would want to use something like:
>
>>insert into test01 select generate_series,
>>repeat(chr(int4(random()*26)+65),1024) from
>>generate_series(1,2457600);
>
> Thanks a lot!
>
> What I am worrying about is that:
> If data grows rapidly, maybe our customer will use too much memory ,
an example. The database could have been nearly empty before you
started running that query. A hostile or adventurous user can craft
queries that will exhaust the server's memory without ever needing any
particular amount of data in data_directory, except maybe in the temp
tablespace.
So it is a matter of what kind of users you have, not how much data
you anticipate having on disk.
The parts of PostgreSQL that might blow up memory based on ordinary
disk-based tables are pretty well protected by shared_buffers,
temp_buffers, work_mem, maintenance_work_mem, etc. already. It is the
things that don't directly map to data already on disk which are
probably more vulnerable.I've used ulimit -v on a test server (which was intentionally used to
> Is
> ulimit command a good idea for PG?
test things to limits of destruction), and was happy with the results.
It seemed like it would error out the offending process, or just the
offending statement, in a graceful way; rather than having random
processes other than the culprit be brutally killed by OOM, or having
the machine just swap itself into uselessness. I'd be reluctant to
use it on production just on spec that something bad *might* happen
without it, but if I started experiencing problems caused by a single
rogue process using outrageous amounts of memory, that would be one of
my first stops.
Experimentally, shared memory does count against the -v limit, and the
limit has to be set rather higher than shared_buffers, or else your
database won't even start.
Cheers,
Jeff
Thanks, I'll consider it carefully.Best Regards2013/9/3 Jeff Janes <jeff.janes@gmail.com>On Sun, Sep 1, 2013 at 6:25 PM, 高健 <luckyjackgao@gmail.com> wrote:The size of the data has little to do with it. Take your example as
>>To spare memory, you would want to use something like:
>
>>insert into test01 select generate_series,
>>repeat(chr(int4(random()*26)+65),1024) from
>>generate_series(1,2457600);
>
> Thanks a lot!
>
> What I am worrying about is that:
> If data grows rapidly, maybe our customer will use too much memory ,
an example. The database could have been nearly empty before you
started running that query. A hostile or adventurous user can craft
queries that will exhaust the server's memory without ever needing any
particular amount of data in data_directory, except maybe in the temp
tablespace.
So it is a matter of what kind of users you have, not how much data
you anticipate having on disk.
The parts of PostgreSQL that might blow up memory based on ordinary
disk-based tables are pretty well protected by shared_buffers,
temp_buffers, work_mem, maintenance_work_mem, etc. already. It is the
things that don't directly map to data already on disk which are
probably more vulnerable.I've used ulimit -v on a test server (which was intentionally used to
> Is
> ulimit command a good idea for PG?
test things to limits of destruction), and was happy with the results.
It seemed like it would error out the offending process, or just the
offending statement, in a graceful way; rather than having random
processes other than the culprit be brutally killed by OOM, or having
the machine just swap itself into uselessness. I'd be reluctant to
use it on production just on spec that something bad *might* happen
without it, but if I started experiencing problems caused by a single
rogue process using outrageous amounts of memory, that would be one of
my first stops.
Experimentally, shared memory does count against the -v limit, and the
limit has to be set rather higher than shared_buffers, or else your
database won't even start.
Cheers,
Jeff