Thread: My Experiment of PG crash when dealing with huge amount of data

My Experiment of PG crash when dealing with huge amount of data

From
高健
Date:
Hello:

I  have done the following experiment to test :

PG's activity when dealing with data which is bigger in size than total memory of the whole os system.

The result is:
PG says:
---------------------------- 
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
The connection to the server was lost. Attempting reset: Failed.
!> 
-----------------------------

In log, I can see the following:
LOG:  background writer process (PID 3221) was terminated by signal 9: Killed

But why it need to kill the bgwriter ? 
And that PG will use as much resource as it can to finish a sql dml , till it used almost all the resource and then crash?  

I used the default  checkpoint_segments(3) and shard_buffers(32MB) settings. And my machine has only 1024MB memory.
The process is as :
1) create table whose one tuple is more than 1KB:

postgres=# create table test01(id integer, val char(1024)); 

2) insert into the table 2457600 records, which makes totally more than 2400MB:

postgres=# insert into test01 values(generate_series(1,2457600),repeat( chr(int4(random()*26)+65),1024));

It really took a few minutes, When the sql statement run, I can see that the server process is consuming 80% of total memory of the os.

----------------------------------------------------------------------------------------------------------------------------------

[root@server ~]# ps aux | grep post

root      3180  0.0  0.0 105296   712 pts/1    S    16:31   0:00 su - postgres

postgres  3181  0.0  0.0  70304   676 pts/1    S+   16:31   0:00 -bash

postgres  3219  0.0  0.2 113644  2864 pts/1    S    16:32   0:00 /usr/local/pgsql/bin/postgres -D /gao/data

postgres  3221  0.4  3.0 113724 35252 ?        Ss   16:32   0:01 postgres: writer process                  

postgres  3222  0.2  0.1 113644  1616 ?        Ds   16:32   0:00 postgres: wal writer process              

postgres  3223  0.0  0.0 114380  1148 ?        Ss   16:32   0:00 postgres: autovacuum launcher process     

postgres  3224  0.0  0.0  73332   472 ?        Ss   16:32   0:00 postgres: stats collector process         

root      3252  0.0  0.0 105296   712 pts/2    S    16:32   0:00 su - postgres

postgres  3253  0.0  0.0  70304   676 pts/2    S    16:32   0:00 -bash

postgres  3285  0.0  0.0  83488   740 pts/2    S+   16:32   0:00 ./psql

postgres  3286 14.8 80.2 2598332 924308 ?      Ds   16:32   0:35 postgres: postgres postgres [local] INSERT

root      3333  0.0  0.0  65424   812 pts/3    S+   16:36   0:00 grep post

--------------------------------------------------------

After a while, I found in the PG's log , the following information:
--------------------------------------------------------

LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (10 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (11 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (15 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (23 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (13 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (7 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (8 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (4 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (5 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  background writer process (PID 3221) was terminated by signal 9: Killed
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
LOG:  all server processes terminated; reinitializing
FATAL:  the database system is in recovery mode
LOG:  database system was interrupted; last known up at 2013-08-30 16:36:42 CST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  consistent recovery state reached at 0/B7657BD0
LOG:  redo starts at 0/B60FE2B8
LOG:  unexpected pageaddr 0/B044C000 in log file 0, segment 184, offset 4505600
LOG:  redo done at 0/B844B940
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections

-------------------------------------------------------------------------------------

Best Regards

Re: My Experiment of PG crash when dealing with huge amount of data

From
hxreno1
Date:
This should be the operating system  OOM kills pg process,check syslog

On Fri 30 Aug 2013 05:10:42 PM CST, 高健 wrote:
> Hello:
>
> I  have done the following experiment to test :
>
> PG's activity when dealing with data which is bigger in size than
> total memory of the whole os system.
>
> The result is:
> PG says:
> ----------------------------
> WARNING:  terminating connection because of crash of another server
> process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> The connection to the server was lost. Attempting reset: Failed.
> !>
> -----------------------------
>
> In log, I can see the following:
> LOG:  background writer process (PID 3221) was terminated by signal 9:
> Killed
>
> But why it need to kill the bgwriter ?
> And that PG will use as much resource as it can to finish a sql dml ,
> till it used almost all the resource and then crash?
>
> I used the default  checkpoint_segments(3) and shard_buffers(32MB)
> settings. And my machine has only 1024MB memory.
> The process is as :
> 1) create table whose one tuple is more than 1KB:
>
> postgres=# create table test01(id integer, val char(1024));
>
> 2) insert into the table 2457600 records, which makes totally more
> than 2400MB:
>
> postgres=# insert into test01
> values(generate_series(1,2457600),repeat(
> chr(int4(random()*26)+65),1024));
>
> It really took a few minutes, When the sql statement run, I can see
> that the server process is consuming 80% of total memory of the os.
>
>
----------------------------------------------------------------------------------------------------------------------------------
>
> [root@server ~]# ps aux | grep post
>
> root      3180  0.0  0.0 105296   712 pts/1    S    16:31   0:00 su -
> postgres
>
> postgres  3181  0.0  0.0  70304   676 pts/1    S+   16:31   0:00 -bash
>
> postgres  3219  0.0  0.2 113644  2864 pts/1    S    16:32   0:00
> /usr/local/pgsql/bin/postgres -D /gao/data
>
> postgres  3221  0.4  3.0 113724 35252 ?        Ss   16:32   0:01
> postgres: writer process
>
> postgres  3222  0.2  0.1 113644  1616 ?        Ds   16:32   0:00
> postgres: wal writer process
>
> postgres  3223  0.0  0.0 114380  1148 ?        Ss   16:32   0:00
> postgres: autovacuum launcher process
>
> postgres  3224  0.0  0.0  73332   472 ?        Ss   16:32   0:00
> postgres: stats collector process
>
> root      3252  0.0  0.0 105296   712 pts/2    S    16:32   0:00 su -
> postgres
>
> postgres  3253  0.0  0.0  70304   676 pts/2    S    16:32   0:00 -bash
>
> postgres  3285  0.0  0.0  83488   740 pts/2    S+   16:32   0:00 ./psql
>
> postgres  3286 14.8 80.2 2598332 924308 ?      Ds   16:32   0:35
> postgres: postgres postgres [local] INSERT
>
> root      3333  0.0  0.0  65424   812 pts/3    S+   16:36   0:00 grep post
>
> --------------------------------------------------------
>
> After a while, I found in the PG's log , the following information:
> --------------------------------------------------------
>
> LOG:  autovacuum launcher started
> LOG:  database system is ready to accept connections
> LOG:  checkpoints are occurring too frequently (2 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (1 second apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (2 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (1 second apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (2 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (1 second apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (2 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (4 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (10 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (9 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (7 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (9 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (7 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (5 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (5 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (10 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (11 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (15 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (23 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (7 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (8 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (13 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (7 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (8 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (8 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (8 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (4 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (5 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  checkpoints are occurring too frequently (9 seconds apart)
> HINT:  Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG:  background writer process (PID 3221) was terminated by signal 9:
> Killed
> LOG:  terminating any other active server processes
> WARNING:  terminating connection because of crash of another server
> process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> WARNING:  terminating connection because of crash of another server
> process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> WARNING:  terminating connection because of crash of another server
> process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> WARNING:  terminating connection because of crash of another server
> process
> DETAIL:  The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> LOG:  all server processes terminated; reinitializing
> FATAL:  the database system is in recovery mode
> LOG:  database system was interrupted; last known up at 2013-08-30
> 16:36:42 CST
> LOG:  database system was not properly shut down; automatic recovery
> in progress
> LOG:  consistent recovery state reached at 0/B7657BD0
> LOG:  redo starts at 0/B60FE2B8
> LOG:  unexpected pageaddr 0/B044C000 in log file 0, segment 184,
> offset 4505600
> LOG:  redo done at 0/B844B940
> LOG:  autovacuum launcher started
> LOG:  database system is ready to accept connections
>
> -------------------------------------------------------------------------------------
>
> Best Regards
>




Re: My Experiment of PG crash when dealing with huge amount of data

From
Michael Paquier
Date:
On Fri, Aug 30, 2013 at 6:10 PM, 高健 <luckyjackgao@gmail.com> wrote:
> In log, I can see the following:
> LOG:  background writer process (PID 3221) was terminated by signal 9:
> Killed
Assuming that no users on your server manually killed this process, or
that no maintenance task you implemented did that, this looks like the
Linux OOM killer because of a memory overcommit. Have a look here for
more details:
http://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
So have a look at dmesg to confirm that, then you could use one of the
strategies described in the docs. Also, as you have been doing a bulk
INSERT, you should as well increase temporarily checkpoint_segments to
reduce the pressure on the background writer by reducing the number of
checkpoints happening. This will also make your data load faster.
--
Michael


Re: My Experiment of PG crash when dealing with huge amount of data

From
Jeff Janes
Date:
On Fri, Aug 30, 2013 at 2:10 AM, 高健 <luckyjackgao@gmail.com> wrote:
>
>
> postgres=# insert into test01 values(generate_series(1,2457600),repeat(
> chr(int4(random()*26)+65),1024));

The construct "values (srf1,srf2)" will generate its entire result set
in memory up front, it will not "stream" its results to the insert
statement on the fly.

To spare memory, you would want to use something like:

insert into test01 select generate_series,
repeat(chr(int4(random()*26)+65),1024) from
generate_series(1,2457600);

Cheers,

Jeff


Re: My Experiment of PG crash when dealing with huge amount of data

From
高健
Date:
>To spare memory, you would want to use something like:

>insert into test01 select generate_series,
>repeat(chr(int4(random()*26)+65),1024) from
>generate_series(1,2457600);

Thanks a lot!

What I am worrying about is that:  
If data grows rapidly, maybe our customer will use too much memory , Is ulimit  command a good idea for PG?

Best Regards



2013/9/1 Jeff Janes <jeff.janes@gmail.com>
On Fri, Aug 30, 2013 at 2:10 AM, 高健 <luckyjackgao@gmail.com> wrote:
>
>
> postgres=# insert into test01 values(generate_series(1,2457600),repeat(
> chr(int4(random()*26)+65),1024));

The construct "values (srf1,srf2)" will generate its entire result set
in memory up front, it will not "stream" its results to the insert
statement on the fly.

To spare memory, you would want to use something like:

insert into test01 select generate_series,
repeat(chr(int4(random()*26)+65),1024) from
generate_series(1,2457600);

Cheers,

Jeff

Re: My Experiment of PG crash when dealing with huge amount of data

From
Tom Lane
Date:
=?UTF-8?B?6auY5YGl?= <luckyjackgao@gmail.com> writes:
> If data grows rapidly, maybe our customer will use too much memory , Is
> ulimit  command a good idea for PG?

There's no received wisdom saying that it is.  There's a fairly widespread
consensus that disabling OOM kill can be a good idea, but I don't recall
that many people have tried setting specific ulimits on server processes.

Keep in mind that exceeding a ulimit would cause queries to fail outright
(whether the server was under much load or not), versus just getting
slower if the server starts to swap under too much load.  I can imagine
situations where that would be considered a good tradeoff, but it's hardly
right for everyone.

            regards, tom lane


Re: My Experiment of PG crash when dealing with huge amount of data

From
Jeff Janes
Date:
On Sun, Sep 1, 2013 at 6:25 PM, 高健 <luckyjackgao@gmail.com> wrote:
>>To spare memory, you would want to use something like:
>
>>insert into test01 select generate_series,
>>repeat(chr(int4(random()*26)+65),1024) from
>>generate_series(1,2457600);
>
> Thanks a lot!
>
> What I am worrying about is that:
> If data grows rapidly, maybe our customer will use too much memory ,


The size of the data has little to do with it.  Take your example as
an example.  The database could have been nearly empty before you
started running that query.  A hostile or adventurous user can craft
queries that will exhaust the server's memory without ever needing any
particular amount of data in data_directory, except maybe in the temp
tablespace.

So it is a matter of what kind of users you have, not how much data
you anticipate having on disk.

The parts of PostgreSQL that might blow up memory based on ordinary
disk-based tables are pretty well protected by shared_buffers,
temp_buffers, work_mem, maintenance_work_mem, etc. already.  It is the
things that don't directly map to data already on disk which are
probably more vulnerable.

> Is
> ulimit  command a good idea for PG?

I've used ulimit -v on a test server (which was intentionally used to
test things to limits of destruction), and was happy with the results.
 It seemed like it would error out the offending process, or just the
offending statement, in a graceful way; rather than having random
processes other than the culprit be brutally killed by OOM, or having
the machine just swap itself into uselessness.   I'd be reluctant to
use it on production just on spec that something bad *might* happen
without it, but if I started experiencing problems caused by a single
rogue process using outrageous amounts of memory, that would be one of
my first stops.

Experimentally, shared memory does count against the -v limit, and the
limit has to be set rather higher than shared_buffers, or else your
database won't even start.

Cheers,

Jeff


Re: My Experiment of PG crash when dealing with huge amount of data

From
高健
Date:
Thanks, I'll consider it carefully.
 
Best Regards

2013/9/3 Jeff Janes <jeff.janes@gmail.com>
On Sun, Sep 1, 2013 at 6:25 PM, 高健 <luckyjackgao@gmail.com> wrote:
>>To spare memory, you would want to use something like:
>
>>insert into test01 select generate_series,
>>repeat(chr(int4(random()*26)+65),1024) from
>>generate_series(1,2457600);
>
> Thanks a lot!
>
> What I am worrying about is that:
> If data grows rapidly, maybe our customer will use too much memory ,


The size of the data has little to do with it.  Take your example as
an example.  The database could have been nearly empty before you
started running that query.  A hostile or adventurous user can craft
queries that will exhaust the server's memory without ever needing any
particular amount of data in data_directory, except maybe in the temp
tablespace.

So it is a matter of what kind of users you have, not how much data
you anticipate having on disk.

The parts of PostgreSQL that might blow up memory based on ordinary
disk-based tables are pretty well protected by shared_buffers,
temp_buffers, work_mem, maintenance_work_mem, etc. already.  It is the
things that don't directly map to data already on disk which are
probably more vulnerable.

> Is
> ulimit  command a good idea for PG?

I've used ulimit -v on a test server (which was intentionally used to
test things to limits of destruction), and was happy with the results.
 It seemed like it would error out the offending process, or just the
offending statement, in a graceful way; rather than having random
processes other than the culprit be brutally killed by OOM, or having
the machine just swap itself into uselessness.   I'd be reluctant to
use it on production just on spec that something bad *might* happen
without it, but if I started experiencing problems caused by a single
rogue process using outrageous amounts of memory, that would be one of
my first stops.

Experimentally, shared memory does count against the -v limit, and the
limit has to be set rather higher than shared_buffers, or else your
database won't even start.

Cheers,

Jeff

Re: My Experiment of PG crash when dealing with huge amount of data

From
高健
Date:
Hello:

Sorry for disturbing again.
Some of my friends told me about cgroups, So I tried it first.
I found that cgroups can work for  task such as  wget. 
But it can't work for my postgres process.

[root@cent6 Desktop]# cat /etc/cgconfig.conf
#
#  Copyright IBM Corporation. 2007
#
#  Authors: Balbir Singh <balbir@linux.vnet.ibm.com>
#  This program is free software; you can redistribute it and/or modify it
#  under the terms of version 2.1 of the GNU Lesser General Public License
#  as published by the Free Software Foundation.
#
#  This program is distributed in the hope that it would be useful, but
#  WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# See man cgconfig.conf for further details.
#
# By default, mount all controllers to /cgroup/<controller>

mount {
cpuset = /cgroup/cpuset;
cpu = /cgroup/cpu;
cpuacct = /cgroup/cpuacct;
memory = /cgroup/memory;
devices = /cgroup/devices;
freezer = /cgroup/freezer;
net_cls = /cgroup/net_cls;
blkio = /cgroup/blkio;
}

group test1 {
    perm {
          task{
              uid=postgres;
              gid=postgres;
          }
          
          admin{
             uid=root;
             gid=root; 
          }

    } memory {
       memory.limit_in_bytes=500M;
    }
}

[root@cent6 Desktop]# 

[root@cent6 Desktop]# service cgconfig status
Running
[root@cent6 Desktop]# 

When I start postgres and run the above sql statement, It still consume too much memory. As if cgroups does not work.

Best Regards


2013/9/3 高健 <luckyjackgao@gmail.com>
Thanks, I'll consider it carefully.
 
Best Regards

2013/9/3 Jeff Janes <jeff.janes@gmail.com>
On Sun, Sep 1, 2013 at 6:25 PM, 高健 <luckyjackgao@gmail.com> wrote:
>>To spare memory, you would want to use something like:
>
>>insert into test01 select generate_series,
>>repeat(chr(int4(random()*26)+65),1024) from
>>generate_series(1,2457600);
>
> Thanks a lot!
>
> What I am worrying about is that:
> If data grows rapidly, maybe our customer will use too much memory ,


The size of the data has little to do with it.  Take your example as
an example.  The database could have been nearly empty before you
started running that query.  A hostile or adventurous user can craft
queries that will exhaust the server's memory without ever needing any
particular amount of data in data_directory, except maybe in the temp
tablespace.

So it is a matter of what kind of users you have, not how much data
you anticipate having on disk.

The parts of PostgreSQL that might blow up memory based on ordinary
disk-based tables are pretty well protected by shared_buffers,
temp_buffers, work_mem, maintenance_work_mem, etc. already.  It is the
things that don't directly map to data already on disk which are
probably more vulnerable.

> Is
> ulimit  command a good idea for PG?

I've used ulimit -v on a test server (which was intentionally used to
test things to limits of destruction), and was happy with the results.
 It seemed like it would error out the offending process, or just the
offending statement, in a graceful way; rather than having random
processes other than the culprit be brutally killed by OOM, or having
the machine just swap itself into uselessness.   I'd be reluctant to
use it on production just on spec that something bad *might* happen
without it, but if I started experiencing problems caused by a single
rogue process using outrageous amounts of memory, that would be one of
my first stops.

Experimentally, shared memory does count against the -v limit, and the
limit has to be set rather higher than shared_buffers, or else your
database won't even start.

Cheers,

Jeff