Thread: memory leak under heavy load?
hi
i think i've encountered a bug in postgresql 8.1.
yet - i'm not reallty info submitting it to -bugs, as i have no way to successfully redo it again.
basically
i have server, with dual opteron, 4g of memory, 2gb of swap. everything working under centos 4.2.
postgresql 8.1 compiled from sources using:
./configure \
--prefix=/home/pgdba/work \
--without-debug \
--disable-debug \
--with-pgport=5810 \
--with-tcl \
--with-perl \
--with-python \
--without-krb4 \
--without-krb5 \
--without-pam \
--without-rendezvous \
--with-openssl \
--with-readline \
--with-zlib \
--with-gnu-ld
postgresql.conf looks like this (i removed commented lines):
listen_addresses = '*'
max_connections = 250
superuser_reserved_connections = 10
password_encryption = on
shared_buffers = 50000
temp_buffers = 1000
max_prepared_transactions = 250
work_mem = 10240
maintenance_work_mem = 131072
max_fsm_pages = 500000
max_fsm_relations = 5000
fsync = off
wal_buffers = 100
commit_delay = 1000
commit_siblings = 5
checkpoint_segments = 100
effective_cache_size = 196608
random_page_cost = 1.5
default_statistics_target = 50
log_destination = 'stderr'
redirect_stderr = on
log_directory = '/home/pgdba/logs/'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1440
log_rotation_size = 502400
log_min_duration_statement = 5000
log_connections = on
log_duration = off
log_line_prefix = '[%t] [%p] <%u@%d> '
log_statement = 'none'
stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on
autovacuum = off
autovacuum_naptime = 60
autovacuum_vacuum_threshold = 1000
autovacuum_analyze_threshold = 500
autovacuum_vacuum_scale_factor = 0.4
autovacuum_analyze_scale_factor = 0.2
check_function_bodies = on
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
custom_variable_classes = 'plperl'
plperl.use_strict = true
everything works nice,
but:
i run a loop of about 400 thousands inserts in transactions - two inserts in a transaction.
in totaal i had nearly 200 000 transactions - all very fast, and with no (or very little) time between them.
insert were made to 2 distinct tables, and (maybe that's important) about 99% failed because of "unique index violation".
what i say is that postmaster user started to "eat" memory.
it allocated *all* memory (both ram and swap), and then died.
load on the machine jumped to something around 20.
it is very strange for me. since next such run didn't break the postgres, but then - another one did.
i am unable to replay the scenario with 100% guarantee it will crash the backend.
as for the inserts - there were no triggers, no rules.
not even foreign keys. encodign is utf8.
is this something that anybody else encountered? what can i do to make it possible to fix the problem?
and yes - i know thay i should do copy instead of inserts, and bigger transactions, but i would like to have the problem fixed and not a workaround. actually i did a woraround for now and it works - every 100 transactions i disconnect and reconnect again. this way it works.
depesz
i think i've encountered a bug in postgresql 8.1.
yet - i'm not reallty info submitting it to -bugs, as i have no way to successfully redo it again.
basically
i have server, with dual opteron, 4g of memory, 2gb of swap. everything working under centos 4.2.
postgresql 8.1 compiled from sources using:
./configure \
--prefix=/home/pgdba/work \
--without-debug \
--disable-debug \
--with-pgport=5810 \
--with-tcl \
--with-perl \
--with-python \
--without-krb4 \
--without-krb5 \
--without-pam \
--without-rendezvous \
--with-openssl \
--with-readline \
--with-zlib \
--with-gnu-ld
postgresql.conf looks like this (i removed commented lines):
listen_addresses = '*'
max_connections = 250
superuser_reserved_connections = 10
password_encryption = on
shared_buffers = 50000
temp_buffers = 1000
max_prepared_transactions = 250
work_mem = 10240
maintenance_work_mem = 131072
max_fsm_pages = 500000
max_fsm_relations = 5000
fsync = off
wal_buffers = 100
commit_delay = 1000
commit_siblings = 5
checkpoint_segments = 100
effective_cache_size = 196608
random_page_cost = 1.5
default_statistics_target = 50
log_destination = 'stderr'
redirect_stderr = on
log_directory = '/home/pgdba/logs/'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1440
log_rotation_size = 502400
log_min_duration_statement = 5000
log_connections = on
log_duration = off
log_line_prefix = '[%t] [%p] <%u@%d> '
log_statement = 'none'
stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on
autovacuum = off
autovacuum_naptime = 60
autovacuum_vacuum_threshold = 1000
autovacuum_analyze_threshold = 500
autovacuum_vacuum_scale_factor = 0.4
autovacuum_analyze_scale_factor = 0.2
check_function_bodies = on
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
custom_variable_classes = 'plperl'
plperl.use_strict = true
everything works nice,
but:
i run a loop of about 400 thousands inserts in transactions - two inserts in a transaction.
in totaal i had nearly 200 000 transactions - all very fast, and with no (or very little) time between them.
insert were made to 2 distinct tables, and (maybe that's important) about 99% failed because of "unique index violation".
what i say is that postmaster user started to "eat" memory.
it allocated *all* memory (both ram and swap), and then died.
load on the machine jumped to something around 20.
it is very strange for me. since next such run didn't break the postgres, but then - another one did.
i am unable to replay the scenario with 100% guarantee it will crash the backend.
as for the inserts - there were no triggers, no rules.
not even foreign keys. encodign is utf8.
is this something that anybody else encountered? what can i do to make it possible to fix the problem?
and yes - i know thay i should do copy instead of inserts, and bigger transactions, but i would like to have the problem fixed and not a workaround. actually i did a woraround for now and it works - every 100 transactions i disconnect and reconnect again. this way it works.
depesz
On 11/29/05, hubert depesz lubaczewski <depesz@gmail.com> wrote:
now i'm nearly positive it's a bug.
i created database in this way:
CREATE DATABASE leak;
\c leak
CREATE TABLE users (id serial PRIMARY KEY, username TEXT NOT NULL DEFAULT '', password TEXT);
then made a list of "usernames": from "aaaa" to "czzz" with probability 97% - 3% are missing.
and then i COPY'ied this list into users.
then:
CREATE UNIQUE INDEX xxx on users (username);
vacuum verbose analyze;
after all of this i run this script:
#!/usr/bin/perl
use strict;
use DBI;
my $dbh = DBI->connect("dbi:Pg:dbname=leak", "pgdba", "", {"AutoCommit" => 0,
"RaiseError" => 0, "PrintError" => 0}) or die "cannot connect\n";
for (1..1000) {
my $i = 0;
my $q = "zzz";
while (1) {
$dbh->rollback();
$dbh->do("INSERT INTO users (username) VALUES ('$q')");
$dbh->commit();
$q++;
last if $q eq 'fzzz';
$i++;
if (0 == $i % 1000) {
system("ps uxf");
}
}
print "one iteration done\n";
<>;
}
$dbh->disconnect();
technically - i think that after first "iteration" memory usage should not inreate. but it does.
it does in small amounts, but still does.
can anybody test/confirm the problem?
depesz
i think i've encountered a bug in postgresql 8.1.
now i'm nearly positive it's a bug.
i created database in this way:
CREATE DATABASE leak;
\c leak
CREATE TABLE users (id serial PRIMARY KEY, username TEXT NOT NULL DEFAULT '', password TEXT);
then made a list of "usernames": from "aaaa" to "czzz" with probability 97% - 3% are missing.
and then i COPY'ied this list into users.
then:
CREATE UNIQUE INDEX xxx on users (username);
vacuum verbose analyze;
after all of this i run this script:
#!/usr/bin/perl
use strict;
use DBI;
my $dbh = DBI->connect("dbi:Pg:dbname=leak", "pgdba", "", {"AutoCommit" => 0,
"RaiseError" => 0, "PrintError" => 0}) or die "cannot connect\n";
for (1..1000) {
my $i = 0;
my $q = "zzz";
while (1) {
$dbh->rollback();
$dbh->do("INSERT INTO users (username) VALUES ('$q')");
$dbh->commit();
$q++;
last if $q eq 'fzzz';
$i++;
if (0 == $i % 1000) {
system("ps uxf");
}
}
print "one iteration done\n";
<>;
}
$dbh->disconnect();
technically - i think that after first "iteration" memory usage should not inreate. but it does.
it does in small amounts, but still does.
can anybody test/confirm the problem?
depesz
Are you sure this isn't just PostgreSQL caching data? A complete testcase would help, too (ie: whatever you used to generate the initial data). On Tue, Nov 29, 2005 at 06:46:06PM +0100, hubert depesz lubaczewski wrote: > On 11/29/05, hubert depesz lubaczewski <depesz@gmail.com> wrote: > > > > i think i've encountered a bug in postgresql 8.1. > > > > now i'm nearly positive it's a bug. > > i created database in this way: > CREATE DATABASE leak; > \c leak > CREATE TABLE users (id serial PRIMARY KEY, username TEXT NOT NULL DEFAULT > '', password TEXT); > then made a list of "usernames": from "aaaa" to "czzz" with probability 97% > - 3% are missing. > and then i COPY'ied this list into users. > > then: > CREATE UNIQUE INDEX xxx on users (username); > vacuum verbose analyze; > > after all of this i run this script: > #!/usr/bin/perl > use strict; > use DBI; > > my $dbh = DBI->connect("dbi:Pg:dbname=leak", "pgdba", "", {"AutoCommit" => > 0, > "RaiseError" => 0, "PrintError" => 0}) or die "cannot connect\n"; > > for (1..1000) { > my $i = 0; > my $q = "zzz"; > while (1) { > $dbh->rollback(); > $dbh->do("INSERT INTO users (username) VALUES ('$q')"); > $dbh->commit(); > $q++; > last if $q eq 'fzzz'; > $i++; > if (0 == $i % 1000) { > system("ps uxf"); > } > } > print "one iteration done\n"; > <>; > } > $dbh->disconnect(); > > > technically - i think that after first "iteration" memory usage should not > inreate. but it does. > it does in small amounts, but still does. > can anybody test/confirm the problem? > > depesz -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On 11/29/05, Jim C. Nasby <jnasby@pervasive.com> wrote:
i am not sure. but what bothers me is:
i setup limit of shared memory to 50000 buffers - which gives estimate 400 megabytes. how come it ended up using 6GB ?
depesz
Are you sure this isn't just PostgreSQL caching data?
i am not sure. but what bothers me is:
i setup limit of shared memory to 50000 buffers - which gives estimate 400 megabytes. how come it ended up using 6GB ?
depesz
even i have observed memory leaks ... is it happening in postgres side
i can send the valgrind logs
From: pgsql-general-owner@postgresql.org on behalf of hubert depesz lubaczewski
Sent: Wed 11/30/2005 12:59 PM
To: Jim C. Nasby
Cc: PostgreSQL General
Subject: Re: [GENERAL] memory leak under heavy load?
*********************** Your mail has been scanned by InterScan VirusWall. ***********-*********** |
Are you sure this isn't just PostgreSQL caching data?
i am not sure. but what bothers me is:
i setup limit of shared memory to 50000 buffers - which gives estimate 400 megabytes. how come it ended up using 6GB ?
depesz
Probably best to open up a bug... On Wed, Nov 30, 2005 at 03:38:06PM +0530, surabhi.ahuja wrote: > even i have observed memory leaks ... is it happening in postgres side > > i can send the valgrind logs > > ________________________________ > > From: pgsql-general-owner@postgresql.org on behalf of hubert depesz lubaczewski > Sent: Wed 11/30/2005 12:59 PM > To: Jim C. Nasby > Cc: PostgreSQL General > Subject: Re: [GENERAL] memory leak under heavy load? > > > *********************** > Your mail has been scanned by InterScan VirusWall. > ***********-*********** > > > On 11/29/05, Jim C. Nasby <jnasby@pervasive.com> wrote: > > Are you sure this isn't just PostgreSQL caching data? > > > > i am not sure. but what bothers me is: > i setup limit of shared memory to 50000 buffers - which gives estimate 400 megabytes. how come it ended up using 6GB ? > > depesz > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
> hi > i think i've encountered a bug in postgresql 8.1. > yet - i'm not reallty info submitting it to -bugs, as i have no way to > successfully redo it again. > > basically > i have server, with dual opteron, 4g of memory, 2gb of swap. > everything working under centos 4.2. > ... > what i say is that postmaster user started to "eat" memory. > it allocated *all* memory (both ram and swap), and then died. > load on the machine jumped to something around 20. I noticed a similar occurrence. We have a high-load PostgreSQL database -- not a ridiculous amount of inserts or updates, but a huge variety of diverse queries on some 200 tables. We had noticed load averages of 3-4 on our database for the past couple days. Then, this morning, Postgres got killed twice by the Linux out-of-memory process killer. (Also on a dual Opteron, 4GB of memory.) We were showing 3.5 GB of memory allocated to *something*, but stopping Postgres completely for a few seconds didn't lower the number. It wasn't taken by any process, which leads me to believe that it's a kernel bug. One reboot later, everything is rosy -- load hovers around 1.2, there's enough free memory to have a 2.5 GB buffer cache, and swap is untouched. PostgreSQL 7.4 had run on this box flawlessly for six months -- bad RAM forced us to take it down -- then again for another month until we upgraded to 8.1 last week. Like the original poster, we're set up for ~500 MB of shared memory; certainly not enough to make the kernel kill -9 postmaster. Kernel is 2.6.11-gentoo-r6, same as before the upgrade. Also, this didn't happen in our test environment, which uses a similar but x86 server. Perhaps this is AMD64 related? --Will Glynn Freedom Healthcare
Will Glynn <wglynn@freedomhealthcare.org> wrote: > Postgres completely for a few seconds didn't lower the number. It wasn't > taken by any process, which leads me to believe that it's a kernel bug. If it was a shared memory segment allocated a particular way (I *think* it's "shm_open", I'm not 100% sure), it's not erronious for the kernel to leave it behind after all processes are gone... see http://lists.debian.org/debian-apache/2004/06/msg00188.html . If postgres needs this much shared memory and wants it to go away on a crash, I think (again, I'm a neophyte at this still, I havent even fixed mod_bt fo rthis yet) that an mmap()ed file is the way to go... but then don't you need enough harddrive space to support your shared memory? I don't know, this whole things confusing... - Tyler
On Fri, Dec 02, 2005 at 03:53:20PM -0800, Tyler MacDonald wrote: > Will Glynn <wglynn@freedomhealthcare.org> wrote: > > Postgres completely for a few seconds didn't lower the number. It wasn't > > taken by any process, which leads me to believe that it's a kernel bug. > > If it was a shared memory segment allocated a particular way (I > *think* it's "shm_open", I'm not 100% sure), it's not erronious for the > kernel to leave it behind after all processes are gone... see > http://lists.debian.org/debian-apache/2004/06/msg00188.html . But this shouldn't be an issue here. If you set the IPC_RMID flag then the kernel should remove the segment when all users go away. This is standard IPC behaviour and is documentated in the manpage... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
> But this shouldn't be an issue here. If you set the IPC_RMID flag then > the kernel should remove the segment when all users go away. This is > standard IPC behaviour and is documentated in the manpage... > Would you please tell me where to find the manpage and how to set IPC_RMID flag? -- Kathy Lo
On Thu, Dec 08, 2005 at 12:29:11PM +0800, Kathy Lo wrote: > > But this shouldn't be an issue here. If you set the IPC_RMID flag then > > the kernel should remove the segment when all users go away. This is > > standard IPC behaviour and is documentated in the manpage... > > > > Would you please tell me where to find the manpage and how to set IPC_RMID flag? See the shmctl() manpage: int shmctl(int shmid, int cmd, struct shmid_ds *buf); One of the command ids is IPC_RMID Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On 12/8/05, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Thu, Dec 08, 2005 at 12:29:11PM +0800, Kathy Lo wrote: > > > But this shouldn't be an issue here. If you set the IPC_RMID flag then > > > the kernel should remove the segment when all users go away. This is > > > standard IPC behaviour and is documentated in the manpage... > > > > > > > Would you please tell me where to find the manpage and how to set IPC_RMID > flag? > > See the shmctl() manpage: > > int shmctl(int shmid, int cmd, struct shmid_ds *buf); > > One of the command ids is IPC_RMID > Do I need to change the source code of postgresql if I want to set IPC_RMID flag to solve this problem? -- Kathy Lo
On Fri, Dec 09, 2005 at 12:15:44PM +0800, Kathy Lo wrote: > > See the shmctl() manpage: > > > > int shmctl(int shmid, int cmd, struct shmid_ds *buf); > > > > One of the command ids is IPC_RMID > > > Do I need to change the source code of postgresql if I want to set > IPC_RMID flag to solve this problem? No because it's completely unrelated. The amount of shared memory doesn't vary while the system is running and it gets removed once you restart postgres. So it can't cause you to run out of memory (unless you allocated a truly huge amount of memory that way, but that's bad for other reasons). At worst it's gets stuffed into swap until you next start postgres. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.