Thread: Error while trying to back up database: out of memroy
Hi.
First of all, sorry for my English. I hope you'll understand me.
We have quite large database: about 10G with two tables with binary data (~7 Gb and ~2 Gb with 1-5 Mb rows, there are no rows bigger than 6 MBytes).
We have 8 Gb ram, enough swap and 32-bit RHEL4 (I know, this is not very good).
We're required to take nightly per-table backups. I've written python script for it and it was working for a year.
But now I'm getting following error:
pg_dump: WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally a
nd possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
pg_dump: Dumping the contents of table "binary1" failed: PQgetCopyData() failed.
pg_dump: Error message from server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.binary1 (pkid, obj, filename, contenttype) TO stdout;
pg_dump: [archiver (db)] connection to database "database" failed: FATAL: the database system is in recovery mode
<database now in recovery mode and no longer accepts connections>
Sometimes it happens with binary2, sometimes it does not happens.
It seems postmaster eats a LOT of ram during backup, but I can't understand why and how to fix it. Please help.
Here is lines from postgres log:
<....>
Sep 22 01:00:59 db1 postgres[10553]: [2-1] user=,db= LOG: server process (PID 17490) was terminated by signal 9: Killed
Sep 22 01:01:06 db1 postgres[10553]: [3-1] user=,db= LOG: terminating any other active server processes
Sep 22 01:01:06 db1 postgres[14485]: [224-1] user=user,db=database WARNING: terminating connection because of crash of another server proces
s
Sep 22 01:01:06 db1 postgres[14462]: [38-1] user=user,db=database WARNING: terminating connection because of crash of another server process
Sep 22 01:01:06 db1 postgres[14485]: [224-2] user=u,db=d DETAIL: The postmaster has commanded this server process to roll back the
current
Sep 22 01:01:06 db1 postgres[14485]: [224-3] transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Sep 22 01:01:06 db1 postgres[18037]: [256633-1] user=u,db=d WARNING: terminating connection because of crash of another server pro
cess
Sep 22 01:01:06 db1 postgres[14899]: [493508-1] user=u,db=d WARNING: terminating connection because of crash of another server pro
cess
Sep 22 01:01:06 db1 postgres[14899]: [493508-2] user=u,db=d DETAIL: The postmaster has commanded this server process to roll back
the current
Sep 22 01:01:06 db1 postgres[17352]: [50-1] user=u,db=d WARNING: terminating connection because of crash of another server process
Sep 22 01:01:06 db1 postgres[14899]: [493508-3] transaction and exit, because another server process exited abnormally and possibly corrupted shared memor
y.
Sep 22 01:01:06 db1 postgres[14462]: [38-2] user=u,db=d DETAIL: The postmaster has commanded this server process to roll back the curr
ent transaction
Sep 22 01:01:06 db1 postgres[18037]: [256633-2] user=u,db=d DETAIL: The postmaster has commanded this server process to roll back
the current
Sep 22 01:01:06 db1 postgres[14485]: [224-4] user=u,db=d HINT: In a moment you should be able to reconnect to the database and rep
eat your command.
Sep 22 01:01:06 db1 postgres[18346]: [3-1] user=u,db=d WARNING: terminating connection because of crash of another server process
Sep 22 01:01:06 db1 postgres[17098]: [5-1] user=u,db=d WARNING: terminating connection because of crash of another server process
Sep 22 01:01:06 db1 postgres[17098]: [5-2] user=u,db=d DETAIL: The postmaster has commanded this server process to roll back the current
transaction and
Sep 22 01:01:06 db1 postgres[17098]: [5-3] exit, because another server process exited abnormally and possibly corrupted shared memory.
<...>
And here is dmesg:
Free pages: 1523292kB (1510016kB HighMem)
Active:439981 inactive:1048491 dirty:355684 writeback:0 unstable:0 free:380823 slab:44677 mapped:85734 pagetables:1714
DMA free:12588kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:823719 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:688kB min:928kB low:1856kB high:2784kB active:3068kB inactive:3044kB present:901120kB pages_scanned:7524 all_unreclaimable? yes
protections[]: 0 0 0
HighMem free:1510016kB min:512kB low:1024kB high:1536kB active:1756856kB inactive:4190920kB present:7995392kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 1*4kB 3*8kB 3*16kB 3*32kB 4*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12588kB
Normal: 0*4kB 0*8kB 7*16kB 0*32kB 1*64kB 0*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 688kB
HighMem: 190092*4kB 37922*8kB 5006*16kB 5751*32kB 2746*64kB 48*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1510016kB
Swap cache: add 1759101, delete 1758447, find 1522743/1668224, race 3+238
158914 bounce buffer pages
Free swap: 5417376kB
2228224 pages of RAM
1867710 pages of HIGHMEM
150044 reserved pages
1200940 pages shared
660 pages swap cached
Out of Memory: Killed process 18141 (postmaster).
============================================
#!/usr/bin/env python
import os
import datetime
import sys
import _pg as pg
## settings
###########################################################
host = "localhost"
user = "user"
dbs = ("database", ) # list of databases to backup
backup_path_tables = "/bla-bla-bla/dumps/"
###########################################################
## end of settings
date = datetime.date.today().strftime("%Y_%m_%d")
# 1: dump tables
for db in dbs:
dbh = pg.connect(db, host, -1, None, None, user, password)
ret = dbh.query("SELECT tablename FROM pg_tables WHERE schemaname='public';")
for row in ret.dictresult():
table = row['tablename']
if table <> 'binobject':
print "Dumping table '%s' from db '%s' " % (table, db)
os.system("pg_dump -F p " + db + " -U " + user + " -t " + table + " > " + backup_path_tables + "/" + db + "_" + table + "_" + date + ".sql")
--
Vladimir Rusinov
http://greenmice.info/
First of all, sorry for my English. I hope you'll understand me.
We have quite large database: about 10G with two tables with binary data (~7 Gb and ~2 Gb with 1-5 Mb rows, there are no rows bigger than 6 MBytes).
We have 8 Gb ram, enough swap and 32-bit RHEL4 (I know, this is not very good).
We're required to take nightly per-table backups. I've written python script for it and it was working for a year.
But now I'm getting following error:
pg_dump: WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally a
nd possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
pg_dump: Dumping the contents of table "binary1" failed: PQgetCopyData() failed.
pg_dump: Error message from server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.binary1 (pkid, obj, filename, contenttype) TO stdout;
pg_dump: [archiver (db)] connection to database "database" failed: FATAL: the database system is in recovery mode
<database now in recovery mode and no longer accepts connections>
Sometimes it happens with binary2, sometimes it does not happens.
It seems postmaster eats a LOT of ram during backup, but I can't understand why and how to fix it. Please help.
Here is lines from postgres log:
<....>
Sep 22 01:00:59 db1 postgres[10553]: [2-1] user=,db= LOG: server process (PID 17490) was terminated by signal 9: Killed
Sep 22 01:01:06 db1 postgres[10553]: [3-1] user=,db= LOG: terminating any other active server processes
Sep 22 01:01:06 db1 postgres[14485]: [224-1] user=user,db=database WARNING: terminating connection because of crash of another server proces
s
Sep 22 01:01:06 db1 postgres[14462]: [38-1] user=user,db=database WARNING: terminating connection because of crash of another server process
Sep 22 01:01:06 db1 postgres[14485]: [224-2] user=u,db=d DETAIL: The postmaster has commanded this server process to roll back the
current
Sep 22 01:01:06 db1 postgres[14485]: [224-3] transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Sep 22 01:01:06 db1 postgres[18037]: [256633-1] user=u,db=d WARNING: terminating connection because of crash of another server pro
cess
Sep 22 01:01:06 db1 postgres[14899]: [493508-1] user=u,db=d WARNING: terminating connection because of crash of another server pro
cess
Sep 22 01:01:06 db1 postgres[14899]: [493508-2] user=u,db=d DETAIL: The postmaster has commanded this server process to roll back
the current
Sep 22 01:01:06 db1 postgres[17352]: [50-1] user=u,db=d WARNING: terminating connection because of crash of another server process
Sep 22 01:01:06 db1 postgres[14899]: [493508-3] transaction and exit, because another server process exited abnormally and possibly corrupted shared memor
y.
Sep 22 01:01:06 db1 postgres[14462]: [38-2] user=u,db=d DETAIL: The postmaster has commanded this server process to roll back the curr
ent transaction
Sep 22 01:01:06 db1 postgres[18037]: [256633-2] user=u,db=d DETAIL: The postmaster has commanded this server process to roll back
the current
Sep 22 01:01:06 db1 postgres[14485]: [224-4] user=u,db=d HINT: In a moment you should be able to reconnect to the database and rep
eat your command.
Sep 22 01:01:06 db1 postgres[18346]: [3-1] user=u,db=d WARNING: terminating connection because of crash of another server process
Sep 22 01:01:06 db1 postgres[17098]: [5-1] user=u,db=d WARNING: terminating connection because of crash of another server process
Sep 22 01:01:06 db1 postgres[17098]: [5-2] user=u,db=d DETAIL: The postmaster has commanded this server process to roll back the current
transaction and
Sep 22 01:01:06 db1 postgres[17098]: [5-3] exit, because another server process exited abnormally and possibly corrupted shared memory.
<...>
And here is dmesg:
Free pages: 1523292kB (1510016kB HighMem)
Active:439981 inactive:1048491 dirty:355684 writeback:0 unstable:0 free:380823 slab:44677 mapped:85734 pagetables:1714
DMA free:12588kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:823719 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:688kB min:928kB low:1856kB high:2784kB active:3068kB inactive:3044kB present:901120kB pages_scanned:7524 all_unreclaimable? yes
protections[]: 0 0 0
HighMem free:1510016kB min:512kB low:1024kB high:1536kB active:1756856kB inactive:4190920kB present:7995392kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 1*4kB 3*8kB 3*16kB 3*32kB 4*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12588kB
Normal: 0*4kB 0*8kB 7*16kB 0*32kB 1*64kB 0*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 688kB
HighMem: 190092*4kB 37922*8kB 5006*16kB 5751*32kB 2746*64kB 48*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1510016kB
Swap cache: add 1759101, delete 1758447, find 1522743/1668224, race 3+238
158914 bounce buffer pages
Free swap: 5417376kB
2228224 pages of RAM
1867710 pages of HIGHMEM
150044 reserved pages
1200940 pages shared
660 pages swap cached
Out of Memory: Killed process 18141 (postmaster).
============================================
#!/usr/bin/env python
import os
import datetime
import sys
import _pg as pg
## settings
###########################################################
host = "localhost"
user = "user"
dbs = ("database", ) # list of databases to backup
backup_path_tables = "/bla-bla-bla/dumps/"
###########################################################
## end of settings
date = datetime.date.today().strftime("%Y_%m_%d")
# 1: dump tables
for db in dbs:
dbh = pg.connect(db, host, -1, None, None, user, password)
ret = dbh.query("SELECT tablename FROM pg_tables WHERE schemaname='public';")
for row in ret.dictresult():
table = row['tablename']
if table <> 'binobject':
print "Dumping table '%s' from db '%s' " % (table, db)
os.system("pg_dump -F p " + db + " -U " + user + " -t " + table + " > " + backup_path_tables + "/" + db + "_" + table + "_" + date + ".sql")
--
Vladimir Rusinov
http://greenmice.info/
"Vladimir Rusinov" <vladimir@greenmice.info> writes: > But now I'm getting following error: > pg_dump: WARNING: terminating connection because of crash of another server > process As a rule of thumb, you should disable OOM kill on any server system. However, you might want to look into why the system's aggregate memory requirements have now increased from what they used to be. It seems unlikely that this is pg_dump's fault per se, if you're running a reasonably recent PG release. (There were some memory leaks inside pg_dump, a long time ago...) regards, tom lane
As Tom mentioned, it sounds like you're being bitten by the oom killer. If, for some reason, you cannot run with it turned off, then add a really big swap space so it delays the onset of sudden death by oom to something really big. Is It possible your work_mem is set too high?
On Mon, Sep 22, 2008 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Vladimir Rusinov" <vladimir@greenmice.info> writes: >> But now I'm getting following error: >> pg_dump: WARNING: terminating connection because of crash of another server >> process > > As a rule of thumb, you should disable OOM kill on any server system. This document describes a few solutions potentially better than outright disabling: http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html . (I don't know whether those solutions actually work or not, but may be worth trying by the look of it.) Peter > However, you might want to look into why the system's aggregate memory > requirements have now increased from what they used to be. It seems > unlikely that this is pg_dump's fault per se, if you're running a > reasonably recent PG release. (There were some memory leaks inside > pg_dump, a long time ago...) > > regards, tom lane > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin >
On Sun, Sep 28, 2008 at 2:18 AM, Peter Kovacs <peter.kovacs.1.0rc@gmail.com> wrote: > On Mon, Sep 22, 2008 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Vladimir Rusinov" <vladimir@greenmice.info> writes: >>> But now I'm getting following error: >>> pg_dump: WARNING: terminating connection because of crash of another server >>> process >> >> As a rule of thumb, you should disable OOM kill on any server system. > > This document describes a few solutions potentially better than > outright disabling: > http://www.redhat.com/archives/taroon-list/2007-August/msg00006.html . > (I don't know whether those solutions actually work or not, but may be > worth trying by the look of it.) While there are better solutions for other types of servers, like web servers and what not, for PostgreSQL servers, overcommit isn't usually needed, and OOM killer / overcommit can both be disabled.