Thread: Problem with data corruption and psql memory usage
Hello! I'm new to Postgresql and I did make some import with about 2.8 Mio with normal insert commands. Config was (difference from default config): listen_addresses = '*' temp_buffers = 20MB # min 800kB work_mem = 20MB # min 64kB maintenance_work_mem = 32MB # min 1MB fsync = off # turns forced synchronization on or off full_page_writes = off wal_buffers = 20MB It crashed with a core dump (ulimit -c 0): LOG: server process (PID 12720) was terminated by signal 11 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server proc ess exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server proc ess exited abnormally and possibly corrupted shared memory. Afterwards I got the following error messages: WARNING: index "table_pkey" contains 2572948 row versions, but table contains 2572949 row versions HINT: Rebuild the index with REINDEX. LOG: server process (PID 13794) was terminated by signal 11 LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server proc ess exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: could not fsync segment 0 of relation 1663/16386/42726: Input/output error ERROR: storage sync failed on magnetic disk: Input/output error ERROR: could not access status of transaction 808464434 DETAIL: Could not open file "pg_clog/0303": No such file or directory. Afterwards I got: ERROR: could not access status of transaction 5526085 There were also some coredumps afterwards where I have a stack trace: #0 0x0807d241 in heap_deform_tuple () #1 0x08095b8c in toast_delete () #2 0x0809432e in heap_delete () #3 0x0814bfa4 in ExecutorRun () #4 0x081d7ece in FreeQueryDesc () #5 0x081d80c1 in FreeQueryDesc () #6 0x081d8979 in PortalRun () #7 0x081d4480 in pg_parse_query () #8 0x081d5a57 in PostgresMain () #9 0x081ad4fe in ClosePostmasterPorts () #10 0x081ae307 in PostmasterMain () #11 0x0816dec0 in main () So my questions are: 1.) Are my settings to aggresive (fsync=off, full_page_writes=off)? 2.) Should PostgreSQL also recover with these 2 options enabled on a core dump or is data corruption normally with these settings? 3.) Any ideas for the reason of coredumps? Write access was only from one session at a time. I only did select count(*) from table from other sessions. Afterwards I cleaned up the tables, pg_dumpall/restore session, initdb and disabled these 2 settings and everything went fine. I also had a problem with psql: psql < file.sql => psql took around 2GB virtual memory with heavy swapping. After Ctrl-C, restarting, it worked well. Any ideas? Machine is stable so I would say that a hardware failure is not the problem. Postgresql version is 8.2.3 on FC6 Thank you for the answer. Ciao, Gerhard -- http://www.wiesinger.com/
Gerhard Wiesinger <lists@wiesinger.com> writes: > LOG: could not fsync segment 0 of relation 1663/16386/42726: Input/output > error [ raised eyebrow... ] I think your machine is flakier than you believe. This error is particularly damning, but the general pattern of weird failures all over the place seems to me to fit the idea of hardware problems much better than any other explanation. FC6 and PG 8.2.3 are both pretty darn stable for most people, so there's *something* wrong with your installation, and unstable hardware is the first thing that comes to mind. regards, tom lane
Hello Tom! I don't think this is a hardware problem. Machine runs 24/7 for around 4 years without any problems, daily backup with GBs of data to it, uptimes to the next kernel security patch, etc. The only problem I could believe is: I'm running the FC7 test packages of postgresql in FC6 and maybe there is a slight glibc library conflict or any other incompatibility. Ciao, Gerhard -- http://www.wiesinger.com/ On Wed, 9 May 2007, Tom Lane wrote: > Gerhard Wiesinger <lists@wiesinger.com> writes: >> LOG: could not fsync segment 0 of relation 1663/16386/42726: Input/output >> error > > [ raised eyebrow... ] I think your machine is flakier than you believe. > This error is particularly damning, but the general pattern of weird > failures all over the place seems to me to fit the idea of hardware > problems much better than any other explanation. FC6 and PG 8.2.3 are > both pretty darn stable for most people, so there's *something* wrong > with your installation, and unstable hardware is the first thing that > comes to mind. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
Gerhard Wiesinger <lists@wiesinger.com> writes: > The only problem I could believe is: > I'm running the FC7 test packages of postgresql in FC6 and maybe there is > a slight glibc library conflict or any other incompatibility. Hmm, I'd be suspicious of that too. You'd be well advised to take the FC7 SRPM and rebuild it locally to ensure you don't have any conflicts of that sort. regards, tom lane
On Wed, 2007-05-09 at 11:18, Gerhard Wiesinger wrote: > Hello Tom! > > I don't think this is a hardware problem. Machine runs 24/7 for around 4 > years without any problems, daily backup with GBs of data to it, > uptimes to the next kernel security patch, etc. > > The only problem I could believe is: > I'm running the FC7 test packages of postgresql in FC6 and maybe there is > a slight glibc library conflict or any other incompatibility. While I agree with Tom that you should look at recompiling the fc7 packages to fc6, hardware does break in strange ways sometimes. A piece of dust in just the right place, a bit of heat sink compound that finally migrated onto a circuit trace. I'd test the hardware to be sure.
Hello Tom, Late answer, but answer :-) : Finally, it was a very strange hardware problem, where a very small part of RAM was defect but kernel never crashed. I had also a very strange behavior when verifying rpm packages with rpm -V. First I had the harddisk under suspicion. But then I flushed the OS caches: echo 3 > /proc/sys/vm/drop_caches and rpm -V was correct. => RAM issue. A memtest86+ showed very fast a defect RAM. So PostgreSQL didn't have any issue :-) Ciao, Gerhard -- http://www.wiesinger.com/ On Wed, 9 May 2007, Gerhard Wiesinger wrote: > Hello Tom! > > I don't think this is a hardware problem. Machine runs 24/7 for around 4 > years without any problems, daily backup with GBs of data to it, uptimes to > the next kernel security patch, etc. > > The only problem I could believe is: > I'm running the FC7 test packages of postgresql in FC6 and maybe there is a > slight glibc library conflict or any other incompatibility. > > Ciao, > Gerhard > > -- > http://www.wiesinger.com/ > > > On Wed, 9 May 2007, Tom Lane wrote: > >> Gerhard Wiesinger <lists@wiesinger.com> writes: >>> LOG: could not fsync segment 0 of relation 1663/16386/42726: Input/output >>> error >> >> [ raised eyebrow... ] I think your machine is flakier than you believe. >> This error is particularly damning, but the general pattern of weird >> failures all over the place seems to me to fit the idea of hardware >> problems much better than any other explanation. FC6 and PG 8.2.3 are >> both pretty darn stable for most people, so there's *something* wrong >> with your installation, and unstable hardware is the first thing that >> comes to mind. >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 3: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faq >> > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >