BUG #3484: Missing pg_clog file / corrupt index - Mailing list pgsql-bugs
From | Marc Schablewski |
---|---|
Subject | BUG #3484: Missing pg_clog file / corrupt index |
Date | |
Msg-id | 200707251402.l6PE2UcM081666@wwwmaster.postgresql.org Whole thread Raw |
Responses |
Re: BUG #3484: Missing pg_clog file / corrupt index
|
List | pgsql-bugs |
The following bug has been logged online: Bug reference: 3484 Logged by: Marc Schablewski Email address: ms@clickware.de PostgreSQL version: 8.1.8 Operating system: SuSE Linux 10.0 / Kernel 2.6.13-15.8-smp (x86-64) Description: Missing pg_clog file / corrupt index Details: Our application receives and processes payment information that comes in plain text files and stores the processed data into different tables in our database. There are about 5 tables involved, three of them with 35 million records so far. We get approximately 150000 payments a day. Each payment is handled in a single transaction, because in case of an error, we want to store as many payments as possible. We have about 500000 INSERT and a few UPDATE statements each day. The whole application runs on two servers (see specs below) which are nearly identical. One is the production system, the other is for testing. A few months ago we had some trouble with the test system. Postgres complained about a missing pg_clog file during nightly routine VACUUM/ANALYZE. Some days later, the same error occurred on the production system, even on the same table! The corrupted table is one of those bigger ones involved into the file processing. After searching the web we found a hint that this problem could be related to a bug in 8.1.3, so we upgraded to 8.1.8 and restored the databases on both servers. This was around May, but now we discovered the same problem on our production system again. Actually, the error occurred four weeks ago, but it was not discovered until now â if you do error logging, you should look at it from time to time ;) When trying to backup or vacuum the database, we receive one of the following error messages: **BACKUP** pg_dump: ERROR: could not access status of transaction 2134240 DETAIL: could not open file "pg_clog/0002": Datei oder Verzeichnis nicht gefunden pg_dump: SQL command to dump the contents of table "dateiblock" failed: PQendcopy() failed. pg_dump: Error message from server: ERROR: could not access status of transaction 2134240 DETAIL: could not open file "pg_clog/0002": Datei oder Verzeichnis nicht gefunden pg_dump: The command was: COPY public.dateiblock (id, eda_id, dbt_id, lfdnr, binaer_offset, laenge) TO stdout; **VACUUM** INFO: vacuuming "public.dateiblock" ERROR: could not access status of transaction 2134240 DETAIL: could not open file "pg_clog/0002": Datei oder Verzeichnis nicht gefunden (For those not familiar to the German language: âDatei oder Verzeichnis nicht gefundenâ means âfile or directory not foundâ). Current pg_clogs range from 005A to 008F. The oldest one is dated to April 30th. We narrowed it down to a few records in that table. Some records contain unreasonable values, others produce the same message about the missing pg_clog file when selected and some are simply missing. But they must have existed, because there are still records in a second table referencing them. One strange thing about this is, that the referencing records are about two and a half months old and shouldnât been touched since then. We donât think this is a hardware issue, because we had it on two different servers and within a short period of time. Luckily, the loss of data is minimal. There are only about 30 records affected. Otherwise this would have been fatal, because as said before, our backup was not working either. In addition, we had another problem today. One of the indexes on a second table became corrupted, causing the database backend to rollback all transactions and restart the processes. A REINDEX fixed it, but it leaves me with a bad feeling about what will break next. 2007-07-25 08:07:00 CEST PANIC: right sibling's left-link doesn't match 2007-07-25 08:07:00 CEST STATEMENT: insert into transaktion (kaz_id, dbl_id, sta_id, kss_id, summe, zeitpunkt, tracenr, terminalid, status, kartennr , wae_id, kassenschnitt, freigabe, flo_id, importdatum) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15) 2007-07-25 08:07:00 CEST LOG: server process (PID 5699) was terminated by signal 6 2007-07-25 08:07:00 CEST LOG: terminating any other active server processes 2007-07-25 08:07:00 CEST WARNING: terminating connection because of crash of another server process 2007-07-25 08:07:00 CEST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. Kind regards, Marc Schablewski System: OS: SUSE LINUX 10.0 (x86-64), 2.6.13-15.8-smp x86_64 System: 2x Intel(R) Xeon(TM) CPU 2.80GHz Dual Core, 4GB RAM (HP proliant server) 3Ware 9500S-4LP, 2xRAID1 (one for OS & one for database) Postgres 8.1.8 Postgres was compiled manually but with default parameters. The configuration has been tuned to improve performance. shared_buffers = 30000 maintenance_work_mem = 131072 max_fsm_pages = 500000 default_statistics_target = 200 redirect_stderr = on log_directory = '/var/log/pg_log' log_min_duration_statement = 10 log_duration = on log_line_prefix = '%t ' stats_start_collector = on stats_command_string = on stats_block_level = on stats_row_level = on stats_reset_on_server_start = off VACUUM ANALYZE runs once a day, auto_vacuum is off.
pgsql-bugs by date: