Losing data from Postgres - Mailing list pgsql-admin
From | Paul Breen |
---|---|
Subject | Losing data from Postgres |
Date | |
Msg-id | Pine.LNX.3.96.1001115152350.6198F-100000@cpark37.computerpark.co.uk Whole thread Raw |
Responses |
Re: Losing data from Postgres
Re: Losing data from Postgres |
List | pgsql-admin |
Hello everyone, Can anyone help us? We are using Postgres in a hotspare configuration, that is, we have 2 separate boxes both running identical versions of Postgres and everytime we insert|update|delete from the database we write to both boxes (at the application level). All communications to the databases are in transaction blocks and if we cannot commit to both databases then we rollback. Recently we have discovered that some records are missing from the main database table on the master box but are present in the table on the spare. We know its not a transaction problem (rolling back on master but not spare) because reports printed at the time of the records being written show the data in the master. It seems that some time afterwards the records just disappear from the master! Both boxes are raided (Raid 5), don't know if this is significant. Originally we were vacuuming twice a day but because some of the reports we produce regularly were taking too long as the database grew, we added multiple indexes onto the key tables and began vacuuming every hour. It's only after doing this that we noticed the data loss - don't know if this is coincidental or not. Yesterday we went back to vacuuming only twice a day. Looking in Postgres' server log we found some worrying error messages. Again, don't know if these are significant: 1000822.00:39:44.676 [5182] NOTICE: LockRelease: locktable lookup failed, no lock 515 instances on the master, 7 on the spare. 1000929.15:33:40.454 [20007] NOTICE: LockReplace: xid table corrupted 80 instances on the master, 0 on the spare - is this the problem? 1001027.11:55:04.071 [4847] NOTICE: Index dttmprod_ndx: NUMBER OF INDEX' TUPLES (33586) IS NOT THE SAME AS HEAP' (33578) 1065 instances on the master, 20 on the spare. The reason there is so many of these messages is that we didn't spot this error until after we had done a lot of vacuums. Since rebuilding the indexes (dropping & re-creating) we haven't seen this one again. We are using Postgres version 6.5.3. I know its old but this is the customer standard and we are loathe to upgrade unless its a problem inherent with this version. From a cost-of-ownership viewpoint, its easier to support a single version than multiple versions. The OS is SuSE Linux 6.3 (kernel 2.2.13), the control system programs talk to Postgres via libpq, the web interface is PHP3 and the Tcl/Tk interface is pgtksh. The Raid controller is a DPT SmartCache 4 (28 MB RAM). Both the master and the spare boxes are Intel Pentium III 550Mhz with 256MB RAM. If we've left anything out, write and we'll supply more information. Any help, especially around Postgres' server log messages would be greatly appreciated. Paul M. Breen, Software Engineer - Computer Park Ltd. Tel: (01536) 417155 Email: pbreen@computerpark.co.uk
pgsql-admin by date: