Thread: could not access status of transaction 825832753

could not access status of transaction 825832753

From
Stephen Tyler
Date:
I just got this error, and I don't know why I got it:

7/12/09 2:57:24 PM    org.postgresql.postgres[89]    ERROR:  could not access status of transaction 825832753
7/12/09 2:57:24 PM    org.postgresql.postgres[89]    DETAIL:  Could not open file "pg_clog/0313": No such file or directory.
7/12/09 2:57:24 PM    org.postgresql.postgres[89]    STATEMENT:  select u.link, u.url from link_relurl as u left join link_meta as m on (m.link = u.link) where u.url like 'http://www.somedomain.com/%' and released not in  (2,4) and url not like 'http://www.somedomain.com/blah%' order by length(url) limit 200;

Retrying the SQL resulted in the same error.

I immediately ran pg_dump on the entire database.  No errors reported.

I checked the disk volume.  No problems found.  No console message about disk errors.  SMART status is OK.

I quit psql, and then restarted psql and re-entered the SQL.  The statement succeeded.

I did "vacuum analyze <tablename>" on both the tables in the SQL statement.  No errors.  Both tables are quite large (around 20GBytes).

I then did "select count(*) from link_relurl" and my Mac crashed hard (multilingual grey-screen asking me to hold the power button down).

After reboot, "select count(*) from link_relurl" (and the other table) succeeded.

pg_clog/ contains 146 files from 03F1 to 0482, so pg_clog/0313 is long gone.

I've searched past messages, and found references to disk corruption and advice to rebuild the entire database.  Is that still the advice?  Is there anyway to check that the database is not corrupted?  Is running "vacuum analyze" on a table enough to prove it is not corrupted?

My details:

Mac Pro 2009 Quad 2.93 with 16G of ECC RAM
Snow Leopard 10.6.2 in 64bit mode, fully patched
Database on RAID 0 array of SSDs
Postgres 8.4.1, 64 bit, compiled from source

I just installed Windows 7 in boot camp (on a different disk), and rearranged the SATA cabling.  But since the problem "disappeared" on reboot I'm thinking the corruption, if any, was in RAM not on disk.

Stephen