Jigar Shah <jshah@pandora.com> writes:
> Postgres version = 9.1.2
Um, you do realize this is over a year out of date right?
(Fortunately, you will have an excellent opportunity to update tomorrow.)
> Few days ago we had a situation where our Primary started to through the error messages below indicating corruption
inthe database. It crashed sometimes and showed a panic message in the logs
> [d: u:radio p:31917 242] ERROR: could not open file "base/16384/114846.39" (target block 360448000): No such file or
directory[d: u:radio p:31917 243]
> 2013-03-27 11:07:51.348 PDT FATAL: corrupted item pointer: offset = 0, size = 0
> 2013-03-27 11:07:51.348 PDT CONTEXT: xlog redo split_l: rel 1663/16384/115085 left 4256959, right 5861610, next
5044459,level 0, firstright 192
Look up relfilenodes 114846 and 115085 in pg_class of whichever database
has OID 16384. I'm guessing the latter is an index of the former. If
that's true, then both of these messages suggest corruption in the index
--- the latter pretty obviously, and the former because it looks like
it's an attempt to fetch from a silly block number, which could have
come out of a corrupted index entry. So if you're really lucky and
nothing but that index is corrupted, a REINDEX will fix it. Personally
I'd be wondering about what's the underlying cause and whether there is
corruption elsewhere, though. Try looking for evidence of flaky RAM or
flaky disk drives on your primary. See if you can pg_dump (not just
for forensic reasons, but so you've got some kind of backup if things
go downhill from here).
regards, tom lane