On Tue, Dec 27, 2011 at 4:07 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> Googling around, it sounds like this is often due to table corruption, which would be unfortunate, but usually
seemsto be repeatable. I can re-run that query without issue, and in fact can select * from the entire table without
issue.I do see the row was updated a few minutes after this error, so is it wishful thinking that vacuum came around
andsuccessfully removed the old, corrupted row version?
>>
>> It also happens that 18446744073709551613 is -3 in 64-bit 2's complement if it was unsigned. Is it possible that -3
wassome error return code that got cast and then passed directly to malloc()?
>
> That's not likely. The corruption is usually the cause, when it hits
> varlena header - that's where the length info is stored. In that case
> PostgreSQL suddenly thinks the varlena field has a negative value (and
> malloc accepts unsigned integers).
If the problem truly went away, one likely possibility is that the bad
tuple was simply deleted -- occasionally the corruption is limited to
a tuple or two but doesn't spill over into the page itself -- in such
situations some judicious deletion of rows can get you to a point
where you can pull off a dump.
merlin