Re: could not access status of transaction - Mailing list pgsql-hackers

From Robert Haas
Subject Re: could not access status of transaction
Date
Msg-id CA+TgmoYc0cmQKd+ogi=BwRqwnQ21ooSEj0O84wKenxAiPzZT+Q@mail.gmail.com
Whole thread Raw
In response to Re:could not access status of transaction  (chenhj <chjischj@163.com>)
List pgsql-hackers
On Sun, Jan 5, 2020 at 11:00 PM chenhj <chjischj@163.com> wrote:
> According to above information, the flags of the heap page (163363) with the problem tuple (163363, 9) is 0x0001
(HAS_FREE_LINES),that is, ALL_VISIBLE is not set.
 
>
> However, according  hexdump content of the corresponding vm file, that block(location is 9F88 + 6bit) has set
VISIBILITYMAP_ALL_FROZENand VISIBILITYMAP_ALL_VISIBLE flags. That is, the heap file and the vm file are inconsistent.
 

That's not supposed to happen, and represents data corruption. Your
previous report of a too-old xmin surviving in the heap is also
corruption.  There is no guarantee that both problems have the same
cause, but suppose they do. One possibility is that a write to the
heap page may have gotten lost or undone. Suppose that, while this
page was in shared_buffers, VACUUM came through and froze it, setting
the bits in the VM and later truncating CLOG. Then, suppose that when
that page was evicted from shared_buffers, it didn't really get
written back to disk, or alternatively it did, but then later somehow
the old version reappeared. I think that would produce these symptoms.

I think that bad hardware could cause this, or running two copies of
the server on the same data files at the same time, or maybe some kind
of filesystem-related flakiness, especially if, for example, you are
using a network filesystem like NFS, or maybe a broken iSCSI stack.
There is also no reason it couldn't be a bug in PostgreSQL itself,
although if we lost page writes routinely somebody would surely have
noticed by now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Removing pg_pltemplate and creating "trustable" extensions
Next
From: Pierre Ducroquet
Date:
Subject: Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders