Re: could not access status of transaction - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: could not access status of transaction
Date
Msg-id cca621ea-0e93-4515-1266-f21bcba5ae65@2ndquadrant.com
Whole thread Raw
In response to could not access status of transaction  (chenhj <chjischj@163.com>)
Responses Re:Re: could not access status of transaction  (chenhj <chjischj@163.com>)
List pgsql-hackers
On 1/20/19 5:07 PM, chenhj wrote:
> In our PG 10.2(CentOS 7.3) database, the following error is reported when querying a table. We have already restored
theproduction data through backup, but i want to confirm what may be the reason and how to avoid it in the future.
 
> 
>     lma=# select count(*) from bi_dm.tdm_ttk_site_on_way_rt;
>     ERROR:  could not access status of transaction 3250922107
>     DETAIL:  Could not open file "pg_xact/0C1C": No such file or directory.
> 
> Here are some related information
> 
> The CLOG files in pg_xact diractory is as follow:
> 
>     0C4A(Last update date: 2018/12/29)
>         ...
>         0D09(Last update date: 2019/01/13)
> 

Yes, that very much looks like a data corruption, probably due to
truncating the clog too early or something like that.

> ...
> 
> A similar problem has been reported in 9.0, but there is no reason to mention it.
> 
> https://www.postgresql.org/message-id/flat/1300970362.2349.27.camel%40stevie
> 

The symptoms are the same, but that's far from sufficient to conclude
it's the same root cause.

> Currently I suspect that it may be the same problem as the bug below. is it possible?
> 
> The bug will cause some sessions to cache the wrong relfrozenxid of the table. The session that may call
vac_truncate_clog()will clean up the clog after the actual relfrozenxid due to reading the wrong relfrozenxid.
 
> 
> https://www.postgresql.org/message-id/flat/20180809161148.GB22623%40momjian.us#a7cc4d41464064b7752a5574eb74a06d
> 

Maybe. But it'll be hard to confirm it's what happened. It also shows
why it's important to keep up with minor updates (you're running 10.3,
which is almost 1 year old).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: chenhj
Date:
Subject: could not access status of transaction
Next
From: Dave Cramer
Date:
Subject: Re: Reviving the "Stopping logical replication protocol" patch fromVladimir Gordichuk