On 1/20/19 5:07 PM, chenhj wrote:
> In our PG 10.2(CentOS 7.3) database, the following error is reported when querying a table. We have already restored
theproduction data through backup, but i want to confirm what may be the reason and how to avoid it in the future.
>
> lma=# select count(*) from bi_dm.tdm_ttk_site_on_way_rt;
> ERROR: could not access status of transaction 3250922107
> DETAIL: Could not open file "pg_xact/0C1C": No such file or directory.
>
> Here are some related information
>
> The CLOG files in pg_xact diractory is as follow:
>
> 0C4A(Last update date: 2018/12/29)
> ...
> 0D09(Last update date: 2019/01/13)
>
Yes, that very much looks like a data corruption, probably due to
truncating the clog too early or something like that.
> ...
>
> A similar problem has been reported in 9.0, but there is no reason to mention it.
>
> https://www.postgresql.org/message-id/flat/1300970362.2349.27.camel%40stevie
>
The symptoms are the same, but that's far from sufficient to conclude
it's the same root cause.
> Currently I suspect that it may be the same problem as the bug below. is it possible?
>
> The bug will cause some sessions to cache the wrong relfrozenxid of the table. The session that may call
vac_truncate_clog()will clean up the clog after the actual relfrozenxid due to reading the wrong relfrozenxid.
>
> https://www.postgresql.org/message-id/flat/20180809161148.GB22623%40momjian.us#a7cc4d41464064b7752a5574eb74a06d
>
Maybe. But it'll be hard to confirm it's what happened. It also shows
why it's important to keep up with minor updates (you're running 10.3,
which is almost 1 year old).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services