Re: ERROR: found multixact from before relminmxid - Mailing list pgsql-general

From Adrien NAYRAT
Subject Re: ERROR: found multixact from before relminmxid
Date
Msg-id e9dbb964-2da7-1cc6-3ec5-b3684de84066@anayrat.info
Whole thread Raw
In response to Re: ERROR: found multixact from before relminmxid  (Jeremy Finzel <finzelj@gmail.com>)
Responses Re: ERROR: found multixact from before relminmxid
List pgsql-general
On 6/8/18 8:30 PM, Jeremy Finzel wrote:
> 
>   No I was referring to this from the documentation:
> 
>     Avoid spuriously marking pages as all-visible (Dan Wood, Pavan
>     Deolasee, Álvaro Herrera)
> 
>     This could happen if some tuples were locked (but not deleted).
>     While queries would still function correctly, vacuum would normally
>     ignore such pages, with the long-term effect that the tuples were
>     never frozen. In recent releases this would eventually result in
>     errors such as "found multixact nnnnn from before relminmxid nnnnn".
> 
> 

Hello hackers,

We got the same issue on a 9.6.10, autovacuum reports the same error 
when he tried to freeze a table:
ERROR:  found multixact 37009963 from before relminmxid 48606990

autovacuum was not able to freeze this table until postgres reach 1 
million transactions before wraparound and refuse any new transaction.

We have an OLTP workload and I noticed queries like SELECT .. FOR SHARE. 
I checked durability settings (fsync, fpw) everything seems fine. Also, 
I did not notice any error on the storage (local ssd with raid controler).

We "solved" with a dump/restore on another server, also we kept previous 
cluster to investigate. I want to be sure we encountered the bug solved 
in 9.6.9 and it is not a new one.

If we confirm it, maybe we should advise users to perform integrity 
check? I was surprised 9.6.9 avoid new appearance of corruption but 
nothing to ensure if database is already corrupted.

FYI, we tried to do a pgdump on secondary:
pg_dump: Error message from server: ERROR:  could not access status of 
transaction 1323321209
DETAIL:  Could not open file "pg_clog/04EE": No such file or directory.

I am surprised this clog file is missing on the secondary (but present 
on primary) :
[...]
04EB
04EC
04ED
0CEE
0CEF
0CF0
0CF1
[...]

I ran pg_visibility's checks, they do not reveal any corruption :
select pg_check_visible(i) from (select nspname ||'.' || relname as i 
from pg_class join pg_namespace on 
pg_class.relnamespace=pg_namespace.oid WHERE relkind IN ('r','t','m')) as i;
(0 rows)

select pg_check_frozen(i) from (select nspname ||'.' || relname as i 
from pg_class join pg_namespace on 
pg_class.relnamespace=pg_namespace.oid WHERE relkind IN ('r','t','m')) as i;
(0 rows)

Here is pg_controldata and information on the table where autovacuum 
reports error :
/usr/pgsql-9.6/bin/pg_controldata
pg_control version number:            960
Catalog version number:               201608131
Database system identifier:           6451990604639161176
Database cluster state:               in production
pg_control last modified:             Mon 05 Nov 2018 12:44:30 PM CET
Latest checkpoint location:           B9AF/70A4FD98
Prior checkpoint location:            B9AF/70A4B7D0
Latest checkpoint's REDO location:    B9AF/70A4FD60
Latest checkpoint's REDO WAL file:    000000010000B9AF00000070
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          1:1323325854
Latest checkpoint's NextOID:          1665327960
Latest checkpoint's NextMultiXactId:  60149587
Latest checkpoint's NextMultiOffset:  174269996
Latest checkpoint's oldestXID:        3471809503
Latest checkpoint's oldestXID's DB:   16393
Latest checkpoint's oldestActiveXID:  1323325854
Latest checkpoint's oldestMultiXid:   48606990
Latest checkpoint's oldestMulti's DB: 16393
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Mon 05 Nov 2018 12:44:29 PM CET
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                off
max_connections setting:              750
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

SELECT * FROM pg_class WHERE oid = 'anonymised'::regclass;
-[ RECORD 1 

]-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
relname             | anonymised
relnamespace        | 2200
reltype             | 16719
reloftype           | 0
relowner            | 16386
relam               | 0
relfilenode         | 438244741
reltablespace       | 0
relpages            | 1823173
reltuples           | 6.82778e+07
relallvisible       | 1822975
reltoastrelid       | 438244744
relhasindex         | t
relisshared         | f
relpersistence      | p
relkind             | r
relnatts            | 25
relchecks           | 0
relhasoids          | f
relhaspkey          | t
relhasrules         | f
relhastriggers      | t
relhassubclass      | f
relrowsecurity      | f
relforcerowsecurity | f
relispopulated      | t
relreplident        | d
relfrozenxid        | 3471809503
relminmxid          | 48606990
relacl              | {anonymised}
reloptions          | 
{autovacuum_vacuum_scale_factor=0.1,autovacuum_analyze_scale_factor=0.05}


Do you have any idea to investigate? Or steps to identify previous 
corruption on another instance?

Thanks!


pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Trouble Upgrading Postgres
Next
From: Charles Martin
Date:
Subject: Re: Trouble Upgrading Postgres