Re: ERROR: found multixact from before relminmxid - Mailing list pgsql-general
From | Adrien NAYRAT |
---|---|
Subject | Re: ERROR: found multixact from before relminmxid |
Date | |
Msg-id | e9dbb964-2da7-1cc6-3ec5-b3684de84066@anayrat.info Whole thread Raw |
In response to | Re: ERROR: found multixact from before relminmxid (Jeremy Finzel <finzelj@gmail.com>) |
Responses |
Re: ERROR: found multixact from before relminmxid
|
List | pgsql-general |
On 6/8/18 8:30 PM, Jeremy Finzel wrote: > > No I was referring to this from the documentation: > > Avoid spuriously marking pages as all-visible (Dan Wood, Pavan > Deolasee, Álvaro Herrera) > > This could happen if some tuples were locked (but not deleted). > While queries would still function correctly, vacuum would normally > ignore such pages, with the long-term effect that the tuples were > never frozen. In recent releases this would eventually result in > errors such as "found multixact nnnnn from before relminmxid nnnnn". > > Hello hackers, We got the same issue on a 9.6.10, autovacuum reports the same error when he tried to freeze a table: ERROR: found multixact 37009963 from before relminmxid 48606990 autovacuum was not able to freeze this table until postgres reach 1 million transactions before wraparound and refuse any new transaction. We have an OLTP workload and I noticed queries like SELECT .. FOR SHARE. I checked durability settings (fsync, fpw) everything seems fine. Also, I did not notice any error on the storage (local ssd with raid controler). We "solved" with a dump/restore on another server, also we kept previous cluster to investigate. I want to be sure we encountered the bug solved in 9.6.9 and it is not a new one. If we confirm it, maybe we should advise users to perform integrity check? I was surprised 9.6.9 avoid new appearance of corruption but nothing to ensure if database is already corrupted. FYI, we tried to do a pgdump on secondary: pg_dump: Error message from server: ERROR: could not access status of transaction 1323321209 DETAIL: Could not open file "pg_clog/04EE": No such file or directory. I am surprised this clog file is missing on the secondary (but present on primary) : [...] 04EB 04EC 04ED 0CEE 0CEF 0CF0 0CF1 [...] I ran pg_visibility's checks, they do not reveal any corruption : select pg_check_visible(i) from (select nspname ||'.' || relname as i from pg_class join pg_namespace on pg_class.relnamespace=pg_namespace.oid WHERE relkind IN ('r','t','m')) as i; (0 rows) select pg_check_frozen(i) from (select nspname ||'.' || relname as i from pg_class join pg_namespace on pg_class.relnamespace=pg_namespace.oid WHERE relkind IN ('r','t','m')) as i; (0 rows) Here is pg_controldata and information on the table where autovacuum reports error : /usr/pgsql-9.6/bin/pg_controldata pg_control version number: 960 Catalog version number: 201608131 Database system identifier: 6451990604639161176 Database cluster state: in production pg_control last modified: Mon 05 Nov 2018 12:44:30 PM CET Latest checkpoint location: B9AF/70A4FD98 Prior checkpoint location: B9AF/70A4B7D0 Latest checkpoint's REDO location: B9AF/70A4FD60 Latest checkpoint's REDO WAL file: 000000010000B9AF00000070 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 1:1323325854 Latest checkpoint's NextOID: 1665327960 Latest checkpoint's NextMultiXactId: 60149587 Latest checkpoint's NextMultiOffset: 174269996 Latest checkpoint's oldestXID: 3471809503 Latest checkpoint's oldestXID's DB: 16393 Latest checkpoint's oldestActiveXID: 1323325854 Latest checkpoint's oldestMultiXid: 48606990 Latest checkpoint's oldestMulti's DB: 16393 Latest checkpoint's oldestCommitTsXid:0 Latest checkpoint's newestCommitTsXid:0 Time of latest checkpoint: Mon 05 Nov 2018 12:44:29 PM CET Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no wal_level setting: replica wal_log_hints setting: off max_connections setting: 750 max_worker_processes setting: 8 max_prepared_xacts setting: 0 max_locks_per_xact setting: 64 track_commit_timestamp setting: off Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Size of a large-object chunk: 2048 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Data page checksum version: 0 SELECT * FROM pg_class WHERE oid = 'anonymised'::regclass; -[ RECORD 1 ]-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- relname | anonymised relnamespace | 2200 reltype | 16719 reloftype | 0 relowner | 16386 relam | 0 relfilenode | 438244741 reltablespace | 0 relpages | 1823173 reltuples | 6.82778e+07 relallvisible | 1822975 reltoastrelid | 438244744 relhasindex | t relisshared | f relpersistence | p relkind | r relnatts | 25 relchecks | 0 relhasoids | f relhaspkey | t relhasrules | f relhastriggers | t relhassubclass | f relrowsecurity | f relforcerowsecurity | f relispopulated | t relreplident | d relfrozenxid | 3471809503 relminmxid | 48606990 relacl | {anonymised} reloptions | {autovacuum_vacuum_scale_factor=0.1,autovacuum_analyze_scale_factor=0.05} Do you have any idea to investigate? Or steps to identify previous corruption on another instance? Thanks!
pgsql-general by date: