Home > mailing lists

Re: VACUUM unable to accomplish because of a non-existent MultiXactId - Mailing list pgsql-admin

From	Kouber Saparev
Subject	Re: VACUUM unable to accomplish because of a non-existent MultiXactId
Date	November 28, 2015 13:53:08
Msg-id	99572764-62FE-4C23-8667-53B648C8DFD4@gmail.com Whole thread Raw
In response to	Re: VACUUM unable to accomplish because of a non-existent MultiXactId (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses	Re: VACUUM unable to accomplish because of a non-existent MultiXactId (Alvaro Herrera <alvherre@2ndquadrant.com>)
List	pgsql-admin

Tree view

The state of your data is probably caused by some weird corner case of
the upgrade. Can you see in the log files that the toast table has been
failing vacuum since the upgrade, or is it more recent than that? (In
other words, is there any working vacuum after the upgrade?)

We upgraded to 9.4.5 on 19 October, and there was a successful automatic vacuum over pg_toast_376621 just 3 days later - on 22 October:

Oct 22 08:16:49 db-master postgres[10589]: [3-1] []: LOG: automatic vacuum of table “db.pg_toast.pg_toast_376621": index scans: 1

Oct 22 08:16:49 db-master postgres[10589]: [3-2] pages: 0 removed, 784361 remain

Oct 22 08:16:49 db-master postgres[10589]: [3-3] tuples: 110 removed, 3768496 remain, 0 are dead but not yet removable

Oct 22 08:16:49 db-master postgres[10589]: [3-4] buffer usage: 37193 hits, 44891 misses, 32311 dirtied

Oct 22 08:16:49 db-master postgres[10589]: [3-5] avg read rate: 0.954 MB/s, avg write rate: 0.686 MB/s

Oct 22 08:16:49 db-master postgres[10589]: [3-6] system usage: CPU 1.10s/1.67u sec elapsed 367.73 sec

The next automatic vacuum came 8 days later - on 30 October and failed and it is failing ever since:

Oct 30 14:22:01 db-master postgres[16160]: [3-1] []: ERROR: MultiXactId 2915905228 does no longer exist -- apparent wraparound

Oct 30 14:22:01 db-master postgres[16160]: [3-2] []: CONTEXT: automatic vacuum of table “db.pg_toast.pg_toast_376621”

So I guess something happened between 22 and 30 October and there is no relation to the pg_upgrade we did on 19 October.

It would be useful to debug this that you attached gdb to a backend, set
breakpoint on errfinish, then run vacuum on that table. Then you can
extract the page number from the backtrace. With the page number we can
try pageinspect and heap_page_items until we find the culprit and
perhaps identify how it got in that state.

I will try to obtain the page number, and will then send you the results, thank you.

Can we somehow do it on one of our replicas (after detaching it), i.e. is the corrupted record propagated through the replication channel, and in the meantime fix the table on the master?

Thanks!

—

Kouber Saparev

pgsql-admin by date:

From: Alvaro Herrera
Date: 27 November 2015, 23:48:11
Subject: Re: VACUUM unable to accomplish because of a non-existent MultiXactId

From: Häfliger Guido
Date: 28 November 2015, 19:38:29
Subject: Re: pg_service and ldap

Re: VACUUM unable to accomplish because of a non-existent MultiXactId - Mailing list pgsql-admin

Previous

Next