Re: ERROR: multixact X from before cutoff Y found to be still running - Mailing list pgsql-bugs

On 9/4/19 21:01, Thomas Munro wrote:
I suppose this requires you to run out of member space (for example
many backends key sharing the same FK) or maybe just set
autovacuum_multixact_freeze_max_age quite low, and then prolong the
life of a multixact for longer.
On this particular production system, autovacuum_multixact_freeze_max_age is the default value of 400 million and it is not overridden for any tables.  Looks to me like this was just workload driven.  There are a number of FKs and those seem to be a likely candidate to me.

Does the problem fix itself once you
close the transaction that's in the oldest multixact, ie holding back
GetOldestMultiXact() from advancing?  
The really interesting thing about this case is that the only long-running connection was the autovacuum that had been running since Sunday. While we were investigating yesterday, the autovacuum process died without advancing relfrozenxid (users configured this system with poor logging, so it's not known whether autovac terminated from error or from a user who logged on to the system).  As soon as the autovacuum process died, we stopped getting the "multixact X from before cutoff Y" errors.

It really appears that it was the autovacuum process itself that was providing the oldest running multixact which caused errors on yesterday's attempts to vacuum other tables - even though I though vacuum processes were ignored by that code.  I'll have to take another look at some point.

Vacuum cost parameters had been adjusted after Sunday, so the original autovacuum would have used default settings.  Naturally, a new autovacuum process started up right away.  This new process - definitely using adjusted cost parameters - completed the vacuum of the large table with 5 passes (index_vacuum_count) in a couple hours.  Maintenance work memory was already at the max; there were many hundreds of millions of dead tuples that still remained to be cleaned up.

The size of the large table (heap only) was about 75% of the memory on the server, and the table had three indexes each about half the size of the table.  The storage was provisioned at just over 10k IOPS; at this rate you could read all three indexes from the storage one block at a time in about an hour.  (And Linux should be reading more than a block at a time.)

It is not known whether the original autovacuum failed to completely vacuum the large table in 3 days because of cost settings alone or because there's another latent bug somewhere in the autovacuum code that put it into some kind of loop (but if autovac hit the error above then the PID would have terminated).  We didn't manage to get a pstack.

Since VACUUM errors out, we
don't corrupt data, right?  Everyone else is still going to see the
multixact as running and do the right thing because vacuum never
manages to (bogusly) freeze the tuple.
That's my take as well.  I don't think there's any data corruption risk here.

If anyone else ever hits this in the future, I think it's safe to just kill the oldest open session. The error should go away and there shouldn't be any risk of damage to the database.

Both patches prevent mxactLimit from being newer than the oldest
running multixact.  The v1 patch uses the most aggressive setting
possible: the oldest running multi; the v2 uses the least aggressive
of the 'safe' and oldest running multi.  At first glance it seems like
the second one is better: it only does something different if we're in
the dangerous scenario you identified, but otherwise it sticks to the
safe limit, which generates less IO.
Thanks for taking a look!

-Jeremy


-- 
Jeremy Schneider
Database Engineer
Amazon Web Services

pgsql-bugs by date:

Previous
From: Euler Taveira
Date:
Subject: Re: BUG #15992: Index size larger than the base table size. Sometime3 times large
Next
From: Andres Freund
Date:
Subject: Re: PostgreSQL12 crash bug report