Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Date
Msg-id 20131209182701.GD9519@awork2.anarazel.de
Whole thread Raw
In response to Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby  (Serge Negodyuck <petr@petrovich.kiev.ua>)
Responses Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
Hi,

On 2013-12-09 17:49:34 +0200, Serge Negodyuck wrote:
> On master there are files from 0000 to 14078
>
> On slave there were absent files from A1xx to FFFF
> They were  the oldest ones. (October, November)

Some analysis later, I am pretty sure that the origin is a longstanding
problem and not connected to 9.3.[01] vs 9.3.2.

The above referenced 14078 file is exactly the last page before a
members wraparound:
(gdb) p/x (1L<<32)/(MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT)
$10 = 0x14078

So, what happened is that enough multixacts where created, that the
members slru wrapped around. It's not unreasonable for the members slru
to wrap around faster then the offsets one - after all we create at
least two entries into members for every offset entry. Also in 9.3+
there fit more xids on a offset than a members page.
When truncating, we first read the offset, to know where we currently
are in members, and then truncate both from their respective
point. Since we've wrapped around in members we very well might remove
content we actually need.

I've recently remarked that I find it dangerous that we only do
anti-wraparound stuff for pg_multixact/offsets, not for /members. So,
here we have the proof that that's bad.

This is an issue in <9.3 as well. It might, in some sense, even be worse
there, because we never vacuum old multis away. But on the other hand,
the growths of multis is slower there and we look into old multis less
frequently.

The only reason that you saw the issue on the standby first is that the
truncation code is called more frequently there. Afaics it will happen,
sometime in the future, on the master as well.

I think problems should be preventable if you issue a systemwide VACUUM
FREEZE, but please let others chime in before you execute it.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

Previous
From: Serge Negodyuck
Date:
Subject: Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby