Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
Date
Msg-id 20140613175151.GN18688@eldon.alvh.no-ip.org
Whole thread Raw
In response to pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-bugs
Andres Freund wrote:
> Hi,
>
> When upgrading a < 9.3 cluster pg_upgrade doesn't bother to keep the old
> multixacts around because they won't be read after the upgrade (and
> aren't compatible). It just resets the new cluster's nextMulti to the
> old + 1.
> Unfortunately that means that there'll be a offsets/0000 file created by
> initdb around. Sounds harmless enough, but that'll actually cause
> problems if the old cluster had a nextMulti that's bigger than that
> page.
>
> When vac_truncate_clog() calls TruncateMultiXact() that'll scan
> pg_multixact/offsets to find the earliest existing segment. That'll be
> 0000. If the to-be-truncated data is older than the last existing
> segment it returns. Then it'll try to determine the last required data
> in members/ by accessing the oldest data in offsets/.

I'm trying to understand the mechanism of this bug, and I'm not
succeeding.  If the offset/0000 was created by initdb, how come we try
to delete a file that's not also members/0000?  I mean, surely the file
as created by initdb is empty (zeroed).  In your sample error message
downthread,

ERROR: could not access status of transaction 2072053907
DETAIL: Could not open file "pg_multixact/offsets/7B81": No such file or directory.

what prompted the status of that multixid to be sought?  I see one
possible path to this error message, which is SlruPhysicalReadPage().
(There are other paths that lead to similar errors, but they use
"transaction 0" instead, so we can rule those out; and we can rule out
anything that uses MultiXactMemberCtl because of the path given in
DETAIL.)

There are four callsites that lead to that:

RecordNewMultiXact
GetMultiXactIdMembers (2x)
TrimMultiXact

Of those, only GetMultiXactIdMembers is likely to be called from vacuum
(actually RecordNewMultiXact can too, in a few cases, if it happens to
freeze a multi by creating another multi; should be pretty rare).
But you were talking about vacuum truncating pg_multixact -- and I don't
see how that's related to these functions.

Is it possible that you pasted the wrong error message?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_restore PostgreSQL 9.3.3 problems
Next
From: Andres Freund
Date:
Subject: Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts