Re: Could not open file pg_multixact/offsets/ ERROR on 9.3.4 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Could not open file pg_multixact/offsets/ ERROR on 9.3.4
Date
Msg-id 20140604151045.GD10482@alap3.anarazel.de
Whole thread Raw
In response to Could not open file pg_multixact/offsets/ ERROR on 9.3.4  (Alexey Klyukin <alexk@hintbits.com>)
Responses Re: Could not open file pg_multixact/offsets/ ERROR on 9.3.4  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: Could not open file pg_multixact/offsets/ ERROR on 9.3.4  (Alexey Klyukin <alexk@hintbits.com>)
List pgsql-hackers
Hi,

On 2014-06-04 16:59:10 +0200, Alexey Klyukin wrote:
> I've recently discovered a peculiar problem on one of our big databases
> (more than 1TB). The database has been upgraded from 9.2 to 9.3.4 (using
> hardlinks to speedup the process)  on April 7th around 22:00 local time.
> When doing  vacuum on any table, the system fails with the following error:
> 
> ERROR:  could not access status of transaction 2035987419
> DETAIL:  Could not open file "pg_multixact/offsets/795A": No such file or
> directory.
> 
> The erorr doesn't depend on the table being vacuumed, or even database, i.e:
> 
> postgres=# create database x;
> CREATE DATABASE
> postgres=# \c x
> You are now connected to database "x" as user "postgres".
> x=# create table test();
> CREATE TABLE
> x=# vacuum test;
> ERROR:  could not access status of transaction 2035987419
> DETAIL:  Could not open file "pg_multixact/offsets/795A": No such file or
> directory.
> 
> The content of pg_multixact/offsets is:
>
> pg_multixact$ ls -lR
> ./members:
> -rw------- 1 postgres postgres 8192 Apr 16 18:20 0000
> ./offsets:
> -rw------- 1 postgres postgres   8192 Apr  7 22:51 0000
> -rw------- 1 postgres postgres 262144 Apr 16 18:20 79A6
> 
> the select mutlixact from pg_database gives me:

> and the 2035987419 = 0x795AB3DB belongs to 795A segment.
> The 0000 file just contains a single page of all zeroes. Neither the 9.3.4
> replica of this database, nor the original 9.2 cluster data directory
> contain this file.

Looks like you're hitting the issue described in
http://archives.postgresql.org/message-id/20140530121631.GE25431%40alap3.anarazel.de

> I'm tempted to just remove the 0000 file from master and restart the
> database, since it's effectively impossible to run vacuum now, but I'd like
> to understand what's happening first. Below is the result of pg_filedump
> for the master:

Yes, that's fine in this specific case.

Note that the 0000 segment isn't yused by anything between the oldest
and newest multixact:

> Latest checkpoint's NextMultiXactId:  2040987419
> Latest checkpoint's NextMultiOffset:  3
> Latest checkpoint's oldestXID:        1038291920
> Latest checkpoint's oldestXID's DB:   16415
> Latest checkpoint's oldestActiveXID:  1655189767
> Latest checkpoint's oldestMultiXid:   2040987417
> Latest checkpoint's oldestMulti's DB: 0

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_control is missing a field for LOBLKSIZE
Next
From: Magnus Hagander
Date:
Subject: Re: pg_basebackup failed to back up large file