Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby - Mailing list pgsql-bugs

From Serge Negodyuck
Subject Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Date
Msg-id CABKyZDEENX2X5HMqNtMB34zBAr2UZxrcs4SbXx169xDRKeZ4DA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-bugs
2013/12/10 Alvaro Herrera <alvherre@2ndquadrant.com>:
> Andres Freund wrote:
>
>> > > I think problems should be preventable if you issue a systemwide VACUUM
>> > > FREEZE, but please let others chime in before you execute it.
>> >
>> > I wouldn't freeze anything just yet, at least until the patch to fix
>> > multixact freezing is in.
>>
>> Well, it seems better than getting errors because of multixact members
>> that are gone.
>> Maybe PGOPTIONS='-c vacuum_freez_table_age=0 -c vacuum_freeze_min_age=1000000 vacuumdb -a'
>>  - that ought not to cause problems with current data and should freeze
>> enough to get rid of problematic multis?
>
> TBH I don't feel comfortable with predicting what will it freeze with
> the broken code.
>

You guys were right. After a week this issue occured again on almost
all slave servers.

slave:
2013-12-17 14:21:20 MSK CONTEXT: xlog redo delete: index
1663/16516/5320124; iblk 8764, heap 1663/16516/18816;
2013-12-17 14:21:20 MSK LOG: file "pg_clog/0370" doesn't exist,
reading as zeroes
2013-12-17 14:21:20 MSK FATAL: MultiXactId 1819308905 has not been
created yet -- apparent wraparound
2013-12-17 14:21:20 MSK CONTEXT: xlog redo delete: index
1663/16516/5320124; iblk 8764, heap 1663/16516/18816;
2013-12-17 14:21:20 MSK LOG: startup process (PID 13622) exited with exit code 1

I had to do fix something o master since all slaves were affected. So
the only idea was do perform VACUUM FREEZE on master.

I believe that was not a good idea. I suppose "vacuum freeze" leaded
to following errors on master:
2013-12-17 13:15:34 EET 172.18.10.44 ruprom ERROR: could not access
status of transaction 8407326
2013-12-17 13:15:34 EET 172.18.10.44 ruprom DETAIL: Could not open
file "pg_multixact/members/A458": No such file or directory.

The only way out was to perform full backup/restore, which did not
succeed with teh same error (could not access status of transaction
xxxxxxx)
A very ugly hack was to copy pg_multixact/members/0000 ->
pg_multixact/members/[ABCDF]xxx, it helped to do full backup, but not
sure about consistency of data.


My question is are there any quick-and-dirty solution to disable
pg_multixact deletion? I understand it may lead to waste of space.

pgsql-bugs by date:

Previous
From: bneumeier@gmail.com
Date:
Subject: BUG #8684: Tables with custom range domain type cannot be analyzed
Next
From: David Fleischhauer
Date:
Subject: Re: permission issues with PostgreSQL 9.2 EnterpriseDB one-click installer on windows 7 causes initcluster to fail