Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby - Mailing list pgsql-bugs

From Serge Negodyuck
Subject Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Date
Msg-id CABKyZDE9casPHmbTtbGiNAzmd68Muh3i=KTgLd9sOVpU+_v+PA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
2014-06-09 22:49 GMT+03:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>
> Pushed a fix for this.  Thanks for the report.
Thank you!

>
> > 2014-06-02 08:22:30 EEST FATAL: could not access status of transaction
> > 2080547
> > 2014-06-02 08:22:30 EEST DETAIL: Could not read from file
> > "pg_multixact/members/14078" at offset 24576: Success.
> > 2014-06-02 08:22:30 EEST CONTEXT: xlog redo create mxid 2080547 offset
> > 4294961608 nmembers 8684: 6193231 (keysh) 6193233 (fornokeyupd) 6193234
> > (keysh) 6193235 (fornokeyupd) 6193236 (keysh) 6193237 (fornokeyupd) 6193238
> > (keysh) 6193239 (fornokeyupd) 6193240 (keysh) 6193241 (fornokeyupd) 6193242
> > (keysh) 6193243 (fornokeyupd) 6193244 (keysh) 6193245 (fornokeyupd) 6193246
> > (keysh) 6193247 (fornokeyupd) 6193248 (keysh) 6193249 (fornokeyupd) 6193250
> > (keysh) 6193251 (fornokeyupd) 6193252 (keysh) 6193253 (fornokeyupd) 6193254
> > (keysh) 6193255 (fornokeyupd) 6193256 (keysh) 6193257 .......
>
> I find this bit rather odd.  Normally the system shouldn't create
> multixacts this large.  I think we might be missing a trick here somewhere.
> I imagine inserting the last few items is slow, isn't it?

Yes, the duration of inserts have been growing up to 2.2 seconds before crash:
2014-06-02 08:20:11 EEST 172.18.10.4 db LOG: duration: 2213.361 ms
statement: INSERT INTO product (...) VALUES (...) RETURNING product.id

Normally inserts fit in to 100ms (log_min_duration_statement)
The same log "xlog redo create mxid 2080547...." was present on master
and both replica servers. Well, this sounds logical.


>
> > An ugly hack "cp pg_multixact/members/14077 pg_multixact/members/14078"
> > helped me to start master server in replica.
> >
> >
> > Then, did pg_basebackup to slave database. It does not help
> > 2014-06-02 09:58:49 EEST 172.18.10.17 db2 DETAIL: Could not open file
> > "pg_multixact/members/1112D": No such file or directory.
> > 2014-06-02 09:58:49 EEST 172.18.10.18 db2 DETAIL: Could not open file
> > "pg_multixact/members/11130": No such file or directory.
> > 2014-06-02 09:58:51 EEST 172.18.10.34 db2 DETAIL: Could not open file
> > "pg_multixact/members/11145": No such file or directory.
> > 2014-06-02 09:58:51 EEST 172.18.10.38 db2 DETAIL: Could not open file
> > "pg_multixact/members/13F76": No such file or directory
>
> This is strange also; if the files are present in master, how come they
> weren't copied to the replica?  I think we need more info about this
> problem.
I've thoroughly looked through the logs once again and I have not
found anything interesting.
I just know there were very few pg_multixact/members files starting
from 0000. It was on both slave servers. So I've observed this issue
two times.

To fix it I had to do pg_dumpall | pg_restore on master.
So, I'm sorry, I have no additional info about this problem.

pgsql-bugs by date:

Previous
From: zsoros@gmail.com
Date:
Subject: BUG #10589: hungarian.stop file spelling error
Next
From: Geoff Speicher
Date:
Subject: Re: BUG #10587: ERROR: variable not found in subplan target list