Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Date
Msg-id 20140604174659.GP5146@eldon.alvh.no-ip.org
Whole thread Raw
In response to Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby  (Serge Negodyuck <petr@petrovich.kiev.ua>)
Responses Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-bugs
Serge Negodyuck wrote:
> 2014-06-02 17:10 GMT+03:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
>
> > Serge Negodyuck wrote:
> > > Hello,
> > >
> > > I've upgraded postgresql to version 9.3.4 and did fresh initdb and
> > restored
> > > database from sql backup.
> > > According to 9.4.3 changelog issue with multixact wraparound was fixed.
> >
> > Ouch.  This is rather strange.  First I see the failing multixact has
> > 8684 members, which is totally unusual.  My guess is that you have code
> > that creates lots of subtransactions, and perhaps does something to one
> > tuple in a different subtransaction; doing sometihng like that would be,
> > I think, the only way to get subxacts that large.  Does that sound
> > right?
> >
> It sounds like you are right. I've found a lot of inserts in logs. Each
> insert cause trigger to be performed. This  trigger updates counter in
> other table.
> It is very possible this tirgger tries to update the same counter for
> different inserts.

I wasn't able to reproduce it that way, but I eventually figured out
that if I altered altered the plpython function to grab a FOR NO KEY
UPDATE lock first, insertion would grow the multixact beyond reasonable
limits; see the attachment.  If you then INSERT many tuples in "product"
in a single transaction, the resulting xmax is a Multixact that has as
many members as inserts there are, plus one.

(One variation that causes even more bizarre results is dispensing with
the plpy.subtransaction() in the function and instead setting a
savepoint before each insert.  In fact, given the multixact members
shown in your log snippet I think that's more similar to what you code
does.)

> > > Then, did pg_basebackup to slave database. It does not help
> > > 2014-06-02 09:58:49 EEST 172.18.10.17 db2 DETAIL: Could not open file
> > > "pg_multixact/members/1112D": No such file or directory.
> > > 2014-06-02 09:58:49 EEST 172.18.10.18 db2 DETAIL: Could not open file
> > > "pg_multixact/members/11130": No such file or directory.
> > > 2014-06-02 09:58:51 EEST 172.18.10.34 db2 DETAIL: Could not open file
> > > "pg_multixact/members/11145": No such file or directory.
> > > 2014-06-02 09:58:51 EEST 172.18.10.38 db2 DETAIL: Could not open file
> > > "pg_multixact/members/13F76": No such file or directory
> >
> > Are these the only files missing?  Are intermediate files there?
>
> Only 0000 - 001E files were present on slave server.

I don't understand how can files be missing in the replica.
pg_basebackup simply copies all files it can find in the master to the
replica, so if the 111xx files are present in the master they should
certainly be present in the replica as well.  I gave the pg_basebackup
code a look just to be sure there are no 4-char pattern matching or
something like that, and it doesn't look like it attempts to do that at
all.  I also asked Magnus just to be sure and he confirms this.

I'm playing a bit more with this test case, I'll let you know where it
leads.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #10527: TRAP when joining local table with view on tds_fdw foreign table
Next
From: "Gunnar \"Nick\" Bluth"
Date:
Subject: Re: BUG #10527: TRAP when joining local table with view on tds_fdw foreign table