9.3: more problems with "Could not open file "pg_multixact/members/xxxx" - Mailing list pgsql-hackers

From Jeff Janes
Subject 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"
Date
Msg-id CAMkU=1wX9eUumStJODnigW6kB==aNJv5jCUwybzRMNi=Qajs1w@mail.gmail.com
Whole thread Raw
Responses Re: 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: 9.3: more problems with "Could not open file "pg_multixact/members/xxxx"  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Fri, Jun 27, 2014 at 11:51 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Jeff Janes wrote:

> This problem was initially fairly easy to reproduce, but since I
> started adding instrumentation specifically to catch it, it has become
> devilishly hard to reproduce.
>
> I think my next step will be to also log each of the values which goes
> into the complex if (...) expression that decides on the deletion.

Could you please to reproduce it after updating to latest?  I pushed
fixes that should close these issues.  Maybe you want to remove the
instrumentation you added, to make failures more likely.

There are still some problems in 9.4, but I haven't been able to diagnose them and wanted to do more research on it.  The announcement of upcoming back-branches for 9.3 spurred me to try it there, and I have problems with 9.3 (12c5bbdcbaa292b2a4b09d298786) as well.  The move of truncation to the checkpoint seems to have made the problem easier to reproduce.  On an 8 core machine, this test fell over after about 20 minutes, which is much faster than it usually reproduces.

This the error I get:

2084 UPDATE 2014-07-15 15:26:20.608 PDT:ERROR:  could not access status of transaction 85837221
2084 UPDATE 2014-07-15 15:26:20.608 PDT:DETAIL:  Could not open file "pg_multixact/members/14031": No such file or directory.
2084 UPDATE 2014-07-15 15:26:20.608 PDT:CONTEXT:  SQL statement "SELECT 1 FROM ONLY "public"."foo_parent" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"

The testing harness is attached as 3 patches that must be made to the test server, and 2 scripts. The script do.sh sets up the database (using fixed paths, so be careful) and then invokes count.pl in a loop to do the actual work.


Cheers,

Jeff
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: returning SETOF RECORD
Next
From: Tom Lane
Date:
Subject: Re: Allowing join removals for more join types