Re: Explanation for intermittent buildfarm pg_upgradecheck failures - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Explanation for intermittent buildfarm pg_upgradecheck failures
Date
Msg-id 3633.1438533919@sss.pgh.pa.us
Whole thread Raw
In response to Explanation for intermittent buildfarm pg_upgradecheck failures  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Explanation for intermittent buildfarm pg_upgradecheck failures
List pgsql-hackers
I wrote:
> unlink("/tmp/.s.PGSQL.5432")            = 0
> unlink("postmaster.pid")                = 0
> unlink("/tmp/.s.PGSQL.5432.lock")       = 0
> exit_group(0)                           = ?
> +++ exited with 0 +++

> I haven't looked to find out why the unlinks happen in this order, but on
> a heavily loaded machine, it's certainly possible that the process would
> lose the CPU after unlink("postmaster.pid"), and then a new postmaster
> could get far enough to see the socket lock file still there.  So that
> would account for low-probability failures in the pg_upgradecheck test,
> which is exactly what we've been seeing.

Further experimentation says that 9.0-9.2 do this in the expected order;
so somebody broke it during 9.3.

The lack of a close() on the postmaster socket goes all the way back
though.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: No more libedit?! - openssl plans to switch to APL2
Next
From: Peter Eisentraut
Date:
Subject: Re: MultiXact member wraparound protections are now enabled