Buildfarm owners: check if your HEAD build is stuck - Mailing list pgsql-hackers

From Tom Lane
Subject Buildfarm owners: check if your HEAD build is stuck
Date
Msg-id 27932.1155396586@sss.pgh.pa.us
Whole thread Raw
Responses Re: Buildfarm owners: check if your HEAD build is stuck
Re: Buildfarm owners: check if your HEAD build is stuck
List pgsql-hackers
A number of the buildfarm machines have been failing HEAD builds
at the "make check" stage since last night, with complaints like
this one from emu: 

================== pgsql.21911/src/test/regress/log/postmaster.log ===================
FATAL:  lock file "/tmp/.s.PGSQL.55678.lock" already exists
HINT:  Is another postmaster (PID 23692) using socket file "/tmp/.s.PGSQL.55678"?

What's happened is that that GUC patch that was in the tree for a few
hours broke postmaster startup on some machines (for as-yet-unidentified
reasons).  The postmaster does actually start and establish its
lockfiles, but it never gets to the stage of being able to accept
connections.

After the buildfarm script rm -rf's the build tree, the postmaster
process is still there but "disembodied" (its executable file is
probably gone, for example, or at least in the state of zero remaining
directory links).  But it's still got that socket file and lockfile
in /tmp, and this prevents another postmaster from starting with the
same port number.

If you've got this situation, you'll need to do a manual "kill" on the
PID mentioned in the lock file before things will start working again.
(pg_ctl won't work because it looks for the data directory
postmaster.pid file, which is long gone.)  More generally you might want
to look through a ps listing for unexpected postgres-owned processes.

I'm not sure whether there's anything much we can do to prevent such
problems in future.  Maybe it'd be reasonable for pg_regress to do a
kill -9 on its postmaster child process if it gives up waiting for the
postmaster to accept connections.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCHES] Forcing current WAL file to be archived
Next
From: "Francisco Figueiredo Jr."
Date:
Subject: SIg11 on suse linux