Thread: Re: [COMMITTERS] pgsql/doc/TODO.detail (alpha default distinct flock fsync function limit null pg_shadow primary)

Tom Lane writes:

> but if one or both postmasters is started without -i then there's got
> to be some interlock on the Unix socket file.
> 
> I don't much like depending on flock for that, since it isn't available
> everywhere.  The only portable answer is to build a pid-containing
> interlock file for each socket file, as discussed in the TODO item.

But the flock code isn't used because the configure test for it is broken,
and has been broken ever since it was introduced AFAICT. It seems that we
have been relying on the mere existence of the socket file.


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> but if one or both postmasters is started without -i then there's got
>> to be some interlock on the Unix socket file.
>> 
>> I don't much like depending on flock for that, since it isn't available
>> everywhere.  The only portable answer is to build a pid-containing
>> interlock file for each socket file, as discussed in the TODO item.

> But the flock code isn't used because the configure test for it is broken,
> and has been broken ever since it was introduced AFAICT. It seems that we
> have been relying on the mere existence of the socket file.

Oooh, no kidding?  That explains why we're still hearing complaints
about the postmaster failing to start up when there's a socket file
left over from a previous run: the code that's supposed to delete an
old socket file is part of the #ifdef HAVE_FCNTL_SETLK path.

(Tries it out ... sure enough, it's broken ...)

The flock is not really needed to protect the port number; it's there
to prevent a second postmaster from deleting the socket file that
belongs to a still-active old postmaster.  But if you have no delete
logic at all, you can't cope with a leftover socket file.

The flock code *did* work at one time; I recall testing it.  Evidently
someone broke the configure test for it later on.

I think the shortest path to a solution is to fix the configure test,
unless you have the ambition to tackle setting up a set of lock files
for port numbers --- which'd require resolving such thorny questions
as where to keep the lock files.  (/tmp is right out, IMHO.)
        regards, tom lane