Unlinking pg_internal.init, take 2 - Mailing list pgsql-hackers

From Tom Lane
Subject Unlinking pg_internal.init, take 2
Date
Msg-id 11789.1012852577@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
Last week I said:
> Hmm ... what that says is that unlinking pg_internal.init in
> setRelfilenode is the wrong place.  The right place is *after*
> committing your transaction and *before* sending shared cache inval
> messages.  You can't unlink before you commit, or someone may rebuild
> using the old information.  (A backend that's already logged into the
> PROC array when you send SI inval will find out about the changes via SI
> inval.  One that is not yet logged in must be prevented from reading the
> now-obsolete pg_internal.init file.  The startup sequence logs into PROC
> before trying to read pg_internal.init, so that part is done in the
> right order.)  So we need a flag that will cause the unlink to happen at
> the right time in post-commit cleanup.

This is correct as far as it goes, but it only prevents one possible
failure scenario, namely a new backend reading an obsolete
pg_internal.init file and then not seeing any SI update messages to
force it to update the obsolete relcache entries.

It occurs to me that there's a second failure scenario, which is that
pg_internal.init is already gone and there is someone busy constructing
a new init file (using now-obsolete information) at the time you commit
whatever you've been doing to the catalogs.  Your attempt to unlink
pg_internal.init will not help, since it doesn't exist anyway (the other
guy is writing a temp file name, not pg_internal.init).  After the other
guy finishes creating pg_internal.init, he'll see your SI updates and
update his own cache entries --- but now pg_internal.init is out there
with bad info, and subsequently-started backends will take it as gospel.

To make this all work reliably, I think we need the following mechanism:

1. A backend preparing a new pg_internal.init must follow this
procedure:a. Write the file using a temp file name, same as now.b. Grab an LWLock (a new lock that will be used only to
protect  pg_internal.init creation).c. Check to see if any SI messages have come in and not been   processed; if so,
giveup and delete the temp file.  The   next backend to start will have to try again.d. Rename the temp file into
place.e.Release the LWLock.
 

2. The committer of catalog changes that affect system relcache entries
must attempt to unlink pg_internal.init *both* before and after sending
his post-commit SI update messages.  He must grab the LWLock mentioned
above while doing the second of these unlinks.

The lock is needed to prevent the case where a committer sends SI
messages and then (fails to) unlink pg_internal.init between steps c
and d of someone else's pg_internal.init preparation process.

Comments?  Can anyone see any remaining race conditions?

BTW, rather than launching this in the present ad-hoc places, I'm
inclined to set things up so that inval.c sets the flag for post-commit 
pg_internal.init unlink anytime it decides to queue a SI relcache inval
message for a system relation.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: v7.2 rolled last night ...
Next
From: Oleg Bartunov
Date:
Subject: contrib/tsearch in postgresql 7.2 (fwd)