Re: RFC: Add 'taint' field to pg_control. - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: RFC: Add 'taint' field to pg_control.
Date
Msg-id CAMsr+YEt76cziK5=58ss_6+eKapg_YXFm0oC3rMY4N6TEBR4dA@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Add 'taint' field to pg_control.  (Andres Freund <andres@anarazel.de>)
Responses Re: RFC: Add 'taint' field to pg_control.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 8 March 2018 at 10:18, Andres Freund <andres@anarazel.de> wrote:


On March 7, 2018 5:51:29 PM PST, Craig Ringer <craig@2ndquadrant.com> wrote:
>My favourite remains an organisation that kept "fixing" an issue by
>kill
>-9'ing the postmaster and removing postmaster.pid to make it start up
>again. Without killing all the leftover backends. Of course, the system
>kept getting more unstable and broken, so they did it more and more
>often.
>They were working on scripting it when they gave up and asked for help.

Maybe I'm missing something, but that ought to not work. The shmem segment that we keep around would be a conflict, no?


As I understand it, because we allow multiple Pg instances on a system, we identify the small sysv shmem segment we use by the postmaster's pid. If you remove the DirLockFile (postmaster.pid) you remove the interlock against starting a new postmaster. It'll think it's a new independent instance on the same host, make a new shmem segment and go merrily on its way mangling data horribly.

See CreateLockFile(). Also 7e2a18a9161 . In particular src/backend/utils/init/miscinit.c +938, 

if (isDDLock)
{
  ....
                if (PGSharedMemoryIsInUse(id1, id2))
                    ereport(FATAL,
                            (errcode(ERRCODE_LOCK_FILE_EXISTS),
                             errmsg("pre-existing shared memory block "
                                    "(key %lu, ID %lu) is still in use",
                                    id1, id2),
                             errhint("If you're sure there are no old "
                                     "server processes still running, remove "
                                     "the shared memory block "
                                     "or just delete the file \"%s\".",
                                     filename)));
   ....
}

I still think that error is a bit optimistic, and should really say "make very sure there are no 'postgres' processes associated with this data directory, then ...'


It'd be nice if the OS offered us some support here. Something like opening a lockfile in exclusive lock mode, then inheriting the FD and lock on all children, with each child inheriting the lock. So the exclusive lock wouldn't get released until all FDs associated with it are released. But AFAIK nothing like that is present, let alone portable.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Protect syscache from bloating with negative cache entries
Next
From: Tom Lane
Date:
Subject: Re: Protect syscache from bloating with negative cache entries