Re: Proposal to add a QNX 6.5 port to PostgreSQL - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Proposal to add a QNX 6.5 port to PostgreSQL
Date
Msg-id 20140818135946.GB461982@tornado.leadboat.com
Whole thread Raw
In response to Re: Proposal to add a QNX 6.5 port to PostgreSQL  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Aug 18, 2014 at 09:01:20AM -0400, Robert Haas wrote:
> On Sat, Aug 16, 2014 at 3:28 AM, Noah Misch <noah@leadboat.com> wrote:
> >> I'd be afraid that a secondary mechanism that mostly-but-not-really
> >> works could do more harm by allowing us to miss bugs in the primary,
> >> pipe-based locking mechanism than the good it would accomplish.
> >
> > Users do corrupt their NFS- and GFS2-hosted databases today.  I would rather
> > have each process hold only an fcntl() lock than hold only the FIFO file
> > descriptor.  There's no such dichotomy, so let's have both.
> 
> Meh.  We can do that, but I think that will provide us with only the
> it-works-until-it-doesn't level of protection.  Granted, that's more
> than zero, but does anyone advocate wearing seatbelts for the first 60
> minutes you're in the car and then taking them off after that?  I
> think that with a sufficiently long-running server the chances of the
> lock somehow getting released approach certainty.  But I'm not going
> to fight this one tooth and nail.

In case it wasn't clear, I advocate both using the FIFO defense and holding
fcntl locks throughout the life of every PostgreSQL process having a shared
memory attachment.  I grant that this raises the chance of a shortcoming in
one mechanism remaining undiscovered.  However, we already know that each by
itself has limitations.  I don't like the prospect of accepting a known hole
to help discover unknown holes.

We could have the would-be new postmaster, when it hits a fcntl lock conflict,
proceed with the FIFO check anyway.  If the FIFO check says "go" after the
fcntl check said "stop", emit a message about the apparent bug.  (That's
oversimplified; it needs looping to account for the case of the old postmaster
exiting concurrently.)

> A bigger question in my view is what to do with the existing
> mechanism.  The main advantage of making a change like this is that we
> could finally dispense with System V shared memory completely.  But we
> risk encountering systems where the battle-tested System V mechanism
> works and this new one either fails to work at all (server won't
> start) or fails to work as desired (interlock broken).  So it's
> tempting to think we should have a GUC or control-file setting to
> control which mechanism gets used.  Of course for QNX, the actual
> subject of this thread, System V won't be an option, but other people
> might like a big red button they can push if the new code turns out to
> be less than we're hoping.

A GUC sounds fine to me, as would using the sysv interlock unconditionally for
a couple more releases before removing it.

Thanks,
nm



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: WAL format and API changes (9.5)
Next
From: Sawada Masahiko
Date:
Subject: After switching primary server while using replication slot.