Re: rmgr hooks and contrib/rmgr_hook - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: rmgr hooks and contrib/rmgr_hook
Date
Msg-id 1221555881.3913.1761.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: rmgr hooks and contrib/rmgr_hook  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Mon, 2008-09-15 at 16:00 +0100, Simon Riggs wrote:
> On Mon, 2008-09-15 at 10:04 -0400, Tom Lane wrote:
> > Simon Riggs <simon@2ndQuadrant.com> writes:
> 
> > > The version mismatch idea presumes that a code author would
> structure
> > > their code in two pieces: one to generate the WAL and one to read
> it.
> > 
> > No, the version mismatch problem is that you might try to read the
> WAL
> > with a different version of the plugin than you wrote it with.  Or
> maybe
> > with a completely unrelated plugin that was unfortunate enough to
> choose
> > the same rmgr ID.  We can't afford to insert complete versioning
> > information into each WAL record, so it's going to be pretty
> difficult
> > to avoid this risk.

There are a few cases we might be concerned about:

* rmgr plugin using duplicate id
* rmgr plugin using wrong id
* missing rmgr plugin
* version mismatch on WAL format of Rmgr record
* rmgr plugin wrong version to server version

We can handle those issues like this

* rmgr plugin using duplicate id

Registration scheme avoids this somewhat. Plugin setup disallows
possibility of multiple plugins with same id. Ranges of values have been
assigned so we advise people which values to use. This is same issue as
IANA for well known port numbers.

* rmgr plugin using wrong id

We can't actually force code to make calls using the right RmgrId. So
the best we can do is tell the plugin what number it has been assigned
to and have it throw an error if it doesn't expect that number and can't
dynamically change its RmgrId. The latter is just too complex too expect
people to do, so hardcoding the RmgrId is the expected way. That,
combined with the registration scheme should be sufficient. We aren't
expecting more than a 10-20 of these anyway. Again, similar to TCP/IP -
not all software knows how to operate on different port numbers.

*  missing rmgr plugin

It seems sufficient to make the test for !RmgrIdIsValid return FATAL.
This then allows people to correct mistakes and retry. By the time this
test happens the WAL record has been CRC checked and has a valid header
in all other ways. We can document how to put in a dummy handler if the
rmgr plugin has bugs that need to be worked around to continue recovery.

* version mismatch on WAL format of Rmgr record

Immediately following the "shutdown" checkpoint at startup we should
write out a version string for all rmgr plugins as a WAL record. On
replay xlog_redo() will read the WAL record and test it against the
version information presented by the current plugin. If there is a
mismatch, we throw a FATAL error. This stops replay exactly on a
checkpoint record, so if you change rmgr to next version and restart
then no problems ensue. We allow new rmgr plugins that were not there
previously. These changes allow you to rollforward past a change in rmgr
WAL format. This then allows a standby server to be upgraded immediately
following a master server when using log shipping replication even when
the rmgr plugin changes its WAL format.

We don't go to that trouble for datatype versions, perhaps we should.
Overall, version management of rmgr plugins is same difficulty and can
result in errors of same severity as existing PostgresSQL extensions. So
there seems to be no additional risk from allowing them, in general.

We never expect to handle WAL changes across major versions and we will
increase the version numbers of all standard rmgrs at major releases, by
keying them to version numbers/strings already in use. WAL format
changes will be strongly discouraged for any single release.

* rmgr plugin wrong version to server version

Plugins are expected to test whether they are operating at the correct
version for the server. This information is already available if the
plugin wishes to act correctly.

So to do all of the above we need 3 changes:
* !RmgrIdIsValid throws FATAL not ERROR in ReadRecord()
* Alter rmgr API call to send RmgrId to plugin so it can throw error or
handle it somehow.In most cases this will lead to an plugin error "must
be defined on RmgrId 125" or similar.
* New rmgr API call to getRmgrVersion() so we can test it at startup
shutdown checkpoints.

It will take some time to make and test these changes, so this patch
should be marked returned with feedback from this commitfest.

If there are some more error modes, please say. I can't see any others.

This patch is about flexibility for the future. I have no pressing need
for this to go into 8.4, I just think its a good thing if it does. If
people think it poses a genuine risk or that there is insufficient time
to properly assess risks, then we can withdraw it.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: "Ibrar Ahmed"
Date:
Subject: [Review] fix dblink security hole
Next
From: Heikki Linnakangas
Date:
Subject: Re: NDirectFileRead and Write