Re: Hot Standby, release candidate? - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hot Standby, release candidate?
Date
Msg-id 1260742926.1955.195.camel@ebony
Whole thread Raw
In response to Re: Hot Standby, release candidate?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Hot Standby, release candidate?  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Sun, 2009-12-13 at 15:45 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > * NonTransactionalInvalidation logging has been removed following
> > review, but AFAICS that means VACUUM FULL doesn't work correctly on
> > catalog tables, which regrettably will be the only ones still standing
> > even after we apply VFI patch. Did I misunderstand the original intent?
> > Was it just buggy somehow? Or is this hoping VF goes completely, which
> > seems unlikely in this release.
> 
> For my money, the only reason VF is still around is there hasn't been
> an urgent reason to get rid of it.  If it doesn't play with HS, I think
> we'd be better served to put work into getting rid of it than to put
> work into fixing it.

I see the logic, though it has many implications. I'll step up, if I can
get some help from you and Itagaki on the VF side.

You have a rough design here
http://archives.postgresql.org/message-id/19750.1252094460@sss.pgh.pa.us

Some thoughts and some further work on a detailed design

* Which exact tables are we talking about: just pg_class and the shared
catalogs? Everything else is in pg_class, so if we can find it we're OK?
formrdesc() tells me the list of nailed relations is: pg_database,
pg_class, pg_attribute, pg_proc, and pg_type. Are the nailed relations
the ones we care about, or are they just a subset? 

* Restrict set of operations to *only* VACUUM FULL. Is there a need for
anything else to do this, at least in this release?

* Each backend needs to access two map files: shared and local

* Get relcache to read map files at startup in formrdesc(). Rather than
use RelationInitPhysicalAddr() set relation->rd_node.relNode directly

* Get VF to write a new type of invalidation message that means re-read
the two map files to overwrite the relation->rd_node.relNode in the
nailed relations

* Map files would have a very structured format, so each table listed
has its exact place. Sounds like best place for shared catalogs is
pg_control. We only need a few additional bytes for that and everything
else to manipulate it already exists.

* Map files for specific databases would be called pg_database_control,
with roughly same concepts as pg_control. It's then an obvious place to
add any further db specific things in future, if we need them.

* Protect all map files reading/writing using ControlFileLock. Sequence
of update is acquire lock, send invalidation, rewrite file, release lock
all inside a critical section. Readers would take shared, writers
exclusive.

* Work would be in two tranches: add new way of working then later
remove code we don't need; I would actually rather do the second part at
start of next dev cycle.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: James Pye
Date:
Subject: Re: plpython3
Next
From: Takahiro Itagaki
Date:
Subject: Re: Largeobject Access Controls and pg_migrator