Re: Catalog/Metadata consistency during changeset extraction from wal - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Catalog/Metadata consistency during changeset extraction from wal
Date
Msg-id 201206251543.40142.andres@2ndquadrant.com
Whole thread Raw
In response to Re: Catalog/Metadata consistency during changeset extraction from wal  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Catalog/Metadata consistency during changeset extraction from wal
List pgsql-hackers
On Monday, June 25, 2012 03:08:51 AM Robert Haas wrote:
> On Sun, Jun 24, 2012 at 5:11 PM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> > There are some interesting problems related to locking and snapshots
> > here. Not sure if they are resolvable:
> > 
> > We need to restrict SnapshotNow to represent to the view it had back when
> > the wal record were currently decoding had. Otherwise we would possibly
> > get wrong column types and similar. As were working in the past locking
> > doesn't protect us against much here. I have that (mostly and
> > inefficiently).
> > 
> > One interesting problem are table rewrites (truncate, cluster, some ALTER
> > TABLE's) and dropping tables. Because we nudge SnapshotNow to the past
> > view it had back when the wal record was created we get the old
> > relfilenode. Which might have been dropped in part of the transaction
> > cleanup...
> > With most types thats not a problem. Even things like records and arrays
> > aren't problematic. More interesting cases include VACUUM FULL $systable
> > (e.g. pg_enum) and vacuum full'ing a table which is used in the *_out
> > function of a type (like a user level pg_enum implementation).
> > 
> > The only theoretical way I see against that problem would be to postpone
> > all relation unlinks untill everything that could possibly read them has
> > finished. Doesn't seem to alluring although it would be needed if we
> > ever move more things of SnapshotNow.
> > 
> > Input/Ideas/Opinions?
> 
> Yeah, this is slightly nasty.  I'm not sure whether or not there's a
> way to make it work.
Postponing all non-rollback unlinks to the next "logical checkpoint" is the 
only thing I can think of...

> I had another idea.  Suppose decoding happens directly on the primary,
> because I'm still hoping there's a way to swing that.  Suppose further
> that we handle DDL by insisting that (1) any backend which wants to
> add columns or change the types of existing columns must first wait
> for logical replication to catch up and (2) if a backend which has
> added columns or changed the types of existing columns then writes to
> the modified table, decoding of those writes will be postponed until
> transaction commit.  I think that's enough to guarantee that the
> decoding process can just use the catalogs as they stand, with plain
> old SnapshotNow.
I don't think its that easy. If you e.g. have multiple ALTER's in the same 
transaction interspersed with inserted rows they will all have different 
TupleDesc's.
I don't see how thats resolvable without either replicating ddl to the target 
system or changing what SnapshotNow does...

> The downside of this approach is that it makes certain kinds of DDL
> suck worse if logical replication is in use and behind.  But I don't
> necessarily see that as prohibitive because (1) logical replication
> being behind is likely to suck for a lot of other reasons too and (2)
> adding or retyping columns isn't a terribly frequent operation and
> people already expect a hit when they do it.  Also, I suspect that we
> could find ways to loosen those restrictions at least in common cases
> in some future version; meanwhile, less work now.
Agreed.

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: warning handling in Perl scripts
Next
From: Tom Lane
Date:
Subject: Re: [BUGS] pg_tablespace.spclocation column removed in 9.2