Re: Re: xReader, double-effort (was: Temporary tables under hot standby) - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
Date
Msg-id 1335736814.3919.92.camel@hvost
Whole thread Raw
In response to Re: Re: xReader, double-effort (was: Temporary tables under hot standby)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
List pgsql-hackers
On Sun, 2012-04-29 at 16:33 -0400, Robert Haas wrote:
> On Sat, Apr 28, 2012 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Simon Riggs <simon@2ndQuadrant.com> writes:
> >> Translating WAL is a very hard task.
> >
> > No kidding.  I would think it's impossible on its face.  Just for
> > starters, where will you get table and column names from?  (Looking at
> > the system catalogs is cheating, and will not work reliably anyway.)
> >
> > IMO, if we want non-physical replication, we're going to need to build
> > it in at a higher level than after-the-fact processing of WAL.
> > I foresee wasting quite a lot of effort on the currently proposed
> > approaches before we admit that they're unworkable.
> 
> I think the question we should be asking ourselves is not whether WAL
> as it currently exists is adequate for logical replication, but rather
> or not it could be made adequate.  

Agreed. 

> For example, suppose that we were
> to arrange things so that, after each checkpoint, the first insert,
> update, or delete record for a given relfilenode after each checkpoint
> emits a special WAL record that contains the relation name, schema
> OID, attribute names, and attribute type OIDs.  

Not just the first after checkpoint, but also the first after a schema
change, even though will duplicate the wals with changes to system
catalog, it is likely much cheaper overall to always have a fresh
structure in wal stream.

And if we really want to do WAL-->logical-->SQL_text conversion on a
host separate from the master, we also need to insert there the type
definitions of user-defined types together with at least types output
functions in some form .

So you basically need a large part of postgres for reliably making sense
of WAL.

> Well, now we are much
> closer to being able to do some meaningful decoding of the tuple data,
> and it really doesn't cost us that much.  Handling DDL (and manual
> system catalog modifications) seems pretty tricky, but I'd be very
> reluctant to give up on it without banging my head against the wall
> pretty hard. 

Most straightforward way is to have a more or less full copy of
pg_catalog also on the "WAL-filtering / WAL-conversion" node, and to use
it in 1:1 replicas of transactions recreated from the WAL .
This way we can avoid recreating any alternate views of the masters
schema.

Then again, we could do it all on master and inside the wal-writing
transaction and thus avoid large chunk of the problems.

If the receiving side is also PostgreSQL with same catalog structure
(i.e same major version) then we don't actually need to "handle DDL" in
any complicated way, it would be enough to just carry over the changes
to system tables .

The main reason we don't do it currently for trigger-based logical
replication is the restriction of not being able to have triggers on
system tables. 

I hope it is much easier to have the triggerless record generation also
work on system tables.

> The trouble with giving up on WAL completely and moving
> to a separate replication log is that it means a whole lot of
> additional I/O, which is bound to have a negative effect on
> performance.

Why would you give up WAL ?

Or do you mean that the new "logical-wal" needs to have same commit time
behaviour as WAL to be reliable ?

I'd envision a scenario where the logi-wal is sent to slave or
distribution hub directly and not written at the local host at all. 
An optionally sync mode similar to current sync WAL replication could be
configured. I hope this would run mostly in parallel with local WAL
generation so not much extra wall-clock time would be wasted.

> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 

-- 
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: smart shutdown at end of transaction (was: Default mode for shutdown)
Next
From: Alvaro Herrera
Date:
Subject: Re: Future In-Core Replication