Re: Re: xReader, double-effort (was: Temporary tables under hot standby) - Mailing list pgsql-hackers
From | Hannu Krosing |
---|---|
Subject | Re: Re: xReader, double-effort (was: Temporary tables under hot standby) |
Date | |
Msg-id | 1335736814.3919.92.camel@hvost Whole thread Raw |
In response to | Re: Re: xReader, double-effort (was: Temporary tables under hot standby) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Re: xReader, double-effort (was: Temporary tables under
hot standby)
|
List | pgsql-hackers |
On Sun, 2012-04-29 at 16:33 -0400, Robert Haas wrote: > On Sat, Apr 28, 2012 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon@2ndQuadrant.com> writes: > >> Translating WAL is a very hard task. > > > > No kidding. I would think it's impossible on its face. Just for > > starters, where will you get table and column names from? (Looking at > > the system catalogs is cheating, and will not work reliably anyway.) > > > > IMO, if we want non-physical replication, we're going to need to build > > it in at a higher level than after-the-fact processing of WAL. > > I foresee wasting quite a lot of effort on the currently proposed > > approaches before we admit that they're unworkable. > > I think the question we should be asking ourselves is not whether WAL > as it currently exists is adequate for logical replication, but rather > or not it could be made adequate. Agreed. > For example, suppose that we were > to arrange things so that, after each checkpoint, the first insert, > update, or delete record for a given relfilenode after each checkpoint > emits a special WAL record that contains the relation name, schema > OID, attribute names, and attribute type OIDs. Not just the first after checkpoint, but also the first after a schema change, even though will duplicate the wals with changes to system catalog, it is likely much cheaper overall to always have a fresh structure in wal stream. And if we really want to do WAL-->logical-->SQL_text conversion on a host separate from the master, we also need to insert there the type definitions of user-defined types together with at least types output functions in some form . So you basically need a large part of postgres for reliably making sense of WAL. > Well, now we are much > closer to being able to do some meaningful decoding of the tuple data, > and it really doesn't cost us that much. Handling DDL (and manual > system catalog modifications) seems pretty tricky, but I'd be very > reluctant to give up on it without banging my head against the wall > pretty hard. Most straightforward way is to have a more or less full copy of pg_catalog also on the "WAL-filtering / WAL-conversion" node, and to use it in 1:1 replicas of transactions recreated from the WAL . This way we can avoid recreating any alternate views of the masters schema. Then again, we could do it all on master and inside the wal-writing transaction and thus avoid large chunk of the problems. If the receiving side is also PostgreSQL with same catalog structure (i.e same major version) then we don't actually need to "handle DDL" in any complicated way, it would be enough to just carry over the changes to system tables . The main reason we don't do it currently for trigger-based logical replication is the restriction of not being able to have triggers on system tables. I hope it is much easier to have the triggerless record generation also work on system tables. > The trouble with giving up on WAL completely and moving > to a separate replication log is that it means a whole lot of > additional I/O, which is bound to have a negative effect on > performance. Why would you give up WAL ? Or do you mean that the new "logical-wal" needs to have same commit time behaviour as WAL to be reliable ? I'd envision a scenario where the logi-wal is sent to slave or distribution hub directly and not written at the local host at all. An optionally sync mode similar to current sync WAL replication could be configured. I hope this would run mostly in parallel with local WAL generation so not much extra wall-clock time would be wasted. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
pgsql-hackers by date: