Re: Re: xReader, double-effort (was: Temporary tables under hot standby) - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Re: xReader, double-effort (was: Temporary tables under hot standby) |
Date | |
Msg-id | CA+Tgmoa1HriwRfbuYxcHCARtTchqk_fW_9a4KfF_4UdmUv6_qw@mail.gmail.com Whole thread Raw |
In response to | Re: Re: xReader, double-effort (was: Temporary tables under hot standby) (Hannu Krosing <hannu@2ndQuadrant.com>) |
Responses |
Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
|
List | pgsql-hackers |
On Sun, Apr 29, 2012 at 6:00 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> I think the question we should be asking ourselves is not whether WAL >> as it currently exists is adequate for logical replication, but rather >> or not it could be made adequate. > > Agreed. And of course I meant "but rather whether or not it could be made adequate", but I dropped a word. >> For example, suppose that we were >> to arrange things so that, after each checkpoint, the first insert, >> update, or delete record for a given relfilenode after each checkpoint >> emits a special WAL record that contains the relation name, schema >> OID, attribute names, and attribute type OIDs. > > Not just the first after checkpoint, but also the first after a schema > change, even though will duplicate the wals with changes to system > catalog, it is likely much cheaper overall to always have a fresh > structure in wal stream. Yes. > And if we really want to do WAL-->logical-->SQL_text conversion on a > host separate from the master, we also need to insert there the type > definitions of user-defined types together with at least types output > functions in some form . Yes. > So you basically need a large part of postgres for reliably making sense > of WAL. Agreed, but I think that's a problem we need to fix and not a tolerable situation at all. If a user can create a type-output function that goes and looks at the state of the database to determine what to output, then we are completely screwed, because that basically means you would need to have a whole Hot Standby instance up and running just to make it possible to run type output functions. Now you might be able to build a mechanism around that that is useful to some people in some situations, but wow does that sound painful. What I want is for the master to be able to cheaply rattle off the tuples that got inserted, updated, or deleted as those things happen; needing a whole second copy of the database just to do that does not meet my definition of "cheap". Furthermore, it's not really clear that it's sufficient anyway, since there are problems with what happens before the HS instance reaches consistency, what happens when it crashes and restarts, and how do we handle the case when the system catalog we need to examine to generate the logical replication records is access-exclusive-locked? Seems like a house of cards. Some of this might be possible to mitigate contractually, by putting limits on what type input/output functions are allowed to do. Or we could invent a new analog of type input/output functions that is explicitly limited in this way, and support only types that provide it. But I think the real key is that we can't rely on catalog access: the WAL stream has to have enough information to allow the reader to construct some set of in-memory hash tables with sufficient detail to reliably decode WAL. Or at least that's what I'm thinking. > Most straightforward way is to have a more or less full copy of > pg_catalog also on the "WAL-filtering / WAL-conversion" node, and to use > it in 1:1 replicas of transactions recreated from the WAL . > This way we can avoid recreating any alternate views of the masters > schema. See above; I have serious doubts that this can ever be made to work robustly. > Then again, we could do it all on master and inside the wal-writing > transaction and thus avoid large chunk of the problems. > > If the receiving side is also PostgreSQL with same catalog structure > (i.e same major version) then we don't actually need to "handle DDL" in > any complicated way, it would be enough to just carry over the changes > to system tables . I agree it'd be preferable to handle DDL in terms of system catalog updates, rather than saying, well, this is an ALTER TABLE .. RENAME. But you need to be able to decode tuples using the right tuple descriptor, even while that's changing under you. > Why would you give up WAL ? For lack of ability to make it work. Don't underestimate how hard it's going to nail this down. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: