Thread: Re: xReader, double-effort (was: Temporary tables under hot standby)
<div class="gmail_extra">Sure Kevin, will get the wiki page ready asap, and reply back. Thanks.<br /><br /><div class="gmail_quote">OnThu, Apr 26, 2012 at 8:10 PM, Kevin Grittner <span dir="ltr"><<a href="mailto:Kevin.Grittner@wicourts.gov"target="_blank">Kevin.Grittner@wicourts.gov</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">[resending becauseof <a href="http://postgresql.org" target="_blank">postgresql.org</a> bounces on first try]<br /><br /> Simon Riggs<<a href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>> wrote:<br /> > Kevin Grittner <<a href="mailto:Kevin.Grittner@wicourts.gov">Kevin.Grittner@wicourts.gov</a>>wrote:<br /><br /> >> The GSoC xReaderproject is intended to be a major step toward<br /> >> that, by providing a way to translate the WAL streamto a series<br /> >> of notifications of logical events to clients which register with<br /> >> xReader.<br/> ><br /> > This is already nearly finished in prototype and will be published<br /> > in May. AndresFreund is working on it, copied here.<br /><br /> URL?<br /><br /> > It looks like there is significant overlapthere.<br /><br /> Hard for me to know without more information. It sounds like there<br /> is at least some overlap. I hope that can involve cooperation, with<br /> the efforts of Andres forming the basis of Aakash's GSoC effort.<br/> That might leave him more time to polish up the user filters.<br /><br /> Aakash: It seems like we need thatWiki page rather sooner than<br /> later. Can you get to that quickly? I would think that just<br /> copying the textfrom your approved GSoC proposal would be a very<br /> good start. If you need help figuring out how to embed the images<br/> from your proposal, let me know.<br /><span class="HOEnZb"><font color="#888888"><br /> -Kevin<br /></font></span></blockquote></div><br/></div>
All, the wiki page is now up at http://wiki.postgresql.org/wiki/XReader.
On Sat, Apr 28, 2012 at 1:19 AM, Aakash Goel <aakash.bits@gmail.com> wrote:
Sure Kevin, will get the wiki page ready asap, and reply back. Thanks.On Thu, Apr 26, 2012 at 8:10 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:[resending because of postgresql.org bounces on first try]
Simon Riggs <simon@2ndquadrant.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>> The GSoC xReader project is intended to be a major step toward
>> that, by providing a way to translate the WAL stream to a series
>> of notifications of logical events to clients which register with
>> xReader.
>
> This is already nearly finished in prototype and will be published
> in May. Andres Freund is working on it, copied here.
URL?
> It looks like there is significant overlap there.
Hard for me to know without more information. It sounds like there
is at least some overlap. I hope that can involve cooperation, with
the efforts of Andres forming the basis of Aakash's GSoC effort.
That might leave him more time to polish up the user filters.
Aakash: It seems like we need that Wiki page rather sooner than
later. Can you get to that quickly? I would think that just
copying the text from your approved GSoC proposal would be a very
good start. If you need help figuring out how to embed the images
from your proposal, let me know.
-Kevin
[replaced bad email address for Josh (which was my fault)] Aakash Goel <aakash.bits@gmail.com> wrote: > All, the wiki page is now up at > http://wiki.postgresql.org/wiki/XReader. Note that the approach Aakash is taking doesn't involve changes to the backend code, it is strictly a standalone executable to which functions as a proxy to a hot standby and to which clients like replications systems connect. There is a possible additional configuration which wouldn't require a hot standby, if time permits. I am not clear on whether 2nd Quadrant's code takes this approach or builds it into the server. I think we need to know that much before we can get very far in discussion. -Kevin
On Friday, April 27, 2012 11:04:04 PM Kevin Grittner wrote: > [replaced bad email address for Josh (which was my fault)] > > Aakash Goel <aakash.bits@gmail.com> wrote: > > All, the wiki page is now up at > > > > http://wiki.postgresql.org/wiki/XReader. > > Note that the approach Aakash is taking doesn't involve changes to > the backend code, it is strictly a standalone executable to which > functions as a proxy to a hot standby and to which clients like > replications systems connect. There is a possible additional > configuration which wouldn't require a hot standby, if time permits. > I am not clear on whether 2nd Quadrant's code takes this approach > or builds it into the server. I think we need to know that much > before we can get very far in discussion. In the current, prototypal, state there is one component thats integrated into the server (because it needs information thats only available there). That component is layered ontop of a totally generic xlog reading/parsing library that doesn't care at all where its running. Its also used in another cluster to read the received (filtered) stream. I plan to submit the XLogReader (thats what its called atm) before everything else, so everybody can take a look as soon as possible. I took a *very* short glance over the current wiki description of xReader and from that it seems to me it would benefit from trying to make it architecturally more similar to the rest of pg. I also would suggest reviewing how the current walreceiver/sender, and their protocol, work. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
From
"Kevin Grittner"
Date:
Andres Freund <andres@2ndquadrant.com> wrote: > In the current, prototypal, state there is one component thats > integrated into the server (because it needs information thats > only available there). The xReader design was based on the idea that it would be nice not to cause load on the master machine, and that by proxying the WAL stream to the HS, using synchronous replication style to write from xReader to the HS, you could use the HS for a source for that data with it being at exactly the right point in time to query it. I'm not convinced that I would rather see the logic fixed inside the master as opposed to being deployable on the master's machine, the slave machine, or even on its own machine in between. > That component is layered ontop of a totally generic xlog > reading/parsing library that doesn't care at all where its > running. That's cool. > Its also used in another cluster to read the received (filtered) > stream. I don't quite follow what you're saying there. > I plan to submit the XLogReader (thats what its called atm) > before everything else, so everybody can take a look as soon as > possible. Great! That will allow more discussion and planning. > I took a *very* short glance over the current wiki description of > xReader and from that it seems to me it would benefit from trying > to make it architecturally more similar to the rest of pg. We're planning on using existing protocol to talk between pieces. Other than breaking it out so that it can run somewhere other than inside the server, and allowing clients to connect to xReader to listen to WAL events of interest, are you referring to anything else? > I also would suggest reviewing how the current walreceiver/sender, > and their protocol, work. Of course! The first "inch-stone" in the GSoC project plan basically consists of creating an executable that functions as a walreceiver and a walsender to just pass things through from the master to the slave. We build from there by allowing clients to connect (again, over existing protocol) and register for events of interest, and then recognizing different WAL records to generate events. The project was just going to create a simple client to dump the information to disk, but with the time saved by adopting what you've already done, that might leave more time for generating a useful client. Aakash, when you get a chance, could you fill in the "inch-stones" from the GSoC proposal page onto the Wiki page? I think the descriptions of those interim steps would help people understand your proposal better. Obviously, some of the particulars of tasks and the dates may need adjustment based on the new work which is expected to appear before you start, but what's there now would be a helpful reference. -Kevin
Hi Kevin, Hi Aakash, On Saturday, April 28, 2012 12:18:38 AM Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > In the current, prototypal, state there is one component thats > > integrated into the server (because it needs information thats > > only available there). > The xReader design was based on the idea that it would be nice not > to cause load on the master machine, and that by proxying the WAL > stream to the HS, using synchronous replication style to write from > xReader to the HS, you could use the HS for a source for that data > with it being at exactly the right point in time to query it. Yes, that does make sense for some workloads. I don't think its viable for everything though, thats why were not aiming for that ourselves atm. > I'm not convinced that I would rather see the logic fixed inside the > master as opposed to being deployable on the master's machine, the > slave machine, or even on its own machine in between. I don't think that you can do everything apart from the master. We currently need shared memory for coordination between the moving parts, thats why we have it inside the master. It also have the advantage of being easier to setup. > > That component is layered ontop of a totally generic xlog > > reading/parsing library that doesn't care at all where its > > running. > That's cool. > > Its also used in another cluster to read the received (filtered) > > stream. > I don't quite follow what you're saying there. To interpret the xlog back into something that can be used for replication you need to read it again. After filtering we again write valid WAL, so we can use the same library on the sending|filtering side and on the receiving side. But thats actually off topic for this thread ;) > > I took a *very* short glance over the current wiki description of > > xReader and from that it seems to me it would benefit from trying > > to make it architecturally more similar to the rest of pg. > We're planning on using existing protocol to talk between pieces. > Other than breaking it out so that it can run somewhere other than > inside the server, and allowing clients to connect to xReader to > listen to WAL events of interest, are you referring to anything > else? It sounds like the xReader is designed to be one multiplexing process. While this definitely has some advantages resource-usage-wise it doesn't seem to be fitting the rest of the design that well. The advantages might outweigh everything else, but I am not sure about that. Something like registering/deregistering also doesn't fit that well with the way walsender works as far as I understand it. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: xReader, double-effort (was: Temporary tables under hot standby)
From
"Kevin Grittner"
Date:
Andres Freund <andres@2ndquadrant.com> wrote: > Something like registering/deregistering also doesn't fit that > well with the way walsender works as far as I understand it. If you look at the diagrams on the xReader Wiki page, the lines labeled "XLOG stream" are the ones using walsender/walreceiver. The green arrows represent normal connections to the database, to run queries to retrieve metadata needed to interpret the WAL records, and the lines labeled "Listener n" are expected to use the pg protocol to connect, but won't be talking page-oriented WAL -- they will be dealing with logical interpretation of the WAL. The sort of data which could be fed to a database which doesn't have the same page images. Like Slony et al do. Perhaps, given other points you made, the library for interpreting the WAL records could be shared, and hopefully a protocol for the clients, although that seems a lot more muddy to me at this point. If we can share enough code, there may be room for both approaches with minimal code duplication. -Kevin
On Fri, Apr 27, 2012 at 11:18 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > I'm not convinced that I would rather see the logic fixed inside the > master as opposed to being deployable on the master's machine, the > slave machine, or even on its own machine in between. There are use cases where the translation from WAL to logical takes place on the master, the standby or other locations. It's becoming clear that filtering records on the source is important in high bandwidth systems, so the initial work focuses on putting that on the "master", i.e. the source. Which was not my first thought either. If you use cascading, this would still allow you to have master -> standby -> logical. Translating WAL is a very hard task. Some time ago, I did also think an external tool would help (my initial design was called xfilter), but I no longer think that is likely to work very well apart from very simple cases. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
<div class="gmail_extra"><span style="style">> Aakash, when you get a chance, could you fill in the "inch-stones"</span><brstyle="style" /><span style="style">> from the GSoC proposal page onto the Wiki page?</span></div><divclass="gmail_extra"><font color="#222222" face="arial, sans-serif"><br /></font></div><div class="gmail_extra">Sure, <a href="http://wiki.postgresql.org/wiki/XReader">http://wiki.postgresql.org/wiki/XReader</a> updated.<fontcolor="#222222" face="arial,sans-serif"><br /></font><br /><div class="gmail_quote">On Sat, Apr 28, 2012 at 3:48 AM, Kevin Grittner <spandir="ltr"><<a href="mailto:Kevin.Grittner@wicourts.gov" target="_blank">Kevin.Grittner@wicourts.gov</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">Andres Freund <<a href="mailto:andres@2ndquadrant.com">andres@2ndquadrant.com</a>>wrote:<br /><br /> > In the current, prototypal, statethere is one component thats<br /> > integrated into the server (because it needs information thats<br /> > onlyavailable there).<br /><br /></div>The xReader design was based on the idea that it would be nice not<br /> to causeload on the master machine, and that by proxying the WAL<br /> stream to the HS, using synchronous replication styleto write from<br /> xReader to the HS, you could use the HS for a source for that data<br /> with it being at exactlythe right point in time to query it.<br /><br /> I'm not convinced that I would rather see the logic fixed insidethe<br /> master as opposed to being deployable on the master's machine, the<br /> slave machine, or even on its ownmachine in between.<br /><div class="im"><br /> > That component is layered ontop of a totally generic xlog<br /> >reading/parsing library that doesn't care at all where its<br /> > running.<br /><br /></div>That's cool.<br /><divclass="im"><br /> > Its also used in another cluster to read the received (filtered)<br /> > stream.<br /><br/></div>I don't quite follow what you're saying there.<br /><div class="im"><br /> > I plan to submit the XLogReader(thats what its called atm)<br /> > before everything else, so everybody can take a look as soon as<br /> >possible.<br /><br /></div>Great! That will allow more discussion and planning.<br /><div class="im"><br /> > I tooka *very* short glance over the current wiki description of<br /> > xReader and from that it seems to me it would benefitfrom trying<br /> > to make it architecturally more similar to the rest of pg.<br /><br /></div>We're planningon using existing protocol to talk between pieces.<br /> Other than breaking it out so that it can run somewhereother than<br /> inside the server, and allowing clients to connect to xReader to<br /> listen to WAL events ofinterest, are you referring to anything<br /> else?<br /><div class="im"><br /> > I also would suggest reviewing howthe current walreceiver/sender,<br /> > and their protocol, work.<br /><br /></div>Of course! The first "inch-stone"in the GSoC project plan<br /> basically consists of creating an executable that functions as a<br /> walreceiverand a walsender to just pass things through from the<br /> master to the slave. We build from there by allowingclients to<br /> connect (again, over existing protocol) and register for events of<br /> interest, and then recognizingdifferent WAL records to generate<br /> events. The project was just going to create a simple client to<br />dump the information to disk, but with the time saved by adopting<br /> what you've already done, that might leave moretime for generating<br /> a useful client.<br /><br /> Aakash, when you get a chance, could you fill in the "inch-stones"<br/> from the GSoC proposal page onto the Wiki page? I think the<br /> descriptions of those interim stepswould help people understand<br /> your proposal better. Obviously, some of the particulars of tasks<br /> and thedates may need adjustment based on the new work which is<br /> expected to appear before you start, but what's there nowwould be a<br /> helpful reference.<br /><span class="HOEnZb"><font color="#888888"><br /> -Kevin<br /></font></span></blockquote></div><br/></div>
<div class="gmail_extra">Hello Andres,</div><div class="gmail_extra"><br /></div><div class="gmail_extra"><div class="im"style="style">>> The xReader design was based on the idea that it would be nice not<br />>> to causeload on the master machine, and that by proxying the WAL<br /> >> stream to the HS, using synchronous replicationstyle to write from<br />>> xReader to the HS, you could use the HS for a source for that data<br />>>with it being at exactly the right point in time to query it.<br /></div><span style="style">>Yes, that doesmake sense for some workloads. I don't think its viable for</span><br style="style" /><span style="style">>everythingthough, thats why were not aiming for that ourselves atm.</span> </div><div class="gmail_extra"><br/></div><div class="gmail_extra">Regarding the above, what would be a case where querying the HS willnot suffice?<br /><br /><div class="gmail_quote">On Sat, Apr 28, 2012 at 4:02 AM, Andres Freund <span dir="ltr"><<ahref="mailto:andres@2ndquadrant.com" target="_blank">andres@2ndquadrant.com</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Kevin, Hi Aakash,<br/><div class="im"><br /> On Saturday, April 28, 2012 12:18:38 AM Kevin Grittner wrote:<br /> > Andres Freund<<a href="mailto:andres@2ndquadrant.com">andres@2ndquadrant.com</a>> wrote:<br /> > > In the current, prototypal,state there is one component thats<br /> > > integrated into the server (because it needs information thats<br/> > > only available there).<br /> > The xReader design was based on the idea that it would be nice not<br/> > to cause load on the master machine, and that by proxying the WAL<br /> > stream to the HS, using synchronousreplication style to write from<br /> > xReader to the HS, you could use the HS for a source for that data<br/> > with it being at exactly the right point in time to query it.<br /></div>Yes, that does make sense for someworkloads. I don't think its viable for<br /> everything though, thats why were not aiming for that ourselves atm.<br/><div class="im"><br /> > I'm not convinced that I would rather see the logic fixed inside the<br /> > masteras opposed to being deployable on the master's machine, the<br /> > slave machine, or even on its own machine inbetween.<br /></div>I don't think that you can do everything apart from the master. We currently<br /> need shared memoryfor coordination between the moving parts, thats why we<br /> have it inside the master.<br /> It also have the advantageof being easier to setup.<br /><div class="im"><br /> > > That component is layered ontop of a totally genericxlog<br /> > > reading/parsing library that doesn't care at all where its<br /> > > running.<br /> >That's cool.<br /><br /> > > Its also used in another cluster to read the received (filtered)<br /> > > stream.<br/> > I don't quite follow what you're saying there.<br /></div>To interpret the xlog back into something thatcan be used for replication you<br /> need to read it again. After filtering we again write valid WAL, so we can use<br/> the same library on the sending|filtering side and on the receiving side.<br /> But thats actually off topic forthis thread ;)<br /><div class="im"><br /><br /> > > I took a *very* short glance over the current wiki descriptionof<br /> > > xReader and from that it seems to me it would benefit from trying<br /> > > to make itarchitecturally more similar to the rest of pg.<br /> > We're planning on using existing protocol to talk between pieces.<br/> > Other than breaking it out so that it can run somewhere other than<br /> > inside the server, and allowingclients to connect to xReader to<br /> > listen to WAL events of interest, are you referring to anything<br />> else?<br /></div>It sounds like the xReader is designed to be one multiplexing process. While<br /> this definitelyhas some advantages resource-usage-wise it doesn't seem to be<br /> fitting the rest of the design that well. Theadvantages might outweigh<br /> everything else, but I am not sure about that.<br /> Something like registering/deregisteringalso doesn't fit that well with the<br /> way walsender works as far as I understand it.<br /><br/> Greetings,<br /><div class="HOEnZb"><div class="h5"><br /> Andres<br /> --<br /> Andres Freund <a href="http://www.2ndQuadrant.com/" target="_blank">http://www.2ndQuadrant.com/</a><br /> PostgreSQL Development, 24x7Support, Training & Services<br /></div></div></blockquote></div><br /></div>
Simon Riggs <simon@2ndQuadrant.com> writes: > Translating WAL is a very hard task. No kidding. I would think it's impossible on its face. Just for starters, where will you get table and column names from? (Looking at the system catalogs is cheating, and will not work reliably anyway.) IMO, if we want non-physical replication, we're going to need to build it in at a higher level than after-the-fact processing of WAL. I foresee wasting quite a lot of effort on the currently proposed approaches before we admit that they're unworkable. regards, tom lane
On Sat, Apr 28, 2012 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> Translating WAL is a very hard task. > > No kidding. I would think it's impossible on its face. Just for > starters, where will you get table and column names from? (Looking at > the system catalogs is cheating, and will not work reliably anyway.) > > IMO, if we want non-physical replication, we're going to need to build > it in at a higher level than after-the-fact processing of WAL. > I foresee wasting quite a lot of effort on the currently proposed > approaches before we admit that they're unworkable. I think the question we should be asking ourselves is not whether WAL as it currently exists is adequate for logical replication, but rather or not it could be made adequate. For example, suppose that we were to arrange things so that, after each checkpoint, the first insert, update, or delete record for a given relfilenode after each checkpoint emits a special WAL record that contains the relation name, schema OID, attribute names, and attribute type OIDs. Well, now we are much closer to being able to do some meaningful decoding of the tuple data, and it really doesn't cost us that much. Handling DDL (and manual system catalog modifications) seems pretty tricky, but I'd be very reluctant to give up on it without banging my head against the wall pretty hard. The trouble with giving up on WAL completely and moving to a separate replication log is that it means a whole lot of additional I/O, which is bound to have a negative effect on performance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, 2012-04-29 at 16:33 -0400, Robert Haas wrote: > On Sat, Apr 28, 2012 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Simon Riggs <simon@2ndQuadrant.com> writes: > >> Translating WAL is a very hard task. > > > > No kidding. I would think it's impossible on its face. Just for > > starters, where will you get table and column names from? (Looking at > > the system catalogs is cheating, and will not work reliably anyway.) > > > > IMO, if we want non-physical replication, we're going to need to build > > it in at a higher level than after-the-fact processing of WAL. > > I foresee wasting quite a lot of effort on the currently proposed > > approaches before we admit that they're unworkable. > > I think the question we should be asking ourselves is not whether WAL > as it currently exists is adequate for logical replication, but rather > or not it could be made adequate. Agreed. > For example, suppose that we were > to arrange things so that, after each checkpoint, the first insert, > update, or delete record for a given relfilenode after each checkpoint > emits a special WAL record that contains the relation name, schema > OID, attribute names, and attribute type OIDs. Not just the first after checkpoint, but also the first after a schema change, even though will duplicate the wals with changes to system catalog, it is likely much cheaper overall to always have a fresh structure in wal stream. And if we really want to do WAL-->logical-->SQL_text conversion on a host separate from the master, we also need to insert there the type definitions of user-defined types together with at least types output functions in some form . So you basically need a large part of postgres for reliably making sense of WAL. > Well, now we are much > closer to being able to do some meaningful decoding of the tuple data, > and it really doesn't cost us that much. Handling DDL (and manual > system catalog modifications) seems pretty tricky, but I'd be very > reluctant to give up on it without banging my head against the wall > pretty hard. Most straightforward way is to have a more or less full copy of pg_catalog also on the "WAL-filtering / WAL-conversion" node, and to use it in 1:1 replicas of transactions recreated from the WAL . This way we can avoid recreating any alternate views of the masters schema. Then again, we could do it all on master and inside the wal-writing transaction and thus avoid large chunk of the problems. If the receiving side is also PostgreSQL with same catalog structure (i.e same major version) then we don't actually need to "handle DDL" in any complicated way, it would be enough to just carry over the changes to system tables . The main reason we don't do it currently for trigger-based logical replication is the restriction of not being able to have triggers on system tables. I hope it is much easier to have the triggerless record generation also work on system tables. > The trouble with giving up on WAL completely and moving > to a separate replication log is that it means a whole lot of > additional I/O, which is bound to have a negative effect on > performance. Why would you give up WAL ? Or do you mean that the new "logical-wal" needs to have same commit time behaviour as WAL to be reliable ? I'd envision a scenario where the logi-wal is sent to slave or distribution hub directly and not written at the local host at all. An optionally sync mode similar to current sync WAL replication could be configured. I hope this would run mostly in parallel with local WAL generation so not much extra wall-clock time would be wasted. > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- ------- Hannu Krosing PostgreSQL Unlimited Scalability and Performance Consultant 2ndQuadrant Nordic PG Admin Book: http://www.2ndQuadrant.com/books/
On Sun, Apr 29, 2012 at 6:00 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> I think the question we should be asking ourselves is not whether WAL >> as it currently exists is adequate for logical replication, but rather >> or not it could be made adequate. > > Agreed. And of course I meant "but rather whether or not it could be made adequate", but I dropped a word. >> For example, suppose that we were >> to arrange things so that, after each checkpoint, the first insert, >> update, or delete record for a given relfilenode after each checkpoint >> emits a special WAL record that contains the relation name, schema >> OID, attribute names, and attribute type OIDs. > > Not just the first after checkpoint, but also the first after a schema > change, even though will duplicate the wals with changes to system > catalog, it is likely much cheaper overall to always have a fresh > structure in wal stream. Yes. > And if we really want to do WAL-->logical-->SQL_text conversion on a > host separate from the master, we also need to insert there the type > definitions of user-defined types together with at least types output > functions in some form . Yes. > So you basically need a large part of postgres for reliably making sense > of WAL. Agreed, but I think that's a problem we need to fix and not a tolerable situation at all. If a user can create a type-output function that goes and looks at the state of the database to determine what to output, then we are completely screwed, because that basically means you would need to have a whole Hot Standby instance up and running just to make it possible to run type output functions. Now you might be able to build a mechanism around that that is useful to some people in some situations, but wow does that sound painful. What I want is for the master to be able to cheaply rattle off the tuples that got inserted, updated, or deleted as those things happen; needing a whole second copy of the database just to do that does not meet my definition of "cheap". Furthermore, it's not really clear that it's sufficient anyway, since there are problems with what happens before the HS instance reaches consistency, what happens when it crashes and restarts, and how do we handle the case when the system catalog we need to examine to generate the logical replication records is access-exclusive-locked? Seems like a house of cards. Some of this might be possible to mitigate contractually, by putting limits on what type input/output functions are allowed to do. Or we could invent a new analog of type input/output functions that is explicitly limited in this way, and support only types that provide it. But I think the real key is that we can't rely on catalog access: the WAL stream has to have enough information to allow the reader to construct some set of in-memory hash tables with sufficient detail to reliably decode WAL. Or at least that's what I'm thinking. > Most straightforward way is to have a more or less full copy of > pg_catalog also on the "WAL-filtering / WAL-conversion" node, and to use > it in 1:1 replicas of transactions recreated from the WAL . > This way we can avoid recreating any alternate views of the masters > schema. See above; I have serious doubts that this can ever be made to work robustly. > Then again, we could do it all on master and inside the wal-writing > transaction and thus avoid large chunk of the problems. > > If the receiving side is also PostgreSQL with same catalog structure > (i.e same major version) then we don't actually need to "handle DDL" in > any complicated way, it would be enough to just carry over the changes > to system tables . I agree it'd be preferable to handle DDL in terms of system catalog updates, rather than saying, well, this is an ALTER TABLE .. RENAME. But you need to be able to decode tuples using the right tuple descriptor, even while that's changing under you. > Why would you give up WAL ? For lack of ability to make it work. Don't underestimate how hard it's going to nail this down. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Apr 29, 2012 at 6:00 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> So you basically need a large part of postgres for reliably making sense >> of WAL. > Agreed, but I think that's a problem we need to fix and not a > tolerable situation at all. If a user can create a type-output > function that goes and looks at the state of the database to determine > what to output, then we are completely screwed, because that basically > means you would need to have a whole Hot Standby instance up and > running just to make it possible to run type output functions. You mean like enum_out? Or for that matter array_out, record_out, range_out? regards, tom lane
On Sun, Apr 29, 2012 at 11:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Apr 29, 2012 at 6:00 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >>> So you basically need a large part of postgres for reliably making sense >>> of WAL. > >> Agreed, but I think that's a problem we need to fix and not a >> tolerable situation at all. If a user can create a type-output >> function that goes and looks at the state of the database to determine >> what to output, then we are completely screwed, because that basically >> means you would need to have a whole Hot Standby instance up and >> running just to make it possible to run type output functions. > > You mean like enum_out? Or for that matter array_out, record_out, > range_out? Yeah, exactly. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company