Re: Replication identifiers, take 3 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Replication identifiers, take 3 |
Date | |
Msg-id | CA+TgmoZOhCT9_ENZ165=3pRKH+p=Ads8g5E3voM8sSufW_cYYQ@mail.gmail.com Whole thread Raw |
In response to | Re: Replication identifiers, take 3 (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Replication identifiers, take 3
|
List | pgsql-hackers |
On Fri, Sep 26, 2014 at 5:05 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Let me try to summarize the information requirements for each of these >> things. For #1, you need to know, after crash recovery, for each >> standby, the last commit LSN which the client has confirmed via a >> feedback message. > > I'm not sure I understand what you mean here? This is all happening on > the *standby*. The standby needs to know, after crash recovery, the > latest commit LSN from the primary that it has successfully replayed. Ah, sorry, you're right: so, you need to know, after crash recovery, for each machine you are replicating *from*, the last transaction (in terms of LSN) from that server that you successfully replayed. >> > Similarly, to solve the problem of identifying the origin of changes >> > during decoding, the problem can be solved nicely by adding the origin >> > node of every change to changes/transactions. At first it might seem >> > to be sufficient to do so on transaction level, but for cascading >> > scenarios it's very useful to be able to have changes from different >> > source transactions combinded into a larger one. >> >> I think this is a lot more problematic. I agree that having the data >> in the commit record isn't sufficient here, because for filtering >> purposes (for example) you really want to identify the problematic >> transactions at the beginning, so you can chuck their WAL, rather than >> reassembling the transaction first and then throwing it out. But >> putting the origin ID in every insert/update/delete is pretty >> unappealing from my point of view - not just because it adds bytes to >> WAL, though that's a non-trivial concern, but also because it adds >> complexity - IMHO, a non-trivial amount of complexity. I'd be a lot >> happier with a solution where, say, we have a separate WAL record that >> says "XID 1234 will be decoding for origin 567 until further notice". > > I think it actually ends up much simpler than what you propose. In the > apply process, you simply execute > SELECT pg_replication_identifier_setup_replaying_from('bdr: this-is-my-identifier'); > or it's C equivalent. That sets a global variable which XLogInsert() > includes the record. > Note that this doesn't actually require any additional space in the WAL > - padding bytes in struct XLogRecord are used to store the > identifier. These have been unused at least since 8.0. Sure, that's simpler for logical decoding, for sure. That doesn't make it the right decision for the system overall. > I don't think a solution which logs the change of origin will be > simpler. When the origin is in every record, you can filter without keep > track of any state. That's different if you can switch the origin per > tx. At the very least you need a in memory entry for the origin. But again, that complexity pertains only to logical decoding. Somebody who wants to tweak the WAL format for an UPDATE in the future doesn't need to understand how this works, or care. You know me: I've been a huge advocate of logical decoding. But just like row-level security or BRIN indexes or any other feature, I think it needs to be designed in a way that minimizes the impact it has on the rest of the system. I simply don't believe your contention that this isn't adding any complexity to the code path for regular DML operations. It's entirely possible we could need bit space in those records in the future for something that actually pertains to those operations; if you've burned it for logical decoding, it'll be difficult to claw it back. And what if Tom gets around, some day, to doing that pluggable heap AM work? Then every heap AM has got to allow for those bits, and maybe that doesn't happen to be free for them. Admittedly, these are hypothetical scenarios, but I don't think they're particularly far-fetched. And as a fringe benefit, if you do it the way that I'm proposing, you can use an OID instead of a 16-bit thing that we picked to be 16 bits because that happens to be 100% of the available bit-space. Yeah, there's some complexity on decoding, but it's minimal: one more piece of fixed-size state to track per XID. That's trivial compared to what you've already got. >> What's the point of the short-to-long mappings in the first place? Is >> that only required because of the possibility that there might be >> multiple replication solutions in play on the same node? > > In my original proposal, 2 years+ back, I only used short numeric > ids. And people didn't like it because it requires coordination between > the replication solutions and possibly between servers. Using a string > identifier like in the above allows to easily build unique names; and > allows every solution to add the information it needs into replication > identifiers. I get that, but what I'm asking is why those mappings can't be managed on a per-replication-solution basis. I think that's just because there's a limited namespace and so coordination is needed between multiple replication solutions that might possibly be running on the same system. But I want to confirm if that's actually what you're thinking. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: