Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node - Mailing list pgsql-hackers
From | Christopher Browne |
---|---|
Subject | Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node |
Date | |
Msg-id | CAFNqd5U3XPP2uurg+sn2M2RtsxDwj36y_FDm+N_u31zQfHNLpA@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node |
List | pgsql-hackers |
On Wed, Jun 20, 2012 at 11:50 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On Wednesday, June 20, 2012 05:34:42 PM Kevin Grittner wrote: >> Simon Riggs <simon@2ndQuadrant.com> wrote: >> > This is not transaction metadata, it is WAL record metadata >> > required for multi-master replication, see later point. > >> > We need to add information to every WAL record that is used as the >> > source for generating LCRs. >> If the origin ID of a transaction doesn't count as transaction >> metadata (i.e., data about the transaction), what does? It may be a >> metadata element about which you have special concerns, but it is >> transaction metadata. You don't plan on supporting individual WAL >> records within a transaction containing different values for origin >> ID, do you? If not, why is it something to store in every WAL >> record rather than once per transaction? That's not intended to be >> a rhetorical question. > Its definitely possible to store it per transaction (see the discussion around > http://archives.postgresql.org/message- > id/201206201605.43634.andres@2ndquadrant.com) it just makes the filtering via > the originating node a considerably more complex thing. With our proposal you > can do it without any complexity involved, on a low level. Storing it per > transaction means you can only stream out the data to other nodes *after* > fully reassembling the transaction. Thats a pitty, especially if we go for a > design where the decoding happens in a proxy instance. I guess I'm not seeing the purpose to having the origin node id in the WAL stream either. We have it in the Slony sl_log_* stream, however there is a crucial difference, in that sl_log_* is expressly a shared structure. In contrast, WAL isn't directly sharable; you don't mix together multiple WAL streams. It seems as though the point in time at which you need to know the origin ID is the moment at which you're deciding to read data from the WAL files, and knowing which stream you are reading from is an assertion that might be satisfied by looking at configuration that doesn't need to be in the WAL stream itself. It might be *nice* for the WAL stream to be self-identifying, but that doesn't seem to be forcibly necessary. The case where it *would* be needful is if you are in the process of assembling together updates coming in from multiple masters, and need to know: - This INSERT was replicated from node #1, so should be ignored downstream - That INSERT was replicated from node#2, so should be ignored downstream - This UPDATE came from the local node, so needs to be passed to downstream users Or perhaps something else is behind the node id being deeply embedded into the stream that I'm not seeing altogether. > Other metadata will not be needed on such a low level. > > I also have to admit that I am very hesitant to start developing some generic > "transaction metadata" framework atm. That seems to be a good way to spend a > good part of time in discussion and disagreeing. Imo thats something for > later. Well, I see there being a use in there being at least 3 sorts of LCR records: a) Capturing literal SQL that is to replayed downstream. This parallels two use cases existing in existing replication systems: i) In pre-2.2 versions of Slony, statements are replayedliterally.So there's a stream of INSERT/UPDATE/DELETE statements. ii) DDL capture and replay. In existing replicationsystems, DDL isn't captured implicitly, the way Dimitri's Event Triggers are to do, but rather is captured explicitly. There should be a function to allow injecting such SQL explicitly; that is sure to be a useful sort of thing to be able to do. b) Capturing tuple updates in a binary form that can be turned readily into heap updates on a replica. Unfortunately, this form is likely not to play well when replicating across platforms or Postgres versions, so I suspect that this performance optimization should be implemented as a *last* resort, rather than first. Michael Jackson had some "rules of optimization" that said "don't do it", and, for the expert, "don't do it YET..." c) Capturing tuple data in some reasonably portable and readily re-writable form. Slony 2.2 changes from "SQL fragments" (of a) i) above) to storing updates as an array of text values indicating: - relation name - attribute names - attribute values, serializedinto strings I don't know that this provably represents the *BEST* representation, but it definitely will be portable where b) would not be, and lends itself to being able to reuse query plans, where a) requires extraordinary amounts of parsing work, today. So I'm pretty sure it's better than a) and b) for a sizable set of cases. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
pgsql-hackers by date: