Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node - Mailing list pgsql-hackers

From Christopher Browne
Subject Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Date
Msg-id CAFNqd5U3XPP2uurg+sn2M2RtsxDwj36y_FDm+N_u31zQfHNLpA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
List pgsql-hackers
On Wed, Jun 20, 2012 at 11:50 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Wednesday, June 20, 2012 05:34:42 PM Kevin Grittner wrote:
>> Simon Riggs <simon@2ndQuadrant.com> wrote:
>> > This is not transaction metadata, it is WAL record metadata
>> > required for multi-master replication, see later point.
>
>> > We need to add information to every WAL record that is used as the
>> > source for generating LCRs.
>> If the origin ID of a transaction doesn't count as transaction
>> metadata (i.e., data about the transaction), what does?  It may be a
>> metadata element about which you have special concerns, but it is
>> transaction metadata.  You don't plan on supporting individual WAL
>> records within a transaction containing different values for origin
>> ID, do you?  If not, why is it something to store in every WAL
>> record rather than once per transaction?  That's not intended to be
>> a rhetorical question.
> Its definitely possible to store it per transaction (see the discussion around
> http://archives.postgresql.org/message-
> id/201206201605.43634.andres@2ndquadrant.com) it just makes the filtering via
> the originating node a considerably more complex thing. With our proposal you
> can do it without any complexity involved, on a low level. Storing it per
> transaction means you can only stream out the data to other nodes *after*
> fully reassembling the transaction. Thats a pitty, especially if we go for a
> design where the decoding happens in a proxy instance.

I guess I'm not seeing the purpose to having the origin node id in the
WAL stream either.

We have it in the Slony sl_log_* stream, however there is a crucial
difference, in that sl_log_* is expressly a shared structure.  In
contrast, WAL isn't directly sharable; you don't mix together multiple
WAL streams.

It seems as though the point in time at which you need to know the
origin ID is the moment at which you're deciding to read data from the
WAL files, and knowing which stream you are reading from is an
assertion that might be satisfied by looking at configuration that
doesn't need to be in the WAL stream itself.  It might be *nice* for
the WAL stream to be self-identifying, but that doesn't seem to be
forcibly necessary.

The case where it *would* be needful is if you are in the process of
assembling together updates coming in from multiple masters, and need
to know:  - This INSERT was replicated from node #1, so should be ignored downstream  - That INSERT was replicated from
node#2, so should be ignored downstream  - This UPDATE came from the local node, so needs to be passed to 
downstream users

Or perhaps something else is behind the node id being deeply embedded
into the stream that I'm not seeing altogether.

> Other metadata will not be needed on such a low level.
>
> I also have to admit that I am very hesitant to start developing some generic
> "transaction metadata" framework atm. That seems to be a good way to spend a
> good part of time in discussion and disagreeing. Imo thats something for
> later.

Well, I see there being a use in there being at least 3 sorts of LCR records:
a) Capturing literal SQL that is to replayed downstream.  This
parallels two use cases existing in existing replication systems:  i) In pre-2.2 versions of Slony, statements are
replayedliterally.So there's a stream of INSERT/UPDATE/DELETE statements.  ii) DDL capture and replay.  In existing
replicationsystems, DDL 
isn't captured implicitly, the way Dimitri's Event Triggers are to do,
but rather is captured explicitly.  There should be a function to allow injecting such SQL explicitly;
that is sure to be a useful sort of thing to be able to do.

b) Capturing tuple updates in a binary form that can be turned readily
into heap updates on a replica.   Unfortunately, this form is likely not to play well when
replicating across platforms or Postgres versions, so I suspect that
this performance optimization should be implemented as a *last*
resort, rather than first.  Michael Jackson had some "rules of
optimization" that said "don't do it", and, for the expert, "don't do
it YET..."

c) Capturing tuple data in some reasonably portable and readily
re-writable form.   Slony 2.2 changes from "SQL fragments" (of a) i) above) to storing
updates as an array of text values indicating:    - relation name    - attribute names    - attribute values,
serializedinto strings   I don't know that this provably represents the *BEST* 
representation, but it definitely will be portable where b) would not
be, and lends itself to being able to reuse query plans, where a)
requires extraordinary amounts of parsing work, today.  So I'm pretty
sure it's better than a) and b) for a sizable set of cases.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Nasty, propagating POLA violation in COPY CSV HEADER
Next
From: Robert Haas
Date:
Subject: Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node