Re: [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached) - Mailing list pgsql-hackers

From Christopher Browne
Subject Re: [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)
Date
Msg-id CAFNqd5VjKzNcceHnL=1NJJB818kSb7YO7vmi9hCBU99_btn0Mw@mail.gmail.com
Whole thread Raw
In response to Re: [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Mon, Oct 15, 2012 at 4:51 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Monday, October 15, 2012 10:08:28 PM Christopher Browne wrote:
>> On Mon, Oct 15, 2012 at 3:18 PM, Peter Geoghegan <peter@2ndquadrant.com>
> wrote:
>> > On 15 October 2012 19:19, Bruce Momjian <bruce@momjian.us> wrote:
>> >> I think Robert is right that if Slony can't use the API, it is unlikely
>> >> any other replication system could use it.
>> >
>> > I don't accept that. Clearly there is a circular dependency, and
>> > someone has to go first - why should the Slony guys invest in adopting
>> > this technology if it is going to necessitate using a forked Postgres
>> > with an uncertain future? That would be (with respect to the Slony
>> > guys) a commercial risk that is fairly heavily concentrated with
>> > Afilias.
>>
>> Yep, there's something a bit too circular there.
>>
>> I'd also not be keen on reimplementing the "Slony integration" over
>> and over if it turns out that the API churns for a while before
>> stabilizing.  That shouldn't be misread as "I expect horrible amounts
>> of churn", just that *any* churn comes at a cost.  And if anything
>> unfortunate happens, that can easily multiply into a multiplicity of
>> painfulness(es?).
>
> Well, as a crosscheck, could you list your requirements?
>
> Do you need anything more than outputting data in a format compatible to whats
> stored in sl_log_*? You wouldn't have sl_actionseq, everything else should be
> there (Well, you would need to do lookups to get the tableid, but thats not
> really much of a problem). The results would be ordered in complete
> transactions, in commit order.

Hmm.  We need to have log data that's in a compatible ordering.

We use sl_actionseq, and can mix data from multiple transactions
together; if what you're providing is, instead, in order based on
transaction commit order followed by some sequencing within each
transaction, then that should be acceptable.

The stylized query on sl_log_* looks like...

select log_origin, log_txid, log_tableid,
log_actionseq, log_tablenspname,
log_tablerelname, log_cmdtype,
log_cmdupdncols, log_cmdargs
from %s.sl_log_%d
where log_origin = %d

How about I "quibble" about each of these:

a) log_origin - this indicates the node from which the data
originates.  Presumably, this is implicit in a "chunk" of data that is
coming in.

b) log_txid - indicating the transaction ID.  I presume you've got
this available.  It's less important with the WAL-based scheme in that
we'd probably not be using it as a basis for querying as is the case
today with Slony.

c) log_tableid - indicating the ID of the table.  Are you capturing an
OID equivalent to this?  Or what?

d) log_actionseq - indicating relative sequences of updates.  You
don't have this, but if you're capturing commit ordering, we don't
need it.

e) log_tablenspname, log_tablerelname - some small amount of magic
needful to get this.  Or perhaps you are already capturing it?

f) log_cmdtype - I/U/D/T - indicating the action
(insert/update/delete/truncate).  Hopefully you have something like
this?

g) log_cmdupdncols - for UPDATE action, the number of updated columns.Probably not mandatory; this was a new 2.1
thing...

h) log_cmdargs - the actual data needed to do the I/U/D.  The form of
this matters a fair bit.  Before Slony 2.1, this was a portion of a
SQL statement, omitting the operation (provided in log_cmdtype) and
the table name (in log_tablerelname et al).  In Slony 2.1, this
changes to be a text[] array that essentially consists of pairs of
[column name, column value] values.

I see one place, very notable in Slony 2.2, that would also be more
complicated, which is the handling of DDL.

In 2.1 and earlier, we handled DDL as "events", essentially out of
band.  This isn't actually correct; it could mix very badly if you had
replication activity mixing with DDL requests.  (More detail than you
want is in a bug on this...
<http://www.slony.info/bugzilla/show_bug.cgi?id=137>).

In Slony 2.2, we added a third "log table" where DDL gets captured.
sl_log_script has much the same schema as sl_log_{1,2}; it needs to
get "mixed in" in compatible order.  What I imagine would pointedly
complicate life is if a single transaction contained both DDL and
"regular replicable activity."  Slony 2.2 mixes this in using XID +
log_actionseq; how this would play out with your log capture mechanism
isn't completely clear to me.  That's the place where I'd expect the
very messiest interaction.

> I guess the other tables would stay as they are as they contain the "added
> value" of slony?

A fair bit of Slony is about the "event infrastructure," and some of
that ceases to be as needful.  The configuration bits probably
continue to remain interesting.

The parts that seem notably mysterious to me at the moment are:

a) How do we pull result sets (e.g. - sl_log_* data)?

b) How is the command data represented?

c) If we have a need to mix together your 'raw logs' and other
material (e.g. - our sl_log_script that captures DDL-like changes to
be mixed back in), how easy|impossible is this?
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"



pgsql-hackers by date:

Previous
From: Phil Sorber
Date:
Subject: Re: [WIP] pg_ping utility
Next
From: Josh Berkus
Date:
Subject: Re: Global Sequences