Thread: PATCH: 9.5 replication origins fix for logical decoding

PATCH: 9.5 replication origins fix for logical decoding

From
Craig Ringer
Date:
Hi all

There's an oversight in replication origins support in logical
decoding, where the origin node ID isn't passed correctly to callbacks
except for the origin filter callback. All other callbacks see it as
InvalidRepOriginId.

It's a one-line fix, but I've added support in test_decoding to
validate the fix and expanded the test script to cover it.

Should be applied to 9.5 and 9.6.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: PATCH: 9.5 replication origins fix for logical decoding

From
Andres Freund
Date:
On 2015-10-15 16:02:23 +0800, Craig Ringer wrote:
> There's an oversight in replication origins support in logical
> decoding, where the origin node ID isn't passed correctly to callbacks
> except for the origin filter callback. All other callbacks see it as
> InvalidRepOriginId.

Only for the transaction, right? I.e. the stuff on changes should be
correct? Your description sounds like it's more than that?

I don't think your fix is entirely correct, the
XLogRecGetOrigin(buf->record) shouldn't be in the XACT_XINFO_HAS_ORIGIN
block.

Your test prints the origins from the transaction instead the changes -
why?

Greetings,

Andres Freund



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Craig Ringer
Date:
On 15 October 2015 at 16:48, Andres Freund <andres@anarazel.de> wrote:
> On 2015-10-15 16:02:23 +0800, Craig Ringer wrote:
>> There's an oversight in replication origins support in logical
>> decoding, where the origin node ID isn't passed correctly to callbacks
>> except for the origin filter callback. All other callbacks see it as
>> InvalidRepOriginId.
>
> Only for the transaction, right? I.e. the stuff on changes should be
> correct? Your description sounds like it's more than that?
>
> I don't think your fix is entirely correct, the
> XLogRecGetOrigin(buf->record) shouldn't be in the XACT_XINFO_HAS_ORIGIN
> block.

I guess since it'll be InvalidRepOriginId otherwise, that makes sense.

> Your test prints the origins from the transaction instead the changes -
> why?

I don't understand this part.

Do you mean:

+ BEGIN (from "test_decoding: regression_slot" at lsn 0/AABBCCFF)

?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Andres Freund
Date:
On 2015-10-15 17:05:33 +0800, Craig Ringer wrote:
> On 15 October 2015 at 16:48, Andres Freund <andres@anarazel.de> wrote:
> > On 2015-10-15 16:02:23 +0800, Craig Ringer wrote:
> >> There's an oversight in replication origins support in logical
> >> decoding, where the origin node ID isn't passed correctly to callbacks
> >> except for the origin filter callback. All other callbacks see it as
> >> InvalidRepOriginId.
> >
> > Only for the transaction, right? I.e. the stuff on changes should be
> > correct? Your description sounds like it's more than that?
> >
> > I don't think your fix is entirely correct, the
> > XLogRecGetOrigin(buf->record) shouldn't be in the XACT_XINFO_HAS_ORIGIN
> > block.
> 
> I guess since it'll be InvalidRepOriginId otherwise, that makes sense.

That's not the point. XACT_XINFO_HAS_ORIGIN is about
origin_lsn/timestamp, it doesn't have anything to do with the record
origin (which is included in many more types of record than just
commits).

> > Your test prints the origins from the transaction instead the changes -
> > why?
> 
> I don't understand this part.

Your test prints origin in commits - but changes can have individual
origins.

Greetings,

Andres Freund



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Craig Ringer
Date:
On 15 October 2015 at 17:43, Andres Freund <andres@anarazel.de> wrote:

>> I guess since it'll be InvalidRepOriginId otherwise, that makes sense.
>
> That's not the point. XACT_XINFO_HAS_ORIGIN is about
> origin_lsn/timestamp, it doesn't have anything to do with the record
> origin (which is included in many more types of record than just
> commits).

Ok, I think I see. That's also why it wasn't incorporated into
xl_xact_parsed_commit.

I'll check which records can contain it and assign it in the
appropriate decoding calls. I'll follow up in a while with an updated
patch.

>> > Your test prints the origins from the transaction instead the changes -
>> > why?
>>
>> I don't understand this part.
>
> Your test prints origin in commits - but changes can have individual
> origins.

I was completely unaware of that, so thankyou. I'll change the tests
to exercise that. Any preferences on the output format?

Maybe:

table public.origin_tbl: INSERT: id[integer]:6 data[text]:'from second
origin' -- origin:'some_origin' origin_lsn:'0/1234'

?

it's cluttered, but really I'm not sure there's a pretty way to pack
that in, and it's only test output.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Andres Freund
Date:
On October 15, 2015 1:02:04 PM GMT+02:00, Craig Ringer <craig@2ndquadrant.com> wrote:
>On 15 October 2015 at 17:43, Andres Freund <andres@anarazel.de> wrote:
>
>>> I guess since it'll be InvalidRepOriginId otherwise, that makes
>sense.
>>
>> That's not the point. XACT_XINFO_HAS_ORIGIN is about
>> origin_lsn/timestamp, it doesn't have anything to do with the record
>> origin (which is included in many more types of record than just
>> commits).
>
>Ok, I think I see. That's also why it wasn't incorporated into
>xl_xact_parsed_commit.
>
>I'll check which records can contain it and assign it in the
>appropriate decoding calls. I'll follow up in a while with an updated
>patch.

As far as I can see all the other places have it assigned.

>>> > Your test prints the origins from the transaction instead the
>changes -
>>> > why?
>>>
>>> I don't understand this part.
>>
>> Your test prints origin in commits - but changes can have individual
>> origins.
>
>I was completely unaware of that, so thankyou. I'll change the tests
>to exercise that. Any preferences on the output format?
>
>Maybe:
>
>table public.origin_tbl: INSERT: id[integer]:6 data[text]:'from second
>origin' -- origin:'some_origin' origin_lsn:'0/1234'
>
>?
>
>it's cluttered, but really I'm not sure there's a pretty way to pack
>that in, and it's only test output.

I'm inclined not to commit this part - seems to add too much complications for the amount of coverage. But please use
itfor testing.
 

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Craig Ringer
Date:
On 15 October 2015 at 19:04, Andres Freund <andres@anarazel.de> wrote:

> As far as I can see all the other places have it assigned.

Ok, thanks. Not much need for a followup patch then, if you're not
using the test changes part.

>>table public.origin_tbl: INSERT: id[integer]:6 data[text]:'from second
>>origin' -- origin:'some_origin' origin_lsn:'0/1234'
>>
>>?
>>
>>it's cluttered, but really I'm not sure there's a pretty way to pack
>>that in, and it's only test output.
>
> I'm inclined not to commit this part - seems to add too much complications for the amount of coverage. But please use
itfor testing.
 

It doesn't seem like this will be particularly vulnerable to
regressions or have new record types added that need a check for them.
I'd be inclined to add the info, but I have a higher tolerance for
verbosity than you ;)

I think it's worth adding a test for the change of origin mid-tx. I
had no idea that was even possible.

Testing forwarding of empty tx's is simple and should probably be there too.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Craig Ringer
Date:
On 15 October 2015 at 20:11, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 15 October 2015 at 19:04, Andres Freund <andres@anarazel.de> wrote:
>
>> As far as I can see all the other places have it assigned.
>
> Ok, thanks. Not much need for a followup patch then, if you're not
> using the test changes part.

Here's what I used for my tests, anyway, including an updated fix.

You'll note that the tests fail. When the replication origin is reset
and set again with pg_replication_origin_xact_setup mid-xact, the
origin identity exposed to the decoding plugin callbacks for all
records (including those created before the origin change) is the
latter origin, the one active at COMMIT time.

Is that the intended behaviour? That the session identifier is
determined by what was active at commit time, and only the lsn and
timestamp vary throughout the xact? It looks like it from the code.

Should pg_replication_origin_xact_reset() and
pg_replication_origin_xact_setup() be permitted within a transaction?
Or is this just a "well, don't do that"?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: PATCH: 9.5 replication origins fix for logical decoding

From
Andres Freund
Date:
On 2015-10-15 20:52:41 +0800, Craig Ringer wrote:
> You'll note that the tests fail. When the replication origin is reset
> and set again with pg_replication_origin_xact_setup mid-xact, the
> origin identity exposed to the decoding plugin callbacks for all
> records (including those created before the origin change) is the
> latter origin, the one active at COMMIT time.
> 
> Is that the intended behaviour? That the session identifier is
> determined by what was active at commit time, and only the lsn and
> timestamp vary throughout the xact? It looks like it from the code.

Uh. Isn't that just because you looked at txn->origin_id instead of the
change's origin_id?


Andres



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Craig Ringer
Date:
On 15 October 2015 at 20:55, Andres Freund <andres@anarazel.de> wrote:
> On 2015-10-15 20:52:41 +0800, Craig Ringer wrote:
>> You'll note that the tests fail. When the replication origin is reset
>> and set again with pg_replication_origin_xact_setup mid-xact, the
>> origin identity exposed to the decoding plugin callbacks for all
>> records (including those created before the origin change) is the
>> latter origin, the one active at COMMIT time.
>>
>> Is that the intended behaviour? That the session identifier is
>> determined by what was active at commit time, and only the lsn and
>> timestamp vary throughout the xact? It looks like it from the code.
>
> Uh. Isn't that just because you looked at txn->origin_id instead of the
> change's origin_id?

Yes, it is. I didn't realise that the individual changes had their own
origins, rather than changing the origin in the txn, though I can see
that now that I know to look.

Either I'm confused (likely) or the concept behind allowing this is
critically flawed.

Say some client code does

set session origin=1
begin
set xact lsn=0/123, ts=13:00
do some inserts
set session origin=2
set xact lsn=0/199, ts=14:00
do some more inserts
commit

it seems to be decoded as:

begin origin=2 lsn=0/199
inserts origin=1 lsn=0/199
more inserts origin=2 lsn=0/199
commit origin=2 lsn=0/199

i.e. the begin and commit have the final session origin. Individual
changes have the session origin in effect at the time the change was
created. The last-set origin commit timestamp and origin lsn override
all prior ones; they aren't recorded per-change, only on the commit.

This means you have change records with a change->origin_id that's
from a completely different node, which makes no sense at all with the
txn->origin_lsn . It matches the txn->origin_id, which is the same
throughout, but then why even have the change->origin_id?

I find the idea of each change having its own origin node - but not
its own origin LSN - very confusing. For one thing the origin filter
callback can't know about that, and can only filter based on the txn's
origin. I guess that's the output plugin's problem - if it wants to
cope with arbitrary mixed-origin tx's it can't use the origin filter
and has to check each message.

I really don't see how it makes any sense to allow the origin_id to
change mid-tx. I can see how sending the origin_id for each change
could make sense to allow future support for transaction streaming
where decoding starts before we receive the commit record, but
changing the origin_id within the tx doesn't make any sense.

IMO changing the origin should be disallowed within a tx. Otherwise
there needs to be some way to record the origin lsn and commit
timestamp changing within the tx too.

I was going to just send a patch to disallow changing the origin
mid-tx, but I'm not sure I see a good way to do that since the origin
is a session-level global, not part of the xact info.

Document it as a "don't do that, if you do it you get to keep the pieces"?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Craig Ringer
Date:
On 16 October 2015 at 11:51, Craig Ringer <craig@2ndquadrant.com> wrote:
> Document it as a "don't do that, if you do it you get to keep the pieces"?

Thinking about this some more, having per-change origins makes sense
when you're not using origin LSNs, such as when you're not replaying
from another PostgreSQL instance. So I _can_ see why it exists.

I guess this is mostly a matter of adding some comments and/or some
notes in the functions' docs to explain how it all fits together -
that origins can be per-change, that the txn origin is the origin that
was in effect at commit time, and that the lsn and commit timestamp
are always those that were set at commit time, so you cannot use a
per-change origin with the txn's lsn and expect it to make sense.

Reasonable?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Craig Ringer
Date:
Hi all

Patch revision 3 attached. It's a one-liner, with just the fix, and an
explanatory note in the patch header.

I'll leave the test changes for now, per Andres's feedback.

I'll write up a proposed addition to the replication origin docs that
explains the existence of separate origins for each change and for the
tx as a whole, and explains how replication origins interacts with
transaction commit timestamps. That'll come as a followup; I think
it's worth getting the bug fix in now.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: PATCH: 9.5 replication origins fix for logical decoding

From
Craig Ringer
Date:
On 19 October 2015 at 21:43, Craig Ringer <craig@2ndquadrant.com> wrote:
> Hi all
>
> Patch revision 3 attached. It's a one-liner, with just the fix, and an
> explanatory note in the patch header.

I'm bumping this because I think it's important not to miss it for
9.5, so it can't wait for the commitfest.

It's just the one-liner with the fix its self.


-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Andres Freund
Date:
Hi,

On 2015-10-19 21:43:32 +0800, Craig Ringer wrote:
> Patch revision 3 attached. It's a one-liner, with just the fix, and an
> explanatory note in the patch header.

Pushed to 9.5 and master.

Thanks for noticing the issue,

Andres Freund



Re: PATCH: 9.5 replication origins fix for logical decoding

From
Craig Ringer
Date:
On 9 November 2015 at 07:04, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2015-10-19 21:43:32 +0800, Craig Ringer wrote:
>> Patch revision 3 attached. It's a one-liner, with just the fix, and an
>> explanatory note in the patch header.
>
> Pushed to 9.5 and master.
>
> Thanks for noticing the issue,

Thanks for taking the time to sanity-check and apply the fix.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services