Thread: PATCH: 9.5 replication origins fix for logical decoding
Hi all There's an oversight in replication origins support in logical decoding, where the origin node ID isn't passed correctly to callbacks except for the origin filter callback. All other callbacks see it as InvalidRepOriginId. It's a one-line fix, but I've added support in test_decoding to validate the fix and expanded the test script to cover it. Should be applied to 9.5 and 9.6. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2015-10-15 16:02:23 +0800, Craig Ringer wrote: > There's an oversight in replication origins support in logical > decoding, where the origin node ID isn't passed correctly to callbacks > except for the origin filter callback. All other callbacks see it as > InvalidRepOriginId. Only for the transaction, right? I.e. the stuff on changes should be correct? Your description sounds like it's more than that? I don't think your fix is entirely correct, the XLogRecGetOrigin(buf->record) shouldn't be in the XACT_XINFO_HAS_ORIGIN block. Your test prints the origins from the transaction instead the changes - why? Greetings, Andres Freund
On 15 October 2015 at 16:48, Andres Freund <andres@anarazel.de> wrote: > On 2015-10-15 16:02:23 +0800, Craig Ringer wrote: >> There's an oversight in replication origins support in logical >> decoding, where the origin node ID isn't passed correctly to callbacks >> except for the origin filter callback. All other callbacks see it as >> InvalidRepOriginId. > > Only for the transaction, right? I.e. the stuff on changes should be > correct? Your description sounds like it's more than that? > > I don't think your fix is entirely correct, the > XLogRecGetOrigin(buf->record) shouldn't be in the XACT_XINFO_HAS_ORIGIN > block. I guess since it'll be InvalidRepOriginId otherwise, that makes sense. > Your test prints the origins from the transaction instead the changes - > why? I don't understand this part. Do you mean: + BEGIN (from "test_decoding: regression_slot" at lsn 0/AABBCCFF) ? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2015-10-15 17:05:33 +0800, Craig Ringer wrote: > On 15 October 2015 at 16:48, Andres Freund <andres@anarazel.de> wrote: > > On 2015-10-15 16:02:23 +0800, Craig Ringer wrote: > >> There's an oversight in replication origins support in logical > >> decoding, where the origin node ID isn't passed correctly to callbacks > >> except for the origin filter callback. All other callbacks see it as > >> InvalidRepOriginId. > > > > Only for the transaction, right? I.e. the stuff on changes should be > > correct? Your description sounds like it's more than that? > > > > I don't think your fix is entirely correct, the > > XLogRecGetOrigin(buf->record) shouldn't be in the XACT_XINFO_HAS_ORIGIN > > block. > > I guess since it'll be InvalidRepOriginId otherwise, that makes sense. That's not the point. XACT_XINFO_HAS_ORIGIN is about origin_lsn/timestamp, it doesn't have anything to do with the record origin (which is included in many more types of record than just commits). > > Your test prints the origins from the transaction instead the changes - > > why? > > I don't understand this part. Your test prints origin in commits - but changes can have individual origins. Greetings, Andres Freund
On 15 October 2015 at 17:43, Andres Freund <andres@anarazel.de> wrote: >> I guess since it'll be InvalidRepOriginId otherwise, that makes sense. > > That's not the point. XACT_XINFO_HAS_ORIGIN is about > origin_lsn/timestamp, it doesn't have anything to do with the record > origin (which is included in many more types of record than just > commits). Ok, I think I see. That's also why it wasn't incorporated into xl_xact_parsed_commit. I'll check which records can contain it and assign it in the appropriate decoding calls. I'll follow up in a while with an updated patch. >> > Your test prints the origins from the transaction instead the changes - >> > why? >> >> I don't understand this part. > > Your test prints origin in commits - but changes can have individual > origins. I was completely unaware of that, so thankyou. I'll change the tests to exercise that. Any preferences on the output format? Maybe: table public.origin_tbl: INSERT: id[integer]:6 data[text]:'from second origin' -- origin:'some_origin' origin_lsn:'0/1234' ? it's cluttered, but really I'm not sure there's a pretty way to pack that in, and it's only test output. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On October 15, 2015 1:02:04 PM GMT+02:00, Craig Ringer <craig@2ndquadrant.com> wrote: >On 15 October 2015 at 17:43, Andres Freund <andres@anarazel.de> wrote: > >>> I guess since it'll be InvalidRepOriginId otherwise, that makes >sense. >> >> That's not the point. XACT_XINFO_HAS_ORIGIN is about >> origin_lsn/timestamp, it doesn't have anything to do with the record >> origin (which is included in many more types of record than just >> commits). > >Ok, I think I see. That's also why it wasn't incorporated into >xl_xact_parsed_commit. > >I'll check which records can contain it and assign it in the >appropriate decoding calls. I'll follow up in a while with an updated >patch. As far as I can see all the other places have it assigned. >>> > Your test prints the origins from the transaction instead the >changes - >>> > why? >>> >>> I don't understand this part. >> >> Your test prints origin in commits - but changes can have individual >> origins. > >I was completely unaware of that, so thankyou. I'll change the tests >to exercise that. Any preferences on the output format? > >Maybe: > >table public.origin_tbl: INSERT: id[integer]:6 data[text]:'from second >origin' -- origin:'some_origin' origin_lsn:'0/1234' > >? > >it's cluttered, but really I'm not sure there's a pretty way to pack >that in, and it's only test output. I'm inclined not to commit this part - seems to add too much complications for the amount of coverage. But please use itfor testing. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone.
On 15 October 2015 at 19:04, Andres Freund <andres@anarazel.de> wrote: > As far as I can see all the other places have it assigned. Ok, thanks. Not much need for a followup patch then, if you're not using the test changes part. >>table public.origin_tbl: INSERT: id[integer]:6 data[text]:'from second >>origin' -- origin:'some_origin' origin_lsn:'0/1234' >> >>? >> >>it's cluttered, but really I'm not sure there's a pretty way to pack >>that in, and it's only test output. > > I'm inclined not to commit this part - seems to add too much complications for the amount of coverage. But please use itfor testing. It doesn't seem like this will be particularly vulnerable to regressions or have new record types added that need a check for them. I'd be inclined to add the info, but I have a higher tolerance for verbosity than you ;) I think it's worth adding a test for the change of origin mid-tx. I had no idea that was even possible. Testing forwarding of empty tx's is simple and should probably be there too. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 15 October 2015 at 20:11, Craig Ringer <craig@2ndquadrant.com> wrote: > On 15 October 2015 at 19:04, Andres Freund <andres@anarazel.de> wrote: > >> As far as I can see all the other places have it assigned. > > Ok, thanks. Not much need for a followup patch then, if you're not > using the test changes part. Here's what I used for my tests, anyway, including an updated fix. You'll note that the tests fail. When the replication origin is reset and set again with pg_replication_origin_xact_setup mid-xact, the origin identity exposed to the decoding plugin callbacks for all records (including those created before the origin change) is the latter origin, the one active at COMMIT time. Is that the intended behaviour? That the session identifier is determined by what was active at commit time, and only the lsn and timestamp vary throughout the xact? It looks like it from the code. Should pg_replication_origin_xact_reset() and pg_replication_origin_xact_setup() be permitted within a transaction? Or is this just a "well, don't do that"? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2015-10-15 20:52:41 +0800, Craig Ringer wrote: > You'll note that the tests fail. When the replication origin is reset > and set again with pg_replication_origin_xact_setup mid-xact, the > origin identity exposed to the decoding plugin callbacks for all > records (including those created before the origin change) is the > latter origin, the one active at COMMIT time. > > Is that the intended behaviour? That the session identifier is > determined by what was active at commit time, and only the lsn and > timestamp vary throughout the xact? It looks like it from the code. Uh. Isn't that just because you looked at txn->origin_id instead of the change's origin_id? Andres
On 15 October 2015 at 20:55, Andres Freund <andres@anarazel.de> wrote: > On 2015-10-15 20:52:41 +0800, Craig Ringer wrote: >> You'll note that the tests fail. When the replication origin is reset >> and set again with pg_replication_origin_xact_setup mid-xact, the >> origin identity exposed to the decoding plugin callbacks for all >> records (including those created before the origin change) is the >> latter origin, the one active at COMMIT time. >> >> Is that the intended behaviour? That the session identifier is >> determined by what was active at commit time, and only the lsn and >> timestamp vary throughout the xact? It looks like it from the code. > > Uh. Isn't that just because you looked at txn->origin_id instead of the > change's origin_id? Yes, it is. I didn't realise that the individual changes had their own origins, rather than changing the origin in the txn, though I can see that now that I know to look. Either I'm confused (likely) or the concept behind allowing this is critically flawed. Say some client code does set session origin=1 begin set xact lsn=0/123, ts=13:00 do some inserts set session origin=2 set xact lsn=0/199, ts=14:00 do some more inserts commit it seems to be decoded as: begin origin=2 lsn=0/199 inserts origin=1 lsn=0/199 more inserts origin=2 lsn=0/199 commit origin=2 lsn=0/199 i.e. the begin and commit have the final session origin. Individual changes have the session origin in effect at the time the change was created. The last-set origin commit timestamp and origin lsn override all prior ones; they aren't recorded per-change, only on the commit. This means you have change records with a change->origin_id that's from a completely different node, which makes no sense at all with the txn->origin_lsn . It matches the txn->origin_id, which is the same throughout, but then why even have the change->origin_id? I find the idea of each change having its own origin node - but not its own origin LSN - very confusing. For one thing the origin filter callback can't know about that, and can only filter based on the txn's origin. I guess that's the output plugin's problem - if it wants to cope with arbitrary mixed-origin tx's it can't use the origin filter and has to check each message. I really don't see how it makes any sense to allow the origin_id to change mid-tx. I can see how sending the origin_id for each change could make sense to allow future support for transaction streaming where decoding starts before we receive the commit record, but changing the origin_id within the tx doesn't make any sense. IMO changing the origin should be disallowed within a tx. Otherwise there needs to be some way to record the origin lsn and commit timestamp changing within the tx too. I was going to just send a patch to disallow changing the origin mid-tx, but I'm not sure I see a good way to do that since the origin is a session-level global, not part of the xact info. Document it as a "don't do that, if you do it you get to keep the pieces"? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 16 October 2015 at 11:51, Craig Ringer <craig@2ndquadrant.com> wrote: > Document it as a "don't do that, if you do it you get to keep the pieces"? Thinking about this some more, having per-change origins makes sense when you're not using origin LSNs, such as when you're not replaying from another PostgreSQL instance. So I _can_ see why it exists. I guess this is mostly a matter of adding some comments and/or some notes in the functions' docs to explain how it all fits together - that origins can be per-change, that the txn origin is the origin that was in effect at commit time, and that the lsn and commit timestamp are always those that were set at commit time, so you cannot use a per-change origin with the txn's lsn and expect it to make sense. Reasonable? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi all Patch revision 3 attached. It's a one-liner, with just the fix, and an explanatory note in the patch header. I'll leave the test changes for now, per Andres's feedback. I'll write up a proposed addition to the replication origin docs that explains the existence of separate origins for each change and for the tx as a whole, and explains how replication origins interacts with transaction commit timestamps. That'll come as a followup; I think it's worth getting the bug fix in now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 19 October 2015 at 21:43, Craig Ringer <craig@2ndquadrant.com> wrote: > Hi all > > Patch revision 3 attached. It's a one-liner, with just the fix, and an > explanatory note in the patch header. I'm bumping this because I think it's important not to miss it for 9.5, so it can't wait for the commitfest. It's just the one-liner with the fix its self. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2015-10-19 21:43:32 +0800, Craig Ringer wrote: > Patch revision 3 attached. It's a one-liner, with just the fix, and an > explanatory note in the patch header. Pushed to 9.5 and master. Thanks for noticing the issue, Andres Freund
On 9 November 2015 at 07:04, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2015-10-19 21:43:32 +0800, Craig Ringer wrote: >> Patch revision 3 attached. It's a one-liner, with just the fix, and an >> explanatory note in the patch header. > > Pushed to 9.5 and master. > > Thanks for noticing the issue, Thanks for taking the time to sanity-check and apply the fix. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services