Thread: BUG #14228: replication slot catalog_xmin not cleared on slot reuse
BUG #14228: replication slot catalog_xmin not cleared on slot reuse
From
andrew@tao11.riddles.org.uk
Date:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDIyOApMb2dnZWQgYnk6ICAg ICAgICAgIEFuZHJldyBHaWVydGgKRW1haWwgYWRkcmVzczogICAgICBhbmRy ZXdAdGFvMTEucmlkZGxlcy5vcmcudWsKUG9zdGdyZVNRTCB2ZXJzaW9uOiA5 LjZiZXRhMgpPcGVyYXRpbmcgc3lzdGVtOiAgIGFueQpEZXNjcmlwdGlvbjog ICAgICAgIAoKW015IGFuYWx5c2lzIG9mIGEgYnVnIHJlcG9ydGVkIG9uIElS QzsgcGxlYXNlIENDIG9yaWdpbmFsIHJlcG9ydGVyOiBtZCBhdApjaGV3eSBk b3QgY29tXQ0KDQpSZXBvcnRlZCBvbiBwZyA5LjQuNiBidXQgYWxzbyByZXBy b2R1Y2VkIG9uIDkuNmJldGEyDQoNCldoZW4gY3JlYXRpbmcgYSBwaHlzaWNh bCByZXBsaWNhdGlvbiBzbG90LCB0aGUgY2F0YWxvZ194bWluIGZpZWxkIG9m IHRoZSBuZXcKc2xvdCBpcyBub3QgaW5pdGlhbGl6ZWQuIElmIHRoZSBzbG90 IHN0b3JhZ2UgaGFkIHByZXZpb3VzbHkgYmVlbiB1c2VkIGZvciBhCmxvZ2lj YWwgc2xvdCwgdGhlIG9sZCBjYXRhbG9nX3htaW4gd2lsbCByZW1haW4gaW4g cGxhY2UgYW5kIGludGVyZmVyZSB3aXRoCnZhY3V1bS4NCg0KVHJpdmlhbGx5 IHJlcHJvZHVjaWJsZToNCg0Kc2VsZWN0IHBnX2NyZWF0ZV9sb2dpY2FsX3Jl cGxpY2F0aW9uX3Nsb3QoJ3Rlc3QnLCd0ZXN0X2RlY29kaW5nJyk7DQpzZWxl Y3QgcGdfZHJvcF9yZXBsaWNhdGlvbl9zbG90KCd0ZXN0Jyk7DQpzZWxlY3Qg cGdfY3JlYXRlX3BoeXNpY2FsX3JlcGxpY2F0aW9uX3Nsb3QoJ3Rlc3QnKTsN CnNlbGVjdCAqIGZyb20gcGdfcmVwbGljYXRpb25fc2xvdHM7DQogc2xvdF9u YW1lIHwgcGx1Z2luIHwgc2xvdF90eXBlIHwgZGF0b2lkIHwgZGF0YWJhc2Ug fCBhY3RpdmUgfCBhY3RpdmVfcGlkIHwKeG1pbiB8IGNhdGFsb2dfeG1pbiB8 IHJlc3RhcnRfbHNuIHwgY29uZmlybWVkX2ZsdXNoX2xzbiANCi0tLS0tLS0t LS0tKy0tLS0tLS0tKy0tLS0tLS0tLS0tKy0tLS0tLS0tKy0tLS0tLS0tLS0r LS0tLS0tLS0rLS0tLS0tLS0tLS0tKy0tLS0tLSstLS0tLS0tLS0tLS0tLSst LS0tLS0tLS0tLS0tKy0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KIHRlc3QgICAg ICB8ICAgICAgICB8IHBoeXNpY2FsICB8ICAgICAgICB8ICAgICAgICAgIHwg ZiAgICAgIHwgICAgICAgICAgICB8IAogICAgfCAgICAgICAgICA1NDYgfCAg ICAgICAgICAgICB8IDAvMTUyNTQzOA0KKDEgcm93KQ0KCgo=
On Wed, Jul 6, 2016 at 2:35 AM, <andrew@tao11.riddles.org.uk> wrote: > When creating a physical replication slot, the catalog_xmin field of the new > slot is not initialized. If the slot storage had previously been used for a > logical slot, the old catalog_xmin will remain in place and interfere with > vacuum. Good catch! The same applies to confirmed_flush_lsn, which is used only by logical decoding and should remain as NULL for physical slots. So I propose the patch attached to address both problems. -- Michael
Attachment
>>>>> "Michael" == Michael Paquier <michael.paquier@gmail.com> writes: >> When creating a physical replication slot, the catalog_xmin field of >> the new slot is not initialized. If the slot storage had previously >> been used for a logical slot, the old catalog_xmin will remain in >> place and interfere with vacuum. Michael> Good catch! The same applies to confirmed_flush_lsn, which is Michael> used only by logical decoding and should remain as NULL for Michael> physical slots. So I propose the patch attached to address Michael> both problems. What about slot->effective_catalog_xmin ? -- Andrew (irc:RhodiumToad)
_ On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Michael" == Michael Paquier <michael.paquier@gmail.com> writes: > > >> When creating a physical replication slot, the catalog_xmin field of > >> the new slot is not initialized. If the slot storage had previously > >> been used for a logical slot, the old catalog_xmin will remain in > >> place and interfere with vacuum. > > Michael> Good catch! The same applies to confirmed_flush_lsn, which is > Michael> used only by logical decoding and should remain as NULL for > Michael> physical slots. So I propose the patch attached to address > Michael> both problems. > > What about slot->effective_catalog_xmin ? Yes. I guess so, as well as the other candidate_* fields in the slot to begin from a clean state. -- Michael
Attachment
* Michael Paquier (michael.paquier@gmail.com) wrote: > On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth > <andrew@tao11.riddles.org.uk> wrote: > >>>>>> "Michael" =3D=3D Michael Paquier <michael.paquier@gmail.com> write= s: > > > > >> When creating a physical replication slot, the catalog_xmin field of > > >> the new slot is not initialized. If the slot storage had previously > > >> been used for a logical slot, the old catalog_xmin will remain in > > >> place and interfere with vacuum. > > > > Michael> Good catch! The same applies to confirmed_flush_lsn, which is > > Michael> used only by logical decoding and should remain as NULL for > > Michael> physical slots. So I propose the patch attached to address > > Michael> both problems. > > > > What about slot->effective_catalog_xmin ? >=20 > Yes. I guess so, as well as the other candidate_* fields in the slot > to begin from a clean state. Seems like we should try to get this in before the next round of point releases...? Thanks! Stephen
On Tue, Jul 26, 2016 at 12:25 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Michael Paquier (michael.paquier@gmail.com) wrote: >> On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth >> <andrew@tao11.riddles.org.uk> wrote: >> >>>>>> "Michael" == Michael Paquier <michael.paquier@gmail.com> writes: >> > >> > >> When creating a physical replication slot, the catalog_xmin field of >> > >> the new slot is not initialized. If the slot storage had previously >> > >> been used for a logical slot, the old catalog_xmin will remain in >> > >> place and interfere with vacuum. >> > >> > Michael> Good catch! The same applies to confirmed_flush_lsn, which is >> > Michael> used only by logical decoding and should remain as NULL for >> > Michael> physical slots. So I propose the patch attached to address >> > Michael> both problems. >> > >> > What about slot->effective_catalog_xmin ? >> >> Yes. I guess so, as well as the other candidate_* fields in the slot >> to begin from a clean state. > > Seems like we should try to get this in before the next round of point > releases...? That would be nice, I would guess that Andres or Robert (added in CC) are the best fits to commit this patch, even if this is just a variable initialization issue. -- Michael
On 2016-07-06 13:07:36 +0900, Michael Paquier wrote: > _ > > On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth > <andrew@tao11.riddles.org.uk> wrote: > >>>>>> "Michael" == Michael Paquier <michael.paquier@gmail.com> writes: > > > > >> When creating a physical replication slot, the catalog_xmin field of > > >> the new slot is not initialized. If the slot storage had previously > > >> been used for a logical slot, the old catalog_xmin will remain in > > >> place and interfere with vacuum. > > > > Michael> Good catch! The same applies to confirmed_flush_lsn, which is > > Michael> used only by logical decoding and should remain as NULL for > > Michael> physical slots. So I propose the patch attached to address > > Michael> both problems. > > > > What about slot->effective_catalog_xmin ? > > Yes. I guess so, as well as the other candidate_* fields in the slot > to begin from a clean state. I think it'd be better if we explicitly zeroed .data - that way the likelihood of future bugs of the same ilk is smaller. Greetings, Andres Freund
On Thu, Jul 28, 2016 at 9:24 AM, Andres Freund <andres@anarazel.de> wrote: > I think it'd be better if we explicitly zeroed .data - that way the > likelihood of future bugs of the same ilk is smaller. Okay, I have spent some time looking at all the fields here, and their significance before reaching this code path in ReplicationSlotCreate, but did not find any hole if slot->data is zeroed. So here is an updated patch. You could get rid of all the field initializations I have done for slot->data, but I think that's cheap to keep them. -- Michael
Attachment
>>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes: Stephen> Seems like we should try to get this in before the next round Stephen> of point releases...? I notice that this didn't happen.... -- Andrew (irc:RhodiumToad)
On Tue, Aug 9, 2016 at 10:30 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes: > > Stephen> Seems like we should try to get this in before the next round > Stephen> of point releases...? > > I notice that this didn't happen.... At this point I have added it to the next CF as a bug fix: https://commitfest.postgresql.org/10/705/ -- Michael
On 2016-08-09 14:30:58 +0100, Andrew Gierth wrote: > >>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes: > > Stephen> Seems like we should try to get this in before the next round > Stephen> of point releases...? > > I notice that this didn't happen.... I'll piuck it up when I'm back from holidays (Tuesday/Wednesday). Sorry for the delay.
On Mon, Aug 15, 2016 at 10:03 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-08-09 14:30:58 +0100, Andrew Gierth wrote: >> >>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes: >> >> Stephen> Seems like we should try to get this in before the next round >> Stephen> of point releases...? >> >> I notice that this didn't happen.... > > I'll piuck it up when I'm back from holidays (Tuesday/Wednesday). Sorry > for the delay. Thanks! -- Michael
On 2016-07-28 14:22:29 +0900, Michael Paquier wrote: > On Thu, Jul 28, 2016 at 9:24 AM, Andres Freund <andres@anarazel.de> wrote: > > I think it'd be better if we explicitly zeroed .data - that way the > > likelihood of future bugs of the same ilk is smaller. > > Okay, I have spent some time looking at all the fields here, and their > significance before reaching this code path in ReplicationSlotCreate, > but did not find any hole if slot->data is zeroed. So here is an > updated patch. You could get rid of all the field initializations I > have done for slot->data, but I think that's cheap to keep them. Pushed without the additional initializations - they're imo more confusing than helpful - and with some more reordering to match the struct order. Regards, Andres