Thread: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
andrew@tao11.riddles.org.uk
Date:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz
aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDIyOApMb2dnZWQgYnk6ICAg
ICAgICAgIEFuZHJldyBHaWVydGgKRW1haWwgYWRkcmVzczogICAgICBhbmRy
ZXdAdGFvMTEucmlkZGxlcy5vcmcudWsKUG9zdGdyZVNRTCB2ZXJzaW9uOiA5
LjZiZXRhMgpPcGVyYXRpbmcgc3lzdGVtOiAgIGFueQpEZXNjcmlwdGlvbjog
ICAgICAgIAoKW015IGFuYWx5c2lzIG9mIGEgYnVnIHJlcG9ydGVkIG9uIElS
QzsgcGxlYXNlIENDIG9yaWdpbmFsIHJlcG9ydGVyOiBtZCBhdApjaGV3eSBk
b3QgY29tXQ0KDQpSZXBvcnRlZCBvbiBwZyA5LjQuNiBidXQgYWxzbyByZXBy
b2R1Y2VkIG9uIDkuNmJldGEyDQoNCldoZW4gY3JlYXRpbmcgYSBwaHlzaWNh
bCByZXBsaWNhdGlvbiBzbG90LCB0aGUgY2F0YWxvZ194bWluIGZpZWxkIG9m
IHRoZSBuZXcKc2xvdCBpcyBub3QgaW5pdGlhbGl6ZWQuIElmIHRoZSBzbG90
IHN0b3JhZ2UgaGFkIHByZXZpb3VzbHkgYmVlbiB1c2VkIGZvciBhCmxvZ2lj
YWwgc2xvdCwgdGhlIG9sZCBjYXRhbG9nX3htaW4gd2lsbCByZW1haW4gaW4g
cGxhY2UgYW5kIGludGVyZmVyZSB3aXRoCnZhY3V1bS4NCg0KVHJpdmlhbGx5
IHJlcHJvZHVjaWJsZToNCg0Kc2VsZWN0IHBnX2NyZWF0ZV9sb2dpY2FsX3Jl
cGxpY2F0aW9uX3Nsb3QoJ3Rlc3QnLCd0ZXN0X2RlY29kaW5nJyk7DQpzZWxl
Y3QgcGdfZHJvcF9yZXBsaWNhdGlvbl9zbG90KCd0ZXN0Jyk7DQpzZWxlY3Qg
cGdfY3JlYXRlX3BoeXNpY2FsX3JlcGxpY2F0aW9uX3Nsb3QoJ3Rlc3QnKTsN
CnNlbGVjdCAqIGZyb20gcGdfcmVwbGljYXRpb25fc2xvdHM7DQogc2xvdF9u
YW1lIHwgcGx1Z2luIHwgc2xvdF90eXBlIHwgZGF0b2lkIHwgZGF0YWJhc2Ug
fCBhY3RpdmUgfCBhY3RpdmVfcGlkIHwKeG1pbiB8IGNhdGFsb2dfeG1pbiB8
IHJlc3RhcnRfbHNuIHwgY29uZmlybWVkX2ZsdXNoX2xzbiANCi0tLS0tLS0t
LS0tKy0tLS0tLS0tKy0tLS0tLS0tLS0tKy0tLS0tLS0tKy0tLS0tLS0tLS0r
LS0tLS0tLS0rLS0tLS0tLS0tLS0tKy0tLS0tLSstLS0tLS0tLS0tLS0tLSst
LS0tLS0tLS0tLS0tKy0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KIHRlc3QgICAg
ICB8ICAgICAgICB8IHBoeXNpY2FsICB8ICAgICAgICB8ICAgICAgICAgIHwg
ZiAgICAgIHwgICAgICAgICAgICB8IAogICAgfCAgICAgICAgICA1NDYgfCAg
ICAgICAgICAgICB8IDAvMTUyNTQzOA0KKDEgcm93KQ0KCgo=

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Michael Paquier
Date:
On Wed, Jul 6, 2016 at 2:35 AM,  <andrew@tao11.riddles.org.uk> wrote:
> When creating a physical replication slot, the catalog_xmin field of the new
> slot is not initialized. If the slot storage had previously been used for a
> logical slot, the old catalog_xmin will remain in place and interfere with
> vacuum.

Good catch! The same applies to confirmed_flush_lsn, which is used
only by logical decoding and should remain as NULL for physical slots.
So I propose the patch attached to address both problems.
--
Michael

Attachment

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Andrew Gierth
Date:
>>>>> "Michael" == Michael Paquier <michael.paquier@gmail.com> writes:

 >> When creating a physical replication slot, the catalog_xmin field of
 >> the new slot is not initialized. If the slot storage had previously
 >> been used for a logical slot, the old catalog_xmin will remain in
 >> place and interfere with vacuum.

 Michael> Good catch! The same applies to confirmed_flush_lsn, which is
 Michael> used only by logical decoding and should remain as NULL for
 Michael> physical slots.  So I propose the patch attached to address
 Michael> both problems.

What about slot->effective_catalog_xmin ?

--
Andrew (irc:RhodiumToad)

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Michael Paquier
Date:
_

On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Michael" == Michael Paquier <michael.paquier@gmail.com> writes:
>
>  >> When creating a physical replication slot, the catalog_xmin field of
>  >> the new slot is not initialized. If the slot storage had previously
>  >> been used for a logical slot, the old catalog_xmin will remain in
>  >> place and interfere with vacuum.
>
>  Michael> Good catch! The same applies to confirmed_flush_lsn, which is
>  Michael> used only by logical decoding and should remain as NULL for
>  Michael> physical slots.  So I propose the patch attached to address
>  Michael> both problems.
>
> What about slot->effective_catalog_xmin ?

Yes. I guess so, as well as the other candidate_* fields in the slot
to begin from a clean state.
--
Michael

Attachment

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Stephen Frost
Date:
* Michael Paquier (michael.paquier@gmail.com) wrote:
> On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth
> <andrew@tao11.riddles.org.uk> wrote:
> >>>>>> "Michael" =3D=3D Michael Paquier <michael.paquier@gmail.com> write=
s:
> >
> >  >> When creating a physical replication slot, the catalog_xmin field of
> >  >> the new slot is not initialized. If the slot storage had previously
> >  >> been used for a logical slot, the old catalog_xmin will remain in
> >  >> place and interfere with vacuum.
> >
> >  Michael> Good catch! The same applies to confirmed_flush_lsn, which is
> >  Michael> used only by logical decoding and should remain as NULL for
> >  Michael> physical slots.  So I propose the patch attached to address
> >  Michael> both problems.
> >
> > What about slot->effective_catalog_xmin ?
>=20
> Yes. I guess so, as well as the other candidate_* fields in the slot
> to begin from a clean state.

Seems like we should try to get this in before the next round of point
releases...?

Thanks!

Stephen

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Michael Paquier
Date:
On Tue, Jul 26, 2016 at 12:25 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Michael Paquier (michael.paquier@gmail.com) wrote:
>> On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth
>> <andrew@tao11.riddles.org.uk> wrote:
>> >>>>>> "Michael" == Michael Paquier <michael.paquier@gmail.com> writes:
>> >
>> >  >> When creating a physical replication slot, the catalog_xmin field of
>> >  >> the new slot is not initialized. If the slot storage had previously
>> >  >> been used for a logical slot, the old catalog_xmin will remain in
>> >  >> place and interfere with vacuum.
>> >
>> >  Michael> Good catch! The same applies to confirmed_flush_lsn, which is
>> >  Michael> used only by logical decoding and should remain as NULL for
>> >  Michael> physical slots.  So I propose the patch attached to address
>> >  Michael> both problems.
>> >
>> > What about slot->effective_catalog_xmin ?
>>
>> Yes. I guess so, as well as the other candidate_* fields in the slot
>> to begin from a clean state.
>
> Seems like we should try to get this in before the next round of point
> releases...?

That would be nice, I would guess that Andres or Robert (added in CC)
are the best fits to commit this patch, even if this is just a
variable initialization issue.
--
Michael

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Andres Freund
Date:
On 2016-07-06 13:07:36 +0900, Michael Paquier wrote:
> _
>
> On Wed, Jul 6, 2016 at 12:56 PM, Andrew Gierth
> <andrew@tao11.riddles.org.uk> wrote:
> >>>>>> "Michael" == Michael Paquier <michael.paquier@gmail.com> writes:
> >
> >  >> When creating a physical replication slot, the catalog_xmin field of
> >  >> the new slot is not initialized. If the slot storage had previously
> >  >> been used for a logical slot, the old catalog_xmin will remain in
> >  >> place and interfere with vacuum.
> >
> >  Michael> Good catch! The same applies to confirmed_flush_lsn, which is
> >  Michael> used only by logical decoding and should remain as NULL for
> >  Michael> physical slots.  So I propose the patch attached to address
> >  Michael> both problems.
> >
> > What about slot->effective_catalog_xmin ?
>
> Yes. I guess so, as well as the other candidate_* fields in the slot
> to begin from a clean state.

I think it'd be better if we explicitly zeroed .data - that way the
likelihood of future bugs of the same ilk is smaller.

Greetings,

Andres Freund

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Michael Paquier
Date:
On Thu, Jul 28, 2016 at 9:24 AM, Andres Freund <andres@anarazel.de> wrote:
> I think it'd be better if we explicitly zeroed .data - that way the
> likelihood of future bugs of the same ilk is smaller.

Okay, I have spent some time looking at all the fields here, and their
significance before reaching this code path in ReplicationSlotCreate,
but did not find any hole if slot->data is zeroed. So here is an
updated patch. You could get rid of all the field initializations I
have done for slot->data, but I think that's cheap to keep them.
--
Michael

Attachment

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Andrew Gierth
Date:
>>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes:

 Stephen> Seems like we should try to get this in before the next round
 Stephen> of point releases...?

I notice that this didn't happen....

--
Andrew (irc:RhodiumToad)

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Michael Paquier
Date:
On Tue, Aug 9, 2016 at 10:30 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes:
>
>  Stephen> Seems like we should try to get this in before the next round
>  Stephen> of point releases...?
>
> I notice that this didn't happen....

At this point I have added it to the next CF as a bug fix:
https://commitfest.postgresql.org/10/705/
--
Michael

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Andres Freund
Date:
On 2016-08-09 14:30:58 +0100, Andrew Gierth wrote:
> >>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes:
>
>  Stephen> Seems like we should try to get this in before the next round
>  Stephen> of point releases...?
>
> I notice that this didn't happen....

I'll piuck it up when I'm back from holidays (Tuesday/Wednesday). Sorry
for the delay.

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Michael Paquier
Date:
On Mon, Aug 15, 2016 at 10:03 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-08-09 14:30:58 +0100, Andrew Gierth wrote:
>> >>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes:
>>
>>  Stephen> Seems like we should try to get this in before the next round
>>  Stephen> of point releases...?
>>
>> I notice that this didn't happen....
>
> I'll piuck it up when I'm back from holidays (Tuesday/Wednesday). Sorry
> for the delay.

Thanks!
--
Michael

Re: BUG #14228: replication slot catalog_xmin not cleared on slot reuse

From
Andres Freund
Date:
On 2016-07-28 14:22:29 +0900, Michael Paquier wrote:
> On Thu, Jul 28, 2016 at 9:24 AM, Andres Freund <andres@anarazel.de> wrote:
> > I think it'd be better if we explicitly zeroed .data - that way the
> > likelihood of future bugs of the same ilk is smaller.
>
> Okay, I have spent some time looking at all the fields here, and their
> significance before reaching this code path in ReplicationSlotCreate,
> but did not find any hole if slot->data is zeroed. So here is an
> updated patch. You could get rid of all the field initializations I
> have done for slot->data, but I think that's cheap to keep them.

Pushed without the additional initializations - they're imo more
confusing than helpful - and with some more reordering to match the
struct order.

Regards,

Andres