Thread: Long text values destroys logical replication slots

Long text values destroys logical replication slots

From
Adam Dratwiński
Date:
I recently encountered a bug on my production postgres which results in killing replication slot. It happens when there is an UPDATE statement called on a record having text column with a very long value (about 1mb). The problem can be easily reproduced by this query:


Here I am using decoderbufs decoder https://github.com/krzychk/decoderbufs but it doesn't seems to be related with this plugin as it also happen on the postgres test decoder.

Re: Long text values destroys logical replication slots

From
Andres Freund
Date:
On October 28, 2015 7:51:18 PM GMT+01:00, "Adam Dratwiński" <adam.dratwinski@gmail.com> wrote:
>I recently encountered a bug on my production postgres which results in
>killing replication slot. It happens when there is an UPDATE statement
>called on a record having text column with a very long value (about
>1mb).
>The problem can be easily reproduced by this query:
>
>https://gist.github.com/adwinsky/f7a6dc4382776fb9ea60
>
>Here I am using decoderbufs decoder
>https://github.com/krzychk/decoderbufs
>but it doesn't seems to be related with this plugin as it also happen
>on
>the postgres test decoder.

Did you try the last minor release?

Andres
Hi
---
Please excuse brevity and formatting - I am writing this on my mobile phone.

Re: Long text values destroys logical replication slots

From
Andres Freund
Date:
Can you reproduce it with test_decoding as the output plugin?
---
Please excuse brevity and formatting - I am writing this on my mobile phone.

Re: Long text values destroys logical replication slots

From
Adam Dratwiński
Date:
<div dir="ltr">I managed to reproduce this problem on postgres 9.4.1 and 9.4.5 on osx and gentoo.</div><div
class="gmail_extra"><br/><div class="gmail_quote">On Wed, Oct 28, 2015 at 9:54 PM, Andres Freund <span dir="ltr"><<a
href="mailto:andres@anarazel.de"target="_blank">andres@anarazel.de</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div
class="h5">OnOctober 28, 2015 7:51:18 PM GMT+01:00, "Adam Dratwiński" <<a
href="mailto:adam.dratwinski@gmail.com">adam.dratwinski@gmail.com</a>>wrote:<br /> >I recently encountered a bug
onmy production postgres which results in<br /> >killing replication slot. It happens when there is an UPDATE
statement<br/> >called on a record having text column with a very long value (about<br /> >1mb).<br /> >The
problemcan be easily reproduced by this query:<br /> ><br /> ><a
href="https://gist.github.com/adwinsky/f7a6dc4382776fb9ea60"rel="noreferrer"
target="_blank">https://gist.github.com/adwinsky/f7a6dc4382776fb9ea60</a><br/> ><br /> >Here I am using
decoderbufsdecoder<br /> ><a href="https://github.com/krzychk/decoderbufs" rel="noreferrer"
target="_blank">https://github.com/krzychk/decoderbufs</a><br/> >but it doesn't seems to be related with this plugin
asit also happen<br /> >on<br /> >the postgres test decoder.<br /><br /></div></div>Did you try the last minor
release?<br/><br /> Andres<br /> Hi<br /> ---<br /> Please excuse brevity and formatting - I am writing this on my
mobilephone.<br /></blockquote></div><br /></div> 

Re: Long text values destroys logical replication slots

From
Michael Paquier
Date:
On Wed, Oct 28, 2015 at 10:17 PM, Andres Freund <andres@anarazel.de> wrote:
> Can you reproduce it with test_decoding as the output plugin?

You can just use that for example to get an assertion failure:
CREATE TABLE a (b text);
ALTER TABLE ONLY a REPLICA IDENTITY FULL;
SELECT * FROM pg_create_logical_replication_slot('new', 'test_decoding');
INSERT INTO a (b) VALUES (repeat('k', 2000000));
UPDATE a SET b = 'c';
select * from pg_logical_slot_peek_changes('new', NULL, NULL); -- boom

frame #3: 0x0000000100458ca9
postgres`DecodeXLogTuple(data=0x00007fb7e2126046, len=22910,
tuple=0x000000010a32e038) + 137 at decode.c:856
   853         int            datalen = len - SizeOfHeapHeader;
   854
   855         Assert(datalen >= 0);
-> 856         Assert(datalen <= MaxHeapTupleSize);
(lldb) p datalen
(int) $0 = 22905
--
Michael

Re: Long text values destroys logical replication slots

From
Andres Freund
Date:
On 2015-10-28 23:00:53 +0100, Michael Paquier wrote:
> On Wed, Oct 28, 2015 at 10:17 PM, Andres Freund <andres@anarazel.de> wrote:
> > Can you reproduce it with test_decoding as the output plugin?
>
> You can just use that for example to get an assertion failure:
> CREATE TABLE a (b text);
> ALTER TABLE ONLY a REPLICA IDENTITY FULL;
> SELECT * FROM pg_create_logical_replication_slot('new', 'test_decoding');
> INSERT INTO a (b) VALUES (repeat('k', 2000000));
> UPDATE a SET b = 'c';
> select * from pg_logical_slot_peek_changes('new', NULL, NULL); -- boom
>
> frame #3: 0x0000000100458ca9
> postgres`DecodeXLogTuple(data=0x00007fb7e2126046, len=22910,
> tuple=0x000000010a32e038) + 137 at decode.c:856
>    853         int            datalen = len - SizeOfHeapHeader;
>    854
>    855         Assert(datalen >= 0);
> -> 856         Assert(datalen <= MaxHeapTupleSize);
> (lldb) p datalen
> (int) $0 = 22905

Ugh, that's some monumental stupidity on my side in the handling of
oldtuple values. Newtuple datums never can be bigger than 8k (all
relevant columns point into toasted datums)- but that's absolutely not
the case for the "old" values.  That can't happen for primary keys (due
to the btree limitations), which is why this wasn't noticed so far.
Gotta think about this one.

Greetings,

Andres Freund

Re: Long text values destroys logical replication slots

From
Michael Paquier
Date:
On Thu, Oct 29, 2015 at 7:23 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-10-28 23:00:53 +0100, Michael Paquier wrote:
>> On Wed, Oct 28, 2015 at 10:17 PM, Andres Freund <andres@anarazel.de> wrote:
>> > Can you reproduce it with test_decoding as the output plugin?
>>
>> You can just use that for example to get an assertion failure:
>> CREATE TABLE a (b text);
>> ALTER TABLE ONLY a REPLICA IDENTITY FULL;
>> SELECT * FROM pg_create_logical_replication_slot('new', 'test_decoding');
>> INSERT INTO a (b) VALUES (repeat('k', 2000000));
>> UPDATE a SET b = 'c';
>> select * from pg_logical_slot_peek_changes('new', NULL, NULL); -- boom
>>
>> frame #3: 0x0000000100458ca9
>> postgres`DecodeXLogTuple(data=0x00007fb7e2126046, len=22910,
>> tuple=0x000000010a32e038) + 137 at decode.c:856
>>    853         int            datalen = len - SizeOfHeapHeader;
>>    854
>>    855         Assert(datalen >= 0);
>> -> 856         Assert(datalen <= MaxHeapTupleSize);
>> (lldb) p datalen
>> (int) $0 = 22905
>
> Ugh, that's some monumental stupidity on my side in the handling of
> oldtuple values. Newtuple datums never can be bigger than 8k (all
> relevant columns point into toasted datums)- but that's absolutely not
> the case for the "old" values.  That can't happen for primary keys (due
> to the btree limitations), which is why this wasn't noticed so far.
> Gotta think about this one.

FWIW, I am going to play with that..
--
Michael

Re: Long text values destroys logical replication slots

From
Andres Freund
Date:
On January 29, 2016 8:07:29 AM GMT+01:00, Michael Paquier <michael.paquier@gmail.com> wrote:
>On Thu, Oct 29, 2015 at 7:23 AM, Andres Freund <andres@anarazel.de>
>wrote:
>> On 2015-10-28 23:00:53 +0100, Michael Paquier wrote:
>>> On Wed, Oct 28, 2015 at 10:17 PM, Andres Freund <andres@anarazel.de>
>wrote:
>>> > Can you reproduce it with test_decoding as the output plugin?
>>>
>>> You can just use that for example to get an assertion failure:
>>> CREATE TABLE a (b text);
>>> ALTER TABLE ONLY a REPLICA IDENTITY FULL;
>>> SELECT * FROM pg_create_logical_replication_slot('new',
>'test_decoding');
>>> INSERT INTO a (b) VALUES (repeat('k', 2000000));
>>> UPDATE a SET b = 'c';
>>> select * from pg_logical_slot_peek_changes('new', NULL, NULL); --
>boom
>>>
>>> frame #3: 0x0000000100458ca9
>>> postgres`DecodeXLogTuple(data=0x00007fb7e2126046, len=22910,
>>> tuple=0x000000010a32e038) + 137 at decode.c:856
>>>    853         int            datalen = len - SizeOfHeapHeader;
>>>    854
>>>    855         Assert(datalen >= 0);
>>> -> 856         Assert(datalen <= MaxHeapTupleSize);
>>> (lldb) p datalen
>>> (int) $0 = 22905
>>
>> Ugh, that's some monumental stupidity on my side in the handling of
>> oldtuple values. Newtuple datums never can be bigger than 8k (all
>> relevant columns point into toasted datums)- but that's absolutely
>not
>> the case for the "old" values.  That can't happen for primary keys
>(due
>> to the btree limitations), which is why this wasn't noticed so far.
>> Gotta think about this one.
>
>FWIW, I am going to play with that..

I've a patch for this, in just not yet happy with the API changes...

---
Please excuse brevity and formatting - I am writing this on my mobile phone.

Re: Long text values destroys logical replication slots

From
Michael Paquier
Date:
On Fri, Jan 29, 2016 at 4:37 PM, Andres Freund <andres@anarazel.de> wrote:
> On January 29, 2016 8:07:29 AM GMT+01:00, Michael Paquier <michael.paquier@gmail.com> wrote:
>>On Thu, Oct 29, 2015 at 7:23 AM, Andres Freund <andres@anarazel.de>
>>wrote:
>>> On 2015-10-28 23:00:53 +0100, Michael Paquier wrote:
>>>> On Wed, Oct 28, 2015 at 10:17 PM, Andres Freund <andres@anarazel.de>
>>wrote:
>>>> > Can you reproduce it with test_decoding as the output plugin?
>>>>
>>>> You can just use that for example to get an assertion failure:
>>>> CREATE TABLE a (b text);
>>>> ALTER TABLE ONLY a REPLICA IDENTITY FULL;
>>>> SELECT * FROM pg_create_logical_replication_slot('new',
>>'test_decoding');
>>>> INSERT INTO a (b) VALUES (repeat('k', 2000000));
>>>> UPDATE a SET b = 'c';
>>>> select * from pg_logical_slot_peek_changes('new', NULL, NULL); --
>>boom
>>>>
>>>> frame #3: 0x0000000100458ca9
>>>> postgres`DecodeXLogTuple(data=0x00007fb7e2126046, len=22910,
>>>> tuple=0x000000010a32e038) + 137 at decode.c:856
>>>>    853         int            datalen = len - SizeOfHeapHeader;
>>>>    854
>>>>    855         Assert(datalen >= 0);
>>>> -> 856         Assert(datalen <= MaxHeapTupleSize);
>>>> (lldb) p datalen
>>>> (int) $0 = 22905
>>>
>>> Ugh, that's some monumental stupidity on my side in the handling of
>>> oldtuple values. Newtuple datums never can be bigger than 8k (all
>>> relevant columns point into toasted datums)- but that's absolutely
>>not
>>> the case for the "old" values.  That can't happen for primary keys
>>(due
>>> to the btree limitations), which is why this wasn't noticed so far.
>>> Gotta think about this one.
>>
>>FWIW, I am going to play with that..
>
> I've a patch for this, in just not yet happy with the API changes...

Oh, OK, good to know. If you want to have extra eyes on it, just ping
me here then.
--
Michael

Re: Long text values destroys logical replication slots

From
Andres Freund
Date:
Hi Adam,

On 2015-10-28 19:51:18 +0100, Adam Dratwiński wrote:
> I recently encountered a bug on my production postgres which results in
> killing replication slot. It happens when there is an UPDATE statement
> called on a record having text column with a very long value (about 1mb).
> The problem can be easily reproduced by this query:
>
> https://gist.github.com/adwinsky/f7a6dc4382776fb9ea60
>
> Here I am using decoderbufs decoder https://github.com/krzychk/decoderbufs
> but it doesn't seems to be related with this plugin as it also happen on
> the postgres test decoder.

I pushed a fix for this; it'll be included in the next maintenance
releases which are coming up soon-ish. Sorry that it took so long.

Thanks for the report!

Regards,

Andres