Thread: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

[BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

From
"anderson"
Date:
hi team:

I found a postgresql 9.4.10 Logical decoding problem
heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than the value field of uint16.

a uint32 variable is assigned to uint16

The structure that holds the oldkey length is:
typedef struct xl_heap_header_len
{
uint16 t_len;
xl_heap_header header;
} xl_heap_header_len;

The problem will lead to logic decoding failure when oldtuplene > 65535 and relreplident = FULL.
The table that triggers the problem is usually relreplident = FULL.
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff} span.s1 {font-variant-ligatures: no-common-ligatures}

Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

From
Michael Paquier
Date:
On Tue, Dec 27, 2016 at 12:30 PM, anderson <anderson2013@qq.com> wrote:
> I found a postgresql 9.4.10 Logical decoding problem
> heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than
> the value field of uint16.
>
> a uint32 variable is assigned to uint16
>
> The structure that holds the oldkey length is:
> typedef struct xl_heap_header_len
> {
> uint16 t_len;
> xl_heap_header header;
> } xl_heap_header_len;
>
> The problem will lead to logic decoding failure when oldtuplene > 65535 and
> relreplident = FULL.
> The table that triggers the problem is usually relreplident = FULL.

Do you have a test case able to reproduce the problem?
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

From
"anderson"
Date:
yes i can.

reproduce the step:

1. create table testcase(a int,b int,c text,d text);

2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;

3. /u01/pgsql/bin/psql  -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv'  DELIMITER E'\t'  csv QUOTE '''' ";

4. select pg_create_logical_replication_slot('logical_slot','test_decoding');

--update one row
5. update testcase set b = 1; 

6. select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);

7. postgres=# select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
ERROR:  compressed data is corrupt


------------------ 原始邮件 ------------------
发件人: "Michael Paquier";<michael.paquier@gmail.com>;
发送时间: 2016年12月27日(星期二) 中午1:29
收件人: "anderson"<anderson2013@qq.com>;
抄送: "pgsql-bugs"<pgsql-bugs@postgresql.org>;
主题: Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

On Tue, Dec 27, 2016 at 12:30 PM, anderson <anderson2013@qq.com> wrote:
> I found a postgresql 9.4.10 Logical decoding problem
> heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than
> the value field of uint16.
>
> a uint32 variable is assigned to uint16
>
> The structure that holds the oldkey length is:
> typedef struct xl_heap_header_len
> {
> uint16 t_len;
> xl_heap_header header;
> } xl_heap_header_len;
>
> The problem will lead to logic decoding failure when oldtuplene > 65535 and
> relreplident = FULL.
> The table that triggers the problem is usually relreplident = FULL.

Do you have a test case able to reproduce the problem?
--
Michael


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

From
"anderson"
Date:
yes i can.

reproduce the step:

1. create table testcase(a int,b int,c text,d text);

2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;

3. psql  -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv'  DELIMITER E'\t'  csv QUOTE '''' ";

4. select pg_create_logical_replication_slot('logical_slot','test_decoding');

--update one row
5. update testcase set b = 1;

6. select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
ERROR:  compressed data is corrupt
--The data length is incorrect and decompression fails

------------------ 原始邮件 ------------------
发件人: "anderson";<anderson2013@qq.com>;
发送时间: 2016年12月28日(星期三) 晚上7:24
收件人: "Michael Paquier"<michael.paquier@gmail.com>;
抄送: "pgsql-bugs"<pgsql-bugs@postgresql.org>;
主题: [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

yes i can.

reproduce the step:

1. create table testcase(a int,b int,c text,d text);

2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;

3. /u01/pgsql/bin/psql  -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv'  DELIMITER E'\t'  csv QUOTE '''' ";

4. select pg_create_logical_replication_slot('logical_slot','test_decoding');

--update one row
5. update testcase set b = 1;

6. select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);

7. postgres=# select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
ERROR:  compressed data is corrupt


------------------ 原始邮件 ------------------
发件人: "Michael Paquier";<michael.paquier@gmail.com>;
发送时间: 2016年12月27日(星期二) 中午1:29
收件人: "anderson"<anderson2013@qq.com>;
抄送: "pgsql-bugs"<pgsql-bugs@postgresql.org>;
主题: Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

On Tue, Dec 27, 2016 at 12:30 PM, anderson <anderson2013@qq.com> wrote:
> I found a postgresql 9.4.10 Logical decoding problem
> heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than
> the value field of uint16.
>
> a uint32 variable is assigned to uint16
>
> The structure that holds the oldkey length is:
> typedef struct xl_heap_header_len
> {
> uint16 t_len;
> xl_heap_header header;
> } xl_heap_header_len;
>
> The problem will lead to logic decoding failure when oldtuplene > 65535 and
> relreplident = FULL.
> The table that triggers the problem is usually relreplident = FULL.

Do you have a test case able to reproduce the problem?
--
Michael


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Attachment

Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

From
Michael Paquier
Date:
On Wed, Dec 28, 2016 at 8:32 PM, anderson <anderson2013@qq.com> wrote:
> reproduce the step:
>
> 1. create table testcase(a int,b int,c text,d text);
>
> 2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;
>
> 3. psql  -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv'
> DELIMITER E'\t'  csv QUOTE '''' ";
>
> 4. select
> pg_create_logical_replication_slot('logical_slot','test_decoding');
>
> --update one row
> 5. update testcase set b = 1;
>
> 6. select count(*) from
> pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
> ERROR:  compressed data is corrupt
> --The data length is incorrect and decompression fails

OK, I can see the problem. And I can see as well that things have been
tackled in this area with 9.5 thanks to commit 2c03216d...
--
Michael


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

From
Michael Paquier
Date:
On Sun, Jan 1, 2017 at 9:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> OK, I can see the problem. And I can see as well that things have been
> tackled in this area with 9.5 thanks to commit 2c03216d...

(Adding Andres in CC for some input on the matter)
Looking at that... As HeapTupleData->t_len is uint32, it is really an
oversight in the 9.4 WAL logging machinery that xl_heap_header_len
uses uint16. If xl_heap_update->flags still had a bit free, we could
have for example use XLOG_HEAP_CONTAINS_OLD_TUPLE to track the
existing short version of xl_heap_header_len and add a new bit for a
"long" version where a version of xl_heap_header_len including uint32
as t_len is read. This would preserve replay from doing stupid things
for existing 9.4 deployments and allow new records to work fully. But
there are no bits free here. As well as there are no bits for
XLOG_HEAP[1|2]. Andres, perhaps you have an idea?

Looking at this code, It is worth noting that
XLOG_HEAP_CONTAINS_OLD_TUPLE and XLOG_HEAP_CONTAINS_OLD_KEY are always
used just for XLOG_HEAP_CONTAINS_OLD, so we could merge them and have
a single flag to do the whole thing, the parent's replica identity
value being here to track if a full version of the old tuple is logged
or not.
-- 
Michael


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correctoldtuplelen

From
Andres Freund
Date:
On 2017-01-05 13:09:28 +0900, Michael Paquier wrote:
> On Sun, Jan 1, 2017 at 9:06 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > OK, I can see the problem. And I can see as well that things have been
> > tackled in this area with 9.5 thanks to commit 2c03216d...
>
> (Adding Andres in CC for some input on the matter)

Thanks.

> Looking at that... As HeapTupleData->t_len is uint32, it is really an
> oversight in the 9.4 WAL logging machinery that xl_heap_header_len
> uses uint16.

Not generally, no. For new tuples and such it's perfectly fine - they
can't be bigger than MaxHeapTupleSize. It's just for the old-tuple where
(for FULL), we inline toasted columns.


> If xl_heap_update->flags still had a bit free, we could
> have for example use XLOG_HEAP_CONTAINS_OLD_TUPLE to track the
> existing short version of xl_heap_header_len and add a new bit for a
> "long" version where a version of xl_heap_header_len including uint32
> as t_len is read. This would preserve replay from doing stupid things
> for existing 9.4 deployments and allow new records to work fully. But
> there are no bits free here. As well as there are no bits for
> XLOG_HEAP[1|2]. Andres, perhaps you have an idea?

> Looking at this code, It is worth noting that
> XLOG_HEAP_CONTAINS_OLD_TUPLE and XLOG_HEAP_CONTAINS_OLD_KEY are always
> used just for XLOG_HEAP_CONTAINS_OLD, so we could merge them and have
> a single flag to do the whole thing, the parent's replica identity
> value being here to track if a full version of the old tuple is logged
> or not.

That sounds like a bad idea to me.  But I don't have one a lot better,
and as you say the knowledge is currently not required.  If both bits
are set it's a 32bit len, otherwise 16. Yay.

Andres


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correctoldtuplelen

From
Andres Freund
Date:
On 2017-01-05 06:48:20 -0800, Andres Freund wrote:
> On 2017-01-05 13:09:28 +0900, Michael Paquier wrote:
> > If xl_heap_update->flags still had a bit free, we could
> > have for example use XLOG_HEAP_CONTAINS_OLD_TUPLE to track the
> > existing short version of xl_heap_header_len and add a new bit for a
> > "long" version where a version of xl_heap_header_len including uint32
> > as t_len is read. This would preserve replay from doing stupid things
> > for existing 9.4 deployments and allow new records to work fully. But
> > there are no bits free here. As well as there are no bits for
> > XLOG_HEAP[1|2]. Andres, perhaps you have an idea?
> 
> > Looking at this code, It is worth noting that
> > XLOG_HEAP_CONTAINS_OLD_TUPLE and XLOG_HEAP_CONTAINS_OLD_KEY are always
> > used just for XLOG_HEAP_CONTAINS_OLD, so we could merge them and have
> > a single flag to do the whole thing, the parent's replica identity
> > value being here to track if a full version of the old tuple is logged
> > or not.
> 
> That sounds like a bad idea to me.  But I don't have one a lot better,
> and as you say the knowledge is currently not required.  If both bits
> are set it's a 32bit len, otherwise 16. Yay.

There luckily is a better solution: We can just disregard the old
tuple's t_len, and compute it via r->xl_len
-#size-of-all-new-tuple-stuff.  There's really no need for the new
tuple's t_len via xl_heap_header_len.

Pushed a fix. Could have removed copying of xl_heap_header_len in
DecodeUpdate(), but decided that to go as small as possible.

- Andres


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs