Thread: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen
hi team:
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
span.s1 {font-variant-ligatures: no-common-ligatures}
I found a postgresql 9.4.10 Logical decoding problem
heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than the value field of uint16.
a uint32 variable is assigned to uint16
The structure that holds the oldkey length is:
typedef struct xl_heap_header_len
{
uint16 t_len;
xl_heap_header header;
} xl_heap_header_len;
The problem will lead to logic decoding failure when oldtuplene > 65535 and relreplident = FULL.
The table that triggers the problem is usually relreplident = FULL.
Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen
From
Michael Paquier
Date:
On Tue, Dec 27, 2016 at 12:30 PM, anderson <anderson2013@qq.com> wrote: > I found a postgresql 9.4.10 Logical decoding problem > heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than > the value field of uint16. > > a uint32 variable is assigned to uint16 > > The structure that holds the oldkey length is: > typedef struct xl_heap_header_len > { > uint16 t_len; > xl_heap_header header; > } xl_heap_header_len; > > The problem will lead to logic decoding failure when oldtuplene > 65535 and > relreplident = FULL. > The table that triggers the problem is usually relreplident = FULL. Do you have a test case able to reproduce the problem? -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
yes i can.
reproduce the step:
1. create table testcase(a int,b int,c text,d text);
2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;
3. /u01/pgsql/bin/psql -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv' DELIMITER E'\t' csv QUOTE '''' ";
4. select pg_create_logical_replication_slot('logical_slot','test_decoding');
--update one row
5. update testcase set b = 1;
6. select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
7. postgres=# select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
ERROR: compressed data is corrupt
------------------ 原始邮件 ------------------
发件人: "Michael Paquier";<michael.paquier@gmail.com>;
发送时间: 2016年12月27日(星期二) 中午1:29
收件人: "anderson"<anderson2013@qq.com>;
抄送: "pgsql-bugs"<pgsql-bugs@postgresql.org>;
主题: Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen
> I found a postgresql 9.4.10 Logical decoding problem
> heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than
> the value field of uint16.
>
> a uint32 variable is assigned to uint16
>
> The structure that holds the oldkey length is:
> typedef struct xl_heap_header_len
> {
> uint16 t_len;
> xl_heap_header header;
> } xl_heap_header_len;
>
> The problem will lead to logic decoding failure when oldtuplene > 65535 and
> relreplident = FULL.
> The table that triggers the problem is usually relreplident = FULL.
Do you have a test case able to reproduce the problem?
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen
From
"anderson"
Date:
yes i can.
reproduce the step:
1. create table testcase(a int,b int,c text,d text);
2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;
3. psql -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv' DELIMITER E'\t' csv QUOTE '''' ";
4. select pg_create_logical_replication_slot('logical_slot','test_decoding');
--update one row
5. update testcase set b = 1;
6. select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
ERROR: compressed data is corrupt
--The data length is incorrect and decompression fails
------------------ 原始邮件 ------------------
发件人: "anderson";<anderson2013@qq.com>;
发送时间: 2016年12月28日(星期三) 晚上7:24
收件人: "Michael Paquier"<michael.paquier@gmail.com>;
抄送: "pgsql-bugs"<pgsql-bugs@postgresql.org>;
主题: [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen
yes i can.
reproduce the step:
1. create table testcase(a int,b int,c text,d text);
2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL;
3. /u01/pgsql/bin/psql -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv' DELIMITER E'\t' csv QUOTE '''' ";
4. select pg_create_logical_replication_slot('logical_slot','test_decoding');
--update one row
5. update testcase set b = 1;
6. select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
7. postgres=# select count(*) from pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL);
ERROR: compressed data is corrupt
------------------ 原始邮件 ------------------
发件人: "Michael Paquier";<michael.paquier@gmail.com>;
发送时间: 2016年12月27日(星期二) 中午1:29
收件人: "anderson"<anderson2013@qq.com>;
抄送: "pgsql-bugs"<pgsql-bugs@postgresql.org>;
主题: Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen
> I found a postgresql 9.4.10 Logical decoding problem
> heapam.c:6976 xlog store incorrect oldtuplen when tuplelen is greater than
> the value field of uint16.
>
> a uint32 variable is assigned to uint16
>
> The structure that holds the oldkey length is:
> typedef struct xl_heap_header_len
> {
> uint16 t_len;
> xl_heap_header header;
> } xl_heap_header_len;
>
> The problem will lead to logic decoding failure when oldtuplene > 65535 and
> relreplident = FULL.
> The table that triggers the problem is usually relreplident = FULL.
Do you have a test case able to reproduce the problem?
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Attachment
Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen
From
Michael Paquier
Date:
On Wed, Dec 28, 2016 at 8:32 PM, anderson <anderson2013@qq.com> wrote: > reproduce the step: > > 1. create table testcase(a int,b int,c text,d text); > > 2. ALTER TABLE ONLY testcase REPLICA IDENTITY FULL; > > 3. psql -d postgres -p5559 -c "copy testcase from ‘/tmp/testcase.csv' > DELIMITER E'\t' csv QUOTE '''' "; > > 4. select > pg_create_logical_replication_slot('logical_slot','test_decoding'); > > --update one row > 5. update testcase set b = 1; > > 6. select count(*) from > pg_logical_slot_peek_binary_changes('logical_slot',NULL,NULL); > ERROR: compressed data is corrupt > --The data length is incorrect and decompression fails OK, I can see the problem. And I can see as well that things have been tackled in this area with 9.5 thanks to commit 2c03216d... -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen
From
Michael Paquier
Date:
On Sun, Jan 1, 2017 at 9:06 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > OK, I can see the problem. And I can see as well that things have been > tackled in this area with 9.5 thanks to commit 2c03216d... (Adding Andres in CC for some input on the matter) Looking at that... As HeapTupleData->t_len is uint32, it is really an oversight in the 9.4 WAL logging machinery that xl_heap_header_len uses uint16. If xl_heap_update->flags still had a bit free, we could have for example use XLOG_HEAP_CONTAINS_OLD_TUPLE to track the existing short version of xl_heap_header_len and add a new bit for a "long" version where a version of xl_heap_header_len including uint32 as t_len is read. This would preserve replay from doing stupid things for existing 9.4 deployments and allow new records to work fully. But there are no bits free here. As well as there are no bits for XLOG_HEAP[1|2]. Andres, perhaps you have an idea? Looking at this code, It is worth noting that XLOG_HEAP_CONTAINS_OLD_TUPLE and XLOG_HEAP_CONTAINS_OLD_KEY are always used just for XLOG_HEAP_CONTAINS_OLD, so we could merge them and have a single flag to do the whole thing, the parent's replica identity value being here to track if a full version of the old tuple is logged or not. -- Michael -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correctoldtuplelen
From
Andres Freund
Date:
On 2017-01-05 13:09:28 +0900, Michael Paquier wrote: > On Sun, Jan 1, 2017 at 9:06 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: > > OK, I can see the problem. And I can see as well that things have been > > tackled in this area with 9.5 thanks to commit 2c03216d... > > (Adding Andres in CC for some input on the matter) Thanks. > Looking at that... As HeapTupleData->t_len is uint32, it is really an > oversight in the 9.4 WAL logging machinery that xl_heap_header_len > uses uint16. Not generally, no. For new tuples and such it's perfectly fine - they can't be bigger than MaxHeapTupleSize. It's just for the old-tuple where (for FULL), we inline toasted columns. > If xl_heap_update->flags still had a bit free, we could > have for example use XLOG_HEAP_CONTAINS_OLD_TUPLE to track the > existing short version of xl_heap_header_len and add a new bit for a > "long" version where a version of xl_heap_header_len including uint32 > as t_len is read. This would preserve replay from doing stupid things > for existing 9.4 deployments and allow new records to work fully. But > there are no bits free here. As well as there are no bits for > XLOG_HEAP[1|2]. Andres, perhaps you have an idea? > Looking at this code, It is worth noting that > XLOG_HEAP_CONTAINS_OLD_TUPLE and XLOG_HEAP_CONTAINS_OLD_KEY are always > used just for XLOG_HEAP_CONTAINS_OLD, so we could merge them and have > a single flag to do the whole thing, the parent's replica identity > value being here to track if a full version of the old tuple is logged > or not. That sounds like a bad idea to me. But I don't have one a lot better, and as you say the knowledge is currently not required. If both bits are set it's a 32bit len, otherwise 16. Yay. Andres -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] [BUG] pg9.4.10 Logical decoding did not get the correctoldtuplelen
From
Andres Freund
Date:
On 2017-01-05 06:48:20 -0800, Andres Freund wrote: > On 2017-01-05 13:09:28 +0900, Michael Paquier wrote: > > If xl_heap_update->flags still had a bit free, we could > > have for example use XLOG_HEAP_CONTAINS_OLD_TUPLE to track the > > existing short version of xl_heap_header_len and add a new bit for a > > "long" version where a version of xl_heap_header_len including uint32 > > as t_len is read. This would preserve replay from doing stupid things > > for existing 9.4 deployments and allow new records to work fully. But > > there are no bits free here. As well as there are no bits for > > XLOG_HEAP[1|2]. Andres, perhaps you have an idea? > > > Looking at this code, It is worth noting that > > XLOG_HEAP_CONTAINS_OLD_TUPLE and XLOG_HEAP_CONTAINS_OLD_KEY are always > > used just for XLOG_HEAP_CONTAINS_OLD, so we could merge them and have > > a single flag to do the whole thing, the parent's replica identity > > value being here to track if a full version of the old tuple is logged > > or not. > > That sounds like a bad idea to me. But I don't have one a lot better, > and as you say the knowledge is currently not required. If both bits > are set it's a 32bit len, otherwise 16. Yay. There luckily is a better solution: We can just disregard the old tuple's t_len, and compute it via r->xl_len -#size-of-all-new-tuple-stuff. There's really no need for the new tuple's t_len via xl_heap_header_len. Pushed a fix. Could have removed copying of xl_heap_header_len in DecodeUpdate(), but decided that to go as small as possible. - Andres -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs