RE: logrep stuck with 'ERROR: int2vector has too many elements' - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: logrep stuck with 'ERROR: int2vector has too many elements'
Date
Msg-id OS0PR01MB5716EA3B7E6DE060773B1A5594C09@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to logrep stuck with 'ERROR: int2vector has too many elements'  (Erik Rijkers <er@xs4all.nl>)
List pgsql-hackers
On Sunday, January 15, 2023 5:35 PM Erik Rijkers <er@xs4all.nl> wrote:
> 
> I can't find the exact circumstances that cause it but it has something to do with
> many columns (or adding many columns) in combination with perhaps
> generated columns.
> 
> This replication test, in a slightly different form, used to work. This is also
> suggested by the fact that the attached runs without errors in REL_15_STABLE but
> gets stuck in HEAD.
> 
> What it does: it initdbs and runs two instances, primary and replica. In the
> primary 'pgbench -is1' done, and many columns, including 1 generated column,
> are added to all 4 pgbench tables. This is then pg_dump/pg_restored to the
> replica, and a short pgbench is run. The result tables on primary and replica are
> compared for the final result.
> (To run it will need some tweaks to directory and connection parms)
> 
> I ran it on both v15 and v16 for 25 runs: with the parameters as given
> 15 has no problem while 16 always got stuck with the int2vector error.
> (15 can actually be pushed up to the max of 1600 columns per table without
> errors)
> 
> Both REL_15_STABLE and 16devel built from recent master on Debian 10, gcc
> 12.2.0.
> 
> I hope someone understands what's going wrong.

Thanks for reporting.

I think the basic problem is that we try to fetch the column list as a int2vector
when doing table sync, and then if the number of columns is larger than 100, we
will get an ERROR like the $subject.

We can also hit this ERROR by manually specifying a long(>100) column
list in the publication Like:

create publication pub for table test(a1,a2,a3... a200);
create subscription xxx.

The script didn't reproduce this in PG15, because we didn't filter out
generated column when fetching the column list, so it assumes all columns are
replicated and will return NULL for the column list(int2vector) value. But in
PG16 (b7ae039), we started to filter out generated column(because generated columns are
not replicated in logical replication), so we get a valid int2vector and get
the ERROR. 
I will think and work on a fix for this.

Best regards,
Hou zj

pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: [RFC] Add jit deform_counter
Next
From: Zhang Mingli
Date:
Subject: Code review in dsa.c