Thread: [BUGS] BUG #14769: Logical replication error "cache lookup failed for type 0"

[BUGS] BUG #14769: Logical replication error "cache lookup failed for type 0"

From
scott@deltaex.com
Date:
The following bug has been logged on the website:

Bug reference:      14769
Logged by:          Scott Milliken
Email address:      scott@deltaex.com
PostgreSQL version: 10beta2
Operating system:   Linux 4.10.0-27-generic #30~16.04.2-Ubuntu
Description:

I'm testing postgresql 10beta2, and found a bug with logical replication.
The bug halts logical replication, and produces the error "cache lookup
failed for type 0" in the secondary's log.

The error can be reproduced by dropping a column in a replicated table. When
a column is dropped, it zeroes out attypeid in pg_attribute:

```
test=# select * from pg_attribute where attrelid =
'public.testtbl'::regclass and atttypid = 0 order by attnum;
-[ RECORD 1 ]-+-----------------------------
attrelid      | 16386
attname       | ........pg.dropped.2........
atttypid      | 0
attstattarget | 0
attlen        | -1
attnum        | 2
attndims      | 0
attcacheoff   | -1
atttypmod     | -1
attbyval      | f
attstorage    | x
attalign      | i
attnotnull    | f
atthasdef     | f
attidentity   | 
attisdropped  | t
attislocal    | t
attinhcount   | 0
attcollation  | 100
attacl        | ¤
attoptions    | ¤
attfdwoptions | ¤
```

Perhaps there's a missing check against pg_attribute.attisdropped
somewhere?


Here's full reproduction steps:

Primary:
```
mkdir -p /tmp/test0
PGPORT=5301 initdb -D /tmp/test0
echo "wal_level = logical" >> /tmp/test0/postgresql.conf
PGPORT=5301 pg_ctl -D /tmp/test0 start
for (( ; ; )); do if pg_isready -d postgres -p 5301; then break; fi; sleep
1; done
psql -p 5301 postgres -c "CREATE USER test WITH PASSWORD 'test' SUPERUSER
CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
createdb -p 5301 -E utf8 test

psql -p 5301 -U test test -c "CREATE TABLE testtbl (id int, name text);"
psql -p 5301 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
testtbl_pkey PRIMARY KEY (id);"
psql -p 5301 -U test test -c "CREATE PUBLICATION testpub FOR TABLE
testtbl;"
psql -p 5301 -U test test -c "INSERT INTO testtbl (id, name) VALUES (1,
'a');"
# continue with "Secondary" instructions
psql -p 5301 -U test test -c "UPDATE testtbl SET name = 'b';"
# continue with "Secondary" instructions
psql -p 5301 -U test test -c "ALTER TABLE testtbl DROP COLUMN name;"
psql -p 5301 -U test test -c "ALTER TABLE testtbl ADD COLUMN name2 text;"
psql -p 5301 -U test test -c "UPDATE testtbl SET name2 = 'b';"
# continue with "Secondary" instructions
```

Secondary:
```
mkdir -p /tmp/test1
PGPORT=5302 initdb -D /tmp/test1
PGPORT=5302 pg_ctl -D /tmp/test1 start
for (( ; ; )); do if pg_isready -d postgres -p 5302; then break; fi; sleep
1; done
psql -p 5302 postgres -c "CREATE USER test WITH PASSWORD 'test' SUPERUSER
CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
createdb -p 5302 -E utf8 test

psql -p 5302 -U test test -c "CREATE TABLE testtbl (id int, name text);"
psql -p 5302 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
testtbl_pkey PRIMARY KEY (id);"
psql -p 5302 -U test test -c "CREATE SUBSCRIPTION test0_testpub CONNECTION
'port=5301 user=test dbname=test' PUBLICATION testpub;"
psql -p 5302 -U test test -c "SELECT * FROM testtbl;"  # OK
# continue with "Primary" instructions
psql -p 5302 -U test test -c "SELECT * FROM testtbl;"  # OK
psql -p 5302 -U test test -c "ALTER TABLE testtbl DROP COLUMN name;"
psql -p 5302 -U test test -c "ALTER TABLE testtbl ADD COLUMN name2 text;"
# continue with "Primary" instructions
psql -p 5302 -U test test -c "SELECT * FROM testtbl;"  # not OK, doesn't
receive update, errors in log
```


Here's the log:
```
2017-07-23 22:42:28.719 PDT [7706] LOG:  logical replication apply worker
for subscription "test0_testpub" has started
2017-07-23 22:42:28.725 PDT [7706] ERROR:  cache lookup failed for type 0
2017-07-23 22:42:28.725 PDT [7706] CONTEXT:  processing remote data for
replication target relation "public.testtbl" column "id", remote type
integer, local type integer
2017-07-23 22:42:28.725 PDT [3359] LOG:  worker process: logical replication
worker for subscription 16395 (PID 7706) exited with exit code 1
```


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

On 03/08/17 01:59, scott@deltaex.com wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      14769
> Logged by:          Scott Milliken
> Email address:      scott@deltaex.com
> PostgreSQL version: 10beta2
> Operating system:   Linux 4.10.0-27-generic #30~16.04.2-Ubuntu
> Description:        
> 
> I'm testing postgresql 10beta2, and found a bug with logical replication.
> The bug halts logical replication, and produces the error "cache lookup
> failed for type 0" in the secondary's log.
> 
> The error can be reproduced by dropping a column in a replicated table. When
> a column is dropped, it zeroes out attypeid in pg_attribute:
> 
> ```
> test=# select * from pg_attribute where attrelid =
> 'public.testtbl'::regclass and atttypid = 0 order by attnum;
> -[ RECORD 1 ]-+-----------------------------
> attrelid      | 16386
> attname       | ........pg.dropped.2........
> atttypid      | 0
> attstattarget | 0
> attlen        | -1
> attnum        | 2
> attndims      | 0
> attcacheoff   | -1
> atttypmod     | -1
> attbyval      | f
> attstorage    | x
> attalign      | i
> attnotnull    | f
> atthasdef     | f
> attidentity   | 
> attisdropped  | t
> attislocal    | t
> attinhcount   | 0
> attcollation  | 100
> attacl        | ¤
> attoptions    | ¤
> attfdwoptions | ¤
> ```
> 
> Perhaps there's a missing check against pg_attribute.attisdropped
> somewhere?
> 

Good catch! Looks like we didn't initialize the attribute map for
locally dropped columns. Once I did that I found there is also typo in
slot_modify_cstrings() which uses wrong value from the remote side, that
is also made visible by your example.

Both are fixed in attached patch.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachment

Re: [BUGS] BUG #14769: Logical replication error "cache lookup failedfor type 0"

From
Peter Eisentraut
Date:
On 8/4/17 05:41, Petr Jelinek wrote:
>> Perhaps there's a missing check against pg_attribute.attisdropped
>> somewhere?
>>
> 
> Good catch! Looks like we didn't initialize the attribute map for
> locally dropped columns.

fixed

> Once I did that I found there is also typo in
> slot_modify_cstrings() which uses wrong value from the remote side, that
> is also made visible by your example.

and fixed and added a test case

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14769: Logical replication error "cache lookup failedfor type 0"

From
Scott Milliken
Date:
Thanks Petr and Peter! I've tested commit and confirmed that commit 0e58455d fixes this bug.

On Mon, Aug 7, 2017 at 7:54 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 8/4/17 05:41, Petr Jelinek wrote:
>> Perhaps there's a missing check against pg_attribute.attisdropped
>> somewhere?
>>
>
> Good catch! Looks like we didn't initialize the attribute map for
> locally dropped columns.

fixed

> Once I did that I found there is also typo in
> slot_modify_cstrings() which uses wrong value from the remote side, that
> is also made visible by your example.

and fixed and added a test case

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services