repack: fix a bug to reject deferrable primary key fallback for concurrent mode - Mailing list pgsql-hackers

From Chao Li
Subject repack: fix a bug to reject deferrable primary key fallback for concurrent mode
Date
Msg-id 10DD5E13-B45D-44F1-BE08-C63E00ABCAC0@gmail.com
Whole thread
Responses RE: repack: fix a bug to reject deferrable primary key fallback for concurrent mode
List pgsql-hackers
Hi,

I am continuing to test REPACK, and I found another issue.

In check_concurrent_repack_requirements(), if a table has no replica identity index, the code falls back to using the
primarykey if one exists. The problem is that a deferrable primary key cannot be used for this purpose. WAL generation
doesnot consider a deferrable primary key to be a replica identity, so concurrent mode may not receive enough old tuple
informationto replay concurrent changes. 

I tested this with the following procedure.

1 - Create a table
```
create table t (id int,  v text, primary key (id) deferrable initially deferred);
insert into t values (1, 'a');
```

2 -  Attach a debugger to session 1's backend process. I used vscode. Add a breakpoint at the first
process_concurrent_changes()call. This blocks the REPACK process and gives session 2 time to issue a DELETE. 

3 - In session 1, issue a repack, it will stop at the breakpoint
```
repack (concurrently) t;
```

4 - In session 2
```
delete from t where id=1;
```

5 - Detach session 1 from the debugger, so that repack continues and tries to re-apply the delete from session 2.

6 - repack fails with:
```
evantest=# repack (concurrently) t;
ERROR:  incomplete delete info
CONTEXT:  slot "repack_96468", output plugin "pgrepack", in the change callback, associated LSN 0/2A5717F0
REPACK decoding worker
```

The error comes from this code in pgrepack.c:
```
        case REORDER_BUFFER_CHANGE_DELETE:
            {
                HeapTuple    oldtuple;

                oldtuple = change->data.tp.oldtuple;

                if (oldtuple == NULL)
                    elog(ERROR, "incomplete delete info");

                repack_store_change(ctx, relation, CHANGE_DELETE, oldtuple);
            }
            break;
```

The root cause is that repack.c assumes rel->rd_pkindex is usable as an identity index, but logical decoding does not
treata deferrable primary key as replica identity. As a result, the decoding worker may not get the old tuple needed to
re-applythe delete. 

To fix the problem, I think we should just not fall back to deferrable primary key in the first place. See the attached
patch.

With this patch, repack will quickly for the test:
```
evantest=# repack (concurrently) t;
ERROR:  cannot process relation "t"
HINT:  Relation "t" has a deferrable primary key.
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/





Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Reject invalid databases in pg_get_database_ddl()
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Fix stats reporting delays in logical parallel apply worker