Hi hackers,
CREATE PUBLICATION silently succeeds even when target tables lack a
usable replica identity, while the publication publishes UPDATE and/or
DELETE. The error only surfaces later at replication time:
ERROR: cannot delete from table "foo" because it does not have a
replica identity and publishes deletes
This gap has caused real production incidents — in one case, a CDC
pipeline using FOR TABLES IN SCHEMA included a table without a primary
key, and replication stalled for hours before the cause was found.
I'd like to propose emitting a WARNING at publication creation/alter
time when this mismatch exists. The check would cover all paths:
- CREATE PUBLICATION ... FOR TABLE / FOR TABLES IN SCHEMA / FOR ALL TABLES
- ALTER PUBLICATION ... ADD/SET TABLE / ADD/SET TABLES IN SCHEMA
- ALTER PUBLICATION ... SET (publish = 'update, delete')
The approach I'm considering is a publication-level check that runs
after the final publication state is known, scanning the effective set
of published tables via GetIncludedPublicationRelations() /
GetAllSchemaPublicationRelations() / GetAllPublicationRelations() and
checking each table's replica identity.
I have a working prototype for the FOR TABLE / ADD TABLE paths. A few
open questions before I post a full patch:
1. For FOR ALL TABLES, the check would scan pg_class. Acceptable for
a DDL operation, or too expensive?
2. Should we cap the number of warnings when many tables are affected?
3. Should this be controllable via a GUC, or is a simple WARNING
sufficient?
Thoughts welcome.
--
Best regards,