Re: Improve logical replication usability when tables lack primary keys - Mailing list pgsql-hackers
| From | Chao Li |
|---|---|
| Subject | Re: Improve logical replication usability when tables lack primary keys |
| Date | |
| Msg-id | 875BBCC0-CF08-4136-8E9E-F03DF75C3A11@gmail.com Whole thread Raw |
| In response to | Re: Improve logical replication usability when tables lack primary keys ("Euler Taveira" <euler@eulerto.com>) |
| List | pgsql-hackers |
> On Dec 18, 2025, at 22:49, Euler Taveira <euler@eulerto.com> wrote: > > On Wed, Dec 17, 2025, at 6:43 PM, GRANT ZHOU wrote: >> On Wed, Dec 17, 2025 at 12:50 PM Euler Taveira <euler@eulerto.com> wrote: >>> Each table needs to say what's its row identifier. The user created a table >>> without primary key. Well, create a primary key. There are dozens of thousands >>> of objects. Use a script. >> However, I’d like to share a user perspective regarding the "use a >> script" approach. The main value of `FOR TABLES IN SCHEMA` is >> *in-database automation*. If users still need to maintain external >> scripts to monitor and `ALTER` new tables to prevent replication >> errors, it significantly diminishes the value of that automation. >> > > As I tried to explain in the previous email, the problem with FOR ALL TABLES > and FOR TABLES IN SCHEMA syntax is that the is no catalog information about the > relations; the list of relations is collected at runtime. > > When I suggested "use a script" I was referring to fix the logical replication > setup regarding the lack of primary key. There is no need to have an automation > outside the database, use an event trigger. If your lazy user doesn't create > the primary key, assign REPLICA IDENTITY FULL. Something like > > -- This example is far from being a complete solution for fixing the lack of > -- primary key in a logical replication scenario. > -- ALTER TABLE should be supported too > CREATE OR REPLACE FUNCTION event_trigger_for_replica_identity() > RETURNS event_trigger LANGUAGE plpgsql AS $$ > DECLARE > obj record; > rec record; > ricnt integer := 0; > BEGIN > FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands() > LOOP > IF obj.command_tag = 'CREATE TABLE' THEN > SELECT COUNT(*) INTO ricnt FROM pg_index WHERE indrelid = obj.objid AND indisprimary; > RAISE NOTICE 'ricnt: %', ricnt; > IF ricnt = 0 THEN > EXECUTE 'ALTER TABLE ' || obj.object_identity || ' REPLICA IDENTITY FULL'; > END IF; > END IF; > END LOOP; > END; > $$; > > CREATE EVENT TRIGGER event_trigger_for_replica_identity > ON ddl_command_end > EXECUTE FUNCTION event_trigger_for_replica_identity(); > > CREATE TABLE event_trigger_test_1 (a int); > \d+ event_trigger_test_1 > CREATE TABLE event_trigger_test_2 (a int primary key); > \d+ event_trigger_test_2 > CREATE TABLE event_trigger_test_3 (a int, b text not null, primary key(b)); > \d+ event_trigger_test_3 > --ALTER TABLE event_trigger_test_3 DROP CONSTRAINT event_trigger_test_3_pkey; > --\d+ event_trigger_test_3 > > DROP EVENT TRIGGER event_trigger_for_replica_identity; > DROP FUNCTION event_trigger_for_replica_identity; > DROP TABLE event_trigger_test_1, event_trigger_test_2, event_trigger_test_3; > > 8<----------------------------------------------------------------------------8< > >> Additionally, tables without Primary Keys are valid SQL and extremely >> common in enterprise environments (e.g., audit logs, data warehousing). >> In large-scale deployments, enforcing PKs on every single table isn't >> always practical. >> > > I'm not saying users shouldn't create tables without a primary key. I'm arguing > that this decision should take into account what adjustments need to be made to > use these tables in logical replication. > >> >> I think the goal of this proposal is not to change the underlying table >> property design, but rather to seek a mechanism (like a Publication >> option) to ensure this automation functions safely without external >> intervention. It is simply about allowing the database to handle these >> valid, common scenarios gracefully when automation is enabled. >> > > You didn't get it. You already have one property to handle it and you are > proposing to add a second property to handle it. > > I think you are pursuing the wrong solution. IMO we need a solution to enforce > that the logical replication contract is valid. If you create or modify a table > that is part of a publication, there is no validation that that table complies > with the publication properties (update and delete properties should require an > appropriate replica identity). We should close the gaps in both publication and > table. > If I summarize Euler’s position in short words: discipline over convenience. I actually strongly agree with that. In PG wegenerally prefer explicit over implicit behavior, and predictability over magic. Based on the discussion so far, I think we share the following design goals: 1) Keep replica identity as a table property. 2) Avoid silent runtime failures when FOR TABLES IN SCHEMA pulls in tables without primary keys. 3) Avoid global or implicit behavior changes. 4) Preserve explicit opt-in for higher WAL cost. 5) Keep the logical replication contract explicit and enforceable. I’ve been thinking about whether adding a new replica identity could meet these goals. Today we have four replica identities: DEFAULT (PK, fallback to NONE), INDEX, FULL, and NONE. What if we introduce a new replica identity, tentatively called “FORCE”: PK with fallback to FULL. (Let’s keep our focuson the design, not argue the name for now.) With this approach: 1) Replica identity remains a table property. 2) Publication membership is still evaluated at runtime, so FOR TABLES IN SCHEMA is not special-cased. 3) No new GUCs are required. 4) The user must explicitly opt in by setting the replica identity. Once FORCE is chosen, adding or dropping a primary keylater does not silently break UPDATE/DELETE replication. 5) The logical replication contract remains explicit; the table declares that it is safe for UPDATE/DELETE replication evenwithout a PK, at the cost of higher WAL volume. This feels like a small, explicit extension of the existing RI semantics. Notably, REPLICA IDENTITY DEFAULT already has conditionalbehavior (PK fallback to NONE), so conditional RI behavior is not new, this would just make a different fallback explicit and user-chosen. After that, we could consider a database-level default_replica_identity setting, applied at table creation time, for environmentsthat want this behavior consistently. But that would only make sense if we first agree on the table-level mechanism. I’m interested in whether this direction aligns better with the goals above. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
pgsql-hackers by date: