Re: Improve logical replication usability when tables lack primary keys - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Improve logical replication usability when tables lack primary keys
Date
Msg-id CAA4eK1KMOTYGSQdoPBZsJ6T-QEw5eYHG0padQXTViu7LeeFWMw@mail.gmail.com
Whole thread Raw
In response to Re: Improve logical replication usability when tables lack primary keys  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers
On Fri, Dec 19, 2025 at 1:39 PM Chao Li <li.evan.chao@gmail.com> wrote:
>
> > On Dec 18, 2025, at 22:49, Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Wed, Dec 17, 2025, at 6:43 PM, GRANT ZHOU wrote:
> >> On Wed, Dec 17, 2025 at 12:50 PM Euler Taveira <euler@eulerto.com> wrote:
> >>> Each table needs to say what's its row identifier. The user created a table
> >>> without primary key. Well, create a primary key. There are dozens of thousands
> >>> of objects. Use a script.
> >> However, I’d like to share a user perspective regarding the "use a
> >> script" approach. The main value of `FOR TABLES IN SCHEMA` is
> >> *in-database automation*. If users still need to maintain external
> >> scripts to monitor and `ALTER` new tables to prevent replication
> >> errors, it significantly diminishes the value of that automation.
> >>
> >
> > As I tried to explain in the previous email, the problem with FOR ALL TABLES
> > and FOR TABLES IN SCHEMA syntax is that the is no catalog information about the
> > relations; the list of relations is collected at runtime.
> >
> > When I suggested "use a script" I was referring to fix the logical replication
> > setup regarding the lack of primary key. There is no need to have an automation
> > outside the database, use an event trigger. If your lazy user doesn't create
> > the primary key, assign REPLICA IDENTITY FULL. Something like
> >
> > -- This example is far from being a complete solution for fixing the lack of
> > -- primary key in a logical replication scenario.
> > -- ALTER TABLE should be supported too
> > CREATE OR REPLACE FUNCTION event_trigger_for_replica_identity()
> >        RETURNS event_trigger LANGUAGE plpgsql AS $$
> > DECLARE
> >    obj record;
> >    rec record;
> >    ricnt integer := 0;
> > BEGIN
> >    FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
> >    LOOP
> >        IF obj.command_tag = 'CREATE TABLE' THEN
> >            SELECT COUNT(*) INTO ricnt FROM pg_index WHERE indrelid = obj.objid AND indisprimary;
> >            RAISE NOTICE 'ricnt: %', ricnt;
> >            IF ricnt = 0 THEN
> >                EXECUTE 'ALTER TABLE ' || obj.object_identity || ' REPLICA IDENTITY FULL';
> >            END IF;
> >        END IF;
> >    END LOOP;
> > END;
> > $$;
> >
> > CREATE EVENT TRIGGER event_trigger_for_replica_identity
> >    ON ddl_command_end
> >    EXECUTE FUNCTION event_trigger_for_replica_identity();
> >
> > CREATE TABLE event_trigger_test_1 (a int);
> > \d+ event_trigger_test_1
> > CREATE TABLE event_trigger_test_2 (a int primary key);
> > \d+ event_trigger_test_2
> > CREATE TABLE event_trigger_test_3 (a int, b text not null, primary key(b));
> > \d+ event_trigger_test_3
> > --ALTER TABLE event_trigger_test_3 DROP CONSTRAINT event_trigger_test_3_pkey;
> > --\d+ event_trigger_test_3
> >
> > DROP EVENT TRIGGER event_trigger_for_replica_identity;
> > DROP FUNCTION event_trigger_for_replica_identity;
> > DROP TABLE event_trigger_test_1, event_trigger_test_2, event_trigger_test_3;
> >
> > 8<----------------------------------------------------------------------------8<
> >
> >> Additionally, tables without Primary Keys are valid SQL and extremely
> >> common in enterprise environments (e.g., audit logs, data warehousing).
> >> In large-scale deployments, enforcing PKs on every single table isn't
> >> always practical.
> >>
> >
> > I'm not saying users shouldn't create tables without a primary key. I'm arguing
> > that this decision should take into account what adjustments need to be made to
> > use these tables in logical replication.
> >
> >>
> >> I think the goal of this proposal is not to change the underlying table
> >> property design, but rather to seek a mechanism (like a Publication
> >> option) to ensure this automation functions safely without external
> >> intervention. It is simply about allowing the database to handle these
> >> valid, common scenarios gracefully when automation is enabled.
> >>
> >
> > You didn't get it. You already have one property to handle it and you are
> > proposing to add a second property to handle it.
> >
> > I think you are pursuing the wrong solution. IMO we need a solution to enforce
> > that the logical replication contract is valid. If you create or modify a table
> > that is part of a publication, there is no validation that that table complies
> > with the publication properties (update and delete properties should require an
> > appropriate replica identity). We should close the gaps in both publication and
> > table.
> >

If we want, we can ensure that any table added to that specific
publication (that has an option replica_identy='full') would
automatically override the default to FULL, if PK is not available.
This information can be cached to avoid overhead.

>
> If I summarize Euler’s position in short words: discipline over convenience. I actually strongly agree with that. In
PGwe generally prefer explicit over implicit behavior, and predictability over magic. 
>

You haven't told why we can't consider a custom event trigger as
suggested by Euler for customers who are not willing to change the RI
default explicitly for each table. I think it is worth considering
providing a custom solution outside core-postgres for your customers
for this specific case.

> Based on the discussion so far, I think we share the following design goals:
>
> 1) Keep replica identity as a table property.
> 2) Avoid silent runtime failures when FOR TABLES IN SCHEMA pulls in tables without primary keys.
> 3) Avoid global or implicit behavior changes.
> 4) Preserve explicit opt-in for higher WAL cost.
> 5) Keep the logical replication contract explicit and enforceable.
>
> I’ve been thinking about whether adding a new replica identity could meet these goals.
>
> Today we have four replica identities: DEFAULT (PK, fallback to NONE), INDEX, FULL, and NONE.
>
> What if we introduce a new replica identity, tentatively called “FORCE”: PK with fallback to FULL. (Let’s keep our
focuson the design, not argue the name for now.) 
>
> With this approach:
>
> 1) Replica identity remains a table property.
> 2) Publication membership is still evaluated at runtime, so FOR TABLES IN SCHEMA is not special-cased.
> 3) No new GUCs are required.
> 4) The user must explicitly opt in by setting the replica identity. Once FORCE is chosen, adding or dropping a
primarykey later does not silently break UPDATE/DELETE replication. 
>
> 5) The logical replication contract remains explicit; the table declares that it is safe for UPDATE/DELETE
replicationeven without a PK, at the cost of higher WAL volume. 
>
> This feels like a small, explicit extension of the existing RI semantics. Notably, REPLICA IDENTITY DEFAULT already
hasconditional behavior (PK fallback 
> to NONE), so conditional RI behavior is not new, this would just make a different fallback explicit and user-chosen.
>
> After that, we could consider a database-level default_replica_identity setting, applied at table creation time, for
environmentsthat want this behavior consistently. But that would only make sense if we first agree on the table-level
mechanism.
>

I don't much like the database-level option as it expects a new
default to be introduced. I think the internal working will almost be
same as the option at publication-level.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Nazir Bilal Yavuz
Date:
Subject: Generate images for docs by using meson build system
Next
From: Peter Eisentraut
Date:
Subject: Re: Don't cast away const where possible