Home > mailing lists

Re: pg_publication_tables: return NULL attnames when no column list is specified - Mailing list pgsql-hackers

From	Roberto Mello
Subject	Re: pg_publication_tables: return NULL attnames when no column list is specified
Date	April 1 02:23:11
Msg-id	CAKz==b+MwMXSqfXyq90PW4qmNDTqrVAoTLAG3BsK-tpq54WkOg@mail.gmail.com Whole thread
In response to	Re: pg_publication_tables: return NULL attnames when no column list is specified ("David G. Johnston" <david.g.johnston@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Mar 31, 2026 at 4:55 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

IIUC the wording for v18 and earlier should read more like:

“Subscriptions having several publications in which the same table has different sets of columns published are not supported.”

The claim that this defacto behavior is a bug needing to be fixed is now before us (there is no disagreement that the physical column lists are different - null vs non-null). My cursory take at this leads me to believe we should accept what actually got implemented and not call this a bug to be fixed (aside from the docs).

That the catalog is the only official source of truth regarding the physical column list distinction, and the function represents the logical “set of columns actually seen”, makes sense seen in that light.

The internal code was designed around the NULL/non-NULL distinction. The SRF

pg_get_publication_tables() is the one place that erased it, and the CASE WHEN relnatts heuristic

in tablesync was an attempt to reverse that erasure, but it's demonstrably broken for

tables with dropped columns. That seems like a bug to me regardless of how we feel about the

behavioral question, but I have no objections to not calling it a bug. I'm confident the

best thing was intended when the code was committed and hindsight is always 20/20.

I haven’t dived deep enough to understand whether there is C code issue that needs to be resolved. Or whether we can make dealing with this more user-friendly given this constraint.

Removing the limitation would seem more appealing if we are going to make a change. The obvious answer of “union all sets of columns published for a table and replicate those” would be the simplest to document though I suspect the current implementation basically chooses one of the publications to pull from which makes that difficult in the general case. I do kinda wonder why we need to enforce any kind of error so long as one of the publications for a given table includes all columns though. Or even is a proper superset to be a tiny bit more flexible. A technically uninformed wondering but still.

The superset idea would be a significant change to how the WAL output plugin works. pgoutput.c

doesn't have a concept of "this publication contributes columns X and that publication contributes

columns Y, send the union."

This would be an interesting improvement but it's a larger project... it would touch pgoutput, tablesync,

and the subscriber's relation mapping. My patch is trying to fix the immediate inconsistency (the view

lying about the catalog state, and the broken relnatts heuristic) without changing the replication protocol

or column merging behavior.

If the view shows {id, name} for both publications, a DBA planning a schema migration has no way to

know that ALTER TABLE ADD COLUMN email will be replicated for one publication but not the

other. The catalog stores the information needed to make this determination, the view actively hides it.

NULL in the view would tell the DBA "this publication replicates everything, including future columns"

which is actionable information.

Roberto Mello

Snowflake

pgsql-hackers by date:

From: Zsolt Parragi
Date: 01 April, 02:20:00
Subject: Re: table AM option passing

From: Zsolt Parragi
Date: 01 April, 02:30:32
Subject: Re: pg_get__*_ddl consolidation

Re: pg_publication_tables: return NULL attnames when no column list is specified - Mailing list pgsql-hackers

Previous

Next