Re: making tid and HOTness of UPDATE available to logical decoding plugins - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: making tid and HOTness of UPDATE available to logical decoding plugins
Date
Msg-id CAEze2WjLyX3wY6XDjmcGpGpUJEFAtU-bs1SNxZNPDW=Oym-czw@mail.gmail.com
Whole thread Raw
In response to Re: making tid and HOTness of UPDATE available to logical decoding plugins  (Hannu Krosing <hannuk@google.com>)
Responses Re: making tid and HOTness of UPDATE available to logical decoding plugins
List pgsql-hackers
On Fri, 5 Dec 2025 at 16:50, Hannu Krosing <hannuk@google.com> wrote:
>
> On Fri, Dec 5, 2025 at 3:58 PM Euler Taveira <euler@eulerto.com> wrote:
> >
> > On Thu, Dec 4, 2025, at 5:58 PM, Hannu Krosing wrote:
> > > Please find attached a patch that makes tuple ids and info about
> > > weather it was plain or HOT update available to logical decoding
> > > callbacks.
> > >
> >
> > My first impression was: why do you want to expose an internal information that
> > is mostly useless for a broader audience? The logical decoding infrastructure
> > is a general purpose solution for streaming modifications made to Postgres.
> > Could you elaborate how other consumers (DBMS, data store, ...) would use it?
>
> One "other consumer" that came up was possibility to use logical
> decoding for collecting changes for CREATE INDEX CONCURRENTLY so there
> would be no need for 2nd pass of CIC to scan the whole table again.
>
> I understand that there already is an ongoing work to do this with a
> specialized collector, but that involved some other ugliness like
> having to use a specialized logging index acces methods.

I don't see logical decoding as a viable alternative to any
indexing-related workload. Creating and maintaining indexes need
strict limits on their resource usage, and logical decoding is unable
to give those guarantees: It needs to read WAL, which can be produced
in approximately arbitrarily large amounts in any period. This is
further worsened by the index build itself, which will have to write
out WAL if the relation it's building on also needs to log WAL, which
then also needs to be parsed and decoded by LR. And lastly, we want
CIC/RIC to work on all indexes, not just those on logged relations. "A
table with WAL-logging enabled" cannot be a requirement for CIC.

> > It seems a Postgres-centric solution that you didn't provide strong arguments
> > in favor of it. How would logical replication take advantage of such change? If
> > that's the case, share the pgoutput and logical replication changes.
>
> Having though about the issue for quite some time I suddenly
> discovered, that while ctid can not be used as a permanent enough
> unique id for foreign keys or anything external, it is unique at any
> moment in time making it very much sufficient for logical replication.
>
> The high-level idea is to store the source (publisher) ctid value in
> an extra column for sorce_ctid in the target (subscriber) table, that
> column will also have a unique index and is of course NOT NULL (as
> there can be by definition no row without a ctid) so it will form kind
> of "replication primary key".

So you're using it as a poor man's id column.

I understand how you got to this point, but the right solution here
still is to get the user to specify their own identity column that is
stable across operations, and not to use the volatile and
guaranteed-unstable ctid.  As I also said in my other mail, adding
ctid to the logical replication system will expose too much internal
information and will turn current logical no-ops into logical
operations; possibly even bloating the subscriber by a good deal more
than what the publisher cleaned up.

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: index prefetching
Next
From: Pavel Stehule
Date:
Subject: Re: proposal: schema variables