Re: making tid and HOTness of UPDATE available to logical decoding plugins - Mailing list pgsql-hackers
| From | Hannu Krosing |
|---|---|
| Subject | Re: making tid and HOTness of UPDATE available to logical decoding plugins |
| Date | |
| Msg-id | CAMT0RQRAXqfnw_ktGguRyN52-SYtAom_2617bwCMrT-KqGXztg@mail.gmail.com Whole thread Raw |
| In response to | Re: making tid and HOTness of UPDATE available to logical decoding plugins ("Euler Taveira" <euler@eulerto.com>) |
| List | pgsql-hackers |
On Fri, Dec 5, 2025 at 3:58 PM Euler Taveira <euler@eulerto.com> wrote: > > On Thu, Dec 4, 2025, at 5:58 PM, Hannu Krosing wrote: > > Please find attached a patch that makes tuple ids and info about > > weather it was plain or HOT update available to logical decoding > > callbacks. > > > > My first impression was: why do you want to expose an internal information that > is mostly useless for a broader audience? The logical decoding infrastructure > is a general purpose solution for streaming modifications made to Postgres. > Could you elaborate how other consumers (DBMS, data store, ...) would use it? One "other consumer" that came up was possibility to use logical decoding for collecting changes for CREATE INDEX CONCURRENTLY so there would be no need for 2nd pass of CIC to scan the whole table again. I understand that there already is an ongoing work to do this with a specialized collector, but that involved some other ugliness like having to use a specialized logging index acces methods. And tracking changes for other CONCURRENTLY operations, like table repack, could also benefit from having ctid and hotness info. > > My planned use case is for reliable logical replication of tables > > without primary key or other declared IDENTITY (as long as there are > > no updates on target, or at leas no non-hot updates) > > > > Wait, we already have a mechanism to handle it: replica identity. What is the > advantage of this proposal in comparison with replica identity? Replica identity full can become a quite heavyweight operation if you just want to set up logical replication but your table has no primary key but still has occasional updates If all you want to do is to be able to replicate UPDATEs and DELETEs then having to save full tuple data in WAL seems excessive. > It seems a Postgres-centric solution that you didn't provide strong arguments > in favor of it. How would logical replication take advantage of such change? If > that's the case, share the pgoutput and logical replication changes. Having though about the issue for quite some time I suddenly discovered, that while ctid can not be used as a permanent enough unique id for foreign keys or anything external, it is unique at any moment in time making it very much sufficient for logical replication. The high-level idea is to store the source (publisher) ctid value in an extra column for sorce_ctid in the target (subscriber) table, that column will also have a unique index and is of course NOT NULL (as there can be by definition no row without a ctid) so it will form kind of "replication primary key". During CDC replay phase each change is sent with ctid (or two in case of UPDATE) and the replay works as it currently does with the addition of sorce ctid being stored in sorce_ctid column on the target. And because UPDATEalso updates the source_ctid colum on target the "replication primary key" stays nicely in sync. Of course a manual update in the target database could break replication , but this is no different than IDENTITY FULL. or for that matter any other IDENTITY. So the PoC I am working on will - add a "materialised sorce ctid" column to target table, defined as "source_ctid tid NOT NULL UNIQUE" - initial copy will copy over `SELECT *, ctid as source_ctid FROM ...` - replication decoding plugin will include actual ctid(s) in change records For the above PoC the replay part needs no changes beyond knowing that source_ctid is the identity column PoC phase 2 will be more complex and will introduce the "index-only source_ctid column" to avoid bloating the table by storing source ctids there if the sole purpose of the replication is migrating the database, But more on this once I have the basic PoC working :) -- Cheers Hannu P.S: I am also mulling over an idea of adding semi-virtual GENERATED ALWAYS AS ROW IDENTITY where the ROW identity starts as bigint cast of actual ctid and gets materialized only on (non-HOT) update. This does not need this logical decoding patch, but as it is closely related I mention it here as well. ROW IDENTITY has two big advantages over other identity types for mostly static tables - a) identity column takes up no extra space and b) it allows super fast direct lookups without needing an index at all for fully write-only tables or a quick index lookup in a tiny index to check that the ROWID is not there and then direct lookup by ctid. -- Hannu
pgsql-hackers by date: