Re: making tid and HOTness of UPDATE available to logical decoding plugins - Mailing list pgsql-hackers
| From | Hannu Krosing |
|---|---|
| Subject | Re: making tid and HOTness of UPDATE available to logical decoding plugins |
| Date | |
| Msg-id | CAMT0RQTPgS-31_sP4+xmi_iuzXOw41qHG3QFdNpmOfj6WD_iJg@mail.gmail.com Whole thread Raw |
| In response to | Re: making tid and HOTness of UPDATE available to logical decoding plugins (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
| List | pgsql-hackers |
On Mon, Dec 8, 2025 at 2:46 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Fri, 5 Dec 2025 at 16:50, Hannu Krosing <hannuk@google.com> wrote: > > > > On Fri, Dec 5, 2025 at 3:58 PM Euler Taveira <euler@eulerto.com> wrote: > > > > > > On Thu, Dec 4, 2025, at 5:58 PM, Hannu Krosing wrote: > > > > Please find attached a patch that makes tuple ids and info about > > > > weather it was plain or HOT update available to logical decoding > > > > callbacks. > > > > > > > > > > My first impression was: why do you want to expose an internal information that > > > is mostly useless for a broader audience? The logical decoding infrastructure > > > is a general purpose solution for streaming modifications made to Postgres. > > > Could you elaborate how other consumers (DBMS, data store, ...) would use it? > > > > One "other consumer" that came up was possibility to use logical > > decoding for collecting changes for CREATE INDEX CONCURRENTLY so there > > would be no need for 2nd pass of CIC to scan the whole table again. > > > > I understand that there already is an ongoing work to do this with a > > specialized collector, but that involved some other ugliness like > > having to use a specialized logging index acces methods. > > I don't see logical decoding as a viable alternative to any > indexing-related workload. Creating and maintaining indexes need > strict limits on their resource usage, and logical decoding is unable > to give those guarantees: Are these 'strict limits on their resource usage' defined somewhere so I could take a look? > It needs to read WAL, which can be produced > in approximately arbitrarily large amounts in any period. If you read the WAL as it is generated, then you are essentially just filtering an in-memory stream, most of the time just jumping to next WAL record. The upsides for logical decoding based collection are - is that you do not need to have any extra settings and conditions in you index methods to do in-index-method collection - you avoid the hassle of synchronoizing collection starts and stops between all active backends - you are collecting in a single process, so no overhead from synchronizing between all the backends that capture/log index insertions - you can choose between collecting immediately in a background worker and collecting later by re-reading WAL. The one upside (?) of in-index capture is that it will naturally throttle your production workload if capture can not keep up for some reason. > This is further worsened by the index build itself, which will have to write > out WAL if the relation it's building on also needs to log WAL, which > then also needs to be parsed and decoded by LR. Not "parsed and decoded" - just read the (database, tablespace, relation) triplet, decide "not for me" and jump to the next record. I am working on this specifically because of huge databases with heavy production workloads which by definition generate a huge amount of WAL. One of the top goals is to avoid REPLICA IDENTITY FULL which can hugely bloat amount of WAL generated . The fact that you can turn on REPLICA IDENTITY ROWID with no extra cost on write side is a bonus (you can even replace the small overhead of writing th eprimary key recorde by turning on rowid) > And lastly, we want > CIC/RIC to work on all indexes, not just those on logged relations. "A > table with WAL-logging enabled" cannot be a requirement for CIC. We can use the easy, straightforward collection method - logical decoding - when available, and fall back to the complicated method (collecting inside index access method) or the resource-intensive method (re-scanning the whole table) if logical decoding is unavailable. > > > It seems a Postgres-centric solution that you didn't provide strong arguments > > > in favor of it. How would logical replication take advantage of such change? If > > > that's the case, share the pgoutput and logical replication changes. > > > > Having though about the issue for quite some time I suddenly > > discovered, that while ctid can not be used as a permanent enough > > unique id for foreign keys or anything external, it is unique at any > > moment in time making it very much sufficient for logical replication. > > > > The high-level idea is to store the source (publisher) ctid value in > > an extra column for sorce_ctid in the target (subscriber) table, that > > column will also have a unique index and is of course NOT NULL (as > > there can be by definition no row without a ctid) so it will form kind > > of "replication primary key". > > So you're using it as a poor man's id column. Or a smart man's id column for replication :) > I understand how you got to this point, but the right solution here > still is to get the user to specify their own identity column that is > stable across operations, and not to use the volatile and > guaranteed-unstable ctid. Suggestion to "get the user specify their own identity column" sounds good in purely theoretical sense, but can have unacceptable overheads in practice, especially if a large table started out - often for a good reason - without a PK or other suittable identiuty column As I said before, the row id does not have to be stable across row versions for logical replication, it just has to be able to track the identity of "the row that is changed or deleted" which ctid already does very well. You were very worried about extra WAL usage above, but seem to be oblivious of huge resource usage of REPLICA IDENTITY FULL (when used as a poor man's row id and not because the old row data is needed for some other reason) When you at some point discover the need for logical replication of a large table inside a 24/7 production database where you do have occasional updates - or even frequent updates, just not based on unique id - you have currently a few options. 1. add REPLICA IDENTITY FULL - this will double the WAL traffic for updates and usually more than double for DELETEs (could be 1x or 100x) - it can also be REALLY REALLY SLOW to replicate, the worst case requiring 1 sequential scan of the whole table for each UPDATE or DELETE 2. add a primary key column - quite hard to do CONCURRENTLY, will have severe disk and cpu space demands and once it has been added (which could have taken up to a few weeks) it will slow down any inserts. 3. implement updates and deletes in a way similar to overlay file systems, where updates and deletes are in a different table and any reading of the main table needs to join with "the overlay table" for current state. To reiterate - "stable across operations" is not at a requirement for logical replication, tuple id is "stable enough" for streaming replication changes. Think of it as somebody changing the primary key column at each update - it seems weird, but the updated PK still uniquely identifies the tuple for the next operation. > As I also said in my other mail, adding > ctid to the logical replication system will expose too much internal > information and will turn current logical no-ops into logical > operations; Can you provide an example of this? > possibly even bloating the subscriber by a good deal more > than what the publisher cleaned up. The absolute biggest bloater is REPLICA IDENTITY FULL. The beauty of using REPLICA IDENTITY ROWID is that absolutely *nothing extra needs to be added to WAL*. The tids are already there even for physical replication (and even for wal_level=minimal) as they are required even for crash recovery. All my patch to core did is exposing them to logical decodoing mechanism. You do not have to use them, nothing changes for decoding plugins not using them. --- Best Regards Hannu
pgsql-hackers by date: