Re: logical changeset generation v6.2 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: logical changeset generation v6.2 |
Date | |
Msg-id | 20131018182616.GC16188@awork2.anarazel.de Whole thread Raw |
In response to | Re: logical changeset generation v6.2 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: logical changeset generation v6.2
Re: logical changeset generation v6.2 Re: logical changeset generation v6.2 |
List | pgsql-hackers |
On 2013-10-14 09:36:03 -0400, Robert Haas wrote: > > I thought and implemented that in the beginning. Unfortunately it's not > > enough :(. That's probably the issue that took me longest to understand > > in this patchseries... > > > > Combocids can only fix the case where a transaction actually has create > > a combocid: > > > > 1) TX1: INSERT id = 1 at 0/1: (xmin = 1, xmax=Invalid, cmin = 55, cmax = Invalid) > > 2) TX2: DELETE id = 1 at 0/1: (xmin = 1, xmax=2, cmin = Invalid, cmax = 1) > > > > So, if we're decoding data that needs to lookup those rows in TX1 or TX2 > > we both times need access to cmin and cmax, but neither transaction will > > have created a multixact. That can only be an issue in transaction with > > catalog modifications. > > Oh, yuck. So that means you have to write an extra WAL record for > EVERY heap insert, update, or delete to a catalog table? OUCH. So. As it turns out that solution isn't sufficient in the face of VACUUM FULL and mixed DML/DDL transaction that have not yet been decoded. To reiterate, as published it works like: For every modification of catalog tuple (insert, multi_insert, update, delete) that has influence over visibility issue a record that contains: * filenode * ctid * (cmin, cmax) When doing a visibility check on a catalog row during decoding of mixed DML/DDL transaction lookup (cmin, cmax) for that row since we don't store both for the tuple. That mostly works great. The problematic scenario is decoding a transaction that has done mixed DML/DDL *after* a VACUUM FULL/CLUSTER has been performed. The VACUUM FULL obviously changes the filenode and the ctid of a tuple, so we cannot successfully do a lookup based on what we logged before. I know of the following solutions: 1) Don't allow VACUUM FULL on catalog tables if wal_level = logical. 2) Make VACUUM FULL prevent DDL and then wait till all changestreams have decoded up to the current point. 3) don't delete the old relfilenode for VACUUM/CLUSTERs of system tables if there are life decoding slots around, insteaddelegate that responsibility to the slot management. 4) Store both (cmin, cmax) for catalog tuples. I bascially think only 1) and 4) are realistic. And 1) sucks. I've developed a prototype for 4) and except currently being incredibly ugly, it seems to be the most promising approach by far. My trick to store both cmin and cmax is to store cmax in t_hoff managed space when wal_level = logical. That even works when changing wal_level from < logical to logical because only ever need to store both cmin and cmax for transactions that have decodeable content - which they cannot yet have before wal_level = logical. This requires some not so nice things: * A way to declare we're storing both. I've currently chosen HEAP_MOVED_OFF | HEAP_MOVED_IN. That sucks. * A way for heap_form_tuple to know it should add the necessary space to t_hoff. I've added TupleDesc->tdhaswidecid for it. * Fiddling with existing checks for HEAP_MOVED{,OFF,IN} to check for both set at the same time. * Changing the WAL logging to (optionally?) transport the current CommandId instead of always resetting it InvalidCommandId. The benefits are: * Working VACUUM FULL * Much simpler tqual.c logic, everything is stored in the row itself. No hash or something like that built. * No more need to log (relfilenode, cmin, cmax) separately from heap changes itself anymore. In the end, the costs are that individual catalog rows are 4 bytes bigger iff wal_level = logical. That seems acceptable. Some questions remain: * Better idea for a flag than HEAP_MOVED_OFF | HEAP_MOVED_IN * Should we just unconditionally log the current CommandId or make it conditional. We have plenty of flag space to signalwhether it's present, but it's just 4 bytes. Comments? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: