Re: Too rigorous assert in reorderbuffer.c - Mailing list pgsql-hackers

From Arseny Sher
Subject Re: Too rigorous assert in reorderbuffer.c
Date
Msg-id 87r2c8xocq.fsf@ars-thinkpad
Whole thread Raw
In response to Re: Too rigorous assert in reorderbuffer.c  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Too rigorous assert in reorderbuffer.c
List pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:

> Thanks for checking!  I also run it on all branches, everything passes.
> Pushed now.

I'm sorry to bother you with this again, but due to new test our
internal buildfarm revealed that ajacent assert on cmin is also lie. You
see, we can't assume cmin is stable because the same key (relnode, tid)
might refer to completely different tuples if tuple was inserted by
aborted subxact, immeditaly reclaimed and then space occupied by another
one. Fix is attached.

Technically this might mean a user-facing bug, because we only pick the
first cmin which means we might get visibility wrong, allowing to see
some version too early (i.e real cmin of tuple is y, but decoding thinks
it is x, and x < y). However, I couldn't quickly make up an example
where this would actually lead to bad consequences. I tried to create
such extra visible row in pg_attribute, but that's ok because loop in
RelationBuildTupleDesc spins exactly natts times and ignores what is
left unscanned. It is also ok with pg_class, because apparently
ScanPgRelation also fishes out the (right) first tuple and doesn't check
for duplicates appearing later in the scan. Maybe I just haven't tried
hard enough though.

Attached 'aborted_subxact_test.patch' is an illustration of such wrong
cmin visibility on pg_attribute. It triggers assertion failure, but
otherwise ok (no user-facing issues), as I said earlier, so I am
disinclined to include it in the fix.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2b486b5e9f..505c7ff134 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1318,29 +1318,36 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
                         (void *) &key,
                         HASH_ENTER | HASH_FIND,
                         &found);
-        if (!found)
-        {
-            ent->cmin = change->data.tuplecid.cmin;
-            ent->cmax = change->data.tuplecid.cmax;
-            ent->combocid = change->data.tuplecid.combocid;
-        }
-        else
+        if (found)
         {
             /*
-             * Maybe we already saw this tuple before in this transaction,
-             * but if so it must have the same cmin.
+             * Tuple with such key (relnode, tid) might be inserted in aborted
+             * subxact, space reclaimed and then filled with another tuple
+             * from our xact or another; then we might update it again. It
+             * means cmin can become valid/invalid at any moment, but valid
+             * values are always monotonic.
              */
-            Assert(ent->cmin == change->data.tuplecid.cmin);
+            Assert((ent->cmin == InvalidCommandId) ||
+                   (change->data.tuplecid.cmin == InvalidCommandId) ||
+                   (change->data.tuplecid.cmin >= ent->cmin));
 
             /*
-             * cmax may be initially invalid, but once set it can only grow,
-             * and never become invalid again.
+             * Practically the same for cmax; to see invalid cmax after
+             * valid, we need to insert new tuple in place of one reclaimed
+             * from our aborted subxact.
              */
             Assert((ent->cmax == InvalidCommandId) ||
-                   ((change->data.tuplecid.cmax != InvalidCommandId) &&
-                    (change->data.tuplecid.cmax > ent->cmax)));
-            ent->cmax = change->data.tuplecid.cmax;
+                   (change->data.tuplecid.cmax == InvalidCommandId) ||
+                   (change->data.tuplecid.cmax > ent->cmax));
         }
+
+        /*
+         * The 'right' (potentially belonging to committed xact) cmin and cmax
+         * are always the latest.
+         */
+        ent->cmin = change->data.tuplecid.cmin;
+        ent->cmax = change->data.tuplecid.cmax;
+        ent->combocid = change->data.tuplecid.combocid;
     }
 }

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 4afb1d963e..32c2650d1a 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -3,11 +3,11 @@
 MODULES = test_decoding
 PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
 
-REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
+# REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
     decoding_into_rel binary prepared replorigin time messages \
     spill slot truncate
 ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-    oldest_xmin snapshot_transfer
+    oldest_xmin snapshot_transfer aborted_subxact
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
diff --git a/contrib/test_decoding/specs/aborted_subxact.spec b/contrib/test_decoding/specs/aborted_subxact.spec
new file mode 100644
index 0000000000..cb066fd5d2
--- /dev/null
+++ b/contrib/test_decoding/specs/aborted_subxact.spec
@@ -0,0 +1,35 @@
+# Test DDL in aborted subxact
+
+setup
+{
+    SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); -- must be first write
inxact
 
+    CREATE TABLE harvest(apples int);
+}
+
+teardown
+{
+    DROP TABLE IF EXISTS harvest;
+    SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot');
+}
+
+session "s0"
+setup { SET synchronous_commit=on; }
+step "s0_begin" { BEGIN; select 'harvest'::regclass::int; }
+step "s0_alter_0" { SAVEPOINT a; ALTER TABLE harvest ADD COLUMN pears int; ROLLBACK TO SAVEPOINT a; }
+step "s0_alter_1" { ALTER TABLE harvest ADD COLUMN peaches text; }
+step "s0_insert" { INSERT INTO harvest values (42, 'red'); }
+step "s0_alter_2" { ALTER TABLE harvest ADD COLUMN pears int; }
+step "s0_commit" { COMMIT; }
+step "s0_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids',
'0','skip-empty-xacts', '1'); }
 
+
+session "s1"
+setup { SET synchronous_commit=on; }
+step "s1_vacuum" { VACUUM pg_attribute; }
+step "s1_vacuum_full" { VACUUM FULL; }
+
+# alter_0 adds row in pg_attribute for which we remember cmin = 0; it is
+# reclaimed by vacuum
+# alter_1 ensures we decode insert with current cid = 1
+# alter_2 inserts into another row in place where row of alter_0 was, its cmin
+# is 3, however we already remembered it as 0
+permutation "s1_vacuum" "s0_begin" "s0_alter_0" "s0_alter_1" "s0_insert" "s1_vacuum" "s0_alter_2" "s0_commit"
"s0_get_changes"

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: 2019-03 CF Summary / Review - Tranche #2
Next
From: Pavel Stehule
Date:
Subject: Re: 2019-03 CF Summary / Review - Tranche #2