Re: Too rigorous assert in reorderbuffer.c - Mailing list pgsql-hackers
From | Arseny Sher |
---|---|
Subject | Re: Too rigorous assert in reorderbuffer.c |
Date | |
Msg-id | 87r2c8xocq.fsf@ars-thinkpad Whole thread Raw |
In response to | Re: Too rigorous assert in reorderbuffer.c (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Too rigorous assert in reorderbuffer.c
|
List | pgsql-hackers |
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Thanks for checking! I also run it on all branches, everything passes. > Pushed now. I'm sorry to bother you with this again, but due to new test our internal buildfarm revealed that ajacent assert on cmin is also lie. You see, we can't assume cmin is stable because the same key (relnode, tid) might refer to completely different tuples if tuple was inserted by aborted subxact, immeditaly reclaimed and then space occupied by another one. Fix is attached. Technically this might mean a user-facing bug, because we only pick the first cmin which means we might get visibility wrong, allowing to see some version too early (i.e real cmin of tuple is y, but decoding thinks it is x, and x < y). However, I couldn't quickly make up an example where this would actually lead to bad consequences. I tried to create such extra visible row in pg_attribute, but that's ok because loop in RelationBuildTupleDesc spins exactly natts times and ignores what is left unscanned. It is also ok with pg_class, because apparently ScanPgRelation also fishes out the (right) first tuple and doesn't check for duplicates appearing later in the scan. Maybe I just haven't tried hard enough though. Attached 'aborted_subxact_test.patch' is an illustration of such wrong cmin visibility on pg_attribute. It triggers assertion failure, but otherwise ok (no user-facing issues), as I said earlier, so I am disinclined to include it in the fix. -- Arseny Sher Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 2b486b5e9f..505c7ff134 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -1318,29 +1318,36 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn) (void *) &key, HASH_ENTER | HASH_FIND, &found); - if (!found) - { - ent->cmin = change->data.tuplecid.cmin; - ent->cmax = change->data.tuplecid.cmax; - ent->combocid = change->data.tuplecid.combocid; - } - else + if (found) { /* - * Maybe we already saw this tuple before in this transaction, - * but if so it must have the same cmin. + * Tuple with such key (relnode, tid) might be inserted in aborted + * subxact, space reclaimed and then filled with another tuple + * from our xact or another; then we might update it again. It + * means cmin can become valid/invalid at any moment, but valid + * values are always monotonic. */ - Assert(ent->cmin == change->data.tuplecid.cmin); + Assert((ent->cmin == InvalidCommandId) || + (change->data.tuplecid.cmin == InvalidCommandId) || + (change->data.tuplecid.cmin >= ent->cmin)); /* - * cmax may be initially invalid, but once set it can only grow, - * and never become invalid again. + * Practically the same for cmax; to see invalid cmax after + * valid, we need to insert new tuple in place of one reclaimed + * from our aborted subxact. */ Assert((ent->cmax == InvalidCommandId) || - ((change->data.tuplecid.cmax != InvalidCommandId) && - (change->data.tuplecid.cmax > ent->cmax))); - ent->cmax = change->data.tuplecid.cmax; + (change->data.tuplecid.cmax == InvalidCommandId) || + (change->data.tuplecid.cmax > ent->cmax)); } + + /* + * The 'right' (potentially belonging to committed xact) cmin and cmax + * are always the latest. + */ + ent->cmin = change->data.tuplecid.cmin; + ent->cmax = change->data.tuplecid.cmax; + ent->combocid = change->data.tuplecid.combocid; } } diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile index 4afb1d963e..32c2650d1a 100644 --- a/contrib/test_decoding/Makefile +++ b/contrib/test_decoding/Makefile @@ -3,11 +3,11 @@ MODULES = test_decoding PGFILEDESC = "test_decoding - example of a logical decoding output plugin" -REGRESS = ddl xact rewrite toast permissions decoding_in_xact \ +# REGRESS = ddl xact rewrite toast permissions decoding_in_xact \ decoding_into_rel binary prepared replorigin time messages \ spill slot truncate ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \ - oldest_xmin snapshot_transfer + oldest_xmin snapshot_transfer aborted_subxact REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf diff --git a/contrib/test_decoding/specs/aborted_subxact.spec b/contrib/test_decoding/specs/aborted_subxact.spec new file mode 100644 index 0000000000..cb066fd5d2 --- /dev/null +++ b/contrib/test_decoding/specs/aborted_subxact.spec @@ -0,0 +1,35 @@ +# Test DDL in aborted subxact + +setup +{ + SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); -- must be first write inxact + CREATE TABLE harvest(apples int); +} + +teardown +{ + DROP TABLE IF EXISTS harvest; + SELECT 'stop' FROM pg_drop_replication_slot('isolation_slot'); +} + +session "s0" +setup { SET synchronous_commit=on; } +step "s0_begin" { BEGIN; select 'harvest'::regclass::int; } +step "s0_alter_0" { SAVEPOINT a; ALTER TABLE harvest ADD COLUMN pears int; ROLLBACK TO SAVEPOINT a; } +step "s0_alter_1" { ALTER TABLE harvest ADD COLUMN peaches text; } +step "s0_insert" { INSERT INTO harvest values (42, 'red'); } +step "s0_alter_2" { ALTER TABLE harvest ADD COLUMN pears int; } +step "s0_commit" { COMMIT; } +step "s0_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0','skip-empty-xacts', '1'); } + +session "s1" +setup { SET synchronous_commit=on; } +step "s1_vacuum" { VACUUM pg_attribute; } +step "s1_vacuum_full" { VACUUM FULL; } + +# alter_0 adds row in pg_attribute for which we remember cmin = 0; it is +# reclaimed by vacuum +# alter_1 ensures we decode insert with current cid = 1 +# alter_2 inserts into another row in place where row of alter_0 was, its cmin +# is 3, however we already remembered it as 0 +permutation "s1_vacuum" "s0_begin" "s0_alter_0" "s0_alter_1" "s0_insert" "s1_vacuum" "s0_alter_2" "s0_commit" "s0_get_changes"
pgsql-hackers by date: