Thread: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Alvaro Herrera
Date:
So here's my attempt at an explanation for what is going on. At one point, we have this: select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask, to_hex(t_infomask2) as infomask2 from heap_page_items(get_raw_page('t', 0));lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+----------+--------+--------+--------+----------+----------- 1 | 1 | 2 | 0 | (0,1) | 902 | 32 | 0 | | | | | 3 | 1 | 2 | 19928 | (0,4) | 3142 | c003 4 | 1 | 14662 | 19929 | (0,5) | 3142 | c003 5 | 1 | 14663 | 19931 | (0,6) | 3142 | c003 6 | 1| 14664 | 19933 | (0,7) | 3142 | c003 7 | 1 | 14665 | 0 | (0,7) | 2902 | 8003 (7 filas) which shows a HOT-update chain, where the t_xmax are multixacts. Then a vacuum freeze comes, and because the multixacts are below the freeze horizon for multixacts, we get this: select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask, to_hex(t_infomask2) as infomask2 from heap_page_items(get_raw_page('t', 0));lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+----------+--------+--------+--------+----------+----------- 1 | 1 | 2 | 0 | (0,1) | 902 | 32 | 0 | | | | | 3 | 1 | 2 | 14662 | (0,4) | 2502 | c003 4 | 1 | 2 | 14663 | (0,5) | 2502 | c003 5 | 1 | 2 | 14664 | (0,6) | 2502 | c003 6 | 1| 2 | 14665 | (0,7) | 2502 | c003 7 | 1 | 2 | 0 | (0,7) | 2902 | 8003 (7 filas) where the xmin values have all been frozen, and the xmax values are now regular Xids. I think the HOT code that walks the chain fails to detect these as chains, because the xmin values no longer match the xmax values. I modified the multixact freeze code, so that whenever the update Xid is below the cutoff Xid, it's set to FrozenTransactionId, since keeping the other value is invalid anyway (even though we have set the HEAP_XMAX_COMMITTED flag). But that still doesn't fix the problem; as far as I can see, vacuum removes the root of the chain, not yet sure why, and then things are just as corrupted as before. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > which shows a HOT-update chain, where the t_xmax are multixacts. Then a > vacuum freeze comes, and because the multixacts are below the freeze > horizon for multixacts, we get this: > > select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask, > to_hex(t_infomask2) as infomask2 > from heap_page_items(get_raw_page('t', 0)); > lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 > ----+----------+--------+--------+--------+----------+----------- > 1 | 1 | 2 | 0 | (0,1) | 902 | 3 > 2 | 0 | | | | | > 3 | 1 | 2 | 14662 | (0,4) | 2502 | c003 > 4 | 1 | 2 | 14663 | (0,5) | 2502 | c003 > 5 | 1 | 2 | 14664 | (0,6) | 2502 | c003 > 6 | 1 | 2 | 14665 | (0,7) | 2502 | c003 > 7 | 1 | 2 | 0 | (0,7) | 2902 | 8003 > (7 filas) > > where the xmin values have all been frozen, and the xmax values are now > regular Xids. I thought that we no longer store FrozenTransactionId (xid 2) as our "raw" xmin while freezing, and yet that's what we see here. It looks like pageinspect is looking at raw xmin (it's calling HeapTupleHeaderGetRawXmin(), not HeapTupleHeaderGetXmin()). What's the deal with that? Shouldn't the original xmin be preserved for forensic analysis, following commit 37484ad2? I guess what you mean is that this is what you see having modified the code to actually store FrozenTransactionId as xmin once more, in an effort to fix this? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > But that still doesn't fix the problem; > as far as I can see, vacuum removes the root of the chain, not yet sure > why, and then things are just as corrupted as before. Are you sure it's not opportunistic pruning? Another thing that I've noticed with this problem is that the relevant IndexTuple will pretty quickly vanish, presumably due to LP_DEAD setting (but maybe not actually due to LP_DEAD setting). (Studies the problem some more...) I now think that it actually is a VACUUM problem, specifically a problem with VACUUM pruning. You see the HOT xmin-to-xmax check pattern that you mentioned within heap_prune_chain(), which looks like where the incorrect tuple prune (or possibly, at times, redirect?) takes place. (I refer to the prune/kill that you mentioned today, that frustrated your first attempt at a fix -- "I modified the multixact freeze code...".) The attached patch "fixes" the problem -- I cannot get amcheck to complain about corruption with this applied. And, "make check-world" passes. Hopefully it goes without saying that this isn't actually my proposed fix. It tells us something that this at least *masks* the problem, though; it's a start. FYI, the repro case page contents looks like this with the patch applied: postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask, to_hex(t_infomask2) as infomask2 from heap_page_items(get_raw_page('t', 0)); lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 ----+----------+---------+--------+--------+----------+----------- 1 | 1 | 1845995 | 0 | (0,1) | b02 | 3 2 | 2 | | | | | 3 | 0 | | | | | 4 | 0 | | | | | 5 | 0 | | | | | 6 | 0 | | | | | 7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003 (7 rows) -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Michael Paquier
Date:
On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan <pg@bowt.ie> wrote: > I now think that it actually is a VACUUM problem, specifically a > problem with VACUUM pruning. You see the HOT xmin-to-xmax check > pattern that you mentioned within heap_prune_chain(), which looks like > where the incorrect tuple prune (or possibly, at times, redirect?) > takes place. (I refer to the prune/kill that you mentioned today, that > frustrated your first attempt at a fix -- "I modified the multixact > freeze code...".) My lookup of the problem converges to the same conclusion. Something is wrong with the vacuum's pruning. I have spent some time trying to design a patch, all the solutions I tried have proved to make the problem harder to show up, but it still showed up, sometimes after dozens of attempts. > The attached patch "fixes" the problem -- I cannot get amcheck to > complain about corruption with this applied. And, "make check-world" > passes. Hopefully it goes without saying that this isn't actually my > proposed fix. It tells us something that this at least *masks* the > problem, though; it's a start. Yep. > FYI, the repro case page contents looks like this with the patch applied: > postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid, > to_hex(t_infomask) as infomask, > to_hex(t_infomask2) as infomask2 > from heap_page_items(get_raw_page('t', 0)); > lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 > ----+----------+---------+--------+--------+----------+----------- > 1 | 1 | 1845995 | 0 | (0,1) | b02 | 3 > 2 | 2 | | | | | > 3 | 0 | | | | | > 4 | 0 | | | | | > 5 | 0 | | | | | > 6 | 0 | | | | | > 7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003 > (7 rows) Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is what would look correct to me. - * Check the tuple XMIN against prior XMAX, if any - */ - if (TransactionIdIsValid(priorXmax) && - !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax)) - break; If you remove this check, you could also remove completely priorXmax. Actually, I may be missing something, but why is priorXmax updated even for dead tuples? For example just doing that is also taking care of the problem: @@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum, Assert(ItemPointerGetBlockNumber(&htup->t_ctid) == BufferGetBlockNumber(buffer)); offnum = ItemPointerGetOffsetNumber(&htup->t_ctid); - priorXmax = HeapTupleHeaderGetUpdateXid(htup); + if (!tupdead) + priorXmax = HeapTupleHeaderGetUpdateXid(htup); } -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I’ve just started looking at this again after a few weeks break. There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexeditem). The cause of that is: pruneheap.c: /* * Check the tuple XMIN against prior XMAX, if any */ if (TransactionIdIsValid(priorXmax) && !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax)) break; chainitems[nchain++] = offnum; The priorXmax is a multixact key share lock, thus not equal to xmin. This terminates the scan. Unlike the scan terminationwith “if (!recent_dead) break;” below the switch (...SatisiesVacuum…), this “break” does not put the offnum intothe chain even though it is in the chain. If the first not-deleted item isn’t put in the chain then we’ll not call heap_prune_record_redirect(). I do not know what the above ‘if’ test is protecting. Normally the xmin is equal to the priorXmax. Other than protectingagainst corruption a key share lock can cause this. So, I tried a fix which does the “if” check after adding itto chainitems. This will break whatever real situation this IF was protecting. Tom Lane put this in. OK: With that hack of a fix the redirect now works correctly. However, we still get the reindex problem with not findingthe parent. That problem is caused by: Pruneheap.c:heap_get_root_tuples() if (TransactionIdIsValid(priorXmax) && !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup))) break; In this case, instead of these not being equal because of a multixact key share lock, it is because XMIN is frozen and FrozenTransactionIddoesn’t equal the priorXmax. Thus, we don’t build the entire chain from root to most current row versionand this causes the reindex failure. If we disable this ‘if’ as a hack then we no longer get a problem on the reindex. However, YiWen reported that at the endof an install check out index checking reported corruption in the system catalogues. So we are still looking. We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against withthe two exceptions I’ve found? FYI, someone should look at the same ”if” test in heapam.c: heap_lock_updated_tuple_rec(). Also, I hope there are no strangeissues with concurrent index builds. Finally, the idea behind the original fix was to simply NOT to do an unsupported freeze on a dead tuple. It had two drawbacks: 1) CLOG truncation. This could have been handled by keeping track of the old unpruned item found and using that to updatethe table’s/DB’s freeze xid. 2) Not making freeze progress. The only reason the prune would fail should be because of an open TXN. Unless that TXNwas so old such that it’s XID was as old as the ?min freeze threshold? then we would make progress. If we were doingTXN’s that old then we’d be having problems anyway. On 10/3/17, 5:15 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote: On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan <pg@bowt.ie> wrote: > I now think that it actually is a VACUUM problem,specifically a > problem with VACUUM pruning. You see the HOT xmin-to-xmax check > pattern that you mentionedwithin heap_prune_chain(), which looks like > where the incorrect tuple prune (or possibly, at times, redirect?) > takes place. (I refer to the prune/kill that you mentioned today, that > frustrated your first attempt ata fix -- "I modified the multixact > freeze code...".) My lookup of the problem converges to the same conclusion.Something is wrong with the vacuum's pruning. I have spent some time trying to design a patch, all the solutionsI tried have proved to make the problem harder to show up, but it still showed up, sometimes after dozens ofattempts. > The attached patch "fixes" the problem -- I cannot get amcheck to > complain about corruption with thisapplied. And, "make check-world" > passes. Hopefully it goes without saying that this isn't actually my > proposedfix. It tells us something that this at least *masks* the > problem, though; it's a start. Yep. > FYI,the repro case page contents looks like this with the patch applied: > postgres=# select lp, lp_flags, t_xmin, t_xmax,t_ctid, > to_hex(t_infomask) as infomask, > to_hex(t_infomask2) as infomask2 > from heap_page_items(get_raw_page('t',0)); > lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 > ----+----------+---------+--------+--------+----------+----------- > 1 | 1 | 1845995 | 0 | (0,1) | b02 | 3 > 2 | 2 | | | | | > 3 | 0 | | | | | > 4 | 0 | | | | | > 5 | 0 | | | | | > 6 | 0 | | | | | > 7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003 > (7 rows) Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is what would look correctto me. - * Check the tuple XMIN against prior XMAX, if any - */ - if (TransactionIdIsValid(priorXmax)&& - !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax)) - break; If you remove this check, you could also remove completely priorXmax. Actually, I may be missing something,but why is priorXmax updated even for dead tuples? For example just doing that is also taking care of the problem: @@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum, Assert(ItemPointerGetBlockNumber(&htup->t_ctid)== BufferGetBlockNumber(buffer)); offnum = ItemPointerGetOffsetNumber(&htup->t_ctid); - priorXmax = HeapTupleHeaderGetUpdateXid(htup); + if (!tupdead) + priorXmax = HeapTupleHeaderGetUpdateXid(htup); } -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Tue, Oct 3, 2017 at 5:15 PM, Michael Paquier <michael.paquier@gmail.com> wrote: >> FYI, the repro case page contents looks like this with the patch applied: >> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid, >> to_hex(t_infomask) as infomask, >> to_hex(t_infomask2) as infomask2 >> from heap_page_items(get_raw_page('t', 0)); >> lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 >> ----+----------+---------+--------+--------+----------+----------- >> 1 | 1 | 1845995 | 0 | (0,1) | b02 | 3 >> 2 | 2 | | | | | >> 3 | 0 | | | | | >> 4 | 0 | | | | | >> 5 | 0 | | | | | >> 6 | 0 | | | | | >> 7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003 >> (7 rows) > > Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is > what would look correct to me. Yes, it is: postgres=# select * from bt_page_items('foo', 1);itemoffset | ctid | itemlen | nulls | vars | data ------------+-------+---------+-------+------+------------------------- 1 | (0,1) | 16 | f | f | 01 0000 00 00 00 00 00 2 | (0,2) | 16 | f | f | 03 00 00 00 00 00 00 00 (2 rows) I can tell from looking at my hex editor that the 4 bytes of ItemId that we see for position '(0,2)' in the ItemId array are "07 00 01 00", meaning that '(0,2)' this is a LP_REDIRECT item, repointing us to '(0,7)'. Everything here looks sane to me, at least at first blush. > - * Check the tuple XMIN against prior XMAX, if any > - */ > - if (TransactionIdIsValid(priorXmax) && > - !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax)) > - break; > If you remove this check, you could also remove completely priorXmax. > > Actually, I may be missing something, but why is priorXmax updated > even for dead tuples? For example just doing that is also taking care > of the problem: I'll study what you suggest here some more tomorrow. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote: > I’ve just started looking at this again after a few weeks break. > if (TransactionIdIsValid(priorXmax) && > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup))) > break; > We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against withthe two exceptions I’ve found? I haven't read your remarks here in full, since I'm about to stop working for the day, but I will point out that src/backend/access/heap/README.HOT says a fair amount about this, under "Abort Cases". -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
One minor side note… Is it weird for xmin/xmax to go backwards in a hot row chain? lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax ----+--------+--------+------------+-------------+--------+-------- 1 | (0,1) | 8152 | 2818 | 3 | 36957| 0 2 | | 5 | | | | 3 | | 0 | | | | 4 | | 0 | | | | 5 | (0,6) | 8112 | 9986| 49155 | 36962 | 36963 6 | (0,7) | 8072 | 9986 | 49155 | 36963 | 36961 7 | (0,7) | 8032| 11010 | 32771 | 36961 | 0 (7 rows) On 10/3/17, 6:20 PM, "Peter Geoghegan" <pg@bowt.ie> wrote: On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote: > I’ve just started looking at this again aftera few weeks break. > if (TransactionIdIsValid(priorXmax) && > !TransactionIdEquals(priorXmax,HeapTupleHeaderGetXmin(htup))) > break; > We need to understandwhy these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptionsI’ve found? I haven't read your remarks here in full, since I'm about to stop working for the day, but Iwill point out that src/backend/access/heap/README.HOT says a fair amount about this, under "Abort Cases". -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition shouldbe: > if (TransactionIdIsValid(priorXmax) && > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup))) > break; So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transactionhappened The interesting consequence of changing that is the prune seems to get the entire chain altogether with Dan's repro... I'verun it a couple of times and have consistently gotten the following page lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2 ----+--------+--------+----------+------------+------------- 1 | (0,1) | 8152 | 1 | 2818 | 3 2| | 7 | 2 | | 3 | | 0 | 0 | | 4 | | 0 | 0 | | 5 | | 0 | 0 | | 6 | | 0 | 0 | | 7 | (0,7) | 8112 | 1 | 11010 | 32771 (7 rows) I've made this change to conditions in both heap_prune_chain and heap_get_root_tuples and this seems to cause things to passmy smoke tests. I'll look into this deeper tomorrow. Yi Wen ________________________________________ From: Peter Geoghegan <pg@bowt.ie> Sent: Tuesday, October 3, 2017 6:19 PM To: Wood, Dan Cc: Michael Paquier; Alvaro Herrera; PostgreSQL Hackers; Wong, Yi Wen Subject: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote: > I’ve just started looking at this again after a few weeks break. > if (TransactionIdIsValid(priorXmax) && > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup))) > break; > We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against withthe two exceptions I’ve found? I haven't read your remarks here in full, since I'm about to stop working for the day, but I will point out that src/backend/access/heap/README.HOT says a fair amount about this, under "Abort Cases". -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Alvaro Herrera
Date:
Peter Geoghegan wrote: > I thought that we no longer store FrozenTransactionId (xid 2) as our > "raw" xmin while freezing, and yet that's what we see here. I'm doing this in 9.3. I can't tell if the new tuple freezing stuff broke things more thoroughly, but it is already broken in earlier releases. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Wood, Dan wrote: > There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexeditem). The cause of that is: > pruneheap.c: > > /* > * Check the tuple XMIN against prior XMAX, if any > */ > if (TransactionIdIsValid(priorXmax) && > !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax)) > break; > > chainitems[nchain++] = offnum; > > The priorXmax is a multixact key share lock, Uhh, what? That certainly shouldn't happen -- the priorXmax comes from priorXmax = HeapTupleHeaderGetUpdateXid(htup); so only the XID of the update itself should be reported, not a multixact and certainly not just a tuple lock XID. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Wood, Dan wrote: > One minor side note… Is it weird for xmin/xmax to go backwards in a hot row chain? > > lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax > ----+--------+--------+------------+-------------+--------+-------- > 1 | (0,1) | 8152 | 2818 | 3 | 36957 | 0 > 2 | | 5 | | | | > 3 | | 0 | | | | > 4 | | 0 | | | | > 5 | (0,6) | 8112 | 9986 | 49155 | 36962 | 36963 > 6 | (0,7) | 8072 | 9986 | 49155 | 36963 | 36961 > 7 | (0,7) | 8032 | 11010 | 32771 | 36961 | 0 > (7 rows) No, it just means transaction A got its XID before transaction B, but B executed the update first and A updated the tuple second. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera wrote: > Peter Geoghegan wrote: > > > I thought that we no longer store FrozenTransactionId (xid 2) as our > > "raw" xmin while freezing, and yet that's what we see here. > > I'm doing this in 9.3. I can't tell if the new tuple freezing stuff > broke things more thoroughly, but it is already broken in earlier > releases. In fact, I think in 9.3 we should include this patch, to set the Xmin to FrozenXid. 9.4 and onwards have commit 37484ad2a "Change the way we mark tuples as frozen" which uses a combination of infomask bits, but in 9.3 I think leaving the unfrozen value in the xmax field is a bad idea even if we set the HEAP_XMAX_COMMITTED bit. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Wong, Yi Wen wrote: > My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition shouldbe: > > > if (TransactionIdIsValid(priorXmax) && > > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup))) > > break; > > So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transactionhappened I independently arrived at the same conclusion. Since I was trying with 9.3, the patch differs -- in the old version we must explicitely test for the FrozenTransactionId value, instead of using GetRawXmin. Attached is the patch I'm using, and my own oneliner test (pretty much the same I posted earlier) seems to survive dozens of iterations without showing any problem in REINDEX. This patch is incomplete, since I think there are other places that need to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?). Of course, for 9.4 and onwards we need to patch like you described. This bit in EvalPlanQualFetch caught my attention ... why is it saying xmin never changes? It does change with freezing. /* * If xmin isn't what we're expecting, the slot must have been * recycled and reused for an unrelated tuple. This implies that * the latest version of the row was deleted, so we need do * nothing. (Should be safe to examine xmin without getting * buffer's content lock, since xmin never changes in an existing * tuple.) */ if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data), priorXmax)) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Tue, Oct 3, 2017 at 8:43 PM, Wong, Yi Wen <yiwong@amazon.com> wrote: > My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition shouldbe: > >> if (TransactionIdIsValid(priorXmax) && >> !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup))) >> break; > > So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transactionhappened I was thinking along similar lines. > The interesting consequence of changing that is the prune seems to get the entire chain altogether with Dan's repro...I've run it a couple of times and have consistently gotten the following page > > lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2 > ----+--------+--------+----------+------------+------------- > 1 | (0,1) | 8152 | 1 | 2818 | 3 > 2 | | 7 | 2 | | > 3 | | 0 | 0 | | > 4 | | 0 | 0 | | > 5 | | 0 | 0 | | > 6 | | 0 | 0 | | > 7 | (0,7) | 8112 | 1 | 11010 | 32771 > (7 rows) That's also what I see. This is a good thing, I think; that's how we ought to prune. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Sorry. Should have looked at the macro. I sent this too soon. The early “break;” here is likely the xmin frozen reasonas I found in the other loop. On 10/4/17, 2:52 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote: Wood, Dan wrote: > There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexeditem). The cause of that is: > pruneheap.c: > > /* > * Check the tuple XMIN against prior XMAX, if any > */ > if (TransactionIdIsValid(priorXmax) && > !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax)) > break; > > chainitems[nchain++] = offnum; > > The priorXmax is a multixact key share lock, Uhh, what? That certainly shouldn't happen -- the priorXmax comes from priorXmax = HeapTupleHeaderGetUpdateXid(htup); so only the XID of the update itself should be reported, not a multixact and certainly not just a tuple lock XID. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote: > I’ve just started looking at this again after a few weeks break. > if (TransactionIdIsValid(priorXmax) && > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup))) > break; > We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against withthe two exceptions I’ve found? I haven't read your remarks here in full, since I'm about to stop working for the day, but I will point out that src/backend/access/heap/README.HOT says a fair amount about this, under "Abort Cases". -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Wed, Oct 4, 2017 at 11:00 AM, Wood, Dan <hexpert@amazon.com> wrote: > The early “break;” here is likely the xmin frozen reason as I found in the other loop. It looks that way. Since we're already very defensive when it comes to this xmin/xmax matching business, and we're defensive while following an update chain more generally, I wonder if we should be checking HeapTupleHeaderIsSpeculative() on versions >= 9.5 (versions with ON CONFLICT DO UPDATE, where t_ctid block number might actually be a speculative insertion token). Or, at least acknowledging that case in comments. I remember expressing concern that something like this was possible at the time that that went in. We certainly don't want to have heap_abort_speculative() "super delete" the wrong tuple in the event of item pointer recycling. There are at least defensive sanity checks that would turn that into an error within heap_abort_speculative(), so it wouldn't be a disaster if it was attempted. I don't think that it's possible in practice, and maybe it's sufficient that routines like heap_get_latest_tid() check for a sane item offset, which will discriminate against SpecTokenOffsetNumber, which must be well out of range for ItemIds on the page. Could be worth a comment. (I guess that heap_prune_chain() wouldn't need to be changed if we decide to add such comments, because the speculative tuple ItemId is going to be skipped over due to being ItemIdIsUsed() before we even get there.) -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Wong, Yi Wen wrote: >> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition shouldbe: >> >> > if (TransactionIdIsValid(priorXmax) && >> > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup))) >> > break; >> >> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transactionhappened As you know, on version 9.4+, as of commit 37484ad2a, we decided that we are "largely ignoring the value to which it [xmin] is set". The expectation became that raw xmin is available after freezing, but mostly for forensic purposes. I think Alvaro should now memorialize the idea that its value is actually critical in some place (htup_details.h?). > I independently arrived at the same conclusion. Since I was trying with > 9.3, the patch differs -- in the old version we must explicitely test > for the FrozenTransactionId value, instead of using GetRawXmin. Obviously you're going to have to be prepared for a raw xmin of FrozenTransactionId, even on 9.4+, due to pg_upgrade. I can see why it would be safe (or at least no more dangerous) to rely on HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on installations that initdb'd on a version after commit 37484ad2a (version 9.4+). However, I'm not sure why what you propose here would be safe when even raw xmin happens to be FrozenTransactionId. Are you sure that that's truly race-free? If it's really true that we only need to check for FrozenTransactionId on 9.3, why not just do that on all versions, and never bother with HeapTupleHeaderGetRawXmin()? ("Sheer paranoia" is a valid answer; I just want us to be clear on the reasoning.) Obviously any race would have a ridiculously tiny window, but it's not obvious why this protocol would be completely race-free (in the event of a FrozenTransactionId raw xmin). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Michael Paquier
Date:
On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Wong, Yi Wen wrote: >> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition shouldbe: >> >> > if (TransactionIdIsValid(priorXmax) && >> > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup))) >> > break; >> >> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transactionhappened > > I independently arrived at the same conclusion. Since I was trying with > 9.3, the patch differs -- in the old version we must explicitely test > for the FrozenTransactionId value, instead of using GetRawXmin. > Attached is the patch I'm using, and my own oneliner test (pretty much > the same I posted earlier) seems to survive dozens of iterations without > showing any problem in REINDEX. Confirmed, the problem goes away with this patch on 9.3. > This patch is incomplete, since I think there are other places that need > to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?). > Of course, for 9.4 and onwards we need to patch like you described. I have just done a lookup of the source code, and here is an exhaustive list of things in need of surgery: - heap_hot_search_buffer - heap_get_latest_tid - heap_lock_updated_tuple_rec - heap_prune_chain - heap_get_root_tuples - rewrite_heap_tuple - EvalPlanQualFetch (twice) > This bit in EvalPlanQualFetch caught my attention ... why is it saying > xmin never changes? It does change with freezing. > > /* > * If xmin isn't what we're expecting, the slot must have been > * recycled and reused for an unrelated tuple. This implies that > * the latest version of the row was deleted, so we need do > * nothing. (Should be safe to examine xmin without getting > * buffer's content lock, since xmin never changes in an existing > * tuple.) > */ > if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data), > priorXmax)) Agreed. That's not good. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I canstill get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks. Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a whileI usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might havea lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuumprocess starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table. On 10/4/17, 6:31 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote: On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Wong, Yi Wen wrote: >> My interpretationof README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be: >> >> > if (TransactionIdIsValid(priorXmax) && >> > !TransactionIdEquals(priorXmax,HeapTupleHeaderGetRawXmin(htup))) >> > break; >> >> So the differenceis GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened > > I independently arrived at the same conclusion. Since I was trying with > 9.3, the patch differs -- inthe old version we must explicitely test > for the FrozenTransactionId value, instead of using GetRawXmin. > Attachedis the patch I'm using, and my own oneliner test (pretty much > the same I posted earlier) seems to survive dozensof iterations without > showing any problem in REINDEX. Confirmed, the problem goes away with this patch on9.3. > This patch is incomplete, since I think there are other places that need > to be patched in the same way(EvalPlanQualFetch? heap_get_latest_tid?). > Of course, for 9.4 and onwards we need to patch like you described. I have just done a lookup of the source code, and here is an exhaustive list of things in need of surgery: - heap_hot_search_buffer - heap_get_latest_tid - heap_lock_updated_tuple_rec - heap_prune_chain - heap_get_root_tuples - rewrite_heap_tuple - EvalPlanQualFetch (twice) > This bit in EvalPlanQualFetch caught my attention... why is it saying > xmin never changes? It does change with freezing. > > /* > * If xmin isn't what we're expecting, the slot must have been > *recycled and reused for an unrelated tuple. This implies that > * the latest version of therow was deleted, so we need do > * nothing. (Should be safe to examine xmin without getting > * buffer's content lock, since xmin never changes in an existing > * tuple.) > */ > if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data), > priorXmax)) Agreed. That's not good. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Michael Paquier
Date:
On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan <hexpert@amazon.com> wrote: > Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix Ican still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks. Interesting. Which version did you test? Only 9.6? > Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a whileI usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might havea lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuumprocess starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table. Yeah, that's more or less what I have been doing. My tests involve using your initial script with way more sessions triggering lock.sql, minus the kill-9 portion (good idea actually). I can of course see the sessions queuing for VACUUM, still I cannot see duplicated rows, even if I headshot Postgres in the middle of the VACUUM waiting queue. Note that I have just tested Alvaro's patch on 9.3. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Wood, Dan wrote: > Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix Ican still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks. Good idea. You can achieve a similar effect by adding a filler column, and reducing fillfactor. > Once nearly all 250 clients have done their updates and everybody is > waiting to vacuum which one by one will take a while I usually just > “pkill -9 psql”. After that I have many of duplicate “id=3” rows. Odd ... > On top of that I think we might have a lock leak. After the pkill I > tried to rerun setup.sql to drop/create the table and it hangs. I see > an autovacuum process starting and existing every couple of seconds. > Only by killing and restarting PG can I drop the table. Please do try to figure this one out. It'd be a separate problem, worthy of its own thread. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote: > As you know, on version 9.4+, as of commit 37484ad2a, we decided that > we are "largely ignoring the value to which it [xmin] is set". The > expectation became that raw xmin is available after freezing, but > mostly for forensic purposes. I think Alvaro should now memorialize > the idea that its value is actually critical in some place > (htup_details.h?). I'd love to be certain that we preserve the original value in all cases. > On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > I independently arrived at the same conclusion. Since I was trying with > > 9.3, the patch differs -- in the old version we must explicitely test > > for the FrozenTransactionId value, instead of using GetRawXmin. > > Obviously you're going to have to be prepared for a raw xmin of > FrozenTransactionId, even on 9.4+, due to pg_upgrade. Hmm. It would be good to be able to get rid of that case, actually, because I now think it's less safe (see below). At any rate, I was thinking in a new routine to encapsulate the logic, /* * Check the tuple XMIN against prior XMAX, if any */ if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax,HeapTupleHeaderGetXmin(htup))) break; where ..XmaxMatchesXmin would know about checking for possibly frozen tuples. > I can see why it > would be safe (or at least no more dangerous) to rely on > HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on > installations that initdb'd on a version after commit 37484ad2a > (version 9.4+). However, I'm not sure why what you propose here would > be safe when even raw xmin happens to be FrozenTransactionId. Are you > sure that that's truly race-free? If it's really true that we only > need to check for FrozenTransactionId on 9.3, why not just do that on > all versions, and never bother with HeapTupleHeaderGetRawXmin()? > ("Sheer paranoia" is a valid answer; I just want us to be clear on the > reasoning.) I think the RawXmin is the better mechanism. I'm not absolutely certain that the windows is completely closed in 9.3; as I understand things, it is possible for transaction A to prune an aborted heap-only tuple, then transaction B to insert a frozen tuple in the same location, then transaction C follows a link to the HOT that was pruned. I think we'd end up considering that the new frozen tuple is part of the chain, which is wrong ... In 9.4 we can compare the real xmin. I hope I am proved wrong about this, because if not, I think we're looking at an essentially unsolvable problem in 9.3. > Obviously any race would have a ridiculously tiny window, but it's not > obvious why this protocol would be completely race-free (in the event > of a FrozenTransactionId raw xmin). As far as I understand, this problem only emerges if one part of a HOT chain reaches the min freeze age while another part of the same chain is still visible by some running transaction. It is particularly noticeably in our current test case because we use a min freeze age of 0, with many concurrrent modifying the same page. What this says to me is that VACUUM FREEZE is mildly dangerous when there's lots of high concurrent HOT UPDATE activity. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I think this is the patch for 9.3. I ran the test a few hundred times (with some additional changes such as randomly having an update inside a savepoint that's randomly aborted, randomly aborting the transaction, randomly skipping the for key share lock, randomly sleeping at a few points; and also adding filler columns, reducing fillfactor and using 5, 50, 180, 250, 500 sessions after verifying that it causes the tuples to stay in the same page or migrate to later pages). The final REINDEX has never complained again about failing to find the root tuple. I hope it's good now. The attached patch needs a few small tweaks, such as improving commentary in the new function, maybe turn it into a macro (otherwise I think it could be bad for performance; I'd like a static func but not sure those are readily available in 9.3), change the XID comparison to use the appropriate macro rather than ==, and such. Regarding changes of xmin/xmax comparison, I also checked manually the spots I thought should be modified and later double-checked against the list that Michael posted. It's a match, except for rewriteheap.c which I cannot make heads or tails about. (I think it's rather unfortunate that it sticks a tuple's Xmax into a field that's called Xmin, but let's put that aside). Maybe there's a problem here, maybe there isn't. I'm now going to forward-port this to 9.4. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Thu, Oct 5, 2017 at 4:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > At any rate, I was thinking in a new routine to encapsulate the logic, > > /* > * Check the tuple XMIN against prior XMAX, if any > */ > if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, HeapTupleHeaderGetXmin(htup))) > break; > > where ..XmaxMatchesXmin would know about checking for possibly frozen > tuples. Makes sense. >> I can see why it >> would be safe (or at least no more dangerous) to rely on >> HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on >> installations that initdb'd on a version after commit 37484ad2a >> (version 9.4+). However, I'm not sure why what you propose here would >> be safe when even raw xmin happens to be FrozenTransactionId. Are you >> sure that that's truly race-free? If it's really true that we only >> need to check for FrozenTransactionId on 9.3, why not just do that on >> all versions, and never bother with HeapTupleHeaderGetRawXmin()? >> ("Sheer paranoia" is a valid answer; I just want us to be clear on the >> reasoning.) > > I think the RawXmin is the better mechanism. I'm not absolutely certain > that the windows is completely closed in 9.3; as I understand things, it > is possible for transaction A to prune an aborted heap-only tuple, then > transaction B to insert a frozen tuple in the same location, then > transaction C follows a link to the HOT that was pruned. I think we'd > end up considering that the new frozen tuple is part of the chain, which > is wrong ... In 9.4 we can compare the real xmin. Good point: the race doesn't exist on 9.4+ with pg_upgrade from 9.3, when raw xmin happens to be FrozenTransactionId, because there can be no new tuples that have that property. This possible race is strictly a 9.3 issue. (You have to deal with a raw xmin of FrozenTransactionId on 9.4+, but there are no race conditions, because affected tuples are all pre-pg_upgrade tuples -- they were frozen months ago, not microseconds ago.) > I hope I am proved wrong about this, because if not, I think we're > looking at an essentially unsolvable problem in 9.3. Well, as noted within README.HOT, the xmin/xmax matching did not appear in earlier versions of the original HOT patch, because it was thought that holding a pin of the buffer was a sufficient interlock against concurrent line pointer recycling (pruning must have a buffer cleanup lock). Clearly it is more robust to match xmin to xmax, but is there actually any evidence that it was really necessary at all (for single-page HOT chain traversals) when HOT went in in 2007, or that it has since become necessary? heap_page_prune()'s caller has to have a buffer cleanup lock. I'm certainly not advocating removing the xmin/xmax matching within heap_prune_chain() on 9.3. However, it may be acceptable to rely on holding a buffer cleanup lock within heap_prune_chain() on 9.3, just for the rare/theoretical cases where the FrozenTransactionId raw xmin ambiguity means that xmin/xmax matching alone might not be enough. As you say, it seems unsolvable through looking at state on the page directly on 9.3, so there may be no other way. And, that's still strictly better than what we have today. > As far as I understand, this problem only emerges if one part of a HOT > chain reaches the min freeze age while another part of the same chain is > still visible by some running transaction. It is particularly > noticeably in our current test case because we use a min freeze age of > 0, with many concurrrent modifying the same page. What this says to me > is that VACUUM FREEZE is mildly dangerous when there's lots of high > concurrent HOT UPDATE activity. I'm not sure what you mean here. It is dangerous right now, because there is a bug, but we can squash the bug. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. I would prefer to focus on either latest 9X or 11dev. Does Alvaro’s patch presume any of the other patch to set COMMITTEDin the freeze code? On 10/4/17, 7:17 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote: On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan <hexpert@amazon.com> wrote: > Whatever you do make sure to also test 250clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What worksfor “in-block” hot chains may not work when spanning blocks. Interesting. Which version did you test? Only 9.6? > Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will takea while I usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think wemight have a lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuumprocess starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table. Yeah, that's more or less what I have been doing. My tests involve using your initial script with way more sessionstriggering lock.sql, minus the kill-9 portion (good idea actually). I can of course see the sessions queuingfor VACUUM, still I cannot see duplicated rows, even if I headshot Postgres in the middle of the VACUUM waitingqueue. Note that I have just tested Alvaro's patch on 9.3. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Michael Paquier
Date:
On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I think this is the patch for 9.3. I ran the test a few hundred times > (with some additional changes such as randomly having an update inside a > savepoint that's randomly aborted, randomly aborting the transaction, > randomly skipping the for key share lock, randomly sleeping at a few > points; and also adding filler columns, reducing fillfactor and using > 5, 50, 180, 250, 500 sessions after verifying that it causes the tuples > to stay in the same page or migrate to later pages). The final REINDEX > has never complained again about failing to find the root tuple. I hope > it's good now. I have looked and played with your patch as well for a couple of hours, and did not notice any failures again. The structure of the pages looked sane as well, I could see also with pageinspect a correct HOT chain using the first set of tests provided on this thread. > The attached patch needs a few small tweaks, such as improving > commentary in the new function, maybe turn it into a macro (otherwise I > think it could be bad for performance; I'd like a static func but not > sure those are readily available in 9.3), change the XID comparison to > use the appropriate macro rather than ==, and such. Yeah that would be better. > Regarding changes of xmin/xmax comparison, I also checked manually the > spots I thought should be modified and later double-checked against the > list that Michael posted. Thanks. I can see see that your patch is filling the holes. > It's a match, except for rewriteheap.c which > I cannot make heads or tails about. (I think it's rather unfortunate > that it sticks a tuple's Xmax into a field that's called Xmin, but let's > put that aside). Maybe there's a problem here, maybe there isn't. rewrite_heap_tuple is used only by CLUSTER, and the oldest new tuples are frozen before being rewritten. So this looks safe to me at the end. > I'm now going to forward-port this to 9.4. + /* + * If the xmax of the old tuple is identical to the xmin of the new one, + * it's a match. + */ + if (xmax == xmin) + return true; I would use TransactionIdEquals() here, to remember once you switch that to a macro. +/* + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin, + * taking into account that the Xmin might have been frozen. + */ [...] + /* + * We actually don't know if there's a match, but if the previous tuple + * was frozen, we cannot really rely on a perfect match. + */ -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier wrote: > On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > + /* > + * If the xmax of the old tuple is identical to the xmin of the new one, > + * it's a match. > + */ > + if (xmax == xmin) > + return true; > I would use TransactionIdEquals() here, to remember once you switch > that to a macro. I've had second thoughts about the macro thing -- for now I'm keeping it a function, actually. > +/* > + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin, > + * taking into account that the Xmin might have been frozen. > + */ > [...] > + /* > + * We actually don't know if there's a match, but if the previous tuple > + * was frozen, we cannot really rely on a perfect match. > + */ I don't know what you had in mind here, but I tweaked the 9.3 version so that it now looks like this: /** HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage** Given the new version of a tuple after someupdate, verify whether the* given Xmax (corresponding to the previous version) matches the tuple's* Xmin, taking intoaccount that the Xmin might have been frozen after the* update.*/ bool HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup) {TransactionId xmin = HeapTupleHeaderGetXmin(htup); /* * If the xmax of the old tuple is identical to the xmin of the new one, * it's a match. */if (TransactionIdEquals(xmax,xmin)) return true; /* * When a tuple is frozen, the original Xmin is lost, but we know it's a * committed transaction. So unless the Xmax isInvalidXid, we don't * know for certain that there is a match, but there may be one; and we * must return true so thata HOT chain that is half-frozen can be walked * correctly. */if (TransactionIdEquals(xmin, FrozenTransactionId) && TransactionIdIsValid(xmax)) return true; return false; } -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Wood, Dan wrote: > Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. > > I would prefer to focus on either latest 9X or 11dev. I tested my patch on 9.4 and 9.5 today and it seems to close the problem (with the patch, I waited 10x as many iterations as it took for the problem to occur ~10 times without the patch), but I can reproduce a problem in 9.6 with my patch installed. There must be something new in 9.6 that is causing the problem to reappear. > Does Alvaro’s patch presume any of the other patch to set COMMITTED in > the freeze code? I don't know what you mean. Here is the 9.6 version of my patch. Note that HEAP_XMIN_FROZEN (which uses the XMIN_COMMITTED bit as I recall) was introduced in 9.4. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Michael Paquier
Date:
On Fri, Oct 6, 2017 at 7:57 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Michael Paquier wrote: >> On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> +/* >> + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin, >> + * taking into account that the Xmin might have been frozen. >> + */ >> [...] >> + /* >> + * We actually don't know if there's a match, but if the previous tuple >> + * was frozen, we cannot really rely on a perfect match. >> + */ > > I don't know what you had in mind here, Impossible to know if I don't actually send the contents :) > but I tweaked the 9.3 version so that it now looks like this: I wanted to mention that the comments could be reworked. And forgot to suggest some. > /* > * HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage > * > * Given the new version of a tuple after some update, verify whether the > * given Xmax (corresponding to the previous version) matches the tuple's > * Xmin, taking into account that the Xmin might have been frozen after the > * update. > */ > bool > HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup) > { > TransactionId xmin = HeapTupleHeaderGetXmin(htup); > > /* > * If the xmax of the old tuple is identical to the xmin of the new one, > * it's a match. > */ > if (TransactionIdEquals(xmax, xmin)) > return true; > > /* > * When a tuple is frozen, the original Xmin is lost, but we know it's a > * committed transaction. So unless the Xmax is InvalidXid, we don't > * know for certain that there is a match, but there may be one; and we > * must return true so that a HOT chain that is half-frozen can be walked > * correctly. > */ > if (TransactionIdEquals(xmin, FrozenTransactionId) && > TransactionIdIsValid(xmax)) > return true; > > return false; > } Those are clearly improvements. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Michael Paquier
Date:
On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Wood, Dan wrote: >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. >> >> I would prefer to focus on either latest 9X or 11dev. > > I tested my patch on 9.4 and 9.5 today and it seems to close the problem > (with the patch, I waited 10x as many iterations as it took for the > problem to occur ~10 times without the patch), but I can reproduce a > problem in 9.6 with my patch installed. There must be something new in > 9.6 that is causing the problem to reappear. The freeze visibility map has been introduced in 9.6... There could be interactions on this side. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier wrote: > On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Wood, Dan wrote: > >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. > >> > >> I would prefer to focus on either latest 9X or 11dev. > > > > I tested my patch on 9.4 and 9.5 today and it seems to close the problem > > (with the patch, I waited 10x as many iterations as it took for the > > problem to occur ~10 times without the patch), but I can reproduce a > > problem in 9.6 with my patch installed. There must be something new in > > 9.6 that is causing the problem to reappear. > > The freeze visibility map has been introduced in 9.6... There could be > interactions on this side. Ah, thanks for the tip. I hope the authors of that can do the gruntwork of researching this problem there, then. I'll go commit what I have now. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Michael Paquier
Date:
On Fri, Oct 6, 2017 at 10:57 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Ah, thanks for the tip. I hope the authors of that can do the gruntwork > of researching this problem there, then. I have some stuff using 9.6 extensively, so like Dan I think I'll chime in anyway. Not before Tuesday though, long weekend in Japan ahead. > I'll go commit what I have now. As far as I saw this set definitely improves the situation. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Robert, * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote: > Michael Paquier wrote: > > On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > Wood, Dan wrote: > > >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. > > >> > > >> I would prefer to focus on either latest 9X or 11dev. > > > > > > I tested my patch on 9.4 and 9.5 today and it seems to close the problem > > > (with the patch, I waited 10x as many iterations as it took for the > > > problem to occur ~10 times without the patch), but I can reproduce a > > > problem in 9.6 with my patch installed. There must be something new in > > > 9.6 that is causing the problem to reappear. > > > > The freeze visibility map has been introduced in 9.6... There could be > > interactions on this side. > > Ah, thanks for the tip. I hope the authors of that can do the gruntwork > of researching this problem there, then. I'll go commit what I have > now. I don't doubt you're watching this thread too, but just to be 110% sure that we don't end up with the November releases still having this issue, I'm adding you to the CC on this thread as the one who did the freeze visibility map work. Depending on hope here is a bit too squishy for me when we're talking about corruption issues. Thanks! Stephen
By the way, I still wonder if there's any way for a new tuple to get inserted in the place where a HOT redirect would be pointing to, and have it be marked as Frozen, where the old redirect contains a non-invalid Xmax. I tried to think of a way for that to happen, but couldn't think of anything. What I imagine is a sequence like this: 1. insert a tuple 2. HOT-update a tuple 3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple) 4. start transaction 5. HOT-update the tuple again, creating HOT in lp 3 6. abort transaction (leaving aborted update in lp 3) 7. somehow remove tuple from lp 3, make slot available 8. new transaction comes along, inserts tuple in lp 3 9. somebody freezes tuple in lp3 (???) Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that the tuple is part of the chain because of an xid "match". Basically from step 7 onwards I don't think this is possible, but maybe I'm just blind. Maybe we can forestall the problem by checking whether the Xmax TransactionIdIsCurrentTransaction || TransactionIdDidCommit (or some variant thereof). This would be very slow but safer; and in 9.4 and up we'd only need to do it if the xmin value is actually FrozenXid which should be rare (only in pages upgraded from 9.3). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Wood, Dan wrote: >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. >> >> I would prefer to focus on either latest 9X or 11dev. > > I tested my patch on 9.4 and 9.5 today and it seems to close the problem > (with the patch, I waited 10x as many iterations as it took for the > problem to occur ~10 times without the patch), but I can reproduce a > problem in 9.6 with my patch installed. There must be something new in > 9.6 that is causing the problem to reappear. What problem persists? The original one (or, at least, the original symptom of pruning HOT chains incorrectly)? If that's what you mean, I wouldn't be so quick to assume that it's the freeze map. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote: > On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Wood, Dan wrote: > >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. > >> > >> I would prefer to focus on either latest 9X or 11dev. > > > > I tested my patch on 9.4 and 9.5 today and it seems to close the problem > > (with the patch, I waited 10x as many iterations as it took for the > > problem to occur ~10 times without the patch), but I can reproduce a > > problem in 9.6 with my patch installed. There must be something new in > > 9.6 that is causing the problem to reappear. > > What problem persists? The original one (or, at least, the original > symptom of pruning HOT chains incorrectly)? If that's what you mean, I > wouldn't be so quick to assume that it's the freeze map. I can tell that, in 9.6, REINDEX still reports the error we saw in earlier releases, after some of the runs of my reproducer scripts. I'm unable to reproduce it anymore in 9.3 to 9.5. I can't see the one Dan originally reported anywhere, either. I don't know if it's really the freeze map at fault or something else. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Fri, Oct 6, 2017 at 7:59 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > By the way, I still wonder if there's any way for a new tuple to get > inserted in the place where a HOT redirect would be pointing to, and > have it be marked as Frozen, where the old redirect contains a > non-invalid Xmax. I tried to think of a way for that to happen, but > couldn't think of anything. > > What I imagine is a sequence like this: > > 1. insert a tuple > 2. HOT-update a tuple > 3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple) > 4. start transaction > 5. HOT-update the tuple again, creating HOT in lp 3 > 6. abort transaction (leaving aborted update in lp 3) > 7. somehow remove tuple from lp 3, make slot available > 8. new transaction comes along, inserts tuple in lp 3 > 9. somebody freezes tuple in lp3 (???) > > Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that > the tuple is part of the chain because of an xid "match". > > Basically from step 7 onwards I don't think this is possible, but maybe > I'm just blind. For the record, I also think that this is impossible, in part because pruning requires a cleanup buffer lock (and because HOT chains cannot span pages). I wouldn't say that I am 100% confident about this, though. BTW, is this comment block that appears above heap_prepare_freeze_tuple() now obsolete, following 20b65522 (and maybe much earlier commits)? * NB: It is not enough to set hint bits to indicate something is* committed/invalid -- they might not be set on a standby,or after crash* recovery. We really need to remove old xids.*/ We WAL-log setting hint bits during freezing now, iff tuple xmin is before the Xid cutoff and tuple is a heap-only tuple. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Fri, Oct 6, 2017 at 10:49 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I can tell that, in 9.6, REINDEX still reports the error we saw in > earlier releases, after some of the runs of my reproducer scripts. I'm > unable to reproduce it anymore in 9.3 to 9.5. I can't see the one Dan > originally reported anywhere, either. You mean the enhanced stress-test that varied fillfactor, added filler columns, and so on [1]? Can you post that to the list, please? I think that several of us would like to have a reproducible test case. > I don't know if it's really the freeze map at fault or something else. Ideally, it would be possible to effectively disable the new freeze map stuff in a minimal way, for testing purposes. Perhaps the authors of that patch, CC'd, can suggest a way to do that. If I had to guess, I'd say that it's just as likely that the issue is only reproducible on 9.6 because of the enhancements added in that release that improved buffer pinning (the use of atomic ops to pin buffers, moving buffer content locks into buffer descriptors, etc). It was already a bit tricky to get the problem that remained after 20b6552 but before today's a5736bf to reproduce with Dan's script. It often took me 4 or 5 attempts. (I wonder what it looks like with your enhanced version of that script -- the one that I just asked about.) It seems possible that we've merely reduced the window for the race to the point that it's practically (though not theoretically) impossible to reproduce the problem on versions < 9.6, though not on 9.6+. Applying Occam's razor, the problem doesn't seem particularly likely to be in the freeze map stuff, which isn't actually all that closely related. [1] https://postgr.es/m/20171005162402.jahqflf3mekileqm@alvherre.pgsql -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan <pg@bowt.ie> wrote: >> I don't know if it's really the freeze map at fault or something else. > > Ideally, it would be possible to effectively disable the new freeze > map stuff in a minimal way, for testing purposes. Perhaps the authors > of that patch, CC'd, can suggest a way to do that. Actually, the simplest thing might be to just use pg_visibility's pg_check_frozen() to check that the visibility/freeze map accurately summarizes the all-frozen status of tuples in the heap. If that doesn't indicate that there is corruption, we can be fairly confident that the problem is elsewhere. The metadata in the visibility/freeze map should be accurate when a bit is set to indicate that an entire heap page is all-frozen (or, separately, all-visible). We can hardly expect it to have better information that the authoritative source of truth, the heap itself. The more I think about it, the more I tend to doubt that the remaining problems are with the freeze map. If the freeze map was wrong, and incorrectly said that a page was all-frozen, then surely the outward symptoms would take a long time to show up, as they always do when we accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM that that's the only meaningful way that the freeze map can be wrong -- it only promises to be accurate when it says that no further freezing is needed for a page/bit. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
>On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan <pg@bowt.ie> wrote: >>> I don't know if it's really the freeze map at fault or something else. >> >> Ideally, it would be possible to effectively disable the new freeze >> map stuff in a minimal way, for testing purposes. Perhaps the authors > of that patch, CC'd, can suggest a way to do that. >Actually, the simplest thing might be to just use pg_visibility's >pg_check_frozen() to check that the visibility/freeze map accurately >summarizes the all-frozen status of tuples in the heap. If that >doesn't indicate that there is corruption, we can be fairly confident >that the problem is elsewhere. The metadata in the visibility/freeze >map should be accurate when a bit is set to indicate that an entire >heap page is all-frozen (or, separately, all-visible). We can hardly >expect it to have better information that the authoritative source of >truth, the heap itself. >The more I think about it, the more I tend to doubt that the remaining >problems are with the freeze map. If the freeze map was wrong, and >incorrectly said that a page was all-frozen, then surely the outward >symptoms would take a long time to show up, as they always do when we >accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM >that that's the only meaningful way that the freeze map can be wrong >-- it only promises to be accurate when it says that no further >freezing is needed for a page/bit. Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6. None of the all-frozen or all-visible bits are necessarily set in problematic pages. ERROR: failed to find parent tuple for heap-only tuple at (0,182) in table "accounts" blkno | all_visible | all_frozen | pd_all_visible -------+-------------+------------+---------------- 0 | f | f | f (1 row) Even when the bits were set, I haven't found issues with the pg_check_xxx functions in the dozens of times I've run them. postgres=# select * from pg_visibility('accounts'::regclass);blkno | all_visible | all_frozen | pd_all_visible -------+-------------+------------+---------------- 0 | f | f | f 1 | t | t |t (2 rows) postgres=# select pg_check_visible('accounts'::regclass);pg_check_visible ------------------ (0 rows) postgres=# select pg_check_frozen('accounts'::regclass);pg_check_frozen ----------------- (0 rows) Yi Wen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen <yiwong@amazon.com> wrote: > Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6. > None of the all-frozen or all-visible bits are necessarily set in problematic pages. Since this happened yesterday, I assume it was with an unfixed version? As you must have seen, Alvaro said he has a variant of Dan's original script that demonstrates that a problem remains, at least on 9.6+, even with today's fix. I think it's the stress-test that plays with fillfactor, many clients, etc [1]. I've tried to independently reproduce the problem on the master branch's current tip, with today's new fix, but cannot break things despite trying many variations. I cannot reproduce the problem that Alvaro still sees. I'll have to wait until Alvaro posts his repro to the list before commenting further, which I assume he'll post as soon as he can. There doesn't seem to be much point in not waiting for that. [1] https://postgr.es/m/20171005162402.jahqflf3mekileqm@alvherre.pgsql -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote: > On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen <yiwong@amazon.com> wrote: > > Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6. > > None of the all-frozen or all-visible bits are necessarily set in problematic pages. > > Since this happened yesterday, I assume it was with an unfixed version? > > As you must have seen, Alvaro said he has a variant of Dan's original > script that demonstrates that a problem remains, at least on 9.6+, > even with today's fix. I think it's the stress-test that plays with > fillfactor, many clients, etc [1]. I just execute setup.sql once and then run this shell command, while :; do psql -e -P pager=off -f ./repro.sql for i in `seq 1 5`; do psql -P pager=off -e --no-psqlrc -f ./lock.sql & done wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql psql -P pager=off -e --no-psqlrc -f ./report.sql echo "done" done Note that you need to use pg10's psql because of the \if lines in lock.sql. For some runs I change the values to compare random() to, and originally the commented out section in lock.sql was not commented out, but I'm fairly sure the failures I saw where with this version. Also, I sometime change the 5 in the `seq` command to higher values (180, 250). I didn't find the filler column to have any effect, so I took that out. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> As you must have seen, Alvaro said he has a variant of Dan's original >> script that demonstrates that a problem remains, at least on 9.6+, >> even with today's fix. I think it's the stress-test that plays with >> fillfactor, many clients, etc [1]. > > I just execute setup.sql once and then run this shell command, > > while :; do > psql -e -P pager=off -f ./repro.sql > for i in `seq 1 5`; do > psql -P pager=off -e --no-psqlrc -f ./lock.sql & > done > wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql > psql -P pager=off -e --no-psqlrc -f ./report.sql > echo "done" > done I cannot reproduce the problem on my personal machine using this script/stress-test. I tried to do so on the master branch git tip. This reinforces the theory that there is some timing sensitivity, because the remaining race condition is very narrow. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote: > On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >> As you must have seen, Alvaro said he has a variant of Dan's original > >> script that demonstrates that a problem remains, at least on 9.6+, > >> even with today's fix. I think it's the stress-test that plays with > >> fillfactor, many clients, etc [1]. > > > > I just execute setup.sql once and then run this shell command, > > > > while :; do > > psql -e -P pager=off -f ./repro.sql > > for i in `seq 1 5`; do > > psql -P pager=off -e --no-psqlrc -f ./lock.sql & > > done > > wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql > > psql -P pager=off -e --no-psqlrc -f ./report.sql > > echo "done" > > done > > I cannot reproduce the problem on my personal machine using this > script/stress-test. I tried to do so on the master branch git tip. > This reinforces the theory that there is some timing sensitivity, > because the remaining race condition is very narrow. Hmm, I think I added a random sleep (max. 100ms) right after the HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that makes the race easier to hit. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Peter Geoghegan
Date:
On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Hmm, I think I added a random sleep (max. 100ms) right after the > HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that > makes the race easier to hit. I still cannot reproduce. Perhaps you can be more specific? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I’m unclear on what is being repro’d in 9.6. Are you getting the duplicate rows problem or just the reindex problem? Areyou testing with asserts enabled(I’m not)? If you are getting the dup rows consider the code in the block in heapam.c that starts with the comment “replace multi byupdate xid”. When I repro this I find that MultiXactIdGetUpdateXid() returns 0. There is an updater in the multixact array however thestatus is MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I assume this is a preliminary status beforethe following row in the hot chain has it’s multixact set to NoKeyUpdate. Since a 0 is returned this does precede cutoff_xid and TransactionIdDidCommit(0) will return false. This ends up abortingthe multixact on the row even though the real xid is committed. This sets XMAX to 0 and that row becomes visibleas one of the dups. Interestingly the real xid of the updater is 122944 and the cutoff_xid is 122945. I’m still debugging but I start late so I’m passing this incomplete info along now. On 10/7/17, 4:25 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote: Peter Geoghegan wrote: > On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >> As youmust have seen, Alvaro said he has a variant of Dan's original > >> script that demonstrates that a problem remains,at least on 9.6+, > >> even with today's fix. I think it's the stress-test that plays with > >> fillfactor, manyclients, etc [1]. > > > > I just execute setup.sql once and then run this shell command, > > > > while :; do > > psql -e -P pager=off -f ./repro.sql > > for i in `seq 1 5`; do > > psql -P pager=off-e --no-psqlrc -f ./lock.sql & > > done > > wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql > > psql -P pager=off -e --no-psqlrc -f ./report.sql > > echo "done" > > done > >I cannot reproduce the problem on my personal machine using this > script/stress-test. I tried to do so on the masterbranch git tip. > This reinforces the theory that there is some timing sensitivity, > because the remaining racecondition is very narrow. Hmm, I think I added a random sleep (max. 100ms) right after the HeapTupleSatisfiesVacuumcall in vacuumlazy.c (lazy_scan_heap), and that makes the race easier to hit. -- ÁlvaroHerrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training &Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Michael Paquier
Date:
On Mon, Oct 9, 2017 at 2:29 AM, Peter Geoghegan <pg@bowt.ie> wrote: > On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> Hmm, I think I added a random sleep (max. 100ms) right after the >> HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that >> makes the race easier to hit. > > I still cannot reproduce. Perhaps you can be more specific? I have been trying to reproduce things for a total of 4 hours, testing various variations of the proposed test cases (killed Postgres, changed fillfactor, manual sleep calls), but I am proving unable to see a regression as well. I would not think that the OS matters here, all my attempts were on macos with assertions and debugging enabled. At least the code is now more stable, which is definitely a good thing. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Wood, Dan wrote: > I’m unclear on what is being repro’d in 9.6. Are you getting the > duplicate rows problem or just the reindex problem? Are you testing > with asserts enabled(I’m not)? I was seeing just the reindex problem. I don't see any more dups. But I've tried to reproduce it afresh now, and let it run for a long time and nothing happened. Maybe I made a mistake last week and ran an unfixed version. I don't see any more problems now. > If you are getting the dup rows consider the code in the block in > heapam.c that starts with the comment “replace multi by update xid”. > > When I repro this I find that MultiXactIdGetUpdateXid() returns 0. > There is an updater in the multixact array however the status is > MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I > assume this is a preliminary status before the following row in the > hot chain has it’s multixact set to NoKeyUpdate. Yes, the "For" version is the locker version rather than the actual update. That lock is acquired by EvalPlanQual locking the row just before doing the update. I think GetUpdateXid has no reason to return such an Xid, since it's not an update. > Since a 0 is returned this does precede cutoff_xid and > TransactionIdDidCommit(0) will return false. This ends up aborting > the multixact on the row even though the real xid is committed. This > sets XMAX to 0 and that row becomes visible as one of the dups. > Interestingly the real xid of the updater is 122944 and the cutoff_xid > is 122945. I haven't seen this effect. Please keep us updated if you're able to verify corruption this way. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Michael Paquier
Date:
On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I was seeing just the reindex problem. I don't see any more dups. > > But I've tried to reproduce it afresh now, and let it run for a long > time and nothing happened. Maybe I made a mistake last week and > ran an unfixed version. I don't see any more problems now. Okay, so that's one person more going to this trend, making three with Peter and I. >> If you are getting the dup rows consider the code in the block in >> heapam.c that starts with the comment “replace multi by update xid”. >> >> When I repro this I find that MultiXactIdGetUpdateXid() returns 0. >> There is an updater in the multixact array however the status is >> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I >> assume this is a preliminary status before the following row in the >> hot chain has it’s multixact set to NoKeyUpdate. > > Yes, the "For" version is the locker version rather than the actual > update. That lock is acquired by EvalPlanQual locking the row just > before doing the update. I think GetUpdateXid has no reason to return > such an Xid, since it's not an update. > >> Since a 0 is returned this does precede cutoff_xid and >> TransactionIdDidCommit(0) will return false. This ends up aborting >> the multixact on the row even though the real xid is committed. This >> sets XMAX to 0 and that row becomes visible as one of the dups. >> Interestingly the real xid of the updater is 122944 and the cutoff_xid >> is 122945. > > I haven't seen this effect. Please keep us updated if you're able to > verify corruption this way. Me neither. It would be nice to not live long with such a sword of Damocles. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
I found one glitch with our merge of the original dup row fix. With that corrected AND Alvaro’s Friday fix things are solid. No dup’s. No index corruption. Thanks so much. On 10/10/17, 7:25 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote: On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I was seeing just the reindexproblem. I don't see any more dups. > > But I've tried to reproduce it afresh now, and let it run for a long > time and nothing happened. Maybe I made a mistake last week and > ran an unfixed version. I don't see any moreproblems now. Okay, so that's one person more going to this trend, making three with Peter and I. >> If youare getting the dup rows consider the code in the block in >> heapam.c that starts with the comment “replace multi byupdate xid”. >> >> When I repro this I find that MultiXactIdGetUpdateXid() returns 0. >> There is an updater in themultixact array however the status is >> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I >> assumethis is a preliminary status before the following row in the >> hot chain has it’s multixact set to NoKeyUpdate. > > Yes, the "For" version is the locker version rather than the actual > update. That lock is acquiredby EvalPlanQual locking the row just > before doing the update. I think GetUpdateXid has no reason to return > such an Xid, since it's not an update. > >> Since a 0 is returned this does precede cutoff_xid and >> TransactionIdDidCommit(0)will return false. This ends up aborting >> the multixact on the row even though the real xidis committed. This >> sets XMAX to 0 and that row becomes visible as one of the dups. >> Interestingly the real xidof the updater is 122944 and the cutoff_xid >> is 122945. > > I haven't seen this effect. Please keep us updatedif you're able to > verify corruption this way. Me neither. It would be nice to not live long with such a swordof Damocles. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
From
Michael Paquier
Date:
On Wed, Oct 11, 2017 at 11:31 AM, Wood, Dan <hexpert@amazon.com> wrote: > I found one glitch with our merge of the original dup row fix. With that corrected AND Alvaro’s Friday fix things aresolid. > No dup’s. No index corruption. > > Thanks so much. Nice to hear that! You guys seem to be doing extensive testing and actually report back about it, which is really nice to see. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers