Thread: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Alvaro Herrera

Date:

03 October 2017, 19:48:20

So here's my attempt at an explanation for what is going on.  At one
point, we have this:

select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 
----+----------+--------+--------+--------+----------+----------- 1 |        1 |      2 |      0 | (0,1)  | 902      |
32 |        0 |        |        |        |          |  3 |        1 |      2 |  19928 | (0,4)  | 3142     | c003 4 |
   1 |  14662 |  19929 | (0,5)  | 3142     | c003 5 |        1 |  14663 |  19931 | (0,6)  | 3142     | c003 6 |
1|  14664 |  19933 | (0,7)  | 3142     | c003 7 |        1 |  14665 |      0 | (0,7)  | 2902     | 8003
 
(7 filas)

which shows a HOT-update chain, where the t_xmax are multixacts.  Then a
vacuum freeze comes, and because the multixacts are below the freeze
horizon for multixacts, we get this:

select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 
----+----------+--------+--------+--------+----------+----------- 1 |        1 |      2 |      0 | (0,1)  | 902      |
32 |        0 |        |        |        |          |  3 |        1 |      2 |  14662 | (0,4)  | 2502     | c003 4 |
   1 |      2 |  14663 | (0,5)  | 2502     | c003 5 |        1 |      2 |  14664 | (0,6)  | 2502     | c003 6 |
1|      2 |  14665 | (0,7)  | 2502     | c003 7 |        1 |      2 |      0 | (0,7)  | 2902     | 8003
 
(7 filas)

where the xmin values have all been frozen, and the xmax values are now
regular Xids.  I think the HOT code that walks the chain fails to detect
these as chains, because the xmin values no longer match the xmax
values.  I modified the multixact freeze code, so that whenever the
update Xid is below the cutoff Xid, it's set to FrozenTransactionId,
since keeping the other value is invalid anyway (even though we have set
the HEAP_XMAX_COMMITTED flag).  But that still doesn't fix the problem;
as far as I can see, vacuum removes the root of the chain, not yet sure
why, and then things are just as corrupted as before.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

03 October 2017, 20:01:58

On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> which shows a HOT-update chain, where the t_xmax are multixacts.  Then a
> vacuum freeze comes, and because the multixacts are below the freeze
> horizon for multixacts, we get this:
>
> select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
> to_hex(t_infomask2) as infomask2
> from heap_page_items(get_raw_page('t', 0));
>  lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
> ----+----------+--------+--------+--------+----------+-----------
>   1 |        1 |      2 |      0 | (0,1)  | 902      | 3
>   2 |        0 |        |        |        |          |
>   3 |        1 |      2 |  14662 | (0,4)  | 2502     | c003
>   4 |        1 |      2 |  14663 | (0,5)  | 2502     | c003
>   5 |        1 |      2 |  14664 | (0,6)  | 2502     | c003
>   6 |        1 |      2 |  14665 | (0,7)  | 2502     | c003
>   7 |        1 |      2 |      0 | (0,7)  | 2902     | 8003
> (7 filas)
>
> where the xmin values have all been frozen, and the xmax values are now
> regular Xids.

I thought that we no longer store FrozenTransactionId (xid 2) as our
"raw" xmin while freezing, and yet that's what we see here. It looks
like pageinspect is looking at raw xmin (it's calling
HeapTupleHeaderGetRawXmin(), not HeapTupleHeaderGetXmin()). What's the
deal with that? Shouldn't the original xmin be preserved for forensic
analysis, following commit 37484ad2?

I guess what you mean is that this is what you see having modified the
code to actually store FrozenTransactionId as xmin once more, in an
effort to fix this?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

04 October 2017, 02:10:20

On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> But that still doesn't fix the problem;
> as far as I can see, vacuum removes the root of the chain, not yet sure
> why, and then things are just as corrupted as before.

Are you sure it's not opportunistic pruning? Another thing that I've
noticed with this problem is that the relevant IndexTuple will pretty
quickly vanish, presumably due to LP_DEAD setting (but maybe not
actually due to LP_DEAD setting).

(Studies the problem some more...)

I now think that it actually is a VACUUM problem, specifically a
problem with VACUUM pruning. You see the HOT xmin-to-xmax check
pattern that you mentioned within heap_prune_chain(), which looks like
where the incorrect tuple prune (or possibly, at times, redirect?)
takes place. (I refer to the prune/kill that you mentioned today, that
frustrated your first attempt at a fix -- "I modified the multixact
freeze code...".)

The attached patch "fixes" the problem -- I cannot get amcheck to
complain about corruption with this applied. And, "make check-world"
passes. Hopefully it goes without saying that this isn't actually my
proposed fix. It tells us something that this at least *masks* the
problem, though; it's a start.

FYI, the repro case page contents looks like this with the patch applied:

postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
 lp | lp_flags | t_xmin  | t_xmax | t_ctid | infomask | infomask2
----+----------+---------+--------+--------+----------+-----------
  1 |        1 | 1845995 |      0 | (0,1)  | b02      | 3
  2 |        2 |         |        |        |          |
  3 |        0 |         |        |        |          |
  4 |        0 |         |        |        |          |
  5 |        0 |         |        |        |          |
  6 |        0 |         |        |        |          |
  7 |        1 | 1846001 |      0 | (0,7)  | 2b02     | 8003
(7 rows)

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

suppress-bad-prune.patch

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Michael Paquier

Date:

04 October 2017, 03:15:44

On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> I now think that it actually is a VACUUM problem, specifically a
> problem with VACUUM pruning. You see the HOT xmin-to-xmax check
> pattern that you mentioned within heap_prune_chain(), which looks like
> where the incorrect tuple prune (or possibly, at times, redirect?)
> takes place. (I refer to the prune/kill that you mentioned today, that
> frustrated your first attempt at a fix -- "I modified the multixact
> freeze code...".)

My lookup of the problem converges to the same conclusion. Something
is wrong with the vacuum's pruning. I have spent some time trying to
design a patch, all the solutions I tried have proved to make the
problem harder to show up, but it still showed up, sometimes after
dozens of attempts.

> The attached patch "fixes" the problem -- I cannot get amcheck to
> complain about corruption with this applied. And, "make check-world"
> passes. Hopefully it goes without saying that this isn't actually my
> proposed fix. It tells us something that this at least *masks* the
> problem, though; it's a start.

Yep.

> FYI, the repro case page contents looks like this with the patch applied:
> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
> to_hex(t_infomask) as infomask,
> to_hex(t_infomask2) as infomask2
> from heap_page_items(get_raw_page('t', 0));
>  lp | lp_flags | t_xmin  | t_xmax | t_ctid | infomask | infomask2
> ----+----------+---------+--------+--------+----------+-----------
>   1 |        1 | 1845995 |      0 | (0,1)  | b02      | 3
>   2 |        2 |         |        |        |          |
>   3 |        0 |         |        |        |          |
>   4 |        0 |         |        |        |          |
>   5 |        0 |         |        |        |          |
>   6 |        0 |         |        |        |          |
>   7 |        1 | 1846001 |      0 | (0,7)  | 2b02     | 8003
> (7 rows)

Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
what would look correct to me.

-         * Check the tuple XMIN against prior XMAX, if any
-         */
-        if (TransactionIdIsValid(priorXmax) &&
-            !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
-            break;
If you remove this check, you could also remove completely priorXmax.

Actually, I may be missing something, but why is priorXmax updated
even for dead tuples? For example just doing that is also taking care
of the problem:
@@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer,
OffsetNumber rootoffnum,       Assert(ItemPointerGetBlockNumber(&htup->t_ctid) ==
BufferGetBlockNumber(buffer));      offnum = ItemPointerGetOffsetNumber(&htup->t_ctid);
 
-       priorXmax = HeapTupleHeaderGetUpdateXid(htup);
+       if (!tupdead)
+           priorXmax = HeapTupleHeaderGetUpdateXid(htup);   }
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

"Wood, Dan"

Date:

04 October 2017, 04:09:53

I’ve just started looking at this again after a few weeks break.

There is a tangled web of issues here.  With the community fix we get a corrupted page(invalid redirect ptr from
indexeditem).  The cause of that is:
 
pruneheap.c:
                 /*                  * Check the tuple XMIN against prior XMAX, if any                  */
  if (TransactionIdIsValid(priorXmax) &&                         !TransactionIdEquals(HeapTupleHeaderGetXmin(htup),
priorXmax))                        break;
 
chainitems[nchain++] = offnum;

The priorXmax is a multixact key share lock, thus not equal to xmin.  This terminates the scan.  Unlike the scan
terminationwith “if (!recent_dead) break;” below the switch (...SatisiesVacuum…), this “break” does not put the offnum
intothe chain even though it is in the chain.  If the first not-deleted item isn’t put in the chain then we’ll not call
heap_prune_record_redirect().

I do not know what the above ‘if’ test is protecting.  Normally the xmin is equal to the priorXmax.  Other than
protectingagainst corruption a key share lock can cause this.  So, I tried a fix which does the “if” check after adding
itto chainitems.  This will break whatever real situation this IF was protecting.  Tom Lane put this in.
 

OK:  With that hack of a fix the redirect now works correctly.  However, we still get the reindex problem with not
findingthe parent.  That problem is caused by:
 
Pruneheap.c:heap_get_root_tuples()
    if (TransactionIdIsValid(priorXmax) &&        !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
break;

In this case, instead of these not being equal because of a multixact key share lock, it is because XMIN is frozen and
FrozenTransactionIddoesn’t equal the priorXmax.  Thus, we don’t build the entire chain from root to most current row
versionand this causes the reindex failure.
 

If we disable this ‘if’ as a hack then we no longer get a problem on the reindex.  However, YiWen reported that at the
endof an install check out index checking reported corruption in the system catalogues.  So we are still looking.
 

We need to understand why these TXID equal checks exist.  Can we differentiate the cases they are protecting against
withthe two exceptions I’ve found?
 

FYI, someone should look at the same ”if”  test in heapam.c: heap_lock_updated_tuple_rec().  Also, I hope there are no
strangeissues with concurrent index builds.
 

Finally, the idea behind the original fix was to simply NOT to do an unsupported freeze on a dead tuple.  It had two
drawbacks:
1) CLOG truncation.  This could have been handled by keeping track of the old unpruned item found and using that to
updatethe table’s/DB’s freeze xid.
 
2) Not making freeze progress.   The only reason the prune would fail should be because of an open TXN.  Unless that
TXNwas so old such that it’s XID was as old as the ?min freeze threshold? then we would make progress.  If we were
doingTXN’s that old then we’d be having problems anyway.
 


On 10/3/17, 5:15 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
   On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan <pg@bowt.ie> wrote:   > I now think that it actually is a VACUUM
problem,specifically a   > problem with VACUUM pruning. You see the HOT xmin-to-xmax check   > pattern that you
mentionedwithin heap_prune_chain(), which looks like   > where the incorrect tuple prune (or possibly, at times,
redirect?)  > takes place. (I refer to the prune/kill that you mentioned today, that   > frustrated your first attempt
ata fix -- "I modified the multixact   > freeze code...".)      My lookup of the problem converges to the same
conclusion.Something   is wrong with the vacuum's pruning. I have spent some time trying to   design a patch, all the
solutionsI tried have proved to make the   problem harder to show up, but it still showed up, sometimes after   dozens
ofattempts.      > The attached patch "fixes" the problem -- I cannot get amcheck to   > complain about corruption with
thisapplied. And, "make check-world"   > passes. Hopefully it goes without saying that this isn't actually my   >
proposedfix. It tells us something that this at least *masks* the   > problem, though; it's a start.      Yep.      >
FYI,the repro case page contents looks like this with the patch applied:   > postgres=# select lp, lp_flags, t_xmin,
t_xmax,t_ctid,   > to_hex(t_infomask) as infomask,   > to_hex(t_infomask2) as infomask2   > from
heap_page_items(get_raw_page('t',0));   >  lp | lp_flags | t_xmin  | t_xmax | t_ctid | infomask | infomask2   >
----+----------+---------+--------+--------+----------+-----------  >   1 |        1 | 1845995 |      0 | (0,1)  | b02
   | 3   >   2 |        2 |         |        |        |          |   >   3 |        0 |         |        |        |
    |   >   4 |        0 |         |        |        |          |   >   5 |        0 |         |        |        |
   |   >   6 |        0 |         |        |        |          |   >   7 |        1 | 1846001 |      0 | (0,7)  | 2b02
  | 8003   > (7 rows)      Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is   what would look
correctto me.      -         * Check the tuple XMIN against prior XMAX, if any   -         */   -        if
(TransactionIdIsValid(priorXmax)&&   -            !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))   -
       break;   If you remove this check, you could also remove completely priorXmax.      Actually, I may be missing
something,but why is priorXmax updated   even for dead tuples? For example just doing that is also taking care   of the
problem:  @@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer,   OffsetNumber rootoffnum,
Assert(ItemPointerGetBlockNumber(&htup->t_ctid)==                  BufferGetBlockNumber(buffer));           offnum =
ItemPointerGetOffsetNumber(&htup->t_ctid);  -       priorXmax = HeapTupleHeaderGetUpdateXid(htup);   +       if
(!tupdead)  +           priorXmax = HeapTupleHeaderGetUpdateXid(htup);       }   --    Michael      
 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

04 October 2017, 04:13:24

On Tue, Oct 3, 2017 at 5:15 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> FYI, the repro case page contents looks like this with the patch applied:
>> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
>> to_hex(t_infomask) as infomask,
>> to_hex(t_infomask2) as infomask2
>> from heap_page_items(get_raw_page('t', 0));
>>  lp | lp_flags | t_xmin  | t_xmax | t_ctid | infomask | infomask2
>> ----+----------+---------+--------+--------+----------+-----------
>>   1 |        1 | 1845995 |      0 | (0,1)  | b02      | 3
>>   2 |        2 |         |        |        |          |
>>   3 |        0 |         |        |        |          |
>>   4 |        0 |         |        |        |          |
>>   5 |        0 |         |        |        |          |
>>   6 |        0 |         |        |        |          |
>>   7 |        1 | 1846001 |      0 | (0,7)  | 2b02     | 8003
>> (7 rows)
>
> Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
> what would look correct to me.

Yes, it is:

postgres=# select * from bt_page_items('foo', 1);itemoffset | ctid  | itemlen | nulls | vars |          data
------------+-------+---------+-------+------+-------------------------         1 | (0,1) |      16 | f     | f    | 01
0000 00 00 00 00 00         2 | (0,2) |      16 | f     | f    | 03 00 00 00 00 00 00 00
 
(2 rows)

I can tell from looking at my hex editor that the 4 bytes of ItemId
that we see for position '(0,2)' in the ItemId array are "07 00 01
00", meaning that '(0,2)' this is a LP_REDIRECT item, repointing us to
'(0,7)'. Everything here looks sane to me, at least at first blush.

> -         * Check the tuple XMIN against prior XMAX, if any
> -         */
> -        if (TransactionIdIsValid(priorXmax) &&
> -            !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
> -            break;
> If you remove this check, you could also remove completely priorXmax.
>
> Actually, I may be missing something, but why is priorXmax updated
> even for dead tuples? For example just doing that is also taking care
> of the problem:

I'll study what you suggest here some more tomorrow.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

04 October 2017, 04:19:56

On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote:
> I’ve just started looking at this again after a few weeks break.

>                 if (TransactionIdIsValid(priorXmax) &&
>                         !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
>                         break;

> We need to understand why these TXID equal checks exist.  Can we differentiate the cases they are protecting against
withthe two exceptions I’ve found? 

I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".


--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

"Wood, Dan"

Date:

04 October 2017, 04:25:12

One minor side note…   Is it weird for xmin/xmax to go backwards in a hot row chain?

lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax 
----+--------+--------+------------+-------------+--------+-------- 1 | (0,1)  |   8152 |       2818 |           3 |
36957|      0 2 |        |      5 |            |             |        |        3 |        |      0 |            |
     |        |        4 |        |      0 |            |             |        |        5 | (0,6)  |   8112 |
9986|       49155 |  36962 |  36963 6 | (0,7)  |   8072 |       9986 |       49155 |  36963 |  36961 7 | (0,7)  |
8032|      11010 |       32771 |  36961 |      0
 
(7 rows)


On 10/3/17, 6:20 PM, "Peter Geoghegan" <pg@bowt.ie> wrote:
   On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote:   > I’ve just started looking at this again
aftera few weeks break.      >                 if (TransactionIdIsValid(priorXmax) &&   >
!TransactionIdEquals(priorXmax,HeapTupleHeaderGetXmin(htup)))   >                         break;      > We need to
understandwhy these TXID equal checks exist.  Can we differentiate the cases they are protecting against with the two
exceptionsI’ve found?      I haven't read your remarks here in full, since I'm about to stop   working for the day, but
Iwill point out that   src/backend/access/heap/README.HOT says a fair amount about this,   under "Abort Cases".
--   Peter Geoghegan      
 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

"Wong, Yi Wen"

Date:

04 October 2017, 06:43:00

My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition
shouldbe: 

>                 if (TransactionIdIsValid(priorXmax) &&
>                         !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
>                         break;

So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the
transactionhappened 

The interesting consequence of changing that is the prune seems to get the entire chain altogether with Dan's repro...
I'verun it a couple of times and have consistently gotten the following page 
lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2
----+--------+--------+----------+------------+------------- 1 | (0,1)  |   8152 |        1 |       2818 |           3
2|        |      7 |        2 |            |             3 |        |      0 |        0 |            |             4 |
     |      0 |        0 |            |             5 |        |      0 |        0 |            |             6 |
|      0 |        0 |            |             7 | (0,7)  |   8112 |        1 |      11010 |       32771 
(7 rows)

I've made this change to conditions in both heap_prune_chain and heap_get_root_tuples and this seems to cause things to
passmy smoke tests. 

I'll look into this deeper tomorrow.

Yi Wen
________________________________________
From: Peter Geoghegan <pg@bowt.ie>
Sent: Tuesday, October 3, 2017 6:19 PM
To: Wood, Dan
Cc: Michael Paquier; Alvaro Herrera; PostgreSQL Hackers; Wong, Yi Wen
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote:
> I’ve just started looking at this again after a few weeks break.

>                 if (TransactionIdIsValid(priorXmax) &&
>                         !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
>                         break;

> We need to understand why these TXID equal checks exist.  Can we differentiate the cases they are protecting against
withthe two exceptions I’ve found? 

I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".


--
Peter Geoghegan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Alvaro Herrera

Date:

04 October 2017, 12:40:57

Peter Geoghegan wrote:

> I thought that we no longer store FrozenTransactionId (xid 2) as our
> "raw" xmin while freezing, and yet that's what we see here.

I'm doing this in 9.3.  I can't tell if the new tuple freezing stuff
broke things more thoroughly, but it is already broken in earlier
releases.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

04 October 2017, 12:51:59

Wood, Dan wrote:

> There is a tangled web of issues here.  With the community fix we get a corrupted page(invalid redirect ptr from
indexeditem).  The cause of that is:
 
> pruneheap.c:
> 
>                   /*
>                    * Check the tuple XMIN against prior XMAX, if any
>                    */
>                   if (TransactionIdIsValid(priorXmax) &&
>                           !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
>                           break;
> 
>     chainitems[nchain++] = offnum;
> 
> The priorXmax is a multixact key share lock,

Uhh, what?  That certainly shouldn't happen -- the priorXmax comes from
        priorXmax = HeapTupleHeaderGetUpdateXid(htup);

so only the XID of the update itself should be reported, not a multixact
and certainly not just a tuple lock XID.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

04 October 2017, 12:54:29

Wood, Dan wrote:
> One minor side note…   Is it weird for xmin/xmax to go backwards in a hot row chain?
> 
> lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax 
> ----+--------+--------+------------+-------------+--------+--------
>   1 | (0,1)  |   8152 |       2818 |           3 |  36957 |      0
>   2 |        |      5 |            |             |        |       
>   3 |        |      0 |            |             |        |       
>   4 |        |      0 |            |             |        |       
>   5 | (0,6)  |   8112 |       9986 |       49155 |  36962 |  36963
>   6 | (0,7)  |   8072 |       9986 |       49155 |  36963 |  36961
>   7 | (0,7)  |   8032 |      11010 |       32771 |  36961 |      0
> (7 rows)

No, it just means transaction A got its XID before transaction B, but B
executed the update first and A updated the tuple second.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

04 October 2017, 15:06:09

Alvaro Herrera wrote:
> Peter Geoghegan wrote:
> 
> > I thought that we no longer store FrozenTransactionId (xid 2) as our
> > "raw" xmin while freezing, and yet that's what we see here.
> 
> I'm doing this in 9.3.  I can't tell if the new tuple freezing stuff
> broke things more thoroughly, but it is already broken in earlier
> releases.

In fact, I think in 9.3 we should include this patch, to set the Xmin to
FrozenXid.  9.4 and onwards have commit 37484ad2a "Change the way we
mark tuples as frozen" which uses a combination of infomask bits, but in
9.3 I think leaving the unfrozen value in the xmax field is a bad idea
even if we set the HEAP_XMAX_COMMITTED bit.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

fix-multi-freezing.patch

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

04 October 2017, 16:46:05

Wong, Yi Wen wrote:
> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition
shouldbe:

> 
> >                 if (TransactionIdIsValid(priorXmax) &&
> >                         !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
> >                         break;
> 
> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the
transactionhappened

I independently arrived at the same conclusion.  Since I was trying with
9.3, the patch differs -- in the old version we must explicitely test
for the FrozenTransactionId value, instead of using GetRawXmin.
Attached is the patch I'm using, and my own oneliner test (pretty much
the same I posted earlier) seems to survive dozens of iterations without
showing any problem in REINDEX.

This patch is incomplete, since I think there are other places that need
to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
Of course, for 9.4 and onwards we need to patch like you described.

This bit in EvalPlanQualFetch caught my attention ... why is it saying
xmin never changes?  It does change with freezing.

            /*
             * If xmin isn't what we're expecting, the slot must have been
             * recycled and reused for an unrelated tuple.  This implies that
             * the latest version of the row was deleted, so we need do
             * nothing.  (Should be safe to examine xmin without getting
             * buffer's content lock, since xmin never changes in an existing
             * tuple.)
             */
            if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
                                     priorXmax))

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

93-fix.patch

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

04 October 2017, 18:40:32

On Tue, Oct 3, 2017 at 8:43 PM, Wong, Yi Wen <yiwong@amazon.com> wrote:
> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition
shouldbe:
 
>
>>                 if (TransactionIdIsValid(priorXmax) &&
>>                         !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
>>                         break;
>
> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the
transactionhappened
 

I was thinking along similar lines.

> The interesting consequence of changing that is the prune seems to get the entire chain altogether with Dan's
repro...I've run it a couple of times and have consistently gotten the following page
 
>
>  lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2
> ----+--------+--------+----------+------------+-------------
>   1 | (0,1)  |   8152 |        1 |       2818 |           3
>   2 |        |      7 |        2 |            |
>   3 |        |      0 |        0 |            |
>   4 |        |      0 |        0 |            |
>   5 |        |      0 |        0 |            |
>   6 |        |      0 |        0 |            |
>   7 | (0,7)  |   8112 |        1 |      11010 |       32771
> (7 rows)

That's also what I see. This is a good thing, I think; that's how we
ought to prune.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

"Wood, Dan"

Date:

04 October 2017, 21:00:41

Sorry.  Should have looked at the macro.  I sent this too soon.  The early “break;” here is likely the xmin frozen
reasonas I found in the other loop. 
 

On 10/4/17, 2:52 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:

    Wood, Dan wrote:
    
    > There is a tangled web of issues here.  With the community fix we get a corrupted page(invalid redirect ptr from
indexeditem).  The cause of that is:
 
    > pruneheap.c:
    > 
    >                   /*
    >                    * Check the tuple XMIN against prior XMAX, if any
    >                    */
    >                   if (TransactionIdIsValid(priorXmax) &&
    >                           !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
    >                           break;
    > 
    >     chainitems[nchain++] = offnum;
    > 
    > The priorXmax is a multixact key share lock,
    
    Uhh, what?  That certainly shouldn't happen -- the priorXmax comes from
    
                priorXmax = HeapTupleHeaderGetUpdateXid(htup);
    
    so only the XID of the update itself should be reported, not a multixact
    and certainly not just a tuple lock XID.
    
    -- 
    Álvaro Herrera                https://www.2ndQuadrant.com/
    PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
    

On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote:
> I’ve just started looking at this again after a few weeks break.

>                 if (TransactionIdIsValid(priorXmax) &&
>                         !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
>                         break;

> We need to understand why these TXID equal checks exist.  Can we differentiate the cases they are protecting against
withthe two exceptions I’ve found? 

I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".


--
Peter Geoghegan



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

04 October 2017, 21:50:03

On Wed, Oct 4, 2017 at 11:00 AM, Wood, Dan <hexpert@amazon.com> wrote:
> The early “break;” here is likely the xmin frozen reason as I found in the other loop.

It looks that way.

Since we're already very defensive when it comes to this xmin/xmax
matching business, and we're defensive while following an update chain
more generally, I wonder if we should be checking
HeapTupleHeaderIsSpeculative() on versions >= 9.5 (versions with ON
CONFLICT DO UPDATE, where t_ctid block number might actually be a
speculative insertion token). Or, at least acknowledging that case in
comments. I remember expressing concern that something like this was
possible at the time that that went in.

We certainly don't want to have heap_abort_speculative() "super
delete" the wrong tuple in the event of item pointer recycling. There
are at least defensive sanity checks that would turn that into an
error within heap_abort_speculative(), so it wouldn't be a disaster if
it was attempted. I don't think that it's possible in practice, and
maybe it's sufficient that routines like heap_get_latest_tid() check
for a sane item offset, which will discriminate against
SpecTokenOffsetNumber, which must be well out of range for ItemIds on
the page. Could be worth a comment.

(I guess that heap_prune_chain() wouldn't need to be changed if we
decide to add such comments, because the speculative tuple ItemId is
going to be skipped over due to being ItemIdIsUsed() before we even
get there.)
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

05 October 2017, 01:50:10

On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition
shouldbe:

>>
>> >                 if (TransactionIdIsValid(priorXmax) &&
>> >                         !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
>> >                         break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the
transactionhappened

As you know, on version 9.4+, as of commit 37484ad2a, we decided that
we are "largely ignoring the value to which it [xmin] is set". The
expectation became that raw xmin is available after freezing, but
mostly for forensic purposes. I think Alvaro should now memorialize
the idea that its value is actually critical in some place
(htup_details.h?).

> I independently arrived at the same conclusion.  Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.

Obviously you're going to have to be prepared for a raw xmin of
FrozenTransactionId, even on 9.4+, due to pg_upgrade. I can see why it
would be safe (or at least no more dangerous) to rely on
HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
installations that initdb'd on a version after commit 37484ad2a
(version 9.4+). However, I'm not sure why what you propose here would
be safe when even raw xmin happens to be FrozenTransactionId. Are you
sure that that's truly race-free? If it's really true that we only
need to check for FrozenTransactionId on 9.3, why not just do that on
all versions, and never bother with HeapTupleHeaderGetRawXmin()?
("Sheer paranoia" is a valid answer; I just want us to be clear on the
reasoning.)

Obviously any race would have a ridiculously tiny window, but it's not
obvious why this protocol would be completely race-free (in the event
of a FrozenTransactionId raw xmin).

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Michael Paquier

Date:

05 October 2017, 04:31:43

On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition
shouldbe:
 
>>
>> >                 if (TransactionIdIsValid(priorXmax) &&
>> >                         !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
>> >                         break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the
transactionhappened
 
>
> I independently arrived at the same conclusion.  Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.
> Attached is the patch I'm using, and my own oneliner test (pretty much
> the same I posted earlier) seems to survive dozens of iterations without
> showing any problem in REINDEX.

Confirmed, the problem goes away with this patch on 9.3.

> This patch is incomplete, since I think there are other places that need
> to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
> Of course, for 9.4 and onwards we need to patch like you described.

I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)

> This bit in EvalPlanQualFetch caught my attention ... why is it saying
> xmin never changes?  It does change with freezing.
>
>                         /*
>                          * If xmin isn't what we're expecting, the slot must have been
>                          * recycled and reused for an unrelated tuple.  This implies that
>                          * the latest version of the row was deleted, so we need do
>                          * nothing.  (Should be safe to examine xmin without getting
>                          * buffer's content lock, since xmin never changes in an existing
>                          * tuple.)
>                          */
>                         if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
>                                                                          priorXmax))

Agreed. That's not good.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

"Wood, Dan"

Date:

05 October 2017, 04:39:52

Whatever you do make sure to also test 250 clients running lock.sql.  Even with the communities fix plus YiWen’s fix I
canstill get duplicate rows.  What works for “in-block” hot chains may not work when spanning blocks.
 

Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a
whileI usually just “pkill -9 psql”.  After that I have many of duplicate “id=3” rows.  On top of that I think we might
havea lock leak.  After the pkill I tried to rerun setup.sql to drop/create the table and it hangs.  I see an
autovacuumprocess starting and existing every couple of seconds.  Only by killing and restarting PG can I drop the
table.
On 10/4/17, 6:31 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
   On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:   > Wong, Yi Wen wrote:   >> My
interpretationof README.HOT is the check is just to ensure the chain is continuous; in which case the condition should
be:  >>   >> >                 if (TransactionIdIsValid(priorXmax) &&   >> >
!TransactionIdEquals(priorXmax,HeapTupleHeaderGetRawXmin(htup)))   >> >                         break;   >>   >> So the
differenceis GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction
happened  >   > I independently arrived at the same conclusion.  Since I was trying with   > 9.3, the patch differs --
inthe old version we must explicitely test   > for the FrozenTransactionId value, instead of using GetRawXmin.   >
Attachedis the patch I'm using, and my own oneliner test (pretty much   > the same I posted earlier) seems to survive
dozensof iterations without   > showing any problem in REINDEX.      Confirmed, the problem goes away with this patch
on9.3.      > This patch is incomplete, since I think there are other places that need   > to be patched in the same
way(EvalPlanQualFetch? heap_get_latest_tid?).   > Of course, for 9.4 and onwards we need to patch like you described.
  I have just done a lookup of the source code, and here is an   exhaustive list of things in need of surgery:   -
heap_hot_search_buffer  - heap_get_latest_tid   - heap_lock_updated_tuple_rec   - heap_prune_chain   -
heap_get_root_tuples  - rewrite_heap_tuple   - EvalPlanQualFetch (twice)      > This bit in EvalPlanQualFetch caught my
attention... why is it saying   > xmin never changes?  It does change with freezing.   >   >                         /*
 >                          * If xmin isn't what we're expecting, the slot must have been   >
*recycled and reused for an unrelated tuple.  This implies that   >                          * the latest version of
therow was deleted, so we need do   >                          * nothing.  (Should be safe to examine xmin without
getting  >                          * buffer's content lock, since xmin never changes in an existing   >
         * tuple.)   >                          */   >                         if
(!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),  >
               priorXmax))      Agreed. That's not good.   --    Michael      
 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Michael Paquier

Date:

05 October 2017, 05:17:09

On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan <hexpert@amazon.com> wrote:
> Whatever you do make sure to also test 250 clients running lock.sql.  Even with the communities fix plus YiWen’s fix
Ican still get duplicate rows.  What works for “in-block” hot chains may not work when spanning blocks. 

Interesting. Which version did you test? Only 9.6?

> Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a
whileI usually just “pkill -9 psql”.  After that I have many of duplicate “id=3” rows.  On top of that I think we might
havea lock leak.  After the pkill I tried to rerun setup.sql to drop/create the table and it hangs.  I see an
autovacuumprocess starting and existing every couple of seconds.  Only by killing and restarting PG can I drop the
table.

Yeah, that's more or less what I have been doing. My tests involve
using your initial script with way more sessions triggering lock.sql,
minus the kill-9 portion (good idea actually). I can of course see the
sessions queuing for VACUUM, still I cannot see duplicated rows, even
if I headshot Postgres in the middle of the VACUUM waiting queue. Note
that I have just tested Alvaro's patch on 9.3.
--
Michael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

05 October 2017, 12:54:54

Wood, Dan wrote:
> Whatever you do make sure to also test 250 clients running lock.sql.  Even with the communities fix plus YiWen’s fix
Ican still get duplicate rows.  What works for “in-block” hot chains may not work when spanning blocks.
 

Good idea.  You can achieve a similar effect by adding a filler column,
and reducing fillfactor.

> Once nearly all 250 clients have done their updates and everybody is
> waiting to vacuum which one by one will take a while I usually just
> “pkill -9 psql”.  After that I have many of duplicate “id=3” rows.

Odd ...

> On top of that I think we might have a lock leak.  After the pkill I
> tried to rerun setup.sql to drop/create the table and it hangs.  I see
> an autovacuum process starting and existing every couple of seconds.
> Only by killing and restarting PG can I drop the table.

Please do try to figure this one out.  It'd be a separate problem,
worthy of its own thread.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

05 October 2017, 14:35:51

Peter Geoghegan wrote:

> As you know, on version 9.4+, as of commit 37484ad2a, we decided that
> we are "largely ignoring the value to which it [xmin] is set". The
> expectation became that raw xmin is available after freezing, but
> mostly for forensic purposes. I think Alvaro should now memorialize
> the idea that its value is actually critical in some place
> (htup_details.h?).

I'd love to be certain that we preserve the original value in all
cases.

> On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> > I independently arrived at the same conclusion.  Since I was trying with
> > 9.3, the patch differs -- in the old version we must explicitely test
> > for the FrozenTransactionId value, instead of using GetRawXmin.
> 
> Obviously you're going to have to be prepared for a raw xmin of
> FrozenTransactionId, even on 9.4+, due to pg_upgrade.

Hmm.  It would be good to be able to get rid of that case, actually,
because I now think it's less safe (see below).

At any rate, I was thinking in a new routine to encapsulate the logic,
       /*        * Check the tuple XMIN against prior XMAX, if any        */       if
(!HeapTupleUpdateXmaxMatchesXmin(priorXmax,HeapTupleHeaderGetXmin(htup)))           break;

where ..XmaxMatchesXmin would know about checking for possibly frozen
tuples.

> I can see why it
> would be safe (or at least no more dangerous) to rely on
> HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
> installations that initdb'd on a version after commit 37484ad2a
> (version 9.4+). However, I'm not sure why what you propose here would
> be safe when even raw xmin happens to be FrozenTransactionId. Are you
> sure that that's truly race-free? If it's really true that we only
> need to check for FrozenTransactionId on 9.3, why not just do that on
> all versions, and never bother with HeapTupleHeaderGetRawXmin()?
> ("Sheer paranoia" is a valid answer; I just want us to be clear on the
> reasoning.)

I think the RawXmin is the better mechanism.  I'm not absolutely certain
that the windows is completely closed in 9.3; as I understand things, it
is possible for transaction A to prune an aborted heap-only tuple, then
transaction B to insert a frozen tuple in the same location, then
transaction C follows a link to the HOT that was pruned.  I think we'd
end up considering that the new frozen tuple is part of the chain, which
is wrong ...  In 9.4 we can compare the real xmin.

I hope I am proved wrong about this, because if not, I think we're
looking at an essentially unsolvable problem in 9.3.

> Obviously any race would have a ridiculously tiny window, but it's not
> obvious why this protocol would be completely race-free (in the event
> of a FrozenTransactionId raw xmin).

As far as I understand, this problem only emerges if one part of a HOT
chain reaches the min freeze age while another part of the same chain is
still visible by some running transaction.  It is particularly
noticeably in our current test case because we use a min freeze age of
0, with many concurrrent modifying the same page.  What this says to me
is that VACUUM FREEZE is mildly dangerous when there's lots of high
concurrent HOT UPDATE activity.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

05 October 2017, 19:24:03

I think this is the patch for 9.3.  I ran the test a few hundred times
(with some additional changes such as randomly having an update inside a
savepoint that's randomly aborted, randomly aborting the transaction,
randomly skipping the for key share lock, randomly sleeping at a few
points; and also adding filler columns, reducing fillfactor and using
5, 50, 180, 250, 500 sessions after verifying that it causes the tuples
to stay in the same page or migrate to later pages).  The final REINDEX
has never complained again about failing to find the root tuple.  I hope
it's good now.

The attached patch needs a few small tweaks, such as improving
commentary in the new function, maybe turn it into a macro (otherwise I
think it could be bad for performance; I'd like a static func but not
sure those are readily available in 9.3), change the XID comparison to
use the appropriate macro rather than ==, and such.

Regarding changes of xmin/xmax comparison, I also checked manually the
spots I thought should be modified and later double-checked against the
list that Michael posted.  It's a match, except for rewriteheap.c which
I cannot make heads or tails about.  (I think it's rather unfortunate
that it sticks a tuple's Xmax into a field that's called Xmin, but let's
put that aside).  Maybe there's a problem here, maybe there isn't.

I'm now going to forward-port this to 9.4.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Fix-bugs.patch

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

05 October 2017, 20:52:28

On Thu, Oct 5, 2017 at 4:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> At any rate, I was thinking in a new routine to encapsulate the logic,
>
>         /*
>          * Check the tuple XMIN against prior XMAX, if any
>          */
>         if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, HeapTupleHeaderGetXmin(htup)))
>             break;
>
> where ..XmaxMatchesXmin would know about checking for possibly frozen
> tuples.

Makes sense.

>> I can see why it
>> would be safe (or at least no more dangerous) to rely on
>> HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
>> installations that initdb'd on a version after commit 37484ad2a
>> (version 9.4+). However, I'm not sure why what you propose here would
>> be safe when even raw xmin happens to be FrozenTransactionId. Are you
>> sure that that's truly race-free? If it's really true that we only
>> need to check for FrozenTransactionId on 9.3, why not just do that on
>> all versions, and never bother with HeapTupleHeaderGetRawXmin()?
>> ("Sheer paranoia" is a valid answer; I just want us to be clear on the
>> reasoning.)
>
> I think the RawXmin is the better mechanism.  I'm not absolutely certain
> that the windows is completely closed in 9.3; as I understand things, it
> is possible for transaction A to prune an aborted heap-only tuple, then
> transaction B to insert a frozen tuple in the same location, then
> transaction C follows a link to the HOT that was pruned.  I think we'd
> end up considering that the new frozen tuple is part of the chain, which
> is wrong ...  In 9.4 we can compare the real xmin.

Good point: the race doesn't exist on 9.4+ with pg_upgrade from 9.3,
when raw xmin happens to be FrozenTransactionId, because there can be
no new tuples that have that property. This possible race is strictly
a 9.3 issue. (You have to deal with a raw xmin of FrozenTransactionId
on 9.4+, but there are no race conditions, because affected tuples are
all pre-pg_upgrade tuples -- they were frozen months ago, not
microseconds ago.)

> I hope I am proved wrong about this, because if not, I think we're
> looking at an essentially unsolvable problem in 9.3.

Well, as noted within README.HOT, the xmin/xmax matching did not
appear in earlier versions of the original HOT patch, because it was
thought that holding a pin of the buffer was a sufficient interlock
against concurrent line pointer recycling (pruning must have a buffer
cleanup lock). Clearly it is more robust to match xmin to xmax, but is
there actually any evidence that it was really necessary at all (for
single-page HOT chain traversals) when HOT went in in 2007, or that it
has since become necessary? heap_page_prune()'s caller has to have a
buffer cleanup lock.

I'm certainly not advocating removing the xmin/xmax matching within
heap_prune_chain() on 9.3. However, it may be acceptable to rely on
holding a buffer cleanup lock within heap_prune_chain() on 9.3, just
for the rare/theoretical cases where the FrozenTransactionId raw xmin
ambiguity means that xmin/xmax matching alone might not be enough. As
you say, it seems unsolvable through looking at state on the page
directly on 9.3, so there may be no other way. And, that's still
strictly better than what we have today.

> As far as I understand, this problem only emerges if one part of a HOT
> chain reaches the min freeze age while another part of the same chain is
> still visible by some running transaction.  It is particularly
> noticeably in our current test case because we use a min freeze age of
> 0, with many concurrrent modifying the same page.  What this says to me
> is that VACUUM FREEZE is mildly dangerous when there's lots of high
> concurrent HOT UPDATE activity.

I'm not sure what you mean here. It is dangerous right now, because
there is a bug, but we can squash the bug.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

"Wood, Dan"

Date:

05 October 2017, 20:54:46

Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.

I would prefer to focus on either latest 9X or 11dev. Does Alvaro’s patch presume any of the other patch to set
COMMITTEDin the freeze code?

On 10/4/17, 7:17 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan <hexpert@amazon.com> wrote: > Whatever you do make sure to also test
250clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What
worksfor “in-block” hot chains may not work when spanning blocks. Interesting. Which version did you test? Only
9.6? > Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will
takea while I usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think
wemight have a lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an
autovacuumprocess starting and existing every couple of seconds. Only by killing and restarting PG can I drop the
table. Yeah, that's more or less what I have been doing. My tests involve using your initial script with way more
sessionstriggering lock.sql, minus the kill-9 portion (good idea actually). I can of course see the sessions
queuingfor VACUUM, still I cannot see duplicated rows, even if I headshot Postgres in the middle of the VACUUM
waitingqueue. Note that I have just tested Alvaro's patch on 9.3. -- Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Michael Paquier

Date:

06 October 2017, 10:12:47

On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I think this is the patch for 9.3.  I ran the test a few hundred times
> (with some additional changes such as randomly having an update inside a
> savepoint that's randomly aborted, randomly aborting the transaction,
> randomly skipping the for key share lock, randomly sleeping at a few
> points; and also adding filler columns, reducing fillfactor and using
> 5, 50, 180, 250, 500 sessions after verifying that it causes the tuples
> to stay in the same page or migrate to later pages).  The final REINDEX
> has never complained again about failing to find the root tuple.  I hope
> it's good now.

I have looked and played with your patch as well for a couple of
hours, and did not notice any failures again. The structure of the
pages looked sane as well, I could see also with pageinspect a correct
HOT chain using the first set of tests provided on this thread.

> The attached patch needs a few small tweaks, such as improving
> commentary in the new function, maybe turn it into a macro (otherwise I
> think it could be bad for performance; I'd like a static func but not
> sure those are readily available in 9.3), change the XID comparison to
> use the appropriate macro rather than ==, and such.

Yeah that would be better.

> Regarding changes of xmin/xmax comparison, I also checked manually the
> spots I thought should be modified and later double-checked against the
> list that Michael posted.

Thanks. I can see see that your patch is filling the holes.

> It's a match, except for rewriteheap.c which
> I cannot make heads or tails about.  (I think it's rather unfortunate
> that it sticks a tuple's Xmax into a field that's called Xmin, but let's
> put that aside).  Maybe there's a problem here, maybe there isn't.

rewrite_heap_tuple is used only by CLUSTER, and the oldest new tuples
are frozen before being rewritten. So this looks safe to me at the
end.

> I'm now going to forward-port this to 9.4.

+   /*
+    * If the xmax of the old tuple is identical to the xmin of the new one,
+    * it's a match.
+    */
+   if (xmax == xmin)
+       return true;
I would use TransactionIdEquals() here, to remember once you switch
that to a macro.

+/*
+ * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
+ * taking into account that the Xmin might have been frozen.
+ */
[...]
+   /*
+    * We actually don't know if there's a match, but if the previous tuple
+    * was frozen, we cannot really rely on a perfect match.
+    */
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

06 October 2017, 13:57:18

Michael Paquier wrote:
> On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> +   /*
> +    * If the xmax of the old tuple is identical to the xmin of the new one,
> +    * it's a match.
> +    */
> +   if (xmax == xmin)
> +       return true;
> I would use TransactionIdEquals() here, to remember once you switch
> that to a macro.

I've had second thoughts about the macro thing -- for now I'm keeping it
a function, actually.

> +/*
> + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
> + * taking into account that the Xmin might have been frozen.
> + */
> [...]
> +   /*
> +    * We actually don't know if there's a match, but if the previous tuple
> +    * was frozen, we cannot really rely on a perfect match.
> +    */

I don't know what you had in mind here, but I tweaked the 9.3 version so
that it now looks like this:

/** HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage** Given the new version of a tuple after
someupdate, verify whether the* given Xmax (corresponding to the previous version) matches the tuple's* Xmin, taking
intoaccount that the Xmin might have been frozen after the* update.*/

bool
HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
{TransactionId    xmin = HeapTupleHeaderGetXmin(htup);
/* * If the xmax of the old tuple is identical to the xmin of the new one, * it's a match. */if
(TransactionIdEquals(xmax,xmin))    return true;

/* * When a tuple is frozen, the original Xmin is lost, but we know it's a * committed transaction.  So unless the Xmax
isInvalidXid, we don't * know for certain that there is a match, but there may be one; and we * must return true so
thata HOT chain that is half-frozen can be walked * correctly. */if (TransactionIdEquals(xmin, FrozenTransactionId) &&
 TransactionIdIsValid(xmax))    return true;

return false;
}

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

06 October 2017, 16:18:30

Wood, Dan wrote:
> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> 
> I would prefer to focus on either latest 9X or 11dev.  

I tested my patch on 9.4 and 9.5 today and it seems to close the problem
(with the patch, I waited 10x as many iterations as it took for the
problem to occur ~10 times without the patch), but I can reproduce a
problem in 9.6 with my patch installed.  There must be something new in
9.6 that is causing the problem to reappear.

> Does Alvaro’s patch presume any of the other patch to set COMMITTED in
> the freeze code?

I don't know what you mean.  Here is the 9.6 version of my patch.  Note
that HEAP_XMIN_FROZEN (which uses the XMIN_COMMITTED bit as I recall)
was introduced in 9.4.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

0001-Fix-bugs.patch

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Michael Paquier

Date:

06 October 2017, 16:28:03

On Fri, Oct 6, 2017 at 7:57 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Michael Paquier wrote:
>> On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> +/*
>> + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
>> + * taking into account that the Xmin might have been frozen.
>> + */
>> [...]
>> +   /*
>> +    * We actually don't know if there's a match, but if the previous tuple
>> +    * was frozen, we cannot really rely on a perfect match.
>> +    */
>
> I don't know what you had in mind here,

Impossible to know if I don't actually send the contents :)

> but I tweaked the 9.3 version so that it now looks like this:

I wanted to mention that the comments could be reworked. And forgot to
suggest some.

> /*
>  * HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
>  *
>  * Given the new version of a tuple after some update, verify whether the
>  * given Xmax (corresponding to the previous version) matches the tuple's
>  * Xmin, taking into account that the Xmin might have been frozen after the
>  * update.
>  */
> bool
> HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
> {
>         TransactionId   xmin = HeapTupleHeaderGetXmin(htup);
>
>         /*
>          * If the xmax of the old tuple is identical to the xmin of the new one,
>          * it's a match.
>          */
>         if (TransactionIdEquals(xmax, xmin))
>                 return true;
>
>         /*
>          * When a tuple is frozen, the original Xmin is lost, but we know it's a
>          * committed transaction.  So unless the Xmax is InvalidXid, we don't
>          * know for certain that there is a match, but there may be one; and we
>          * must return true so that a HOT chain that is half-frozen can be walked
>          * correctly.
>          */
>         if (TransactionIdEquals(xmin, FrozenTransactionId) &&
>                 TransactionIdIsValid(xmax))
>                 return true;
>
>         return false;
> }

Those are clearly improvements.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Michael Paquier

Date:

06 October 2017, 16:36:17

On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Wood, Dan wrote:
>> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
>>
>> I would prefer to focus on either latest 9X or 11dev.
>
> I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> (with the patch, I waited 10x as many iterations as it took for the
> problem to occur ~10 times without the patch), but I can reproduce a
> problem in 9.6 with my patch installed.  There must be something new in
> 9.6 that is causing the problem to reappear.

The freeze visibility map has been introduced in 9.6... There could be
interactions on this side.
--
Michael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

06 October 2017, 16:57:12

Michael Paquier wrote:
> On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Wood, Dan wrote:
> >> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> >>
> >> I would prefer to focus on either latest 9X or 11dev.
> >
> > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > (with the patch, I waited 10x as many iterations as it took for the
> > problem to occur ~10 times without the patch), but I can reproduce a
> > problem in 9.6 with my patch installed.  There must be something new in
> > 9.6 that is causing the problem to reappear.
> 
> The freeze visibility map has been introduced in 9.6... There could be
> interactions on this side.

Ah, thanks for the tip.  I hope the authors of that can do the gruntwork
of researching this problem there, then.  I'll go commit what I have
now.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Michael Paquier

Date:

06 October 2017, 17:06:38

On Fri, Oct 6, 2017 at 10:57 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Ah, thanks for the tip.  I hope the authors of that can do the gruntwork
> of researching this problem there, then.

I have some stuff using 9.6 extensively, so like Dan I think I'll
chime in anyway. Not before Tuesday though, long weekend in Japan
ahead.

> I'll go commit what I have now.

As far as I saw this set definitely improves the situation.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Stephen Frost

Date:

06 October 2017, 17:11:40

Robert,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> Michael Paquier wrote:
> > On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > Wood, Dan wrote:
> > >> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> > >>
> > >> I would prefer to focus on either latest 9X or 11dev.
> > >
> > > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > > (with the patch, I waited 10x as many iterations as it took for the
> > > problem to occur ~10 times without the patch), but I can reproduce a
> > > problem in 9.6 with my patch installed.  There must be something new in
> > > 9.6 that is causing the problem to reappear.
> >
> > The freeze visibility map has been introduced in 9.6... There could be
> > interactions on this side.
>
> Ah, thanks for the tip.  I hope the authors of that can do the gruntwork
> of researching this problem there, then.  I'll go commit what I have
> now.

I don't doubt you're watching this thread too, but just to be 110% sure
that we don't end up with the November releases still having this issue,
I'm adding you to the CC on this thread as the one who did the freeze
visibility map work.  Depending on hope here is a bit too squishy for me
when we're talking about corruption issues.

Thanks!

Stephen

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

06 October 2017, 17:59:12

By the way, I still wonder if there's any way for a new tuple to get
inserted in the place where a HOT redirect would be pointing to, and
have it be marked as Frozen, where the old redirect contains a
non-invalid Xmax.  I tried to think of a way for that to happen, but
couldn't think of anything.

What I imagine is a sequence like this:

1. insert a tuple
2. HOT-update a tuple
3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple)
4. start transaction
5. HOT-update the tuple again, creating HOT in lp 3
6. abort transaction (leaving aborted update in lp 3)
7. somehow remove tuple from lp 3, make slot available
8. new transaction comes along, inserts tuple in lp 3
9. somebody freezes tuple in lp3 (???)

Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that
the tuple is part of the chain because of an xid "match".

Basically from step 7 onwards I don't think this is possible, but maybe
I'm just blind.


Maybe we can forestall the problem by checking whether the Xmax
TransactionIdIsCurrentTransaction || TransactionIdDidCommit (or some
variant thereof).  This would be very slow but safer; and in 9.4 and up
we'd only need to do it if the xmin value is actually FrozenXid which
should be rare (only in pages upgraded from 9.3).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

06 October 2017, 20:03:51

On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Wood, Dan wrote:
>> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
>>
>> I would prefer to focus on either latest 9X or 11dev.
>
> I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> (with the patch, I waited 10x as many iterations as it took for the
> problem to occur ~10 times without the patch), but I can reproduce a
> problem in 9.6 with my patch installed.  There must be something new in
> 9.6 that is causing the problem to reappear.

What problem persists? The original one (or, at least, the original
symptom of pruning HOT chains incorrectly)? If that's what you mean, I
wouldn't be so quick to assume that it's the freeze map.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

06 October 2017, 20:49:56

Peter Geoghegan wrote:
> On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Wood, Dan wrote:
> >> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> >>
> >> I would prefer to focus on either latest 9X or 11dev.
> >
> > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > (with the patch, I waited 10x as many iterations as it took for the
> > problem to occur ~10 times without the patch), but I can reproduce a
> > problem in 9.6 with my patch installed.  There must be something new in
> > 9.6 that is causing the problem to reappear.
> 
> What problem persists? The original one (or, at least, the original
> symptom of pruning HOT chains incorrectly)? If that's what you mean, I
> wouldn't be so quick to assume that it's the freeze map.

I can tell that, in 9.6, REINDEX still reports the error we saw in
earlier releases, after some of the runs of my reproducer scripts.  I'm
unable to reproduce it anymore in 9.3 to 9.5.  I can't see the one Dan
originally reported anywhere, either.

I don't know if it's really the freeze map at fault or something else.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

06 October 2017, 21:07:45

On Fri, Oct 6, 2017 at 7:59 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> By the way, I still wonder if there's any way for a new tuple to get
> inserted in the place where a HOT redirect would be pointing to, and
> have it be marked as Frozen, where the old redirect contains a
> non-invalid Xmax.  I tried to think of a way for that to happen, but
> couldn't think of anything.
>
> What I imagine is a sequence like this:
>
> 1. insert a tuple
> 2. HOT-update a tuple
> 3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple)
> 4. start transaction
> 5. HOT-update the tuple again, creating HOT in lp 3
> 6. abort transaction (leaving aborted update in lp 3)
> 7. somehow remove tuple from lp 3, make slot available
> 8. new transaction comes along, inserts tuple in lp 3
> 9. somebody freezes tuple in lp3 (???)
>
> Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that
> the tuple is part of the chain because of an xid "match".
>
> Basically from step 7 onwards I don't think this is possible, but maybe
> I'm just blind.

For the record, I also think that this is impossible, in part because
pruning requires a cleanup buffer lock (and because HOT chains cannot
span pages). I wouldn't say that I am 100% confident about this,
though.

BTW, is this comment block that appears above
heap_prepare_freeze_tuple() now obsolete, following 20b65522 (and
maybe much earlier commits)?
* NB: It is not enough to set hint bits to indicate something is* committed/invalid -- they might not be set on a
standby,or after crash* recovery.  We really need to remove old xids.*/

We WAL-log setting hint bits during freezing now, iff tuple xmin is
before the Xid cutoff and tuple is a heap-only tuple.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

06 October 2017, 21:34:20

On Fri, Oct 6, 2017 at 10:49 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I can tell that, in 9.6, REINDEX still reports the error we saw in
> earlier releases, after some of the runs of my reproducer scripts.  I'm
> unable to reproduce it anymore in 9.3 to 9.5.  I can't see the one Dan
> originally reported anywhere, either.

You mean the enhanced stress-test that varied fillfactor, added filler
columns, and so on [1]? Can you post that to the list, please? I think
that several of us would like to have a reproducible test case.

> I don't know if it's really the freeze map at fault or something else.

Ideally, it would be possible to effectively disable the new freeze
map stuff in a minimal way, for testing purposes. Perhaps the authors
of that patch, CC'd, can suggest a way to do that.

If I had to guess, I'd say that it's just as likely that the issue is
only reproducible on 9.6 because of the enhancements added in that
release that improved buffer pinning (the use of atomic ops to pin
buffers, moving buffer content locks into buffer descriptors, etc). It
was already a bit tricky to get the problem that remained after
20b6552 but before today's a5736bf to reproduce with Dan's script. It
often took me 4 or 5 attempts. (I wonder what it looks like with your
enhanced version of that script -- the one that I just asked about.)

It seems possible that we've merely reduced the window for the race to
the point that it's practically (though not theoretically) impossible
to reproduce the problem on versions < 9.6, though not on 9.6+.
Applying Occam's razor, the problem doesn't seem particularly likely
to be in the freeze map stuff, which isn't actually all that closely
related.

[1] https://postgr.es/m/20171005162402.jahqflf3mekileqm@alvherre.pgsql
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

06 October 2017, 22:27:26

On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>> I don't know if it's really the freeze map at fault or something else.
>
> Ideally, it would be possible to effectively disable the new freeze
> map stuff in a minimal way, for testing purposes. Perhaps the authors
> of that patch, CC'd, can suggest a way to do that.

Actually, the simplest thing might be to just use pg_visibility's
pg_check_frozen() to check that the visibility/freeze map accurately
summarizes the all-frozen status of tuples in the heap. If that
doesn't indicate that there is corruption, we can be fairly confident
that the problem is elsewhere. The metadata in the visibility/freeze
map should be accurate when a bit is set to indicate that an entire
heap page is all-frozen (or, separately, all-visible). We can hardly
expect it to have better information that the authoritative source of
truth, the heap itself.

The more I think about it, the more I tend to doubt that the remaining
problems are with the freeze map. If the freeze map was wrong, and
incorrectly said that a page was all-frozen, then surely the outward
symptoms would take a long time to show up, as they always do when we
accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM
that that's the only meaningful way that the freeze map can be wrong
-- it only promises to be accurate when it says that no further
freezing is needed for a page/bit.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

"Wong, Yi Wen"

Date:

07 October 2017, 00:09:12

>On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>>> I don't know if it's really the freeze map at fault or something else.
>>
>> Ideally, it would be possible to effectively disable the new freeze
>> map stuff in a minimal way, for testing purposes. Perhaps the authors
> of that patch, CC'd, can suggest a way to do that.

>Actually, the simplest thing might be to just use pg_visibility's
>pg_check_frozen() to check that the visibility/freeze map accurately
>summarizes the all-frozen status of tuples in the heap. If that
>doesn't indicate that there is corruption, we can be fairly confident
>that the problem is elsewhere. The metadata in the visibility/freeze
>map should be accurate when a bit is set to indicate that an entire
>heap page is all-frozen (or, separately, all-visible). We can hardly
>expect it to have better information that the authoritative source of
>truth, the heap itself.

>The more I think about it, the more I tend to doubt that the remaining
>problems are with the freeze map. If the freeze map was wrong, and
>incorrectly said that a page was all-frozen, then surely the outward
>symptoms would take a long time to show up, as they always do when we
>accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM
>that that's the only meaningful way that the freeze map can be wrong
>-- it only promises to be accurate when it says that no further
>freezing is needed for a page/bit.

Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6.
None of the all-frozen or all-visible bits are necessarily set in problematic pages.

ERROR:  failed to find parent tuple for heap-only tuple at (0,182) in table "accounts"
blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------    0 | f           | f          | f
(1 row)

Even when the bits were set, I haven't found issues with the pg_check_xxx functions in the dozens of times I've run
them.

postgres=# select * from pg_visibility('accounts'::regclass);blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------    0 | f           | f          | f    1 | t           | t
|t 
(2 rows)

postgres=# select pg_check_visible('accounts'::regclass);pg_check_visible
------------------
(0 rows)

postgres=# select pg_check_frozen('accounts'::regclass);pg_check_frozen
-----------------
(0 rows)

Yi Wen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

07 October 2017, 01:15:07

On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen <yiwong@amazon.com> wrote:
> Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6.
> None of the all-frozen or all-visible bits are necessarily set in problematic pages.

Since this happened yesterday, I assume it was with an unfixed version?

As you must have seen, Alvaro said he has a variant of Dan's original
script that demonstrates that a problem remains, at least on 9.6+,
even with today's fix. I think it's the stress-test that plays with
fillfactor, many clients, etc [1].

I've tried to independently reproduce the problem on the master
branch's current tip, with today's new fix, but cannot break things
despite trying many variations. I cannot reproduce the problem that
Alvaro still sees.

I'll have to wait until Alvaro posts his repro to the list before
commenting further, which I assume he'll post as soon as he can. There
doesn't seem to be much point in not waiting for that.

[1] https://postgr.es/m/20171005162402.jahqflf3mekileqm@alvherre.pgsql
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

07 October 2017, 11:31:06

Peter Geoghegan wrote:
> On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen <yiwong@amazon.com> wrote:
> > Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6.
> > None of the all-frozen or all-visible bits are necessarily set in problematic pages.
> 
> Since this happened yesterday, I assume it was with an unfixed version?
> 
> As you must have seen, Alvaro said he has a variant of Dan's original
> script that demonstrates that a problem remains, at least on 9.6+,
> even with today's fix. I think it's the stress-test that plays with
> fillfactor, many clients, etc [1].

I just execute setup.sql once and then run this shell command,

while :; do
    psql -e -P pager=off -f ./repro.sql
    for i in `seq 1 5`; do
        psql -P pager=off -e --no-psqlrc -f ./lock.sql &
    done
    wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql 
    psql -P pager=off -e --no-psqlrc -f ./report.sql
    echo "done"
done

Note that you need to use pg10's psql because of the \if lines in
lock.sql.  For some runs I change the values to compare random() to, and
originally the commented out section in lock.sql was not commented out,
but I'm fairly sure the failures I saw where with this version.  Also, I
sometime change the 5 in the `seq` command to higher values (180, 250).

I didn't find the filler column to have any effect, so I took that out.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

08 October 2017, 02:22:08

On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> As you must have seen, Alvaro said he has a variant of Dan's original
>> script that demonstrates that a problem remains, at least on 9.6+,
>> even with today's fix. I think it's the stress-test that plays with
>> fillfactor, many clients, etc [1].
>
> I just execute setup.sql once and then run this shell command,
>
> while :; do
>         psql -e -P pager=off -f ./repro.sql
>         for i in `seq 1 5`; do
>                 psql -P pager=off -e --no-psqlrc -f ./lock.sql &
>         done
>         wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
>         psql -P pager=off -e --no-psqlrc -f ./report.sql
>         echo "done"
> done

I cannot reproduce the problem on my personal machine using this
script/stress-test. I tried to do so on the master branch git tip.
This reinforces the theory that there is some timing sensitivity,
because the remaining race condition is very narrow.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

08 October 2017, 02:25:24

Peter Geoghegan wrote:
> On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >> As you must have seen, Alvaro said he has a variant of Dan's original
> >> script that demonstrates that a problem remains, at least on 9.6+,
> >> even with today's fix. I think it's the stress-test that plays with
> >> fillfactor, many clients, etc [1].
> >
> > I just execute setup.sql once and then run this shell command,
> >
> > while :; do
> >         psql -e -P pager=off -f ./repro.sql
> >         for i in `seq 1 5`; do
> >                 psql -P pager=off -e --no-psqlrc -f ./lock.sql &
> >         done
> >         wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
> >         psql -P pager=off -e --no-psqlrc -f ./report.sql
> >         echo "done"
> > done
> 
> I cannot reproduce the problem on my personal machine using this
> script/stress-test. I tried to do so on the master branch git tip.
> This reinforces the theory that there is some timing sensitivity,
> because the remaining race condition is very narrow.

Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
makes the race easier to hit.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Peter Geoghegan

Date:

08 October 2017, 20:29:06

On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Hmm, I think I added a random sleep (max. 100ms) right after the
> HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
> makes the race easier to hit.

I still cannot reproduce. Perhaps you can be more specific?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

"Wood, Dan"

Date:

09 October 2017, 09:11:16

I’m unclear on what is being repro’d in 9.6.  Are you getting the duplicate rows problem or just the reindex problem?
Areyou testing with asserts enabled(I’m not)?
 

If you are getting the dup rows consider the code in the block in heapam.c that starts with the comment “replace multi
byupdate xid”.
 
When I repro this I find that MultiXactIdGetUpdateXid() returns 0.  There is an updater in the multixact array however
thestatus is MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I assume this is a preliminary status
beforethe following row in the hot chain has it’s multixact set to NoKeyUpdate.
 

Since a 0 is returned this does precede cutoff_xid and TransactionIdDidCommit(0) will return false.  This ends up
abortingthe multixact on the row even though the real xid is committed.  This sets XMAX to 0 and that row becomes
visibleas one of the dups.  Interestingly the real xid of the updater is 122944 and the cutoff_xid is 122945.
 

I’m still debugging but I start late so I’m passing this incomplete info along now.

On 10/7/17, 4:25 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
   Peter Geoghegan wrote:   > On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:   > >> As
youmust have seen, Alvaro said he has a variant of Dan's original   > >> script that demonstrates that a problem
remains,at least on 9.6+,   > >> even with today's fix. I think it's the stress-test that plays with   > >> fillfactor,
manyclients, etc [1].   > >   > > I just execute setup.sql once and then run this shell command,   > >   > > while :;
do  > >         psql -e -P pager=off -f ./repro.sql   > >         for i in `seq 1 5`; do   > >                 psql -P
pager=off-e --no-psqlrc -f ./lock.sql &   > >         done   > >         wait && psql -P pager=off -e --no-psqlrc -f
./reindex.sql  > >         psql -P pager=off -e --no-psqlrc -f ./report.sql   > >         echo "done"   > > done   >
>I cannot reproduce the problem on my personal machine using this   > script/stress-test. I tried to do so on the
masterbranch git tip.   > This reinforces the theory that there is some timing sensitivity,   > because the remaining
racecondition is very narrow.      Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuumcall in vacuumlazy.c (lazy_scan_heap), and that   makes the race easier to hit.      --
ÁlvaroHerrera                https://www.2ndQuadrant.com/   PostgreSQL Development, 24x7 Support, Remote DBA, Training
&Services   
 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Michael Paquier

Date:

10 October 2017, 04:00:10

On Mon, Oct 9, 2017 at 2:29 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Hmm, I think I added a random sleep (max. 100ms) right after the
>> HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
>> makes the race easier to hit.
>
> I still cannot reproduce. Perhaps you can be more specific?

I have been trying to reproduce things for a total of 4 hours, testing
various variations of the proposed test cases (killed Postgres,
changed fillfactor, manual sleep calls), but I am proving unable to
see a regression as well. I would not think that the OS matters here,
all my attempts were on macos with assertions and debugging enabled.

At least the code is now more stable, which is definitely a good thing.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

Alvaro Herrera

Date:

10 October 2017, 17:14:44

Wood, Dan wrote:
> I’m unclear on what is being repro’d in 9.6.  Are you getting the
> duplicate rows problem or just the reindex problem?  Are you testing
> with asserts enabled(I’m not)?

I was seeing just the reindex problem.  I don't see any more dups.

But I've tried to reproduce it afresh now, and let it run for a long
time and nothing happened.  Maybe I made a mistake last week and
ran an unfixed version.  I don't see any more problems now.

> If you are getting the dup rows consider the code in the block in
> heapam.c that starts with the comment “replace multi by update xid”.
>
> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
> There is an updater in the multixact array however the status is
> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I
> assume this is a preliminary status before the following row in the
> hot chain has it’s multixact set to NoKeyUpdate.

Yes, the "For" version is the locker version rather than the actual
update.  That lock is acquired by EvalPlanQual locking the row just
before doing the update.  I think GetUpdateXid has no reason to return
such an Xid, since it's not an update.

> Since a 0 is returned this does precede cutoff_xid and
> TransactionIdDidCommit(0) will return false.  This ends up aborting
> the multixact on the row even though the real xid is committed.  This
> sets XMAX to 0 and that row becomes visible as one of the dups.
> Interestingly the real xid of the updater is 122944 and the cutoff_xid
> is 122945.

I haven't seen this effect.  Please keep us updated if you're able to
verify corruption this way.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Michael Paquier

Date:

11 October 2017, 05:25:46

On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> I was seeing just the reindex problem.  I don't see any more dups.
>
> But I've tried to reproduce it afresh now, and let it run for a long
> time and nothing happened.  Maybe I made a mistake last week and
> ran an unfixed version.  I don't see any more problems now.

Okay, so that's one person more going to this trend, making three with
Peter and I.

>> If you are getting the dup rows consider the code in the block in
>> heapam.c that starts with the comment “replace multi by update xid”.
>>
>> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
>> There is an updater in the multixact array however the status is
>> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I
>> assume this is a preliminary status before the following row in the
>> hot chain has it’s multixact set to NoKeyUpdate.
>
> Yes, the "For" version is the locker version rather than the actual
> update.  That lock is acquired by EvalPlanQual locking the row just
> before doing the update.  I think GetUpdateXid has no reason to return
> such an Xid, since it's not an update.
>
>> Since a 0 is returned this does precede cutoff_xid and
>> TransactionIdDidCommit(0) will return false.  This ends up aborting
>> the multixact on the row even though the real xid is committed.  This
>> sets XMAX to 0 and that row becomes visible as one of the dups.
>> Interestingly the real xid of the updater is 122944 and the cutoff_xid
>> is 122945.
>
> I haven't seen this effect. Please keep us updated if you're able to
> verify corruption this way.

Me neither. It would be nice to not live long with such a sword of Damocles.
--
Michael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updatedtuple

From

"Wood, Dan"

Date:

11 October 2017, 05:31:13

I found one glitch with our merge of the original dup row fix.  With that corrected AND Alvaro’s Friday fix things are
solid.
No dup’s.  No index corruption.

Thanks so much. 

On 10/10/17, 7:25 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
   On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera   <alvherre@alvh.no-ip.org> wrote:   > I was seeing just the
reindexproblem.  I don't see any more dups.   >   > But I've tried to reproduce it afresh now, and let it run for a
long  > time and nothing happened.  Maybe I made a mistake last week and   > ran an unfixed version.  I don't see any
moreproblems now.      Okay, so that's one person more going to this trend, making three with   Peter and I.      >> If
youare getting the dup rows consider the code in the block in   >> heapam.c that starts with the comment “replace multi
byupdate xid”.   >>   >> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.   >> There is an updater in
themultixact array however the status is   >> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I   >>
assumethis is a preliminary status before the following row in the   >> hot chain has it’s multixact set to
NoKeyUpdate.  >   > Yes, the "For" version is the locker version rather than the actual   > update.  That lock is
acquiredby EvalPlanQual locking the row just   > before doing the update.  I think GetUpdateXid has no reason to return
 > such an Xid, since it's not an update.   >   >> Since a 0 is returned this does precede cutoff_xid and   >>
TransactionIdDidCommit(0)will return false.  This ends up aborting   >> the multixact on the row even though the real
xidis committed.  This   >> sets XMAX to 0 and that row becomes visible as one of the dups.   >> Interestingly the real
xidof the updater is 122944 and the cutoff_xid   >> is 122945.   >   > I haven't seen this effect. Please keep us
updatedif you're able to   > verify corruption this way.      Me neither. It would be nice to not live long with such a
swordof Damocles.   --    Michael      
 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From

Michael Paquier

Date:

11 October 2017, 05:35:20

On Wed, Oct 11, 2017 at 11:31 AM, Wood, Dan <hexpert@amazon.com> wrote:
> I found one glitch with our merge of the original dup row fix.  With that corrected AND Alvaro’s Friday fix things
aresolid. 
> No dup’s.  No index corruption.
>
> Thanks so much.

Nice to hear that! You guys seem to be doing extensive testing and
actually report back about it, which is really nice to see.
--
Michael


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers