Re: HOT chain validation in verify_heapam() - Mailing list pgsql-hackers

From Himanshu Upadhyaya
Subject Re: HOT chain validation in verify_heapam()
Date
Msg-id CAPF61jC05EgtWmjA4F-dKsvU9+a0zv5SWumgBpkOWf+G_3FVPQ@mail.gmail.com
Whole thread Raw
In response to Re: HOT chain validation in verify_heapam()  (Andres Freund <andres@anarazel.de>)
Responses Re: HOT chain validation in verify_heapam()
List pgsql-hackers


On Thu, Nov 10, 2022 at 3:38 AM Andres Freund <andres@anarazel.de> wrote:

> +             }
> +
> +             /*
> +              * Loop over offset and populate predecessor array from all entries
> +              * that are present in successor array.
> +              */
> +             ctx.attnum = -1;
> +             for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
> +                      ctx.offnum = OffsetNumberNext(ctx.offnum))
> +             {
> +                     ItemId          curr_lp;
> +                     ItemId          next_lp;
> +                     HeapTupleHeader curr_htup;
> +                     HeapTupleHeader next_htup;
> +                     TransactionId curr_xmax;
> +                     TransactionId next_xmin;
> +
> +                     OffsetNumber nextoffnum = successor[ctx.offnum];
> +
> +                     curr_lp = PageGetItemId(ctx.page, ctx.offnum);

Why do we get the item when nextoffnum is 0?

Yes, right, I will move this call to PageGetItemId, just after the next "if" condition in the patch.

> +                     if (nextoffnum == 0 || !lp_valid[ctx.offnum] || !lp_valid[nextoffnum])
> +                     {
> +                             /*
> +                              * This is either the last updated tuple in the chain or a
> +                              * corruption raised for this tuple.
> +                              */

"or a corruption raised" isn't quite right grammatically.

will change to "This is either the last updated tuple in the chain or corruption has been raised for this tuple" 

> +                             continue;
> +                     }
> +                     if (ItemIdIsRedirected(curr_lp))
> +                     {
> +                             next_lp = PageGetItemId(ctx.page, nextoffnum);
> +                             if (ItemIdIsRedirected(next_lp))
> +                             {
> +                                     report_corruption(&ctx,
> +                                                                       psprintf("redirected line pointer pointing to another redirected line pointer at offset %u",
> +                                                                                        (unsigned) nextoffnum));
> +                                     continue;
> +                             }
> +                             next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp);
> +                             if (!HeapTupleHeaderIsHeapOnly(next_htup))
> +                             {
> +                                     report_corruption(&ctx,
> +                                                                       psprintf("redirected tuple at line pointer offset %u is not heap only tuple",
> +                                                                                        (unsigned) nextoffnum));
> +                             }
> +                             if ((next_htup->t_infomask & HEAP_UPDATED) == 0)
> +                             {
> +                                     report_corruption(&ctx,
> +                                                                       psprintf("redirected tuple at line pointer offset %u is not heap updated tuple",
> +                                                                                        (unsigned) nextoffnum));
> +                             }
> +                             continue;
> +                     }
> +
> +                     /*
> +                      * Add a line pointer offset to the predecessor array if xmax is
> +                      * matching with xmin of next tuple (reaching via its t_ctid).
> +                      * Prior to PostgreSQL 9.4, we actually changed the xmin to
> +                      * FrozenTransactionId

I'm doubtful it's a good idea to try to validate the 9.4 case. The likelihood
of getting that right seems low and I don't see us gaining much by even trying.


> so we must add offset to predecessor
> +                      * array(irrespective of xmax-xmin matching) if updated tuple xmin
> +                      * is frozen, so that we can later do validation related to frozen
> +                      * xmin. Raise corruption if we have two tuples having the same
> +                      * predecessor.
> +                      * We add the offset to the predecessor array irrespective of the
> +                      * transaction (t_xmin) status. We will do validation related to
> +                      * the transaction status (and also all other validations) when we
> +                      * loop over the predecessor array.
> +                      */
> +                     curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, curr_lp);
> +                     curr_xmax = HeapTupleHeaderGetUpdateXid(curr_htup);
> +                     next_lp = PageGetItemId(ctx.page, nextoffnum);
> +                     next_htup = (HeapTupleHeader) PageGetItem(ctx.page, next_lp);
> +                     next_xmin = HeapTupleHeaderGetXmin(next_htup);
> +                     if (TransactionIdIsValid(curr_xmax) &&
> +                             (TransactionIdEquals(curr_xmax, next_xmin) ||
> +                              next_xmin == FrozenTransactionId))
> +                     {
> +                             if (predecessor[nextoffnum] != 0)
> +                             {
> +                                     report_corruption(&ctx,
> +                                                                       psprintf("updated version at offset %u is also the updated version of tuple at offset %u",
> +                                                                                        (unsigned) nextoffnum, (unsigned) predecessor[nextoffnum]));
> +                                     continue;

I doubt it is correct to enter this path with next_xmin ==
FrozenTransactionId. This is following a ctid chain that we normally wouldn't
follow, because it doesn't satisfy the t_self->xmax == t_ctid->xmin condition.

I don't immediately see what prevents the frozen tuple being from an entirely
different HOT chain than the two tuples pointing to it.



Prior to 9.4 we can have xmin updated with FrozenTransactionId but with 9.4 (or later) we set XMIN_FROZEN bit in t_infomask. if updated tuple is via prior of 9.4 then "TransactionIdEquals(curr_xmax, next_xmin)" will be false for Frozen Tuple.
The Intention of adding  "next_xmin == FrozenTransactionId" to the path is because we wanted to do validation around Frozen Tuple when we loop over predecessor array.

 I need to look at the isolation test in details to understand how this can provide false alarm and but if there is a valid case then we can remove logic of raising corruption related with Frozen Tuple?


> +             }
> +
> +             /* Loop over offsets and validate the data in the predecessor array. */
> +             for (OffsetNumber currentoffnum = FirstOffsetNumber; currentoffnum <= maxoff;
> +                      currentoffnum = OffsetNumberNext(currentoffnum))
> +             {
> +                     HeapTupleHeader pred_htup;
> +                     HeapTupleHeader curr_htup;
> +                     TransactionId pred_xmin;
> +                     TransactionId curr_xmin;
> +                     ItemId          pred_lp;
> +                     ItemId          curr_lp;
> +
> +                     ctx.offnum = predecessor[currentoffnum];
> +                     ctx.attnum = -1;
> +
> +                     if (ctx.offnum == 0)
> +                     {
> +                             /*
> +                              * Either the root of the chain or an xmin-aborted tuple from
> +                              * an abandoned portion of the HOT chain.
> +                              */

Hm - couldn't we check that the tuple could conceivably be at the root of a
chain? I.e. isn't HEAP_HOT_UPDATED? Or alternatively has an aborted xmin?

 
 I don't see a way to check if tuple is at the root of HOT chain because predecessor array will always be having either xmin from non-abandoned transaction or it will be zero. We can't differentiate root or tuple inserted via abandoned transaction.


> +                             continue;
> +                     }
> +
> +                     curr_lp = PageGetItemId(ctx.page, currentoffnum);
> +                     curr_htup = (HeapTupleHeader) PageGetItem(ctx.page, curr_lp);
> +                     curr_xmin = HeapTupleHeaderGetXmin(curr_htup);
> +
> +                     ctx.itemid = pred_lp = PageGetItemId(ctx.page, ctx.offnum);
> +                     pred_htup = (HeapTupleHeader) PageGetItem(ctx.page, pred_lp);
> +                     pred_xmin = HeapTupleHeaderGetXmin(pred_htup);
> +
> +                     /*
> +                      * If the predecessor's xmin is aborted or in progress, the
> +                      * current tuples xmin should be aborted or in progress
> +                      * respectively. Also both xmin's must be equal.
> +                      */
> +                     if (!TransactionIdEquals(pred_xmin, curr_xmin) &&
> +                             !TransactionIdDidCommit(pred_xmin))
> +                     {
> +                             report_corruption(&ctx,
> +                                                               psprintf("tuple with uncommitted xmin %u was updated to produce a tuple at offset %u with differing xmin %u",
> +                                                                                (unsigned) pred_xmin, (unsigned) currentoffnum, (unsigned) curr_xmin));

Is this necessarily true? What about a tuple that was inserted in a
subtransaction and then updated in another subtransaction of the same toplevel
transaction?


not sure if I am getting? I have tried with below test and don't see any issue,

‘postgres[14723]=#’drop table test2;
DROP TABLE
‘postgres[14723]=#’create table test2 (a int, b int primary key);
CREATE TABLE
‘postgres[14723]=#’insert into test2 values (1,1);
INSERT 0 1
‘postgres[14723]=#’BEGIN;
BEGIN
‘postgres[14723]=#*’update test2 set a =2 where a =1;
UPDATE 1
‘postgres[14723]=#*’savepoint s1;
SAVEPOINT
‘postgres[14723]=#*’update test2 set a =6;
UPDATE 1
‘postgres[14723]=#*’rollback to savepoint s1;
ROLLBACK
‘postgres[14723]=#*’update test2 set a =6;
UPDATE 1
‘postgres[14723]=#*’savepoint s2;
SAVEPOINT
‘postgres[14723]=#*’update test2 set a =7;
UPDATE 1
‘postgres[14723]=#*’end;
COMMIT
‘postgres[14723]=#’SELECT lp as tuple, t_xmin, t_xmax, t_field3 as t_cid, t_ctid,tuple_data_split('test2'::regclass, t_data, t_infomask, t_infomask2, t_bits), heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM heap_page_items(get_raw_page('test2', 0));
 tuple | t_xmin | t_xmax | t_cid | t_ctid |       tuple_data_split        |                         heap_tuple_infomask_flags                        
-------+--------+--------+-------+--------+-------------------------------+---------------------------------------------------------------------------
     1 |   1254 |   1255 |     0 | (0,2)  | {"\\x01000000","\\x01000000"} | ("{HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED}",{})
     2 |   1255 |   1257 |     1 | (0,4)  | {"\\x02000000","\\x01000000"} | ("{HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})
     3 |   1256 |      0 |     1 | (0,3)  | {"\\x06000000","\\x01000000"} | ("{HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})
     4 |   1257 |   1258 |     2 | (0,5)  | {"\\x06000000","\\x01000000"} | ("{HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})
     5 |   1258 |      0 |     3 | (0,5)  | {"\\x07000000","\\x01000000"} | ("{HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})
(5 rows)
 

> +                     }
> +
> +                     /*
> +                      * If the predecessor's xmin is not frozen, then current tuple's
> +                      * shouldn't be either.
> +                      */
> +                     if (pred_xmin != FrozenTransactionId && curr_xmin == FrozenTransactionId)
> +                     {
> +                             report_corruption(&ctx,
> +                                                               psprintf("unfrozen tuple was updated to produce a tuple at offset %u which is frozen",
> +                                                                                (unsigned) currentoffnum));
> +                     }

Can't we have a an update chain that is e.g.
xmin 10, xmax 5 -> xmin 5, xmax invalid

and a vacuum cutoff of 7? That'd preent the first tuple from being removed,
but would allow 5 to be frozen.

I think there were recent patches proposing we don't freeze in that case, but
we'll having done that in the past....


Not very sure about this, was trying with such case but found hard to reproduce this.


--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PoC] Let libpq reject unexpected authentication requests
Next
From: Michael Paquier
Date:
Subject: Re: Avoid overhead open-close indexes (catalog updates)