cannot freeze committed xmax - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject cannot freeze committed xmax
Date
Msg-id f4aa20ba-7793-31b9-28ac-e34205362535@postgrespro.ru
Whole thread Raw
Responses Re: cannot freeze committed xmax
List pgsql-hackers
Hi hackers,

The following error was encountered by our customers:
Them have  very huge catalog (size of pg_class relation is more than 
30Gb)  blowned by temporary relations.
When them try to vacuum it, the following error is reported:

vacuum full analyze pg_catalog.pg_class;
ERROR:  cannot freeze committed xmax 596099954

The following records are present in pg_class:

(standard input)-10436009-<Data> ------
(standard input)-10436010- Item   1 -- Length:  229  Offset: 7936 
(0x1f00)  Flags: NORMAL
(standard input):10436011:  XMIN: 596098791  XMAX: 596099954 CID|XVAC: 
1  OID: 930322390
(standard input)-10436012-  Block Id: 108700  linp Index: 17 Attributes: 
33   Size: 32
(standard input)-10436013-  infomask: 0x290b 
(HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
(standard input)-10436014-  t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x7f
(standard input)-10436015-          [4]: 0x00
(standard input)-10436016-
(standard input)-10436017- Item   2 -- Length:  184  Offset: 7752 
(0x1e48)  Flags: NORMAL
(standard input):10436018:  XMIN: 596098791  XMAX: 596099954 CID|XVAC: 
2  OID: 930322393
(standard input)-10436019-  Block Id: 108700  linp Index: 18 Attributes: 
33   Size: 32
(standard input)-10436020-  infomask: 0x2909 
(HASNULL|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
(standard input)-10436021-  t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x3f
(standard input)-10436022-          [4]: 0x00
(standard input)-10436023-
(standard input)-10436024- Item   3 -- Length:  184  Offset: 7568 
(0x1d90)  Flags: NORMAL
(standard input):10436025:  XMIN: 596098791  XMAX: 596099954 CID|XVAC: 
3  OID: 930322395
(standard input)-10436026-  Block Id: 108700  linp Index: 19 Attributes: 
33   Size: 32
(standard input)-10436027-  infomask: 0x2909 
(HASNULL|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
(standard input)-10436028-  t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x3f
(standard input)-10436029-          [4]: 0x00

This error is reported in heap_prepare_freeze_tuple:

     /*
      * Process xmax.  To thoroughly examine the current Xmax value we 
need to
      * resolve a MultiXactId to its member Xids, in case some of them are
      * below the given cutoff for Xids.  In that case, those values 
might need
      * freezing, too.  Also, if a multi needs freezing, we cannot 
simply take
      * it out --- if there's a live updater Xid, it needs to be kept.
      *
      * Make sure to keep heap_tuple_needs_freeze in sync with this.
      */
     xid = HeapTupleGetRawXmax(htup);

     if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
     {
           ...

     }
     else if (TransactionIdIsNormal(xid))
     {

         ...

         if (TransactionIdPrecedes(xid, cutoff_xid))
         {
             /*
              * If we freeze xmax, make absolutely sure that it's not an XID
              * that is important.  (Note, a lock-only xmax can be removed
              * independent of committedness, since a committed lock 
holder has
              * released the lock).
              */
             if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
                 TransactionIdDidCommit(xid))
                 ereport(ERROR,
                         (errcode(ERRCODE_DATA_CORRUPTED),
                          errmsg_internal("cannot freeze committed xmax 
" XID_FMT,
                                          xid)));
             freeze_xmax = true;
         }
         else
             freeze_xmax = false;
        ...
     }
     else if ((tuple->t_infomask & HEAP_XMAX_INVALID) ||
              !TransactionIdIsValid(HeapTupleGetRawXmax(htup)))
     {
         freeze_xmax = false;
         xmax_already_frozen = true;
     }


So, as you can see, in all this records HEAP_XMAX_INVALID is set, but 
xmax is normal transaction id.
This is why we produce error before check for HEAP_XMAX_INVALID in the 
subsequent if.
I do not know value of cutoff_xid, because do not have access to the 
debugger at customer site.

I will be please or any help how to localize the source of the problem.

Looks like there is no assumption that xmax should be set to 
InvalidTransactionId when HEAP_XMAX_INVALID bit is set.
And I didn't find any check  preventing cutoff_xid to be greater than 
XID of some transaction which was aborted long time ago.

So is there some logical error that xmax is compared with cutoff_xid 
before HEAP_XMAX_INVALID bit is checked?
Otherwise, where this constraint most likely be violated?

It is PG 11.7 version of Postgres.
Thanks is advance,

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: cutting down the TODO list thread
Next
From: Bharath Rupireddy
Date:
Subject: Re: A new function to wait for the backend exit after termination