Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?
Date
Msg-id CAA4eK1+097Bu6jHrUrebmdcokeO_z36-wP34-w8fpER1j_gbhQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Challenges preventing us moving to 64 bit transactionid (XID)?  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
List pgsql-hackers
On Fri, Nov 24, 2017 at 4:03 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
>> > 0002-heap-page-special-1.patch
>> > Putting xid and multixact bases into PageHeaderData would take extra 16
>> > bytes on index pages too.  That would be waste of space for indexes.
>> > This
>> > is why I decided to put bases into special area of heap pages.
>> > This patch adds special area for heap pages contaning prune xid and
>> > magic
>> > number.  Magic number is different for regular heap page and sequence
>> > page.
>> >
>>
>> uint16 pd_pagesize_version;
>> - TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
>>   ItemIdData pd_linp[FLEXIBLE_ARRAY_MEMBER]; /* line pointer array */
>>   } PageHeaderData;
>>
>> Why have you moved pd_prune_xid from page header?
>
>
> pg_prune_xid makes sense only for heap pages.  Once we introduce special
> area for heap pages, we can move pg_prune_xid there and save some bytes in
> index pages.  However, this is an optimization not directly related to
> 64-bit xids.  Idea is that if we anyway change page format, why don't apply
> this optimization as well?
>

Sure, but I think this patch could have been proposed on top of your
main patch not other way.  Another similar thing I have noticed is
below:

*************** typedef struct CheckPoint
*** 39,45 **** TimeLineID PrevTimeLineID; /* previous TLI, if this record begins a new  * timeline (equals
ThisTimeLineIDotherwise) */ bool fullPageWrites; /* current full_page_writes */
 
- uint32 nextXidEpoch; /* higher-order bits of nextXid */ TransactionId nextXid; /* next free XID */ Oid nextOid; /*
nextfree OID */ MultiXactId nextMulti; /* next free MultiXactId */
 


I think if we have 64-bit Transaction Ids, then we might not need
epoch at all, but I think this can also be a separate change from the
main patch.  The point is that already main patch is big enough that
we should not try to squeeze other related changes in it.

>  But if we have any doubts in this, it can be
> removed with no problem.
>
>>
>> > 0003-64bit-xid-1.patch
..
>> >
>>
>> It seems there is no README or some detailed explanation of how all
>> this works like how the value of pd_xid_base is maintained.  I don't
>> think there are enough comments in the patch to explain the things.  I
>> think it will be easier to understand and review the patch if you
>> provide some more details either in email or in the patch.
>
>
> OK.  I'm going to write README and include it into the patch.
>

Thanks.  Few assorted comments on the main patch:

1.
+ /*
+  * Ensure that given xid fits base of given page.
+  */
+ bool
+ heap_page_prepare_for_xid(Relation relation, Buffer buffer,
+   TransactionId xid, bool multi)

I couldn't find any use of the return value of this function,
basically, if the specified xid can update the patch, then it returns
false otherwise it updates the tuples on a page and base xid on a page
such that new xid can update the tuple.  So either, you should make
this function return void or split it into two parts such that first
function check if the new xid can update the tuple, if so proceed with
updating the tuple, otherwise, update the base xid in the page and
xmin/xmax in tuples.

2.
heap_page_prepare_for_xid()
{
..
+ /* Find minimum and maximum xids in the page */
+ found = heap_page_xid_min_max(page, multi, &min, &max);
+
+ /* No items on the page? */
+ if (!found)
+ {
+ int64 delta;
+
+ if (!multi)
+ delta = (xid - FirstNormalTransactionId) - pageSpecial->pd_xid_base;
+ else
+ delta = (xid - FirstNormalTransactionId) - pageSpecial->pd_multi_base;
+
+ heap_page_shift_base(RelationNeedsWAL(relation) ? buffer : InvalidBuffer,
+ page, multi, delta);
+ MarkBufferDirty(buffer);
+ return false;
+ }
..
}

When there are no items on the page what is need to try to traverse
all items again (via heap_page_shift_base), you can ideally shift the
base directly as (xid - FirstNormalTransactionId) in this case.

3.
heap_page_prepare_for_xid()
{
..
+ /* Can we just shift base on the page */
+ if (xid < base + FirstNormalTransactionId)
+ {
+ int64 freeDelta = MaxShortTransactionId - max,
+ requiredDelta = (base + FirstNormalTransactionId) - xid;
+
+ if (requiredDelta <= freeDelta)
+ {
+ heap_page_shift_base(RelationNeedsWAL(relation) ? buffer : InvalidBuffer,
+ page, multi, - (freeDelta + requiredDelta) / 2);
+ MarkBufferDirty(buffer);
+ return true;
+ }
+ }
+ else
+ {
+ int64 freeDelta = min - FirstNormalTransactionId,
+ requiredDelta = xid - (base + MaxShortTransactionId);
+
+ if (requiredDelta <= freeDelta)
+ {
+ heap_page_shift_base(RelationNeedsWAL(relation) ? buffer : InvalidBuffer,
+ page, multi, (freeDelta + requiredDelta) / 2);
+ MarkBufferDirty(buffer);
+ return true;
+ }
+ }
..
}

I think the above code doesn't follow guidelines for writing a WAL.
Currently, (a) it modifies page (b) write WAL (c) mark buffer dirty.
You need to perform step (c) before step (b).  Also, all these steps
need to be performed in the critical section.

Also, some comments explaining the computation of delta in above
context can make things easier to understand.

4. I think rewrite_page_prepare_for_xid() has enough common
functionality with heap_page_prepare_for_xid that you can extract
common parts into a separate function.

5.
+ /* FIXME */
+ tuple->t_data->t_choice.t_heap.t_xmin =
NormalTransactionIdToShort(HeapPageGetSpecial(pageHeader)->pd_xid_base,
xid);
+

There are a lot of FIXME's like above which doesn't even indicate what
exactly you want to fix.  I think it is okay to have FIXME's which you
want to fix later in a patch of this size, but at the very least they
should be documented, otherwise, I am afraid that you yourself might
forget what to fix.

6.
+ /*
+  * Shift xid base in the page.  WAL-logged if buffer is specified.
+  */
+ static void
+ heap_page_shift_base(Buffer buffer, Page page, bool multi, int64 delta)
+ {
+ HeapPageSpecial pageSpecial = HeapPageGetSpecial(page);
+ OffsetNumber offnum,
+ maxoff;
+
+ /* Iterate over page items */
+ maxoff = PageGetMaxOffsetNumber(page);
+ for (offnum = FirstOffsetNumber;
+ offnum <= maxoff;
+ offnum = OffsetNumberNext(offnum))
+ {
+ ItemId itemid;
+ HeapTupleHeader htup;
+
+ itemid = PageGetItemId(page, offnum);
+
+ if (!ItemIdIsNormal(itemid))
+ continue;
+
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+ /* Apply xid shift to heap tuple */
+ if (!multi)
+ {
+ if (!HeapTupleHeaderXminFrozen(htup) &&
+ TransactionIdIsNormal(htup->t_choice.t_heap.t_xmin))
+ {
+ Assert(htup->t_choice.t_heap.t_xmin - delta >= FirstNormalTransactionId);
+ Assert(htup->t_choice.t_heap.t_xmin - delta <= MaxShortTransactionId);
+ htup->t_choice.t_heap.t_xmin -= delta;
+ }

How is it ensured that the xids (xmin/xmax on tuple) you are modifying
are not in progress or there is some theory that it is okay to modify
in-progress xids?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: explain analyze output with parallel workers - question aboutmeaning of information for explain.depesz.com
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: default range partition and constraint exclusion