Re: Reviewing freeze map code - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Reviewing freeze map code
Date
Msg-id CAA4eK1LbEeyJkvZR7aad5vTpSpKNCG+hcckFUV1a6-_Y0w755A@mail.gmail.com
Whole thread Raw
In response to Re: Reviewing freeze map code  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Reviewing freeze map code
List pgsql-hackers
On Thu, Jul 7, 2016 at 12:34 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Than you for reviewing!
>
> On Thu, Jul 7, 2016 at 7:06 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-07-05 23:37:59 +0900, Masahiko Sawada wrote:
>>> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
>>> index 57da57a..fd66527 100644
>>> --- a/src/backend/access/heap/heapam.c
>>> +++ b/src/backend/access/heap/heapam.c
>>> @@ -3923,6 +3923,17 @@ l2:
>>>
>>>       if (need_toast || newtupsize > pagefree)
>>>       {
>>> +             /*
>>> +              * To prevent data corruption due to updating old tuple by
>>> +              * other backends after released buffer
>>
>> That's not really the reason, is it? The prime problem is crash safety /
>> replication. The row-lock we're faking (by setting xmax to our xid),
>> prevents concurrent updates until this backend died.
>
> Fixed.
>
>>>                 , we need to emit that
>>> +              * xmax of old tuple is set and clear visibility map bits if
>>> +              * needed before releasing buffer. We can reuse xl_heap_lock
>>> +              * for this purpose. It should be fine even if we crash midway
>>> +              * from this section and the actual updating one later, since
>>> +              * the xmax will appear to come from an aborted xid.
>>> +              */
>>> +             START_CRIT_SECTION();
>>> +
>>>               /* Clear obsolete visibility flags ... */
>>>               oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
>>>               oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
>>> @@ -3936,6 +3947,46 @@ l2:
>>>               /* temporarily make it look not-updated */
>>>               oldtup.t_data->t_ctid = oldtup.t_self;
>>>               already_marked = true;
>>> +
>>> +             /* Clear PD_ALL_VISIBLE flags */
>>> +             if (PageIsAllVisible(BufferGetPage(buffer)))
>>> +             {
>>> +                     all_visible_cleared = true;
>>> +                     PageClearAllVisible(BufferGetPage(buffer));
>>> +                     visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
>>> +                                                             vmbuffer);
>>> +             }
>>> +
>>> +             MarkBufferDirty(buffer);
>>> +
>>> +             if (RelationNeedsWAL(relation))
>>> +             {
>>> +                     xl_heap_lock xlrec;
>>> +                     XLogRecPtr recptr;
>>> +
>>> +                     /*
>>> +                      * For logical decoding we need combocids to properly decode the
>>> +                      * catalog.
>>> +                      */
>>> +                     if (RelationIsAccessibleInLogicalDecoding(relation))
>>> +                             log_heap_new_cid(relation, &oldtup);
>>
>> Hm, I don't see that being necessary here. Row locks aren't logically
>> decoded, so there's no need to emit this here.
>
> Fixed.
>
>>
>>> +     /* Clear PD_ALL_VISIBLE flags */
>>> +     if (PageIsAllVisible(page))
>>> +     {
>>> +             Buffer  vmbuffer = InvalidBuffer;
>>> +             BlockNumber     block = BufferGetBlockNumber(*buffer);
>>> +
>>> +             all_visible_cleared = true;
>>> +             PageClearAllVisible(page);
>>> +             visibilitymap_pin(relation, block, &vmbuffer);
>>> +             visibilitymap_clear(relation, block, vmbuffer);
>>> +     }
>>> +
>>
>> That clears all-visible unnecessarily, we only need to clear all-frozen.
>>
>
> Fixed.
>
>>
>>> @@ -8694,6 +8761,23 @@ heap_xlog_lock(XLogReaderState *record)
>>>               }
>>>               HeapTupleHeaderSetXmax(htup, xlrec->locking_xid);
>>>               HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
>>> +
>>> +             /* The visibility map need to be cleared */
>>> +             if ((xlrec->infobits_set & XLHL_ALL_VISIBLE_CLEARED) != 0)
>>> +             {
>>> +                     RelFileNode     rnode;
>>> +                     Buffer          vmbuffer = InvalidBuffer;
>>> +                     BlockNumber     blkno;
>>> +                     Relation        reln;
>>> +
>>> +                     XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
>>> +                     reln = CreateFakeRelcacheEntry(rnode);
>>> +
>>> +                     visibilitymap_pin(reln, blkno, &vmbuffer);
>>> +                     visibilitymap_clear(reln, blkno, vmbuffer);
>>> +                     PageClearAllVisible(page);
>>> +             }
>>> +
>>
>>
>>>               PageSetLSN(page, lsn);
>>>               MarkBufferDirty(buffer);
>>>       }
>>> diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
>>> index a822d0b..41b3c54 100644
>>> --- a/src/include/access/heapam_xlog.h
>>> +++ b/src/include/access/heapam_xlog.h
>>> @@ -242,6 +242,7 @@ typedef struct xl_heap_cleanup_info
>>>  #define XLHL_XMAX_EXCL_LOCK          0x04
>>>  #define XLHL_XMAX_KEYSHR_LOCK        0x08
>>>  #define XLHL_KEYS_UPDATED            0x10
>>> +#define XLHL_ALL_VISIBLE_CLEARED 0x20
>>
>> Hm. We can't easily do that in the back-patched version; because a
>> standby won't know to check for the flag . That's kinda ok, since we
>> don't yet need to clear all-visible yet at that point of
>> heap_update. But that better means we don't do so on the master either.
>>
>
> Attached latest version patch.

+ /* Clear only the all-frozen bit on visibility map if needed */

+ if (PageIsAllVisible(BufferGetPage(buffer)) &&

+ VM_ALL_FROZEN(relation, block, &vmbuffer))
+ {
+ visibilitymap_clear_extended(relation, block, vmbuffer,
+ VISIBILITYMAP_ALL_FROZEN);
+ }
+

+ if (RelationNeedsWAL(relation))
+ {
..

+ XLogBeginInsert();
+ XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+
+ xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self);
+ xlrec.locking_xid = xmax_old_tuple;
+ xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask,
+   oldtup.t_data->t_infomask2);
+ XLogRegisterData((char *) &xlrec, SizeOfHeapLock);
+ recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK);
..

One thing that looks awkward in this code is that it doesn't record
whether the frozen bit is actually cleared during the actual operation
and then during replay, it always clear the frozen bit irrespective of
whether it has been cleared by the actual operation or not.

+ /* Clear only the all-frozen bit on visibility map if needed */
+ if (PageIsAllVisible(page) &&
+ VM_ALL_FROZEN(relation, BufferGetBlockNumber(*buffer), &vmbuffer))
+ {
+ BlockNumber block = BufferGetBlockNumber(*buffer);
+
+ visibilitymap_pin(relation, block, &vmbuffer);

I think it is not right to call visibilitymap_pin after holding a
buffer lock (visibilitymap_pin can perform I/O).  Refer heap_update
for how to pin the visibility map.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Pete Stevenson
Date:
Subject: Re: MVCC overheads
Next
From: Greg Stark
Date:
Subject: Re: can we optimize STACK_DEPTH_SLOP