Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL
Date
Msg-id 20140215175014.GB3651@momjian.us
Whole thread Raw
In response to Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Sat, Feb 15, 2014 at 04:16:40PM +0100, Andres Freund wrote:
> Hi,
>
> > *************** end_heap_rewrite(RewriteState state)
> > *** 281,286 ****
> > --- 284,290 ----
> >                           true);
> >           RelationOpenSmgr(state->rs_new_rel);
> >
> > +         update_page_vm(state->rs_new_rel, state->rs_buffer, state->rs_blockno);
> >           PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
> >
> >           smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno,
> > *************** raw_heap_insert(RewriteState state, Heap
> > *** 633,638 ****
> > --- 637,643 ----
> >                */
> >               RelationOpenSmgr(state->rs_new_rel);
> >
> > +             update_page_vm(state->rs_new_rel, page, state->rs_blockno);
> >               PageSetChecksumInplace(page, state->rs_blockno);
> >
> >               smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM,
>
> I think those two cases should be combined.

Uh, what I did was to pair the new update_page_vm with the existing
PageSetChecksumInplace calls, figuring if we needed a checksum before we
wrote the page we would need a update_page_vm too.  Is that incorrect?
It is that _last_ page write in the second instance.

> > + static void
> > + update_page_vm(Relation relation, Page page, BlockNumber blkno)
> > + {
> > +     Buffer        vmbuffer = InvalidBuffer;
> > +     TransactionId visibility_cutoff_xid;
> > +
> > +     visibilitymap_pin(relation, blkno, &vmbuffer);
> > +     Assert(BufferIsValid(vmbuffer));
> > +
> > +     if (!visibilitymap_test(relation, blkno, &vmbuffer) &&
> > +         heap_page_is_all_visible(relation, InvalidBuffer, page,
> > +                                  &visibility_cutoff_xid))
> > +     {
> > +         PageSetAllVisible(page);
> > +         visibilitymap_set(relation, blkno, InvalidBuffer,
> > +                           InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid);
> > +     }
> > +     ReleaseBuffer(vmbuffer);
> > + }
>
> How could visibilitymap_test() be true here?

Oh, you are right that I can only hit that once per page;  test removed.

> > diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
> > new file mode 100644
> > index 899ffac..3e7546b
> > *** a/src/backend/access/heap/visibilitymap.c
> > --- b/src/backend/access/heap/visibilitymap.c
> > *************** visibilitymap_set(Relation rel, BlockNum
> > *** 257,263 ****
> >   #endif
> >
> >       Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
> > -     Assert(InRecovery || BufferIsValid(heapBuf));
> >
> >       /* Check that we have the right heap page pinned, if present */
> >       if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
> > --- 257,262 ----
> > *************** visibilitymap_set(Relation rel, BlockNum
> > *** 278,284 ****
> >           map[mapByte] |= (1 << mapBit);
> >           MarkBufferDirty(vmBuf);
> >
> > !         if (RelationNeedsWAL(rel))
> >           {
> >               if (XLogRecPtrIsInvalid(recptr))
> >               {
> > --- 277,283 ----
> >           map[mapByte] |= (1 << mapBit);
> >           MarkBufferDirty(vmBuf);
> >
> > !         if (RelationNeedsWAL(rel) && BufferIsValid(heapBuf))
> >           {
> >               if (XLogRecPtrIsInvalid(recptr))
> >               {
>
> At the very least this needs to revise visibilitymap_set's comment about
> the requirement of passing a buffer unless !InRecovery.

Oh, good point, comment block updated.

> Also, how is this going to work with replication if you're explicitly
> not WAL logging this?

Uh, I assumed that using the existing functions would handle this.  If
not, I don't know the answer.

> > *************** vac_cmp_itemptr(const void *left, const
> > *** 1730,1743 ****
> >    * transactions. Also return the visibility_cutoff_xid which is the highest
> >    * xmin amongst the visible tuples.
> >    */
> > ! static bool
> > ! heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId *visibility_cutoff_xid)
> >   {
> > -     Page        page = BufferGetPage(buf);
> >       OffsetNumber offnum,
> >                   maxoff;
> >       bool        all_visible = true;
> >
> >       *visibility_cutoff_xid = InvalidTransactionId;
> >
> >       /*
> > --- 1728,1744 ----
> >    * transactions. Also return the visibility_cutoff_xid which is the highest
> >    * xmin amongst the visible tuples.
> >    */
> > ! bool
> > ! heap_page_is_all_visible(Relation rel, Buffer buf, Page page,
> > !                          TransactionId *visibility_cutoff_xid)
> >   {
> >       OffsetNumber offnum,
> >                   maxoff;
> >       bool        all_visible = true;
> >
> > +     if (BufferIsValid(buf))
> > +         page = BufferGetPage(buf);
> > +
> >       *visibility_cutoff_xid = InvalidTransactionId;
> >
> >       /*
> > *************** heap_page_is_all_visible(Relation rel, B
> > *** 1758,1764 ****
> >           if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
> >               continue;
> >
> > !         ItemPointerSet(&(tuple.t_self), BufferGetBlockNumber(buf), offnum);
> >
> >           /*
> >            * Dead line pointers can have index pointers pointing to them. So
> > --- 1759,1767 ----
> >           if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
> >               continue;
> >
> > !         /* XXX use 0 or real offset? */
> > !         ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ?
> > !                        BufferGetBlockNumber(buf) : 0, offnum);
>
> How about passing in the page and block number from the outside and not
> dealing with a buffer in here at all?

I would love to do that but HeapTupleSatisfiesVacuum() requires a
buffer, though you can (with my patch) optionally not supply one,
meaning if I passed in just the block number I would still need to
generate a buffer pointer.

> >           /*
> >            * Dead line pointers can have index pointers pointing to them. So
> > diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
> > new file mode 100644
> > index f626755..b37ecc4
> > *** a/src/backend/utils/time/tqual.c
> > --- b/src/backend/utils/time/tqual.c
> > *************** static inline void
> > *** 108,113 ****
> > --- 108,117 ----
> >   SetHintBits(HeapTupleHeader tuple, Buffer buffer,
> >               uint16 infomask, TransactionId xid)
> >   {
> > +     /* we might not have a buffer if we are doing raw_heap_insert() */
> > +     if (!BufferIsValid(buffer))
> > +         return;
> > +
> >       if (TransactionIdIsValid(xid))
> >       {
> >           /* NB: xid must be known committed here! */
>
> This looks seriously wrong to me.
>
> I think the whole idea of doing this in private memory is bad. The
> visibility routines aren't written for that, and the above is just one
> instance of that, and I am far from convinced it's the only one. So you
> either need to work out how to rely on the visibility checking done in
> cluster.c, or you need to modify rewriteheap.c to write via
> shared_buffers.

I don't think we can change rewriteheap.c to use shared buffers --- that
code was written to do things in private memory for performance reasons
and I don't think setting hit bits justifies changing that.

Can you be more specific about the cluster.c idea?  I see
copy_heap_data() in cluster.c calling HeapTupleSatisfiesVacuum() with a
'buf' just like the code I am working in.

Based on Robert's feedback a few months ago I suspected I would need
help completing this patch --- now I am sure.

Updated patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: narwhal and PGDLLIMPORT
Next
From: Andres Freund
Date:
Subject: Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL