Re: display offset along with block number in vacuum errors - Mailing list pgsql-hackers

From Mahendra Singh Thalor
Subject Re: display offset along with block number in vacuum errors
Date
Msg-id CAKYtNAruySg6ZTf6VudLjTx=WbAHMRgN=Nn8WNA4NLtbKUyfxg@mail.gmail.com
Whole thread Raw
In response to Re: display offset along with block number in vacuum errors  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: display offset along with block number in vacuum errors
List pgsql-hackers
Thanks Sawada and Justin.

On Sun, 2 Aug 2020 at 09:33, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > Thanks Justin.
> >
> > On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> > > > Bcc:
> > > > Subject: Re: display offset along with block number in vacuum errors
> > > > Reply-To:
> > > > In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>
> > >
> > > whoops
> > >
> > > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > > > > Here:
> > > > > >
> > > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > > > > >                 BlockNumber tblk;
> > > > > >                 OffsetNumber toff;
> > > > > >                 ItemId          itemid;
> > > > > > +               LVSavedErrInfo loc_saved_err_info;
> > > > > >
> > > > > >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > > > > >                 if (tblk != blkno)
> > > > > >                         break;                          /* past end of tuples for this block */
> > > > > >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > > > > +
> > > > > > +               /* Update error traceback information */
> > > > > > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > > > > +                                                                blkno, toff);
> > > > > >                 itemid = PageGetItemId(page, toff);
> > > > > >                 ItemIdSetUnused(itemid);
> > > > > >                 unused[uncnt++] = toff;
> > > > > > +
> > > > > > +               /* Revert to the previous phase information for error traceback */
> > > > > > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> > > > > >         }
> > > > > >
> > > > > > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > > > > > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > > > > > clean up after their own state changes, rather than requiring callers to do it.
> > > > > > I don't think you should use it in a loop, nor introduce another
> > > > > > LVSavedErrInfo.
> > > > > >
> > > > > > Since phase and blkno are already set, I think you should just set
> > > > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > > > > > Similar to whats done in lazy_vacuum_heap():
> > > > > >
> > > > > >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > > > > >                 vacrelstats->blkno = tblk;
> > > > >
> > > > > Fixed.
> > > >
> > > > I rearead this thread and I think the earlier suggestion from Masahiko was
> > > > right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
> > > > the page, which has already been read from disk.  On my suggestion, your v2
> > > > patch sets offnum directly, but now I think it's not useful to set at all,
> > > > since the whole page is manipulated by PageRepairFragmentation() and
> > > > log_heap_clean().  An error there would misleadingly say "..at offset number
> > > > MM", but would always show the page's last offset, and not the offset where an
> > > > error occured.
> > >
> > > This makes me question whether offset numbers are ever useful during
> > > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
> > > internal functions that don't update vacrelstats->offno.  Note that my initial
> > > problem report that led to the errcontext implementation was an ERROR in heap
> > > *scan* (not vacuum).  So an offset number at that point would've been
> > > sufficient.
> > > https://www.postgresql.org/message-id/20190808012436.GG11185@telsasoft.com
> > >
> > > I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
> > > so an error in heap_page_prune() (for example) doesn't get the wrong offset
> > > associated with it.
> > >
> > > So see the attached changes on top of your v2 patch.
> >
> > Actually I was waiting for review comments from committer and other
> > people also and was planning to send a patch after that. I already
> > fixed your comments in my offline patch and was waiting for more
> > comments. Anyway, thanks for delta patch.
> >
> > Here, attaching v3 patch for review.
>
> Thank you for updating the patch!
>
> Here are my comments on v3 patch:
>
> @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
>     if (PageIsNew(page) || PageIsEmpty(page))
>         return false;
>
> +   /* Update error traceback information */
> +   update_vacuum_error_info(vacrelstats, &saved_err_info,
> +           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
> +           InvalidOffsetNumber);
> +
>     maxoff = PageGetMaxOffsetNumber(page);
>     for (offnum = FirstOffsetNumber;
>          offnum <= maxoff;
>
> You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
> but I think we're already in that phase. I'm okay with explicitly
> setting it but on the other hand, we don't set the phase in
> heap_page_is_all_visible(). Is there any reason for that?
>
> Also, since we don't reset vacrelstats->offnum at the end of
> heap_page_is_all_visible(), if an error occurs by the end of
> lazy_vacuum_page(), the caller of  heap_page_is_all_visible(), we
> report the error context with the last offset number in the page,
> making the users confused.

Your point is valid. Added update and restore functions in
heap_page_is_all_visible in the latest patch.

>
> ---
> @@ -2045,10 +2060,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
>
>         if (heap_tuple_needs_freeze(tupleheader, FreezeLimit,
>                                     MultiXactCutoff, buf))
> -           return true;
> +           break;
>     }                           /* scan along page */
>
> -   return false;
> +   /* Revert to the previous phase information for error traceback */
> +   restore_vacuum_error_info(vacrelstats, &saved_err_info);
> +
> +   return offnum <= maxoff ? true : false;
>  }
>
> I think we can write just "return (offnum <= maxoff)".

Fixed this.

>
> ---
> -   /* Revert back to the old phase information for error traceback */
> +   /* Revert to the old phase information for error traceback */
>
> If we want to modify this comment how about the following phrase for
> consistency with other places?

Fixed this.

>
> /* Revert to the previous phase information for error traceback */
>
> Regards,
>
> --
> Masahiko Sawada            http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Apart from these, I fixed Justin's comment of extra brackets(That was
due to "patch -p 1 < file", as 002_fix was not applying directly). I
haven't updated the document for this(Sawada's comment). I will try in
the next patch.
Attaching v4 patch for review.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Ashwin Agrawal
Date:
Subject: For standby pg_ctl doesn't wait for PM_STATUS_READY in presence of promote_trigger_file
Next
From: Robert Haas
Date:
Subject: Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)