Re: display offset along with block number in vacuum errors - Mailing list pgsql-hackers
From | Mahendra Singh Thalor |
---|---|
Subject | Re: display offset along with block number in vacuum errors |
Date | |
Msg-id | CAKYtNAruySg6ZTf6VudLjTx=WbAHMRgN=Nn8WNA4NLtbKUyfxg@mail.gmail.com Whole thread Raw |
In response to | Re: display offset along with block number in vacuum errors (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: display offset along with block number in vacuum errors
|
List | pgsql-hackers |
Thanks Sawada and Justin. On Sun, 2 Aug 2020 at 09:33, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > Thanks Justin. > > > > On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote: > > > > Bcc: > > > > Subject: Re: display offset along with block number in vacuum errors > > > > Reply-To: > > > > In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com> > > > > > > whoops > > > > > > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > > > > > > Here: > > > > > > > > > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, > > > > > > BlockNumber tblk; > > > > > > OffsetNumber toff; > > > > > > ItemId itemid; > > > > > > + LVSavedErrInfo loc_saved_err_info; > > > > > > > > > > > > tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); > > > > > > if (tblk != blkno) > > > > > > break; /* past end of tuples for this block */ > > > > > > toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); > > > > > > + > > > > > > + /* Update error traceback information */ > > > > > > + update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, > > > > > > + blkno, toff); > > > > > > itemid = PageGetItemId(page, toff); > > > > > > ItemIdSetUnused(itemid); > > > > > > unused[uncnt++] = toff; > > > > > > + > > > > > > + /* Revert to the previous phase information for error traceback */ > > > > > > + restore_vacuum_error_info(vacrelstats, &loc_saved_err_info); > > > > > > } > > > > > > > > > > > > I'm not sure why you use restore_vacuum_error_info() at all. It's already > > > > > > called at the end of lazy_vacuum_page() (and others) to allow functions to > > > > > > clean up after their own state changes, rather than requiring callers to do it. > > > > > > I don't think you should use it in a loop, nor introduce another > > > > > > LVSavedErrInfo. > > > > > > > > > > > > Since phase and blkno are already set, I think you should just set > > > > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info(). > > > > > > Similar to whats done in lazy_vacuum_heap(): > > > > > > > > > > > > tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]); > > > > > > vacrelstats->blkno = tblk; > > > > > > > > > > Fixed. > > > > > > > > I rearead this thread and I think the earlier suggestion from Masahiko was > > > > right. The loop around dead_tuples only does ItemIdSetUnused() which updates > > > > the page, which has already been read from disk. On my suggestion, your v2 > > > > patch sets offnum directly, but now I think it's not useful to set at all, > > > > since the whole page is manipulated by PageRepairFragmentation() and > > > > log_heap_clean(). An error there would misleadingly say "..at offset number > > > > MM", but would always show the page's last offset, and not the offset where an > > > > error occured. > > > > > > This makes me question whether offset numbers are ever useful during > > > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by > > > internal functions that don't update vacrelstats->offno. Note that my initial > > > problem report that led to the errcontext implementation was an ERROR in heap > > > *scan* (not vacuum). So an offset number at that point would've been > > > sufficient. > > > https://www.postgresql.org/message-id/20190808012436.GG11185@telsasoft.com > > > > > > I mentioned that lazy_check_needs_freeze() should save and restore the errinfo, > > > so an error in heap_page_prune() (for example) doesn't get the wrong offset > > > associated with it. > > > > > > So see the attached changes on top of your v2 patch. > > > > Actually I was waiting for review comments from committer and other > > people also and was planning to send a patch after that. I already > > fixed your comments in my offline patch and was waiting for more > > comments. Anyway, thanks for delta patch. > > > > Here, attaching v3 patch for review. > > Thank you for updating the patch! > > Here are my comments on v3 patch: > > @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) > if (PageIsNew(page) || PageIsEmpty(page)) > return false; > > + /* Update error traceback information */ > + update_vacuum_error_info(vacrelstats, &saved_err_info, > + VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno, > + InvalidOffsetNumber); > + > maxoff = PageGetMaxOffsetNumber(page); > for (offnum = FirstOffsetNumber; > offnum <= maxoff; > > You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP > but I think we're already in that phase. I'm okay with explicitly > setting it but on the other hand, we don't set the phase in > heap_page_is_all_visible(). Is there any reason for that? > > Also, since we don't reset vacrelstats->offnum at the end of > heap_page_is_all_visible(), if an error occurs by the end of > lazy_vacuum_page(), the caller of heap_page_is_all_visible(), we > report the error context with the last offset number in the page, > making the users confused. Your point is valid. Added update and restore functions in heap_page_is_all_visible in the latest patch. > > --- > @@ -2045,10 +2060,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) > > if (heap_tuple_needs_freeze(tupleheader, FreezeLimit, > MultiXactCutoff, buf)) > - return true; > + break; > } /* scan along page */ > > - return false; > + /* Revert to the previous phase information for error traceback */ > + restore_vacuum_error_info(vacrelstats, &saved_err_info); > + > + return offnum <= maxoff ? true : false; > } > > I think we can write just "return (offnum <= maxoff)". Fixed this. > > --- > - /* Revert back to the old phase information for error traceback */ > + /* Revert to the old phase information for error traceback */ > > If we want to modify this comment how about the following phrase for > consistency with other places? Fixed this. > > /* Revert to the previous phase information for error traceback */ > > Regards, > > -- > Masahiko Sawada http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services Apart from these, I fixed Justin's comment of extra brackets(That was due to "patch -p 1 < file", as 002_fix was not applying directly). I haven't updated the document for this(Sawada's comment). I will try in the next patch. Attaching v4 patch for review. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: