Thread: display offset along with block number in vacuum errors

display offset along with block number in vacuum errors

From
Mahendra Singh Thalor
Date:
Hi hackers,
We discussed in another email thread[1], that it will be helpful if we can display offset along with block number in vacuum error. Here, proposing a patch to add offset along with block number in vacuum errors.

In commit b61d161(Introduce vacuum errcontext to display additional information), we added vacuum errcontext to display additional information(block number) so that in case of vacuum error, we can identify which block we are getting error.  Addition to block number, if we can display offset, then it will be more helpful for users. So to display offset, here proposing two different methods(Thanks Robert for suggesting these 2 methods):

Method 1: We can report the TID as well as the block number in  errcontext.
- errcontext("while scanning block %u of relation \"%s.%s\"",
-   errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+ errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
+   errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);

Above fix requires more calls to update_vacuum_error_info(). Attaching v01_0001 patch for this method.

Method 2: We can improve the error messages by passing the relevant TID to heap_prepare_freeze_tuple and having it report the TID as part of the error message or in the error detail.
  ereport(ERROR,
  (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg_internal("found xmin %u from before relfrozenxid %u",
+ errmsg_internal("for block %u and offnum %u, found xmin %u from before relfrozenxid %u",
+ ItemPointerGetBlockNumber(tid),
+ ItemPointerGetOffsetNumber(tid),
  xid, relfrozenxid)));

Attaching v01_0002 patch for this method.

Please let me know your thoughts.


Attachment

Re: display offset along with block number in vacuum errors

From
Michael Paquier
Date:
On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:
> In commit b61d161(Introduce vacuum errcontext to display additional
> information), we added vacuum errcontext to display additional
> information(block number) so that in case of vacuum error, we can identify
> which block we are getting error.  Addition to block number, if we can
> display offset, then it will be more helpful for users. So to display
> offset, here proposing two different methods(Thanks Robert for suggesting
> these 2 methods):

    new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
    vacrelstats->blkno = new_rel_pages;
+   vacrelstats->offnum = InvalidOffsetNumber;

Adding more context would be interesting for some cases, but not all
contrary to what your patch does in some code paths like
lazy_truncate_heap() as you would show up an offset of 0 for an
invalid value.  This could confuse some users.  Note that we are
careful enough to not print a context message if the block number is
invalid.
--
Michael

Attachment

Re: display offset along with block number in vacuum errors

From
Masahiko Sawada
Date:
On Sat, 25 Jul 2020 at 02:49, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> Hi hackers,
> We discussed in another email thread[1], that it will be helpful if we can display offset along with block number in
vacuumerror. Here, proposing a patch to add offset along with block number in vacuum errors. 
>
> In commit b61d161(Introduce vacuum errcontext to display additional information), we added vacuum errcontext to
displayadditional information(block number) so that in case of vacuum error, we can identify which block we are getting
error. Addition to block number, if we can display offset, then it will be more helpful for users. So to display
offset,here proposing two different methods(Thanks Robert for suggesting these 2 methods): 
>
> Method 1: We can report the TID as well as the block number in  errcontext.
> - errcontext("while scanning block %u of relation \"%s.%s\"",
> -   errinfo->blkno, errinfo->relnamespace, errinfo->relname);
> + errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
> +   errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
>
> Above fix requires more calls to update_vacuum_error_info(). Attaching v01_0001 patch for this method.
>
> Method 2: We can improve the error messages by passing the relevant TID to heap_prepare_freeze_tuple and having it
reportthe TID as part of the error message or in the error detail. 
>   ereport(ERROR,
>   (errcode(ERRCODE_DATA_CORRUPTED),
> - errmsg_internal("found xmin %u from before relfrozenxid %u",
> + errmsg_internal("for block %u and offnum %u, found xmin %u from before relfrozenxid %u",
> + ItemPointerGetBlockNumber(tid),
> + ItemPointerGetOffsetNumber(tid),
>   xid, relfrozenxid)));
>
> Attaching v01_0002 patch for this method.
>
> Please let me know your thoughts.
>

+1 for adding offset in error messages.

I had a look at 0001 patch. You've set the vacuum error info but I
think an error won't happen during setting itemids unused:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber
blkno, Buffer buffer,
                BlockNumber tblk;
                OffsetNumber toff;
                ItemId          itemid;
+               LVSavedErrInfo loc_saved_err_info;

                tblk =
ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
                if (tblk != blkno)
                        break;                          /* past end of
tuples for this block */
                toff =
ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+               /* Update error traceback information */
+               update_vacuum_error_info(vacrelstats,
&loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+                                                                blkno, toff);
                itemid = PageGetItemId(page, toff);
                ItemIdSetUnused(itemid);
                unused[uncnt++] = toff;
+
+               /* Revert to the previous phase information for error
traceback */
+               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
        }

        PageRepairFragmentation(page);

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: display offset along with block number in vacuum errors

From
Mahendra Singh Thalor
Date:
Thanks Michael for looking into this.

On Sat, 25 Jul 2020 at 15:02, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:
> > In commit b61d161(Introduce vacuum errcontext to display additional
> > information), we added vacuum errcontext to display additional
> > information(block number) so that in case of vacuum error, we can identify
> > which block we are getting error.  Addition to block number, if we can
> > display offset, then it will be more helpful for users. So to display
> > offset, here proposing two different methods(Thanks Robert for suggesting
> > these 2 methods):
>
>     new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
>     vacrelstats->blkno = new_rel_pages;
> +   vacrelstats->offnum = InvalidOffsetNumber;
>
> Adding more context would be interesting for some cases, but not all
> contrary to what your patch does in some code paths like
> lazy_truncate_heap() as you would show up an offset of 0 for an
> invalid value.  This could confuse some users.  Note that we are
> careful enough to not print a context message if the block number is
> invalid.

Okay. I agree with you. In case of inavlid offset, we can skip offset
printing. I will do this change in the next patch.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: display offset along with block number in vacuum errors

From
Justin Pryzby
Date:
On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:
> Hi hackers,
> We discussed in another email thread[1], that it will be helpful if we can
> display offset along with block number in vacuum error. Here, proposing a
> patch to add offset along with block number in vacuum errors.

Thanks.  I happened to see both threads, only by chance.

I'd already started writing the same as your 0001, which is essentially the
same as yours.

Here:

@@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
         BlockNumber tblk;
         OffsetNumber toff;
         ItemId        itemid;
+        LVSavedErrInfo loc_saved_err_info;
 
         tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
         if (tblk != blkno)
             break;                /* past end of tuples for this block */
         toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
+
+        /* Update error traceback information */
+        update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
+                                 blkno, toff);
         itemid = PageGetItemId(page, toff);
         ItemIdSetUnused(itemid);
         unused[uncnt++] = toff;
+
+        /* Revert to the previous phase information for error traceback */
+        restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
     }

I'm not sure why you use restore_vacuum_error_info() at all.  It's already
called at the end of lazy_vacuum_page() (and others) to allow functions to
clean up after their own state changes, rather than requiring callers to do it.
I don't think you should use it in a loop, nor introduce another
LVSavedErrInfo.

Since phase and blkno are already set, I think you should just set
vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
Similar to whats done in lazy_vacuum_heap():

                tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
                vacrelstats->blkno = tblk;

I think you should also do:

@@ -2976,6 +2984,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
                ItemId          itemid;
                HeapTupleData tuple;
 
+               vacrelstats->offset = offnum;

I'm not sure, but maybe you'd also want to do the same in more places:

@@ -2024,6 +2030,7 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
@@ -2790,6 +2797,7 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)

-- 
Justin



Re: display offset along with block number in vacuum errors

From
Mahendra Singh Thalor
Date:
Thanks Justin, Sawada and Michael for reviewing.

On Mon, 27 Jul 2020 at 16:43, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote:
> > Hi hackers,
> > We discussed in another email thread[1], that it will be helpful if we can
> > display offset along with block number in vacuum error. Here, proposing a
> > patch to add offset along with block number in vacuum errors.
>
> Thanks.  I happened to see both threads, only by chance.
>
> I'd already started writing the same as your 0001, which is essentially the
> same as yours.
>
> Here:
>
> @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
>                 BlockNumber tblk;
>                 OffsetNumber toff;
>                 ItemId          itemid;
> +               LVSavedErrInfo loc_saved_err_info;
>
>                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
>                 if (tblk != blkno)
>                         break;                          /* past end of tuples for this block */
>                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> +
> +               /* Update error traceback information */
> +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> +                                                                blkno, toff);
>                 itemid = PageGetItemId(page, toff);
>                 ItemIdSetUnused(itemid);
>                 unused[uncnt++] = toff;
> +
> +               /* Revert to the previous phase information for error traceback */
> +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
>         }
>
> I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> called at the end of lazy_vacuum_page() (and others) to allow functions to
> clean up after their own state changes, rather than requiring callers to do it.
> I don't think you should use it in a loop, nor introduce another
> LVSavedErrInfo.
>
> Since phase and blkno are already set, I think you should just set
> vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> Similar to whats done in lazy_vacuum_heap():
>
>                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
>                 vacrelstats->blkno = tblk;

Fixed.

>
> I think you should also do:
>
> @@ -2976,6 +2984,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
>                 ItemId          itemid;
>                 HeapTupleData tuple;
>
> +               vacrelstats->offset = offnum;

Agreed and fixed.

>
> I'm not sure, but maybe you'd also want to do the same in more places:
>
> @@ -2024,6 +2030,7 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)

Fixed.

> @@ -2790,6 +2797,7 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)

I checked the code and I think there is no need to do similar changes
in count_nondeletable_pages. I will look again and will verify again.

Apart from these, I fixed comments given by Sawada and Michael in the
latest patch. Attaching v2 patch for review.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: display offset along with block number in vacuum errors

From
Justin Pryzby
Date:
On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> Apart from these, I fixed comments given by Sawada and Michael in the
> latest patch. Attaching v2 patch for review.

Thanks.

lazy_check_needs_freeze iterates over blocks and this patch changes it to
update vacrelstats.  I think it should do what
lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
its beginning (even though only the offset is changed), and then
restore_vacuum_error_info() at its end (to "revert back" to the item number it
started with).

The same is true of heap_page_is_all_visible(), except it's only called by
lazy_vacuum_page(), which already calls restore_vacuum_error_info() a few lines
later.

As for the message:

+                               if (OffsetNumberIsValid(errinfo->offnum))
+                                       errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
+                                                          errinfo->blkno, errinfo->offnum, errinfo->relnamespace,
errinfo->relname);
+                               else
+                                       errcontext("while scanning block %u of relation \"%s.%s\"",
+                                                          errinfo->blkno, errinfo->relnamespace, errinfo->relname);

I think that may be confusing.  A DBA should know what a "block" is, but
"offset" sounds like a byte offset, rather than an item number.  Here's what
I'd written.  Maybe it should say "offset number".

+                               errcontext("while vacuuming block %u of relation \"%s.%s\", item offset %u",


-- 
Justin



Re: display offset along with block number in vacuum errors

From
Justin Pryzby
Date:
Bcc: 
Subject: Re: display offset along with block number in vacuum errors
Reply-To: 
In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>

On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > Here:
> >
> > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> >                 BlockNumber tblk;
> >                 OffsetNumber toff;
> >                 ItemId          itemid;
> > +               LVSavedErrInfo loc_saved_err_info;
> >
> >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> >                 if (tblk != blkno)
> >                         break;                          /* past end of tuples for this block */
> >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > +
> > +               /* Update error traceback information */
> > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > +                                                                blkno, toff);
> >                 itemid = PageGetItemId(page, toff);
> >                 ItemIdSetUnused(itemid);
> >                 unused[uncnt++] = toff;
> > +
> > +               /* Revert to the previous phase information for error traceback */
> > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> >         }
> >
> > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > clean up after their own state changes, rather than requiring callers to do it.
> > I don't think you should use it in a loop, nor introduce another
> > LVSavedErrInfo.
> >
> > Since phase and blkno are already set, I think you should just set
> > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > Similar to whats done in lazy_vacuum_heap():
> >
> >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> >                 vacrelstats->blkno = tblk;
> 
> Fixed.

I rearead this thread and I think the earlier suggestion from Masahiko was
right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
the page, which has already been read from disk.  On my suggestion, your v2
patch sets offnum directly, but now I think it's not useful to set at all,
since the whole page is manipulated by PageRepairFragmentation() and
log_heap_clean().  An error there would misleadingly say "..at offset number
MM", but would always show the page's last offset, and not the offset where an
error occured.

I added this at:
https://commitfest.postgresql.org/29/2662/

If anyone is considering this patch for v13, I guess it should be completed by
next week.

-- 
Justin



Re: display offset along with block number in vacuum errors

From
Justin Pryzby
Date:
On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> Bcc: 
> Subject: Re: display offset along with block number in vacuum errors
> Reply-To: 
> In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>

whoops

> On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > Here:
> > >
> > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > >                 BlockNumber tblk;
> > >                 OffsetNumber toff;
> > >                 ItemId          itemid;
> > > +               LVSavedErrInfo loc_saved_err_info;
> > >
> > >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > >                 if (tblk != blkno)
> > >                         break;                          /* past end of tuples for this block */
> > >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > +
> > > +               /* Update error traceback information */
> > > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > +                                                                blkno, toff);
> > >                 itemid = PageGetItemId(page, toff);
> > >                 ItemIdSetUnused(itemid);
> > >                 unused[uncnt++] = toff;
> > > +
> > > +               /* Revert to the previous phase information for error traceback */
> > > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> > >         }
> > >
> > > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > > clean up after their own state changes, rather than requiring callers to do it.
> > > I don't think you should use it in a loop, nor introduce another
> > > LVSavedErrInfo.
> > >
> > > Since phase and blkno are already set, I think you should just set
> > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > > Similar to whats done in lazy_vacuum_heap():
> > >
> > >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > >                 vacrelstats->blkno = tblk;
> > 
> > Fixed.
> 
> I rearead this thread and I think the earlier suggestion from Masahiko was
> right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
> the page, which has already been read from disk.  On my suggestion, your v2
> patch sets offnum directly, but now I think it's not useful to set at all,
> since the whole page is manipulated by PageRepairFragmentation() and
> log_heap_clean().  An error there would misleadingly say "..at offset number
> MM", but would always show the page's last offset, and not the offset where an
> error occured.

This makes me question whether offset numbers are ever useful during
VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
internal functions that don't update vacrelstats->offno.  Note that my initial
problem report that led to the errcontext implementation was an ERROR in heap
*scan* (not vacuum).  So an offset number at that point would've been
sufficient.
https://www.postgresql.org/message-id/20190808012436.GG11185@telsasoft.com

I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
so an error in heap_page_prune() (for example) doesn't get the wrong offset
associated with it.

So see the attached changes on top of your v2 patch.

postgres=# DROP TABLE tt; CREATE TABLE tt(a int) WITH (fillfactor=90); INSERT INTO tt SELECT generate_series(1,99999);
VACUUMtt; UPDATE tt SET a=1 WHERE ctid='(345,10)'; UPDATE pg_class SET relfrozenxid=(relfrozenxid::text::int +
(1<<30))::int::text::xidWHERE oid='tt'::regclass; VACUUM tt;
 
ERROR:  found xmin 1961 from before relfrozenxid 1073743785
CONTEXT:  while scanning block 345 of relation "public.tt", item offset 205

Hmm.. is it confusing that the block number is 0-indexed but the offset is
1-indexed ?

-- 
Justin

Attachment

Re: display offset along with block number in vacuum errors

From
Mahendra Singh Thalor
Date:
Thanks Justin.

On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> > Bcc:
> > Subject: Re: display offset along with block number in vacuum errors
> > Reply-To:
> > In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>
>
> whoops
>
> > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > > Here:
> > > >
> > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > > >                 BlockNumber tblk;
> > > >                 OffsetNumber toff;
> > > >                 ItemId          itemid;
> > > > +               LVSavedErrInfo loc_saved_err_info;
> > > >
> > > >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > > >                 if (tblk != blkno)
> > > >                         break;                          /* past end of tuples for this block */
> > > >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > > +
> > > > +               /* Update error traceback information */
> > > > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > > +                                                                blkno, toff);
> > > >                 itemid = PageGetItemId(page, toff);
> > > >                 ItemIdSetUnused(itemid);
> > > >                 unused[uncnt++] = toff;
> > > > +
> > > > +               /* Revert to the previous phase information for error traceback */
> > > > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> > > >         }
> > > >
> > > > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > > > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > > > clean up after their own state changes, rather than requiring callers to do it.
> > > > I don't think you should use it in a loop, nor introduce another
> > > > LVSavedErrInfo.
> > > >
> > > > Since phase and blkno are already set, I think you should just set
> > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > > > Similar to whats done in lazy_vacuum_heap():
> > > >
> > > >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > > >                 vacrelstats->blkno = tblk;
> > >
> > > Fixed.
> >
> > I rearead this thread and I think the earlier suggestion from Masahiko was
> > right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
> > the page, which has already been read from disk.  On my suggestion, your v2
> > patch sets offnum directly, but now I think it's not useful to set at all,
> > since the whole page is manipulated by PageRepairFragmentation() and
> > log_heap_clean().  An error there would misleadingly say "..at offset number
> > MM", but would always show the page's last offset, and not the offset where an
> > error occured.
>
> This makes me question whether offset numbers are ever useful during
> VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
> internal functions that don't update vacrelstats->offno.  Note that my initial
> problem report that led to the errcontext implementation was an ERROR in heap
> *scan* (not vacuum).  So an offset number at that point would've been
> sufficient.
> https://www.postgresql.org/message-id/20190808012436.GG11185@telsasoft.com
>
> I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
> so an error in heap_page_prune() (for example) doesn't get the wrong offset
> associated with it.
>
> So see the attached changes on top of your v2 patch.

Actually I was waiting for review comments from committer and other
people also and was planning to send a patch after that. I already
fixed your comments in my offline patch and was waiting for more
comments. Anyway, thanks for delta patch.

Here, attaching v3 patch for review.

>
> postgres=# DROP TABLE tt; CREATE TABLE tt(a int) WITH (fillfactor=90); INSERT INTO tt SELECT
generate_series(1,99999);VACUUM tt; UPDATE tt SET a=1 WHERE ctid='(345,10)'; UPDATE pg_class SET
relfrozenxid=(relfrozenxid::text::int+ (1<<30))::int::text::xid WHERE oid='tt'::regclass; VACUUM tt; 
> ERROR:  found xmin 1961 from before relfrozenxid 1073743785
> CONTEXT:  while scanning block 345 of relation "public.tt", item offset 205
>
> Hmm.. is it confusing that the block number is 0-indexed but the offset is
> 1-indexed ?
>
> --
> Justin


--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.c

Attachment

Re: display offset along with block number in vacuum errors

From
Justin Pryzby
Date:
On Sat, Aug 01, 2020 at 12:31:53PM +0530, Mahendra Singh Thalor wrote:
> Actually I was waiting for review comments from committer and other
> people also and was planning to send a patch after that. I already
> fixed your comments in my offline patch and was waiting for more
> comments. Anyway, thanks for delta patch.
> 
> Here, attaching v3 patch for review.

I wasn't being impatient but I spent enough time thinking about this that it
made sense to put it in patch form.  Your patch has a couple extaneous changes:

                case VACUUM_ERRCB_PHASE_VACUUM_HEAP:
                        if (BlockNumberIsValid(errinfo->blkno))
+                       {
                                errcontext("while vacuuming block %u of relation \"%s.%s\"",
                                                   errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+                       }
                        break;
 
                case VACUUM_ERRCB_PHASE_VACUUM_INDEX:
@@ -3589,6 +3618,7 @@ vacuum_error_callback(void *arg)
                                           errinfo->indname, errinfo->relnamespace, errinfo->relname);
                        break;
 
+
                case VACUUM_ERRCB_PHASE_INDEX_CLEANUP:
                        errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"",

I would get rid of these by doing like: git reset -p HEAD~1 (say "n" to most
hunks and "y" to reset just the two, above), then git commit --amend (without
-a and without pathnames), then git diff will show local changes (including
those no-longer-committed hunks), which you can git checkout -p (or similar).
I'd be interested to hear if there's a better way.

-- 
Justin



Re: display offset along with block number in vacuum errors

From
Masahiko Sawada
Date:
On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> Thanks Justin.
>
> On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> > > Bcc:
> > > Subject: Re: display offset along with block number in vacuum errors
> > > Reply-To:
> > > In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>
> >
> > whoops
> >
> > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > > > Here:
> > > > >
> > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > > > >                 BlockNumber tblk;
> > > > >                 OffsetNumber toff;
> > > > >                 ItemId          itemid;
> > > > > +               LVSavedErrInfo loc_saved_err_info;
> > > > >
> > > > >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > > > >                 if (tblk != blkno)
> > > > >                         break;                          /* past end of tuples for this block */
> > > > >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > > > +
> > > > > +               /* Update error traceback information */
> > > > > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > > > +                                                                blkno, toff);
> > > > >                 itemid = PageGetItemId(page, toff);
> > > > >                 ItemIdSetUnused(itemid);
> > > > >                 unused[uncnt++] = toff;
> > > > > +
> > > > > +               /* Revert to the previous phase information for error traceback */
> > > > > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> > > > >         }
> > > > >
> > > > > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > > > > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > > > > clean up after their own state changes, rather than requiring callers to do it.
> > > > > I don't think you should use it in a loop, nor introduce another
> > > > > LVSavedErrInfo.
> > > > >
> > > > > Since phase and blkno are already set, I think you should just set
> > > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > > > > Similar to whats done in lazy_vacuum_heap():
> > > > >
> > > > >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > > > >                 vacrelstats->blkno = tblk;
> > > >
> > > > Fixed.
> > >
> > > I rearead this thread and I think the earlier suggestion from Masahiko was
> > > right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
> > > the page, which has already been read from disk.  On my suggestion, your v2
> > > patch sets offnum directly, but now I think it's not useful to set at all,
> > > since the whole page is manipulated by PageRepairFragmentation() and
> > > log_heap_clean().  An error there would misleadingly say "..at offset number
> > > MM", but would always show the page's last offset, and not the offset where an
> > > error occured.
> >
> > This makes me question whether offset numbers are ever useful during
> > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
> > internal functions that don't update vacrelstats->offno.  Note that my initial
> > problem report that led to the errcontext implementation was an ERROR in heap
> > *scan* (not vacuum).  So an offset number at that point would've been
> > sufficient.
> > https://www.postgresql.org/message-id/20190808012436.GG11185@telsasoft.com
> >
> > I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
> > so an error in heap_page_prune() (for example) doesn't get the wrong offset
> > associated with it.
> >
> > So see the attached changes on top of your v2 patch.
>
> Actually I was waiting for review comments from committer and other
> people also and was planning to send a patch after that. I already
> fixed your comments in my offline patch and was waiting for more
> comments. Anyway, thanks for delta patch.
>
> Here, attaching v3 patch for review.

Thank you for updating the patch!

Here are my comments on v3 patch:

@@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
    if (PageIsNew(page) || PageIsEmpty(page))
        return false;

+   /* Update error traceback information */
+   update_vacuum_error_info(vacrelstats, &saved_err_info,
+           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
+           InvalidOffsetNumber);
+
    maxoff = PageGetMaxOffsetNumber(page);
    for (offnum = FirstOffsetNumber;
         offnum <= maxoff;

You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
but I think we're already in that phase. I'm okay with explicitly
setting it but on the other hand, we don't set the phase in
heap_page_is_all_visible(). Is there any reason for that?

Also, since we don't reset vacrelstats->offnum at the end of
heap_page_is_all_visible(), if an error occurs by the end of
lazy_vacuum_page(), the caller of  heap_page_is_all_visible(), we
report the error context with the last offset number in the page,
making the users confused.

---
@@ -2045,10 +2060,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)

        if (heap_tuple_needs_freeze(tupleheader, FreezeLimit,
                                    MultiXactCutoff, buf))
-           return true;
+           break;
    }                           /* scan along page */

-   return false;
+   /* Revert to the previous phase information for error traceback */
+   restore_vacuum_error_info(vacrelstats, &saved_err_info);
+
+   return offnum <= maxoff ? true : false;
 }

I think we can write just "return (offnum <= maxoff)".

---
-   /* Revert back to the old phase information for error traceback */
+   /* Revert to the old phase information for error traceback */

If we want to modify this comment how about the following phrase for
consistency with other places?

/* Revert to the previous phase information for error traceback */

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: display offset along with block number in vacuum errors

From
Masahiko Sawada
Date:
On Sun, 2 Aug 2020 at 13:24, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Sun, Aug 02, 2020 at 01:02:42PM +0900, Masahiko Sawada wrote:
> > Thank you for updating the patch!
> >
> > Here are my comments on v3 patch:
> >
> > @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
> >     if (PageIsNew(page) || PageIsEmpty(page))
> >         return false;
> >
> > +   /* Update error traceback information */
> > +   update_vacuum_error_info(vacrelstats, &saved_err_info,
> > +           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
> > +           InvalidOffsetNumber);
> > +
> >     maxoff = PageGetMaxOffsetNumber(page);
> >     for (offnum = FirstOffsetNumber;
> >          offnum <= maxoff;
> >
> > You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
> > but I think we're already in that phase. I'm okay with explicitly
> > setting it but on the other hand, we don't set the phase in
> > heap_page_is_all_visible(). Is there any reason for that?
>
> That part was my suggestion, so I can answer that.  I added
> update_vacuum_error_info() to lazy_check_needs_freeze() to allow it to later
> call restore_vacuum_error_info().
>
> > Also, since we don't reset vacrelstats->offnum at the end of
> > heap_page_is_all_visible(), if an error occurs by the end of
> > lazy_vacuum_page(), the caller of  heap_page_is_all_visible(), we
> > report the error context with the last offset number in the page,
> > making the users confused.
>
> So it looks like heap_page_is_all_visible() should also call the update and
> restore functions.
>
> Do you agree with my suggestion that the VACUUM phase should never try to
> report an offset ?

Yeah, given the current heap vacuum implementation, I agree that
setting the offset number during VACUUM_HEAP phase doesn't help
anything. But setting the offset number during checking tuples'
visibility in heap_page_is_all_visible() might be useful, although it
might be unlikely to find a problem in heap_page_is_all_visible() as
the tuple visibility checking is already done in lazy_scan_heap(). I
wonder if we can set SCAN_HEAP phase and update the offset number in
heap_page_is_all_visible().

> How do you think we can phrase the message to avoid confusion due to 0-based
> block number and 1-based offset ?

I think that since the user who uses this errcontext information is
likely to know more or less the internal of PostgreSQL I think 0-based
block number and 1-based offset will not be a problem. However, I
expected these are documented but couldn't find. If not yet, I think
it’s a good time to document that.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: display offset along with block number in vacuum errors

From
Mahendra Singh Thalor
Date:
Thanks Sawada and Justin.

On Sun, 2 Aug 2020 at 09:33, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
> >
> > Thanks Justin.
> >
> > On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote:
> > > > Bcc:
> > > > Subject: Re: display offset along with block number in vacuum errors
> > > > Reply-To:
> > > > In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com>
> > >
> > > whoops
> > >
> > > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > > > > Here:
> > > > > >
> > > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
> > > > > >                 BlockNumber tblk;
> > > > > >                 OffsetNumber toff;
> > > > > >                 ItemId          itemid;
> > > > > > +               LVSavedErrInfo loc_saved_err_info;
> > > > > >
> > > > > >                 tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]);
> > > > > >                 if (tblk != blkno)
> > > > > >                         break;                          /* past end of tuples for this block */
> > > > > >                 toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]);
> > > > > > +
> > > > > > +               /* Update error traceback information */
> > > > > > +               update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP,
> > > > > > +                                                                blkno, toff);
> > > > > >                 itemid = PageGetItemId(page, toff);
> > > > > >                 ItemIdSetUnused(itemid);
> > > > > >                 unused[uncnt++] = toff;
> > > > > > +
> > > > > > +               /* Revert to the previous phase information for error traceback */
> > > > > > +               restore_vacuum_error_info(vacrelstats, &loc_saved_err_info);
> > > > > >         }
> > > > > >
> > > > > > I'm not sure why you use restore_vacuum_error_info() at all.  It's already
> > > > > > called at the end of lazy_vacuum_page() (and others) to allow functions to
> > > > > > clean up after their own state changes, rather than requiring callers to do it.
> > > > > > I don't think you should use it in a loop, nor introduce another
> > > > > > LVSavedErrInfo.
> > > > > >
> > > > > > Since phase and blkno are already set, I think you should just set
> > > > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info().
> > > > > > Similar to whats done in lazy_vacuum_heap():
> > > > > >
> > > > > >                 tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]);
> > > > > >                 vacrelstats->blkno = tblk;
> > > > >
> > > > > Fixed.
> > > >
> > > > I rearead this thread and I think the earlier suggestion from Masahiko was
> > > > right.  The loop around dead_tuples only does ItemIdSetUnused() which updates
> > > > the page, which has already been read from disk.  On my suggestion, your v2
> > > > patch sets offnum directly, but now I think it's not useful to set at all,
> > > > since the whole page is manipulated by PageRepairFragmentation() and
> > > > log_heap_clean().  An error there would misleadingly say "..at offset number
> > > > MM", but would always show the page's last offset, and not the offset where an
> > > > error occured.
> > >
> > > This makes me question whether offset numbers are ever useful during
> > > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by
> > > internal functions that don't update vacrelstats->offno.  Note that my initial
> > > problem report that led to the errcontext implementation was an ERROR in heap
> > > *scan* (not vacuum).  So an offset number at that point would've been
> > > sufficient.
> > > https://www.postgresql.org/message-id/20190808012436.GG11185@telsasoft.com
> > >
> > > I mentioned that lazy_check_needs_freeze() should save and restore the errinfo,
> > > so an error in heap_page_prune() (for example) doesn't get the wrong offset
> > > associated with it.
> > >
> > > So see the attached changes on top of your v2 patch.
> >
> > Actually I was waiting for review comments from committer and other
> > people also and was planning to send a patch after that. I already
> > fixed your comments in my offline patch and was waiting for more
> > comments. Anyway, thanks for delta patch.
> >
> > Here, attaching v3 patch for review.
>
> Thank you for updating the patch!
>
> Here are my comments on v3 patch:
>
> @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
>     if (PageIsNew(page) || PageIsEmpty(page))
>         return false;
>
> +   /* Update error traceback information */
> +   update_vacuum_error_info(vacrelstats, &saved_err_info,
> +           VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno,
> +           InvalidOffsetNumber);
> +
>     maxoff = PageGetMaxOffsetNumber(page);
>     for (offnum = FirstOffsetNumber;
>          offnum <= maxoff;
>
> You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP
> but I think we're already in that phase. I'm okay with explicitly
> setting it but on the other hand, we don't set the phase in
> heap_page_is_all_visible(). Is there any reason for that?
>
> Also, since we don't reset vacrelstats->offnum at the end of
> heap_page_is_all_visible(), if an error occurs by the end of
> lazy_vacuum_page(), the caller of  heap_page_is_all_visible(), we
> report the error context with the last offset number in the page,
> making the users confused.

Your point is valid. Added update and restore functions in
heap_page_is_all_visible in the latest patch.

>
> ---
> @@ -2045,10 +2060,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
>
>         if (heap_tuple_needs_freeze(tupleheader, FreezeLimit,
>                                     MultiXactCutoff, buf))
> -           return true;
> +           break;
>     }                           /* scan along page */
>
> -   return false;
> +   /* Revert to the previous phase information for error traceback */
> +   restore_vacuum_error_info(vacrelstats, &saved_err_info);
> +
> +   return offnum <= maxoff ? true : false;
>  }
>
> I think we can write just "return (offnum <= maxoff)".

Fixed this.

>
> ---
> -   /* Revert back to the old phase information for error traceback */
> +   /* Revert to the old phase information for error traceback */
>
> If we want to modify this comment how about the following phrase for
> consistency with other places?

Fixed this.

>
> /* Revert to the previous phase information for error traceback */
>
> Regards,
>
> --
> Masahiko Sawada            http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Apart from these, I fixed Justin's comment of extra brackets(That was
due to "patch -p 1 < file", as 002_fix was not applying directly). I
haven't updated the document for this(Sawada's comment). I will try in
the next patch.
Attaching v4 patch for review.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: display offset along with block number in vacuum errors

From
Robert Haas
Date:
On Sun, Aug 2, 2020 at 10:43 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> I think that since the user who uses this errcontext information is
> likely to know more or less the internal of PostgreSQL I think 0-based
> block number and 1-based offset will not be a problem. However, I
> expected these are documented but couldn't find. If not yet, I think
> it’s a good time to document that.

I agree. That's just how TIDs are.


--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > Apart from these, I fixed comments given by Sawada and Michael in the
> > latest patch. Attaching v2 patch for review.
>
> Thanks.
>
> lazy_check_needs_freeze iterates over blocks and this patch changes it to
> update vacrelstats.  I think it should do what
> lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
> its beginning (even though only the offset is changed), and then
> restore_vacuum_error_info() at its end (to "revert back" to the item number it
> started with).
>

I see that Mahendra has changed patch as per this suggestion but I am
not convinced that it is a good idea to sprinkle
update_vacuum_error_info()/restore_vacuum_error_info() at places more
than required. I see that it might look a bit clean from the
perspective that if tomorrow we use the function
lazy_check_needs_freeze() for a different purpose then we don't need
to worry about the wrong phase information. If we are worried about
that then we should have an assert in that function to ensure that the
current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Wed, Aug 5, 2020 at 12:47 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> Apart from these, I fixed Justin's comment of extra brackets(That was
> due to "patch -p 1 < file", as 002_fix was not applying directly). I
> haven't updated the document for this(Sawada's comment). I will try in
> the next patch.
> Attaching v4 patch for review.
>

Few comments on the latest patch:
1.
@@ -2640,6 +2659,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats
*vacrelstats)
  */
  new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
  vacrelstats->blkno = new_rel_pages;
+ vacrelstats->offnum = InvalidOffsetNumber;

Do we really need to update the 'vacrelstats->offnum' here when we
have already set it to InvalidOffsetNumber in the caller?

2.
@@ -3574,8 +3605,14 @@ vacuum_error_callback(void *arg)
  {
  case VACUUM_ERRCB_PHASE_SCAN_HEAP:
  if (BlockNumberIsValid(errinfo->blkno))
- errcontext("while scanning block %u of relation \"%s.%s\"",
-    errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+ {
+ if (OffsetNumberIsValid(errinfo->offnum))
+ errcontext("while scanning block %u of relation \"%s.%s\", item offset %u",
+

I am not completely sure if this error message is an improvement over
what you have in the initial version of patch "while scanning block %u
and offset %u of relation \"%s.%s\"",...". I see that Justin has
raised a concern above that whether users will be aware of 'offset'
but I also see that we have used it in a few other messages in the
code.  For example:

PageIndexTupleDeleteNoCompact()
{
..
nline = PageGetMaxOffsetNumber(page);
if ((int) offnum <= 0 || (int) offnum > nline)
elog(ERROR, "invalid index offnum: %u", offnum);
..
}

hash_desc
{
..
case XLOG_HASH_INSERT:
{
xl_hash_insert *xlrec = (xl_hash_insert *) rec;

appendStringInfo(buf, "off %u", xlrec->offnum);
break;
}

Similarly in other desc functions, we have used off or offnum.

I find the message in your initial patch better.

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Justin Pryzby
Date:
On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:
> On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote:
> > > Apart from these, I fixed comments given by Sawada and Michael in the
> > > latest patch. Attaching v2 patch for review.
> >
> > Thanks.
> >
> > lazy_check_needs_freeze iterates over blocks and this patch changes it to
> > update vacrelstats.  I think it should do what
> > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
> > its beginning (even though only the offset is changed), and then
> > restore_vacuum_error_info() at its end (to "revert back" to the item number it
> > started with).
> >
> 
> I see that Mahendra has changed patch as per this suggestion but I am
> not convinced that it is a good idea to sprinkle
> update_vacuum_error_info()/restore_vacuum_error_info() at places more
> than required. I see that it might look a bit clean from the
> perspective that if tomorrow we use the function
> lazy_check_needs_freeze() for a different purpose then we don't need
> to worry about the wrong phase information. If we are worried about
> that then we should have an assert in that function to ensure that the
> current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.

The motivation was to restore the offnum, which is set to Invalid at the start
of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
should be restored or re-set to Invalid when returns to lazy_scan_heap().  If
you think it's important, we could just set vacrelstats->offnum = Invalid
before returning, but that's what the restore function was built for.  We do
direct assignment in 2 places to avoid a function call within a loop.

lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
                           Relation *Irel, int nindexes, bool aggressive)

...
        for (blkno = 0; blkno < nblocks; blkno++)
        {
...
                update_vacuum_error_info(vacrelstats, NULL, VACUUM_ERRCB_PHASE_SCAN_HEAP,
                                                                 blkno, InvalidOffsetNumber);
                if (!ConditionalLockBufferForCleanup(buf))
                {
...
                        if (!lazy_check_needs_freeze(buf, &hastup, vacrelstats))
                        {
...
                for (offnum = FirstOffsetNumber;
                         offnum <= maxoff;
                         offnum = OffsetNumberNext(offnum))


-- 
Justin



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:
> > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > >
> > > lazy_check_needs_freeze iterates over blocks and this patch changes it to
> > > update vacrelstats.  I think it should do what
> > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
> > > its beginning (even though only the offset is changed), and then
> > > restore_vacuum_error_info() at its end (to "revert back" to the item number it
> > > started with).
> > >
> >
> > I see that Mahendra has changed patch as per this suggestion but I am
> > not convinced that it is a good idea to sprinkle
> > update_vacuum_error_info()/restore_vacuum_error_info() at places more
> > than required. I see that it might look a bit clean from the
> > perspective that if tomorrow we use the function
> > lazy_check_needs_freeze() for a different purpose then we don't need
> > to worry about the wrong phase information. If we are worried about
> > that then we should have an assert in that function to ensure that the
> > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.
>
> The motivation was to restore the offnum, which is set to Invalid at the start
> of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
> should be restored or re-set to Invalid when returns to lazy_scan_heap().  If
> you think it's important, we could just set vacrelstats->offnum = Invalid
> before returning,
>

Yeah, I would prefer that and probably a comment to indicate why we
are doing that.

> but that's what the restore function was built for.
>

I think it would be better to call restore wherever we call update. I
see your point that there is some value doing it via update/restore
but I think we should try to avoid that at many places unless it is
required and we already update blockno information without
update/restore at few places.

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Masahiko Sawada
Date:
On Fri, 7 Aug 2020 at 10:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:
> > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > >
> > > >
> > > > lazy_check_needs_freeze iterates over blocks and this patch changes it to
> > > > update vacrelstats.  I think it should do what
> > > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
> > > > its beginning (even though only the offset is changed), and then
> > > > restore_vacuum_error_info() at its end (to "revert back" to the item number it
> > > > started with).
> > > >
> > >
> > > I see that Mahendra has changed patch as per this suggestion but I am
> > > not convinced that it is a good idea to sprinkle
> > > update_vacuum_error_info()/restore_vacuum_error_info() at places more
> > > than required. I see that it might look a bit clean from the
> > > perspective that if tomorrow we use the function
> > > lazy_check_needs_freeze() for a different purpose then we don't need
> > > to worry about the wrong phase information. If we are worried about
> > > that then we should have an assert in that function to ensure that the
> > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.
> >
> > The motivation was to restore the offnum, which is set to Invalid at the start
> > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
> > should be restored or re-set to Invalid when returns to lazy_scan_heap().  If
> > you think it's important, we could just set vacrelstats->offnum = Invalid
> > before returning,
> >
>
> Yeah, I would prefer that and probably a comment to indicate why we
> are doing that.

+1

I'm concerned that we call the update and restore in
heap_page_is_all_visible(). Unlike lazy_check_needs_freeze(), this
function is called for every vacuumed page. I commented that if we
want to update the offset number during iterating tuples in the
function we should change the phase to SCAN_HEAP at the beginning of
the function because it's actually not vacuuming. But if the error is
unlikely to happen within the function I think we can avoid updating
the offset number and phase to avoid performance overhead.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Fri, Aug 7, 2020 at 8:10 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Fri, 7 Aug 2020 at 10:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:
> > > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > >
> > > > >
> > > > > lazy_check_needs_freeze iterates over blocks and this patch changes it to
> > > > > update vacrelstats.  I think it should do what
> > > > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
> > > > > its beginning (even though only the offset is changed), and then
> > > > > restore_vacuum_error_info() at its end (to "revert back" to the item number it
> > > > > started with).
> > > > >
> > > >
> > > > I see that Mahendra has changed patch as per this suggestion but I am
> > > > not convinced that it is a good idea to sprinkle
> > > > update_vacuum_error_info()/restore_vacuum_error_info() at places more
> > > > than required. I see that it might look a bit clean from the
> > > > perspective that if tomorrow we use the function
> > > > lazy_check_needs_freeze() for a different purpose then we don't need
> > > > to worry about the wrong phase information. If we are worried about
> > > > that then we should have an assert in that function to ensure that the
> > > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.
> > >
> > > The motivation was to restore the offnum, which is set to Invalid at the start
> > > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
> > > should be restored or re-set to Invalid when returns to lazy_scan_heap().  If
> > > you think it's important, we could just set vacrelstats->offnum = Invalid
> > > before returning,
> > >
> >
> > Yeah, I would prefer that and probably a comment to indicate why we
> > are doing that.
>
> +1
>
> I'm concerned that we call the update and restore in
> heap_page_is_all_visible(). Unlike lazy_check_needs_freeze(), this
> function is called for every vacuumed page. I commented that if we
> want to update the offset number during iterating tuples in the
> function we should change the phase to SCAN_HEAP at the beginning of
> the function because it's actually not vacuuming.
>

AFAICS, heap_page_is_all_visible() is called from only one place and
that is lazy_vacuum_page(), so I think if there is any error in
heap_page_is_all_visible(), it should be considered as VACUUM_HEAP
phase error.

 But if the error is
> unlikely to happen within the function I think we can avoid updating
> the offset number and phase to avoid performance overhead.
>

I am not sure we can guarantee that and even if it is true today one
can add an error in that path in future. But I feel we can keep the
phase as VACUUM_HEAP.

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Masahiko Sawada
Date:
On Fri, 7 Aug 2020 at 13:04, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Aug 7, 2020 at 8:10 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Fri, 7 Aug 2020 at 10:49, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > >
> > > > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:
> > > > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > > >
> > > > > >
> > > > > > lazy_check_needs_freeze iterates over blocks and this patch changes it to
> > > > > > update vacrelstats.  I think it should do what
> > > > > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
> > > > > > its beginning (even though only the offset is changed), and then
> > > > > > restore_vacuum_error_info() at its end (to "revert back" to the item number it
> > > > > > started with).
> > > > > >
> > > > >
> > > > > I see that Mahendra has changed patch as per this suggestion but I am
> > > > > not convinced that it is a good idea to sprinkle
> > > > > update_vacuum_error_info()/restore_vacuum_error_info() at places more
> > > > > than required. I see that it might look a bit clean from the
> > > > > perspective that if tomorrow we use the function
> > > > > lazy_check_needs_freeze() for a different purpose then we don't need
> > > > > to worry about the wrong phase information. If we are worried about
> > > > > that then we should have an assert in that function to ensure that the
> > > > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.
> > > >
> > > > The motivation was to restore the offnum, which is set to Invalid at the start
> > > > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
> > > > should be restored or re-set to Invalid when returns to lazy_scan_heap().  If
> > > > you think it's important, we could just set vacrelstats->offnum = Invalid
> > > > before returning,
> > > >
> > >
> > > Yeah, I would prefer that and probably a comment to indicate why we
> > > are doing that.
> >
> > +1
> >
> > I'm concerned that we call the update and restore in
> > heap_page_is_all_visible(). Unlike lazy_check_needs_freeze(), this
> > function is called for every vacuumed page. I commented that if we
> > want to update the offset number during iterating tuples in the
> > function we should change the phase to SCAN_HEAP at the beginning of
> > the function because it's actually not vacuuming.
> >
>
> AFAICS, heap_page_is_all_visible() is called from only one place and
> that is lazy_vacuum_page(), so I think if there is any error in
> heap_page_is_all_visible(), it should be considered as VACUUM_HEAP
> phase error.

It's true that heap_page_is_all_visible() is called from only
lazy_vacuum_page() but I'm concerned it would lead misleading since
it's not actually removing tuples but just checking after vacuum. I
guess that the errcontext should show what the process is actually
doing and therefore help the investigation, so I thought VACUUM_HEAP
might not be appropriate for this case. But I see also your point.
Other vacuum error context phases match with vacuum progress
information phrases. So in heap_page_is_all_visible (), I agree it's
better to update the offset number and keep the phase VACUUM_HEAP
rather than do nothing.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Mon, Aug 10, 2020 at 10:24 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> It's true that heap_page_is_all_visible() is called from only
> lazy_vacuum_page() but I'm concerned it would lead misleading since
> it's not actually removing tuples but just checking after vacuum. I
> guess that the errcontext should show what the process is actually
> doing and therefore help the investigation, so I thought VACUUM_HEAP
> might not be appropriate for this case. But I see also your point.
> Other vacuum error context phases match with vacuum progress
> information phrases. So in heap_page_is_all_visible (), I agree it's
> better to update the offset number and keep the phase VACUUM_HEAP
> rather than do nothing.
>

Okay, I have changed accordingly and this means that the offset will
be displayed for the vacuum phase as well. Apart from this, I have
fixed all the comments raised by me in the attached patch. One thing
we need to think is do we want to set offset during heap_page_prune
when called from lazy_scan_heap? I think the code path for
heap_prune_chain is quite deep, so an error can occur in that path.
What do you think?

-- 
With Regards,
Amit Kapila.

Attachment

Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Fri, Aug 7, 2020 at 7:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote:
> > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > >
> > > >
> > > > lazy_check_needs_freeze iterates over blocks and this patch changes it to
> > > > update vacrelstats.  I think it should do what
> > > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at
> > > > its beginning (even though only the offset is changed), and then
> > > > restore_vacuum_error_info() at its end (to "revert back" to the item number it
> > > > started with).
> > > >
> > >
> > > I see that Mahendra has changed patch as per this suggestion but I am
> > > not convinced that it is a good idea to sprinkle
> > > update_vacuum_error_info()/restore_vacuum_error_info() at places more
> > > than required. I see that it might look a bit clean from the
> > > perspective that if tomorrow we use the function
> > > lazy_check_needs_freeze() for a different purpose then we don't need
> > > to worry about the wrong phase information. If we are worried about
> > > that then we should have an assert in that function to ensure that the
> > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP.
> >
> > The motivation was to restore the offnum, which is set to Invalid at the start
> > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but
> > should be restored or re-set to Invalid when returns to lazy_scan_heap().  If
> > you think it's important, we could just set vacrelstats->offnum = Invalid
> > before returning,
> >
>
> Yeah, I would prefer that and probably a comment to indicate why we
> are doing that.
>

Changed accordingly in the updated patch.

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Thu, Aug 6, 2020 at 7:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 5, 2020 at 12:47 AM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > Apart from these, I fixed Justin's comment of extra brackets(That was
> > due to "patch -p 1 < file", as 002_fix was not applying directly). I
> > haven't updated the document for this(Sawada's comment). I will try in
> > the next patch.
> > Attaching v4 patch for review.
> >
>
> Few comments on the latest patch:
> 1.
> @@ -2640,6 +2659,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats
> *vacrelstats)
>   */
>   new_rel_pages = count_nondeletable_pages(onerel, vacrelstats);
>   vacrelstats->blkno = new_rel_pages;
> + vacrelstats->offnum = InvalidOffsetNumber;
>
> Do we really need to update the 'vacrelstats->offnum' here when we
> have already set it to InvalidOffsetNumber in the caller?
>

I have removed this change.

> 2.
> @@ -3574,8 +3605,14 @@ vacuum_error_callback(void *arg)
>   {
>   case VACUUM_ERRCB_PHASE_SCAN_HEAP:
>   if (BlockNumberIsValid(errinfo->blkno))
> - errcontext("while scanning block %u of relation \"%s.%s\"",
> -    errinfo->blkno, errinfo->relnamespace, errinfo->relname);
> + {
> + if (OffsetNumberIsValid(errinfo->offnum))
> + errcontext("while scanning block %u of relation \"%s.%s\", item offset %u",
> +
>
> I am not completely sure if this error message is an improvement over
> what you have in the initial version of patch "while scanning block %u
> and offset %u of relation \"%s.%s\"",...". I see that Justin has
> raised a concern above that whether users will be aware of 'offset'
> but I also see that we have used it in a few other messages in the
> code.

I have changed the message to what you have in the original patch.

Apart from above, I have also reset the offset number back to
InvalidOffsetNumber in lazy_scan_heap after processing all the tuples,
otherwise, it will erroneously display wring offset if any error
occurred afterward.

Let me know what you think of the changes done in the latest patch.

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Fri, Aug 14, 2020 at 4:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Aug 10, 2020 at 10:24 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > It's true that heap_page_is_all_visible() is called from only
> > lazy_vacuum_page() but I'm concerned it would lead misleading since
> > it's not actually removing tuples but just checking after vacuum. I
> > guess that the errcontext should show what the process is actually
> > doing and therefore help the investigation, so I thought VACUUM_HEAP
> > might not be appropriate for this case. But I see also your point.
> > Other vacuum error context phases match with vacuum progress
> > information phrases. So in heap_page_is_all_visible (), I agree it's
> > better to update the offset number and keep the phase VACUUM_HEAP
> > rather than do nothing.
> >
>
> Okay, I have changed accordingly and this means that the offset will
> be displayed for the vacuum phase as well. Apart from this, I have
> fixed all the comments raised by me in the attached patch. One thing
> we need to think is do we want to set offset during heap_page_prune
> when called from lazy_scan_heap? I think the code path for
> heap_prune_chain is quite deep, so an error can occur in that path.
> What do you think?
>

The reason why I have not included heap_page_prune related change in
the patch is that I don't want to sprinkle this in every possible
function (code path) called via vacuum especially if the probability
of an error in that code path is low. But, I am fine if you and or
others think that it is a good idea to update offset in
heap_page_prune as well.

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Masahiko Sawada
Date:
On Sat, 15 Aug 2020 at 12:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Aug 14, 2020 at 4:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Aug 10, 2020 at 10:24 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > It's true that heap_page_is_all_visible() is called from only
> > > lazy_vacuum_page() but I'm concerned it would lead misleading since
> > > it's not actually removing tuples but just checking after vacuum. I
> > > guess that the errcontext should show what the process is actually
> > > doing and therefore help the investigation, so I thought VACUUM_HEAP
> > > might not be appropriate for this case. But I see also your point.
> > > Other vacuum error context phases match with vacuum progress
> > > information phrases. So in heap_page_is_all_visible (), I agree it's
> > > better to update the offset number and keep the phase VACUUM_HEAP
> > > rather than do nothing.
> > >
> >
> > Okay, I have changed accordingly and this means that the offset will
> > be displayed for the vacuum phase as well. Apart from this, I have
> > fixed all the comments raised by me in the attached patch. One thing
> > we need to think is do we want to set offset during heap_page_prune
> > when called from lazy_scan_heap? I think the code path for
> > heap_prune_chain is quite deep, so an error can occur in that path.
> > What do you think?
> >
>
> The reason why I have not included heap_page_prune related change in
> the patch is that I don't want to sprinkle this in every possible
> function (code path) called via vacuum especially if the probability
> of an error in that code path is low. But, I am fine if you and or
> others think that it is a good idea to update offset in
> heap_page_prune as well.

I agree to not try sprinkling it many places than necessity.

Regarding heap_page_prune(), I'm concerned a bit that
heap_page_prune() is typically the first function to check the tuple
visibility within the vacuum code. I've sometimes observed an error
with the message like "DETAIL: could not open file “pg_xact/00AB”: No
such file or directory" due to a tuple header corruption. I suspect
this message was emitted while checking tuple visibility in
heap_page_prune(). So I guess the likelihood of an error in that code
is not so low.

On the other hand, if we want to update the offset number in
heap_page_prune() we will need to expose some vacuum structs defined
as a static including LVRelStats. But I don't think it's a good idea.
The second idea I came up with is that we set another errcontext for
heap_page_prune(). Since heap_page_prune() could be called also by a
regular page scanning it would work fine for both cases, although
there will be extra overheads for both. What do you think?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Mon, Aug 17, 2020 at 11:38 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sat, 15 Aug 2020 at 12:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > The reason why I have not included heap_page_prune related change in
> > the patch is that I don't want to sprinkle this in every possible
> > function (code path) called via vacuum especially if the probability
> > of an error in that code path is low. But, I am fine if you and or
> > others think that it is a good idea to update offset in
> > heap_page_prune as well.
>
> I agree to not try sprinkling it many places than necessity.
>
> Regarding heap_page_prune(), I'm concerned a bit that
> heap_page_prune() is typically the first function to check the tuple
> visibility within the vacuum code. I've sometimes observed an error
> with the message like "DETAIL: could not open file “pg_xact/00AB”: No
> such file or directory" due to a tuple header corruption. I suspect
> this message was emitted while checking tuple visibility in
> heap_page_prune(). So I guess the likelihood of an error in that code
> is not so low.
>

Fair point.

> On the other hand, if we want to update the offset number in
> heap_page_prune() we will need to expose some vacuum structs defined
> as a static including LVRelStats.
>

I don't think we need to expose LVRelStats. We can just pass the
address of vacrelstats->offset_num to achieve what we want. I have
tried that and it works, see the
v6-0002-additinal-error-context-information-in-heap_page_ patch
attached. Do you see any problem with it?

--
With Regards,
Amit Kapila.

Attachment

Re: display offset along with block number in vacuum errors

From
Masahiko Sawada
Date:
On Tue, 18 Aug 2020 at 13:06, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Aug 17, 2020 at 11:38 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Sat, 15 Aug 2020 at 12:19, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > The reason why I have not included heap_page_prune related change in
> > > the patch is that I don't want to sprinkle this in every possible
> > > function (code path) called via vacuum especially if the probability
> > > of an error in that code path is low. But, I am fine if you and or
> > > others think that it is a good idea to update offset in
> > > heap_page_prune as well.
> >
> > I agree to not try sprinkling it many places than necessity.
> >
> > Regarding heap_page_prune(), I'm concerned a bit that
> > heap_page_prune() is typically the first function to check the tuple
> > visibility within the vacuum code. I've sometimes observed an error
> > with the message like "DETAIL: could not open file “pg_xact/00AB”: No
> > such file or directory" due to a tuple header corruption. I suspect
> > this message was emitted while checking tuple visibility in
> > heap_page_prune(). So I guess the likelihood of an error in that code
> > is not so low.
> >
>
> Fair point.
>
> > On the other hand, if we want to update the offset number in
> > heap_page_prune() we will need to expose some vacuum structs defined
> > as a static including LVRelStats.
> >
>
> I don't think we need to expose LVRelStats. We can just pass the
> address of vacrelstats->offset_num to achieve what we want. I have
> tried that and it works, see the
> v6-0002-additinal-error-context-information-in-heap_page_ patch
> attached. Do you see any problem with it?

Yes, you're right. I'm concerned a bit the number of arguments passing
to heap_page_prune() might get higher when we need other values to
update for errcontext, but I'm okay with the current patch.

Currently, we're in SCAN_HEAP phase in heap_page_prune() but should it
be VACUUM_HEAP instead?

Also, I've tested the patch with log_min_messages = 'info' and get the
following sever logs:

2020-08-19 14:28:09.917 JST [72912] INFO:  launched 1 parallel vacuum
worker for index vacuuming (planned: 1)
2020-08-19 14:28:09.917 JST [72912] CONTEXT:  while scanning block 973
of relation "public.tbl"
2020-08-19 14:28:09.959 JST [72912] INFO:  scanned index "i1" to
remove 109872 row versions
2020-08-19 14:28:09.959 JST [72912] DETAIL:  CPU: user: 0.04 s,
system: 0.00 s, elapsed: 0.04 s
2020-08-19 14:28:09.959 JST [72912] CONTEXT:  while vacuuming index
"i1" of relation "public.tbl"
2020-08-19 14:28:09.967 JST [72936] INFO:  scanned index "i2" to
remove 109872 row versions by parallel vacuum worker
2020-08-19 14:28:09.967 JST [72936] DETAIL:  CPU: user: 0.03 s,
system: 0.00 s, elapsed: 0.04 s
2020-08-19 14:28:09.967 JST [72936] CONTEXT:  while vacuuming index
"i2" of relation "public.tbl"
2020-08-19 14:28:09.967 JST [72912] INFO:  scanned index "i2" to
remove 109872 row versions by parallel vacuum worker
2020-08-19 14:28:09.967 JST [72912] DETAIL:  CPU: user: 0.03 s,
system: 0.00 s, elapsed: 0.04 s
2020-08-19 14:28:09.967 JST [72912] CONTEXT:  while vacuuming index
"i2" of relation "public.tbl"
        parallel worker
        while scanning block 973 of relation "public.tbl"
2020-08-19 14:28:09.968 JST [72912] INFO:  "tbl": removed 109872 row
versions in 487 pages
2020-08-19 14:28:09.968 JST [72912] DETAIL:  CPU: user: 0.00 s,
system: 0.00 s, elapsed: 0.00 s
2020-08-19 14:28:09.968 JST [72912] CONTEXT:  while vacuuming block
973 of relation "public.tbl"
2020-08-19 14:28:09.968 JST [72912] INFO:  index "i1" now contains
110000 row versions in 578 pages
2020-08-19 14:28:09.968 JST [72912] DETAIL:  109872 index row versions
were removed.
        0 index pages have been deleted, 0 are currently reusable.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
2020-08-19 14:28:09.968 JST [72912] CONTEXT:  while scanning block 973
of relation "public.tbl"
2020-08-19 14:28:09.968 JST [72912] INFO:  index "i2" now contains
110000 row versions in 578 pages
2020-08-19 14:28:09.968 JST [72912] DETAIL:  109872 index row versions
were removed.
        0 index pages have been deleted, 0 are currently reusable.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
2020-08-19 14:28:09.968 JST [72912] CONTEXT:  while scanning block 973
of relation "public.tbl"
2020-08-19 14:28:09.969 JST [72912] INFO:  "tbl": found 110000
removable, 110000 nonremovable row versions in 974 out of 974 pages
2020-08-19 14:28:09.969 JST [72912] DETAIL:  0 dead row versions
cannot be removed yet, oldest xmin: 519
        There were 372 unused item identifiers.
        Skipped 0 pages due to buffer pins, 0 frozen pages.
        0 pages are entirely empty.
        CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.06 s.
2020-08-19 14:28:09.969 JST [72912] CONTEXT:  while scanning block 973
of relation "public.tbl"

This is not directly related to the patch but it looks like we can
improve the current errcontext settings. For instance, the message
from lazy_vacuum_index(): there are two messages reporting the phases.
I've attached the patch that improves the current errcontext setting,
which can be applied before the patch adding offset number.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 18 Aug 2020 at 13:06, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > I don't think we need to expose LVRelStats. We can just pass the
> > address of vacrelstats->offset_num to achieve what we want. I have
> > tried that and it works, see the
> > v6-0002-additinal-error-context-information-in-heap_page_ patch
> > attached. Do you see any problem with it?
>
> Yes, you're right. I'm concerned a bit the number of arguments passing
> to heap_page_prune() might get higher when we need other values to
> update for errcontext, but I'm okay with the current patch.
>

Yeah, we might need to think if we want to increase the number of
parameters but not sure we need to worry at this stage. If required, I
think we can either expose LVRelStats or extract a few parameters from
it and form a separate structure that could be passed to
heap_page_prune.

> Currently, we're in SCAN_HEAP phase in heap_page_prune() but should it
> be VACUUM_HEAP instead?
>

I think it is currently similar to what we do in progress reporting.
We set to VACUUM_HEAP phase where the progress reporting is also set
to *HEAP_BLKS_VACUUMED. Currently, heap_page_prune() is covered under
*HEAP_BLKS_SCANNED, so I don't see a pressing need to change the error
context phase for heap_page_prune(). And also, we need to add some
more smarts in heap_page_prune() for this which I want to avoid.

> Also, I've tested the patch with log_min_messages = 'info' and get the
> following sever logs:
>
..
>
> This is not directly related to the patch but it looks like we can
> improve the current errcontext settings. For instance, the message
> from lazy_vacuum_index(): there are two messages reporting the phases.
> I've attached the patch that improves the current errcontext setting,
> which can be applied before the patch adding offset number.
>

After your patch, I see output like below with log_min_messages=info,

2020-08-20 10:11:46.769 IST [2640] INFO:  scanned index "idx_test_c1"
to remove 10000 row versions
2020-08-20 10:11:46.769 IST [2640] DETAIL:  CPU: user: 0.06 s, system:
0.01 s, elapsed: 0.06 s
2020-08-20 10:11:46.769 IST [2640] CONTEXT:  while vacuuming index
"idx_test_c1" of relation "public.test_vac"

2020-08-20 10:11:46.901 IST [2640] INFO:  scanned index "idx_test_c2"
to remove 10000 row versions
2020-08-20 10:11:46.901 IST [2640] DETAIL:  CPU: user: 0.10 s, system:
0.01 s, elapsed: 0.13 s
2020-08-20 10:11:46.901 IST [2640] CONTEXT:  while vacuuming index
"idx_test_c2" of relation "public.test_vac"

2020-08-20 10:11:46.917 IST [2640] INFO:  "test_vac": removed 10000
row versions in 667 pages
2020-08-20 10:11:46.917 IST [2640] DETAIL:  CPU: user: 0.01 s, system:
0.00 s, elapsed: 0.01 s

2020-08-20 10:11:46.928 IST [2640] INFO:  index "idx_test_c1" now
contains 50000 row versions in 276 pages
2020-08-20 10:11:46.928 IST [2640] DETAIL:  10000 index row versions
were removed.
        136 index pages have been deleted, 109 are currently reusable.
        CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
2020-08-20 10:11:46.928 IST [2640] CONTEXT:  while cleaning up index
"idx_test_c1" of relation "public.test_vac"

Here, we can notice that for the index, we are getting context
information but not for the heap. The reason is that in
vacuum_error_callback, we are not printing additional information for
phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP
when block number is invalid. If we want to cover the 'info' messages
then won't it be better if we print a message in those phases even
block number is invalid (something like 'while scanning relation
\"%s.%s\"")

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Masahiko Sawada
Date:
On Thu, 20 Aug 2020 at 14:01, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 18 Aug 2020 at 13:06, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > I don't think we need to expose LVRelStats. We can just pass the
> > > address of vacrelstats->offset_num to achieve what we want. I have
> > > tried that and it works, see the
> > > v6-0002-additinal-error-context-information-in-heap_page_ patch
> > > attached. Do you see any problem with it?
> >
> > Yes, you're right. I'm concerned a bit the number of arguments passing
> > to heap_page_prune() might get higher when we need other values to
> > update for errcontext, but I'm okay with the current patch.
> >
>
> Yeah, we might need to think if we want to increase the number of
> parameters but not sure we need to worry at this stage. If required, I
> think we can either expose LVRelStats or extract a few parameters from
> it and form a separate structure that could be passed to
> heap_page_prune.

Agreed.

>
> > Currently, we're in SCAN_HEAP phase in heap_page_prune() but should it
> > be VACUUM_HEAP instead?
> >
>
> I think it is currently similar to what we do in progress reporting.
> We set to VACUUM_HEAP phase where the progress reporting is also set
> to *HEAP_BLKS_VACUUMED. Currently, heap_page_prune() is covered under
> *HEAP_BLKS_SCANNED, so I don't see a pressing need to change the error
> context phase for heap_page_prune(). And also, we need to add some
> more smarts in heap_page_prune() for this which I want to avoid.

Agreed.

>
> > Also, I've tested the patch with log_min_messages = 'info' and get the
> > following sever logs:
> >
> ..
> >
> > This is not directly related to the patch but it looks like we can
> > improve the current errcontext settings. For instance, the message
> > from lazy_vacuum_index(): there are two messages reporting the phases.
> > I've attached the patch that improves the current errcontext setting,
> > which can be applied before the patch adding offset number.
> >
>
> After your patch, I see output like below with log_min_messages=info,
>
> 2020-08-20 10:11:46.769 IST [2640] INFO:  scanned index "idx_test_c1"
> to remove 10000 row versions
> 2020-08-20 10:11:46.769 IST [2640] DETAIL:  CPU: user: 0.06 s, system:
> 0.01 s, elapsed: 0.06 s
> 2020-08-20 10:11:46.769 IST [2640] CONTEXT:  while vacuuming index
> "idx_test_c1" of relation "public.test_vac"
>
> 2020-08-20 10:11:46.901 IST [2640] INFO:  scanned index "idx_test_c2"
> to remove 10000 row versions
> 2020-08-20 10:11:46.901 IST [2640] DETAIL:  CPU: user: 0.10 s, system:
> 0.01 s, elapsed: 0.13 s
> 2020-08-20 10:11:46.901 IST [2640] CONTEXT:  while vacuuming index
> "idx_test_c2" of relation "public.test_vac"
>
> 2020-08-20 10:11:46.917 IST [2640] INFO:  "test_vac": removed 10000
> row versions in 667 pages
> 2020-08-20 10:11:46.917 IST [2640] DETAIL:  CPU: user: 0.01 s, system:
> 0.00 s, elapsed: 0.01 s
>
> 2020-08-20 10:11:46.928 IST [2640] INFO:  index "idx_test_c1" now
> contains 50000 row versions in 276 pages
> 2020-08-20 10:11:46.928 IST [2640] DETAIL:  10000 index row versions
> were removed.
>         136 index pages have been deleted, 109 are currently reusable.
>         CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> 2020-08-20 10:11:46.928 IST [2640] CONTEXT:  while cleaning up index
> "idx_test_c1" of relation "public.test_vac"
>
> Here, we can notice that for the index, we are getting context
> information but not for the heap. The reason is that in
> vacuum_error_callback, we are not printing additional information for
> phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP
> when block number is invalid. If we want to cover the 'info' messages
> then won't it be better if we print a message in those phases even
> block number is invalid (something like 'while scanning relation
> \"%s.%s\"")

Yeah, there is an inconsistency. I agree to print the message even
when the block number is invalid. We're not actually doing any vacuum
jobs when printing the message but it would be less confusing than
printing the wrong phase and more consistent.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Thu, Aug 20, 2020 at 12:18 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Thu, 20 Aug 2020 at 14:01, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Here, we can notice that for the index, we are getting context
> > information but not for the heap. The reason is that in
> > vacuum_error_callback, we are not printing additional information for
> > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP
> > when block number is invalid. If we want to cover the 'info' messages
> > then won't it be better if we print a message in those phases even
> > block number is invalid (something like 'while scanning relation
> > \"%s.%s\"")
>
> Yeah, there is an inconsistency. I agree to print the message even
> when the block number is invalid.
>

Okay, I will update this and send this patch and rebased patch to
display offsets later today or tomorrow.

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Thu, Aug 20, 2020 at 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Aug 20, 2020 at 12:18 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Thu, 20 Aug 2020 at 14:01, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > Here, we can notice that for the index, we are getting context
> > > information but not for the heap. The reason is that in
> > > vacuum_error_callback, we are not printing additional information for
> > > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP
> > > when block number is invalid. If we want to cover the 'info' messages
> > > then won't it be better if we print a message in those phases even
> > > block number is invalid (something like 'while scanning relation
> > > \"%s.%s\"")
> >
> > Yeah, there is an inconsistency. I agree to print the message even
> > when the block number is invalid.
> >
>
> Okay, I will update this and send this patch and rebased patch to
> display offsets later today or tomorrow.
>

Attached are both the patches. The first one is to improve existing
error context information, so I think we should back-patch to 13. The
second one is to add additional vacuum error context information, so
that is for only HEAD. Does that make sense? Also, let me know if you
have any more comments.

-- 
With Regards,
Amit Kapila.

Attachment

Re: display offset along with block number in vacuum errors

From
Masahiko Sawada
Date:
On Thu, 20 Aug 2020 at 21:12, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Aug 20, 2020 at 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Aug 20, 2020 at 12:18 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Thu, 20 Aug 2020 at 14:01, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > Here, we can notice that for the index, we are getting context
> > > > information but not for the heap. The reason is that in
> > > > vacuum_error_callback, we are not printing additional information for
> > > > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP
> > > > when block number is invalid. If we want to cover the 'info' messages
> > > > then won't it be better if we print a message in those phases even
> > > > block number is invalid (something like 'while scanning relation
> > > > \"%s.%s\"")
> > >
> > > Yeah, there is an inconsistency. I agree to print the message even
> > > when the block number is invalid.
> > >
> >
> > Okay, I will update this and send this patch and rebased patch to
> > display offsets later today or tomorrow.
> >
>
> Attached are both the patches. The first one is to improve existing
> error context information, so I think we should back-patch to 13. The
> second one is to add additional vacuum error context information, so
> that is for only HEAD. Does that make sense? Also, let me know if you
> have any more comments.

Yes, makes sense to me.

I don't have comments on both patches. They look good to me.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Fri, Aug 21, 2020 at 12:31 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Thu, 20 Aug 2020 at 21:12, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Attached are both the patches. The first one is to improve existing
> > error context information, so I think we should back-patch to 13. The
> > second one is to add additional vacuum error context information, so
> > that is for only HEAD. Does that make sense? Also, let me know if you
> > have any more comments.
>
> Yes, makes sense to me.
>
> I don't have comments on both patches. They look good to me.
>

Thanks, I have pushed the first patch. I'll will push the second one
in a day or two unless someone has comments on the same.

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Mahendra Singh Thalor
Date:
On Thu, 20 Aug 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Aug 20, 2020 at 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Aug 20, 2020 at 12:18 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Thu, 20 Aug 2020 at 14:01, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > Here, we can notice that for the index, we are getting context
> > > > information but not for the heap. The reason is that in
> > > > vacuum_error_callback, we are not printing additional information for
> > > > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP
> > > > when block number is invalid. If we want to cover the 'info' messages
> > > > then won't it be better if we print a message in those phases even
> > > > block number is invalid (something like 'while scanning relation
> > > > \"%s.%s\"")
> > >
> > > Yeah, there is an inconsistency. I agree to print the message even
> > > when the block number is invalid.
> > >
> >
> > Okay, I will update this and send this patch and rebased patch to
> > display offsets later today or tomorrow.
> >
>
> Attached are both the patches. The first one is to improve existing
> error context information, so I think we should back-patch to 13. The
> second one is to add additional vacuum error context information, so
> that is for only HEAD. Does that make sense? Also, let me know if you
> have any more comments.

Thanks Amit for updating the patch. All changes in v7-02 look fine to me.

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
>
> --
> With Regards,
> Amit Kapila.



-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: display offset along with block number in vacuum errors

From
Amit Kapila
Date:
On Wed, Aug 26, 2020 at 8:54 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> On Thu, 20 Aug 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Attached are both the patches. The first one is to improve existing
> > error context information, so I think we should back-patch to 13. The
> > second one is to add additional vacuum error context information, so
> > that is for only HEAD. Does that make sense? Also, let me know if you
> > have any more comments.
>
> Thanks Amit for updating the patch. All changes in v7-02 look fine to me.
>

Okay, pushed v7-02 as well. I have marked the entry for this in CF as committed.

-- 
With Regards,
Amit Kapila.



Re: display offset along with block number in vacuum errors

From
Masahiko Sawada
Date:
On Wed, 26 Aug 2020 at 15:07, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 26, 2020 at 8:54 AM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > On Thu, 20 Aug 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Attached are both the patches. The first one is to improve existing
> > > error context information, so I think we should back-patch to 13. The
> > > second one is to add additional vacuum error context information, so
> > > that is for only HEAD. Does that make sense? Also, let me know if you
> > > have any more comments.
> >
> > Thanks Amit for updating the patch. All changes in v7-02 look fine to me.
> >
>
> Okay, pushed v7-02 as well. I have marked the entry for this in CF as committed.

Thank you!


Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services