Thread: display offset along with block number in vacuum errors
Hi hackers,
We discussed in another email thread[1], that it will be helpful if we can display offset along with block number in vacuum error. Here, proposing a patch to add offset along with block number in vacuum errors.
In commit b61d161(Introduce vacuum errcontext to display additional information), we added vacuum errcontext to display additional information(block number) so that in case of vacuum error, we can identify which block we are getting error. Addition to block number, if we can display offset, then it will be more helpful for users. So to display offset, here proposing two different methods(Thanks Robert for suggesting these 2 methods):
Method 1: We can report the TID as well as the block number in errcontext.
- errcontext("while scanning block %u of relation \"%s.%s\"",
- errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+ errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
+ errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
- errinfo->blkno, errinfo->relnamespace, errinfo->relname);
+ errcontext("while scanning block %u and offset %u of relation \"%s.%s\"",
+ errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname);
Above fix requires more calls to update_vacuum_error_info(). Attaching v01_0001 patch for this method.
Method 2: We can improve the error messages by passing the relevant TID to heap_prepare_freeze_tuple and having it report the TID as part of the error message or in the error detail.
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
- errmsg_internal("found xmin %u from before relfrozenxid %u",
+ errmsg_internal("for block %u and offnum %u, found xmin %u from before relfrozenxid %u",
+ ItemPointerGetBlockNumber(tid),
+ ItemPointerGetOffsetNumber(tid),
xid, relfrozenxid)));
(errcode(ERRCODE_DATA_CORRUPTED),
- errmsg_internal("found xmin %u from before relfrozenxid %u",
+ errmsg_internal("for block %u and offnum %u, found xmin %u from before relfrozenxid %u",
+ ItemPointerGetBlockNumber(tid),
+ ItemPointerGetOffsetNumber(tid),
xid, relfrozenxid)));
Attaching v01_0002 patch for this method.
Please let me know your thoughts.
Attachment
On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote: > In commit b61d161(Introduce vacuum errcontext to display additional > information), we added vacuum errcontext to display additional > information(block number) so that in case of vacuum error, we can identify > which block we are getting error. Addition to block number, if we can > display offset, then it will be more helpful for users. So to display > offset, here proposing two different methods(Thanks Robert for suggesting > these 2 methods): new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); vacrelstats->blkno = new_rel_pages; + vacrelstats->offnum = InvalidOffsetNumber; Adding more context would be interesting for some cases, but not all contrary to what your patch does in some code paths like lazy_truncate_heap() as you would show up an offset of 0 for an invalid value. This could confuse some users. Note that we are careful enough to not print a context message if the block number is invalid. -- Michael
Attachment
On Sat, 25 Jul 2020 at 02:49, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > Hi hackers, > We discussed in another email thread[1], that it will be helpful if we can display offset along with block number in vacuumerror. Here, proposing a patch to add offset along with block number in vacuum errors. > > In commit b61d161(Introduce vacuum errcontext to display additional information), we added vacuum errcontext to displayadditional information(block number) so that in case of vacuum error, we can identify which block we are getting error. Addition to block number, if we can display offset, then it will be more helpful for users. So to display offset,here proposing two different methods(Thanks Robert for suggesting these 2 methods): > > Method 1: We can report the TID as well as the block number in errcontext. > - errcontext("while scanning block %u of relation \"%s.%s\"", > - errinfo->blkno, errinfo->relnamespace, errinfo->relname); > + errcontext("while scanning block %u and offset %u of relation \"%s.%s\"", > + errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname); > > Above fix requires more calls to update_vacuum_error_info(). Attaching v01_0001 patch for this method. > > Method 2: We can improve the error messages by passing the relevant TID to heap_prepare_freeze_tuple and having it reportthe TID as part of the error message or in the error detail. > ereport(ERROR, > (errcode(ERRCODE_DATA_CORRUPTED), > - errmsg_internal("found xmin %u from before relfrozenxid %u", > + errmsg_internal("for block %u and offnum %u, found xmin %u from before relfrozenxid %u", > + ItemPointerGetBlockNumber(tid), > + ItemPointerGetOffsetNumber(tid), > xid, relfrozenxid))); > > Attaching v01_0002 patch for this method. > > Please let me know your thoughts. > +1 for adding offset in error messages. I had a look at 0001 patch. You've set the vacuum error info but I think an error won't happen during setting itemids unused: @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, BlockNumber tblk; OffsetNumber toff; ItemId itemid; + LVSavedErrInfo loc_saved_err_info; tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); if (tblk != blkno) break; /* past end of tuples for this block */ toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); + + /* Update error traceback information */ + update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, + blkno, toff); itemid = PageGetItemId(page, toff); ItemIdSetUnused(itemid); unused[uncnt++] = toff; + + /* Revert to the previous phase information for error traceback */ + restore_vacuum_error_info(vacrelstats, &loc_saved_err_info); } PageRepairFragmentation(page); Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks Michael for looking into this. On Sat, 25 Jul 2020 at 15:02, Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote: > > In commit b61d161(Introduce vacuum errcontext to display additional > > information), we added vacuum errcontext to display additional > > information(block number) so that in case of vacuum error, we can identify > > which block we are getting error. Addition to block number, if we can > > display offset, then it will be more helpful for users. So to display > > offset, here proposing two different methods(Thanks Robert for suggesting > > these 2 methods): > > new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); > vacrelstats->blkno = new_rel_pages; > + vacrelstats->offnum = InvalidOffsetNumber; > > Adding more context would be interesting for some cases, but not all > contrary to what your patch does in some code paths like > lazy_truncate_heap() as you would show up an offset of 0 for an > invalid value. This could confuse some users. Note that we are > careful enough to not print a context message if the block number is > invalid. Okay. I agree with you. In case of inavlid offset, we can skip offset printing. I will do this change in the next patch. Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote: > Hi hackers, > We discussed in another email thread[1], that it will be helpful if we can > display offset along with block number in vacuum error. Here, proposing a > patch to add offset along with block number in vacuum errors. Thanks. I happened to see both threads, only by chance. I'd already started writing the same as your 0001, which is essentially the same as yours. Here: @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, BlockNumber tblk; OffsetNumber toff; ItemId itemid; + LVSavedErrInfo loc_saved_err_info; tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); if (tblk != blkno) break; /* past end of tuples for this block */ toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); + + /* Update error traceback information */ + update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, + blkno, toff); itemid = PageGetItemId(page, toff); ItemIdSetUnused(itemid); unused[uncnt++] = toff; + + /* Revert to the previous phase information for error traceback */ + restore_vacuum_error_info(vacrelstats, &loc_saved_err_info); } I'm not sure why you use restore_vacuum_error_info() at all. It's already called at the end of lazy_vacuum_page() (and others) to allow functions to clean up after their own state changes, rather than requiring callers to do it. I don't think you should use it in a loop, nor introduce another LVSavedErrInfo. Since phase and blkno are already set, I think you should just set vacrelstats->offnum = toff, rather than calling update_vacuum_error_info(). Similar to whats done in lazy_vacuum_heap(): tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]); vacrelstats->blkno = tblk; I think you should also do: @@ -2976,6 +2984,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf, ItemId itemid; HeapTupleData tuple; + vacrelstats->offset = offnum; I'm not sure, but maybe you'd also want to do the same in more places: @@ -2024,6 +2030,7 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) @@ -2790,6 +2797,7 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats) -- Justin
Thanks Justin, Sawada and Michael for reviewing. On Mon, 27 Jul 2020 at 16:43, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote: > > Hi hackers, > > We discussed in another email thread[1], that it will be helpful if we can > > display offset along with block number in vacuum error. Here, proposing a > > patch to add offset along with block number in vacuum errors. > > Thanks. I happened to see both threads, only by chance. > > I'd already started writing the same as your 0001, which is essentially the > same as yours. > > Here: > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, > BlockNumber tblk; > OffsetNumber toff; > ItemId itemid; > + LVSavedErrInfo loc_saved_err_info; > > tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); > if (tblk != blkno) > break; /* past end of tuples for this block */ > toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); > + > + /* Update error traceback information */ > + update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, > + blkno, toff); > itemid = PageGetItemId(page, toff); > ItemIdSetUnused(itemid); > unused[uncnt++] = toff; > + > + /* Revert to the previous phase information for error traceback */ > + restore_vacuum_error_info(vacrelstats, &loc_saved_err_info); > } > > I'm not sure why you use restore_vacuum_error_info() at all. It's already > called at the end of lazy_vacuum_page() (and others) to allow functions to > clean up after their own state changes, rather than requiring callers to do it. > I don't think you should use it in a loop, nor introduce another > LVSavedErrInfo. > > Since phase and blkno are already set, I think you should just set > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info(). > Similar to whats done in lazy_vacuum_heap(): > > tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]); > vacrelstats->blkno = tblk; Fixed. > > I think you should also do: > > @@ -2976,6 +2984,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf, > ItemId itemid; > HeapTupleData tuple; > > + vacrelstats->offset = offnum; Agreed and fixed. > > I'm not sure, but maybe you'd also want to do the same in more places: > > @@ -2024,6 +2030,7 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) Fixed. > @@ -2790,6 +2797,7 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats) I checked the code and I think there is no need to do similar changes in count_nondeletable_pages. I will look again and will verify again. Apart from these, I fixed comments given by Sawada and Michael in the latest patch. Attaching v2 patch for review. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > Apart from these, I fixed comments given by Sawada and Michael in the > latest patch. Attaching v2 patch for review. Thanks. lazy_check_needs_freeze iterates over blocks and this patch changes it to update vacrelstats. I think it should do what lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at its beginning (even though only the offset is changed), and then restore_vacuum_error_info() at its end (to "revert back" to the item number it started with). The same is true of heap_page_is_all_visible(), except it's only called by lazy_vacuum_page(), which already calls restore_vacuum_error_info() a few lines later. As for the message: + if (OffsetNumberIsValid(errinfo->offnum)) + errcontext("while scanning block %u and offset %u of relation \"%s.%s\"", + errinfo->blkno, errinfo->offnum, errinfo->relnamespace, errinfo->relname); + else + errcontext("while scanning block %u of relation \"%s.%s\"", + errinfo->blkno, errinfo->relnamespace, errinfo->relname); I think that may be confusing. A DBA should know what a "block" is, but "offset" sounds like a byte offset, rather than an item number. Here's what I'd written. Maybe it should say "offset number". + errcontext("while vacuuming block %u of relation \"%s.%s\", item offset %u", -- Justin
Bcc: Subject: Re: display offset along with block number in vacuum errors Reply-To: In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com> On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > > Here: > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, > > BlockNumber tblk; > > OffsetNumber toff; > > ItemId itemid; > > + LVSavedErrInfo loc_saved_err_info; > > > > tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); > > if (tblk != blkno) > > break; /* past end of tuples for this block */ > > toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); > > + > > + /* Update error traceback information */ > > + update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, > > + blkno, toff); > > itemid = PageGetItemId(page, toff); > > ItemIdSetUnused(itemid); > > unused[uncnt++] = toff; > > + > > + /* Revert to the previous phase information for error traceback */ > > + restore_vacuum_error_info(vacrelstats, &loc_saved_err_info); > > } > > > > I'm not sure why you use restore_vacuum_error_info() at all. It's already > > called at the end of lazy_vacuum_page() (and others) to allow functions to > > clean up after their own state changes, rather than requiring callers to do it. > > I don't think you should use it in a loop, nor introduce another > > LVSavedErrInfo. > > > > Since phase and blkno are already set, I think you should just set > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info(). > > Similar to whats done in lazy_vacuum_heap(): > > > > tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]); > > vacrelstats->blkno = tblk; > > Fixed. I rearead this thread and I think the earlier suggestion from Masahiko was right. The loop around dead_tuples only does ItemIdSetUnused() which updates the page, which has already been read from disk. On my suggestion, your v2 patch sets offnum directly, but now I think it's not useful to set at all, since the whole page is manipulated by PageRepairFragmentation() and log_heap_clean(). An error there would misleadingly say "..at offset number MM", but would always show the page's last offset, and not the offset where an error occured. I added this at: https://commitfest.postgresql.org/29/2662/ If anyone is considering this patch for v13, I guess it should be completed by next week. -- Justin
On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote: > Bcc: > Subject: Re: display offset along with block number in vacuum errors > Reply-To: > In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com> whoops > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > > > Here: > > > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, > > > BlockNumber tblk; > > > OffsetNumber toff; > > > ItemId itemid; > > > + LVSavedErrInfo loc_saved_err_info; > > > > > > tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); > > > if (tblk != blkno) > > > break; /* past end of tuples for this block */ > > > toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); > > > + > > > + /* Update error traceback information */ > > > + update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, > > > + blkno, toff); > > > itemid = PageGetItemId(page, toff); > > > ItemIdSetUnused(itemid); > > > unused[uncnt++] = toff; > > > + > > > + /* Revert to the previous phase information for error traceback */ > > > + restore_vacuum_error_info(vacrelstats, &loc_saved_err_info); > > > } > > > > > > I'm not sure why you use restore_vacuum_error_info() at all. It's already > > > called at the end of lazy_vacuum_page() (and others) to allow functions to > > > clean up after their own state changes, rather than requiring callers to do it. > > > I don't think you should use it in a loop, nor introduce another > > > LVSavedErrInfo. > > > > > > Since phase and blkno are already set, I think you should just set > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info(). > > > Similar to whats done in lazy_vacuum_heap(): > > > > > > tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]); > > > vacrelstats->blkno = tblk; > > > > Fixed. > > I rearead this thread and I think the earlier suggestion from Masahiko was > right. The loop around dead_tuples only does ItemIdSetUnused() which updates > the page, which has already been read from disk. On my suggestion, your v2 > patch sets offnum directly, but now I think it's not useful to set at all, > since the whole page is manipulated by PageRepairFragmentation() and > log_heap_clean(). An error there would misleadingly say "..at offset number > MM", but would always show the page's last offset, and not the offset where an > error occured. This makes me question whether offset numbers are ever useful during VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by internal functions that don't update vacrelstats->offno. Note that my initial problem report that led to the errcontext implementation was an ERROR in heap *scan* (not vacuum). So an offset number at that point would've been sufficient. https://www.postgresql.org/message-id/20190808012436.GG11185@telsasoft.com I mentioned that lazy_check_needs_freeze() should save and restore the errinfo, so an error in heap_page_prune() (for example) doesn't get the wrong offset associated with it. So see the attached changes on top of your v2 patch. postgres=# DROP TABLE tt; CREATE TABLE tt(a int) WITH (fillfactor=90); INSERT INTO tt SELECT generate_series(1,99999); VACUUMtt; UPDATE tt SET a=1 WHERE ctid='(345,10)'; UPDATE pg_class SET relfrozenxid=(relfrozenxid::text::int + (1<<30))::int::text::xidWHERE oid='tt'::regclass; VACUUM tt; ERROR: found xmin 1961 from before relfrozenxid 1073743785 CONTEXT: while scanning block 345 of relation "public.tt", item offset 205 Hmm.. is it confusing that the block number is 0-indexed but the offset is 1-indexed ? -- Justin
Attachment
Thanks Justin. On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote: > > Bcc: > > Subject: Re: display offset along with block number in vacuum errors > > Reply-To: > > In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com> > > whoops > > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > > > > Here: > > > > > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, > > > > BlockNumber tblk; > > > > OffsetNumber toff; > > > > ItemId itemid; > > > > + LVSavedErrInfo loc_saved_err_info; > > > > > > > > tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); > > > > if (tblk != blkno) > > > > break; /* past end of tuples for this block */ > > > > toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); > > > > + > > > > + /* Update error traceback information */ > > > > + update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, > > > > + blkno, toff); > > > > itemid = PageGetItemId(page, toff); > > > > ItemIdSetUnused(itemid); > > > > unused[uncnt++] = toff; > > > > + > > > > + /* Revert to the previous phase information for error traceback */ > > > > + restore_vacuum_error_info(vacrelstats, &loc_saved_err_info); > > > > } > > > > > > > > I'm not sure why you use restore_vacuum_error_info() at all. It's already > > > > called at the end of lazy_vacuum_page() (and others) to allow functions to > > > > clean up after their own state changes, rather than requiring callers to do it. > > > > I don't think you should use it in a loop, nor introduce another > > > > LVSavedErrInfo. > > > > > > > > Since phase and blkno are already set, I think you should just set > > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info(). > > > > Similar to whats done in lazy_vacuum_heap(): > > > > > > > > tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]); > > > > vacrelstats->blkno = tblk; > > > > > > Fixed. > > > > I rearead this thread and I think the earlier suggestion from Masahiko was > > right. The loop around dead_tuples only does ItemIdSetUnused() which updates > > the page, which has already been read from disk. On my suggestion, your v2 > > patch sets offnum directly, but now I think it's not useful to set at all, > > since the whole page is manipulated by PageRepairFragmentation() and > > log_heap_clean(). An error there would misleadingly say "..at offset number > > MM", but would always show the page's last offset, and not the offset where an > > error occured. > > This makes me question whether offset numbers are ever useful during > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by > internal functions that don't update vacrelstats->offno. Note that my initial > problem report that led to the errcontext implementation was an ERROR in heap > *scan* (not vacuum). So an offset number at that point would've been > sufficient. > https://www.postgresql.org/message-id/20190808012436.GG11185@telsasoft.com > > I mentioned that lazy_check_needs_freeze() should save and restore the errinfo, > so an error in heap_page_prune() (for example) doesn't get the wrong offset > associated with it. > > So see the attached changes on top of your v2 patch. Actually I was waiting for review comments from committer and other people also and was planning to send a patch after that. I already fixed your comments in my offline patch and was waiting for more comments. Anyway, thanks for delta patch. Here, attaching v3 patch for review. > > postgres=# DROP TABLE tt; CREATE TABLE tt(a int) WITH (fillfactor=90); INSERT INTO tt SELECT generate_series(1,99999);VACUUM tt; UPDATE tt SET a=1 WHERE ctid='(345,10)'; UPDATE pg_class SET relfrozenxid=(relfrozenxid::text::int+ (1<<30))::int::text::xid WHERE oid='tt'::regclass; VACUUM tt; > ERROR: found xmin 1961 from before relfrozenxid 1073743785 > CONTEXT: while scanning block 345 of relation "public.tt", item offset 205 > > Hmm.. is it confusing that the block number is 0-indexed but the offset is > 1-indexed ? > > -- > Justin -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.c
Attachment
On Sat, Aug 01, 2020 at 12:31:53PM +0530, Mahendra Singh Thalor wrote: > Actually I was waiting for review comments from committer and other > people also and was planning to send a patch after that. I already > fixed your comments in my offline patch and was waiting for more > comments. Anyway, thanks for delta patch. > > Here, attaching v3 patch for review. I wasn't being impatient but I spent enough time thinking about this that it made sense to put it in patch form. Your patch has a couple extaneous changes: case VACUUM_ERRCB_PHASE_VACUUM_HEAP: if (BlockNumberIsValid(errinfo->blkno)) + { errcontext("while vacuuming block %u of relation \"%s.%s\"", errinfo->blkno, errinfo->relnamespace, errinfo->relname); + } break; case VACUUM_ERRCB_PHASE_VACUUM_INDEX: @@ -3589,6 +3618,7 @@ vacuum_error_callback(void *arg) errinfo->indname, errinfo->relnamespace, errinfo->relname); break; + case VACUUM_ERRCB_PHASE_INDEX_CLEANUP: errcontext("while cleaning up index \"%s\" of relation \"%s.%s\"", I would get rid of these by doing like: git reset -p HEAD~1 (say "n" to most hunks and "y" to reset just the two, above), then git commit --amend (without -a and without pathnames), then git diff will show local changes (including those no-longer-committed hunks), which you can git checkout -p (or similar). I'd be interested to hear if there's a better way. -- Justin
On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > Thanks Justin. > > On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote: > > > Bcc: > > > Subject: Re: display offset along with block number in vacuum errors > > > Reply-To: > > > In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com> > > > > whoops > > > > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > > > > > Here: > > > > > > > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, > > > > > BlockNumber tblk; > > > > > OffsetNumber toff; > > > > > ItemId itemid; > > > > > + LVSavedErrInfo loc_saved_err_info; > > > > > > > > > > tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); > > > > > if (tblk != blkno) > > > > > break; /* past end of tuples for this block */ > > > > > toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); > > > > > + > > > > > + /* Update error traceback information */ > > > > > + update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, > > > > > + blkno, toff); > > > > > itemid = PageGetItemId(page, toff); > > > > > ItemIdSetUnused(itemid); > > > > > unused[uncnt++] = toff; > > > > > + > > > > > + /* Revert to the previous phase information for error traceback */ > > > > > + restore_vacuum_error_info(vacrelstats, &loc_saved_err_info); > > > > > } > > > > > > > > > > I'm not sure why you use restore_vacuum_error_info() at all. It's already > > > > > called at the end of lazy_vacuum_page() (and others) to allow functions to > > > > > clean up after their own state changes, rather than requiring callers to do it. > > > > > I don't think you should use it in a loop, nor introduce another > > > > > LVSavedErrInfo. > > > > > > > > > > Since phase and blkno are already set, I think you should just set > > > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info(). > > > > > Similar to whats done in lazy_vacuum_heap(): > > > > > > > > > > tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]); > > > > > vacrelstats->blkno = tblk; > > > > > > > > Fixed. > > > > > > I rearead this thread and I think the earlier suggestion from Masahiko was > > > right. The loop around dead_tuples only does ItemIdSetUnused() which updates > > > the page, which has already been read from disk. On my suggestion, your v2 > > > patch sets offnum directly, but now I think it's not useful to set at all, > > > since the whole page is manipulated by PageRepairFragmentation() and > > > log_heap_clean(). An error there would misleadingly say "..at offset number > > > MM", but would always show the page's last offset, and not the offset where an > > > error occured. > > > > This makes me question whether offset numbers are ever useful during > > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by > > internal functions that don't update vacrelstats->offno. Note that my initial > > problem report that led to the errcontext implementation was an ERROR in heap > > *scan* (not vacuum). So an offset number at that point would've been > > sufficient. > > https://www.postgresql.org/message-id/20190808012436.GG11185@telsasoft.com > > > > I mentioned that lazy_check_needs_freeze() should save and restore the errinfo, > > so an error in heap_page_prune() (for example) doesn't get the wrong offset > > associated with it. > > > > So see the attached changes on top of your v2 patch. > > Actually I was waiting for review comments from committer and other > people also and was planning to send a patch after that. I already > fixed your comments in my offline patch and was waiting for more > comments. Anyway, thanks for delta patch. > > Here, attaching v3 patch for review. Thank you for updating the patch! Here are my comments on v3 patch: @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) if (PageIsNew(page) || PageIsEmpty(page)) return false; + /* Update error traceback information */ + update_vacuum_error_info(vacrelstats, &saved_err_info, + VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno, + InvalidOffsetNumber); + maxoff = PageGetMaxOffsetNumber(page); for (offnum = FirstOffsetNumber; offnum <= maxoff; You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP but I think we're already in that phase. I'm okay with explicitly setting it but on the other hand, we don't set the phase in heap_page_is_all_visible(). Is there any reason for that? Also, since we don't reset vacrelstats->offnum at the end of heap_page_is_all_visible(), if an error occurs by the end of lazy_vacuum_page(), the caller of heap_page_is_all_visible(), we report the error context with the last offset number in the page, making the users confused. --- @@ -2045,10 +2060,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) if (heap_tuple_needs_freeze(tupleheader, FreezeLimit, MultiXactCutoff, buf)) - return true; + break; } /* scan along page */ - return false; + /* Revert to the previous phase information for error traceback */ + restore_vacuum_error_info(vacrelstats, &saved_err_info); + + return offnum <= maxoff ? true : false; } I think we can write just "return (offnum <= maxoff)". --- - /* Revert back to the old phase information for error traceback */ + /* Revert to the old phase information for error traceback */ If we want to modify this comment how about the following phrase for consistency with other places? /* Revert to the previous phase information for error traceback */ Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, 2 Aug 2020 at 13:24, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Sun, Aug 02, 2020 at 01:02:42PM +0900, Masahiko Sawada wrote: > > Thank you for updating the patch! > > > > Here are my comments on v3 patch: > > > > @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) > > if (PageIsNew(page) || PageIsEmpty(page)) > > return false; > > > > + /* Update error traceback information */ > > + update_vacuum_error_info(vacrelstats, &saved_err_info, > > + VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno, > > + InvalidOffsetNumber); > > + > > maxoff = PageGetMaxOffsetNumber(page); > > for (offnum = FirstOffsetNumber; > > offnum <= maxoff; > > > > You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP > > but I think we're already in that phase. I'm okay with explicitly > > setting it but on the other hand, we don't set the phase in > > heap_page_is_all_visible(). Is there any reason for that? > > That part was my suggestion, so I can answer that. I added > update_vacuum_error_info() to lazy_check_needs_freeze() to allow it to later > call restore_vacuum_error_info(). > > > Also, since we don't reset vacrelstats->offnum at the end of > > heap_page_is_all_visible(), if an error occurs by the end of > > lazy_vacuum_page(), the caller of heap_page_is_all_visible(), we > > report the error context with the last offset number in the page, > > making the users confused. > > So it looks like heap_page_is_all_visible() should also call the update and > restore functions. > > Do you agree with my suggestion that the VACUUM phase should never try to > report an offset ? Yeah, given the current heap vacuum implementation, I agree that setting the offset number during VACUUM_HEAP phase doesn't help anything. But setting the offset number during checking tuples' visibility in heap_page_is_all_visible() might be useful, although it might be unlikely to find a problem in heap_page_is_all_visible() as the tuple visibility checking is already done in lazy_scan_heap(). I wonder if we can set SCAN_HEAP phase and update the offset number in heap_page_is_all_visible(). > How do you think we can phrase the message to avoid confusion due to 0-based > block number and 1-based offset ? I think that since the user who uses this errcontext information is likely to know more or less the internal of PostgreSQL I think 0-based block number and 1-based offset will not be a problem. However, I expected these are documented but couldn't find. If not yet, I think it’s a good time to document that. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks Sawada and Justin. On Sun, 2 Aug 2020 at 09:33, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sat, 1 Aug 2020 at 16:02, Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > > > Thanks Justin. > > > > On Sat, 1 Aug 2020 at 11:47, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > On Fri, Jul 31, 2020 at 04:55:14PM -0500, Justin Pryzby wrote: > > > > Bcc: > > > > Subject: Re: display offset along with block number in vacuum errors > > > > Reply-To: > > > > In-Reply-To: <CAKYtNApLJjAaRw0UEBBY6G1o0LRZKS7rA5n46BFh+NfwSOycdg@mail.gmail.com> > > > > > > whoops > > > > > > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > > > > > > Here: > > > > > > > > > > > > @@ -1924,14 +1932,22 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer, > > > > > > BlockNumber tblk; > > > > > > OffsetNumber toff; > > > > > > ItemId itemid; > > > > > > + LVSavedErrInfo loc_saved_err_info; > > > > > > > > > > > > tblk = ItemPointerGetBlockNumber(&dead_tuples->itemptrs[tupindex]); > > > > > > if (tblk != blkno) > > > > > > break; /* past end of tuples for this block */ > > > > > > toff = ItemPointerGetOffsetNumber(&dead_tuples->itemptrs[tupindex]); > > > > > > + > > > > > > + /* Update error traceback information */ > > > > > > + update_vacuum_error_info(vacrelstats, &loc_saved_err_info, VACUUM_ERRCB_PHASE_VACUUM_HEAP, > > > > > > + blkno, toff); > > > > > > itemid = PageGetItemId(page, toff); > > > > > > ItemIdSetUnused(itemid); > > > > > > unused[uncnt++] = toff; > > > > > > + > > > > > > + /* Revert to the previous phase information for error traceback */ > > > > > > + restore_vacuum_error_info(vacrelstats, &loc_saved_err_info); > > > > > > } > > > > > > > > > > > > I'm not sure why you use restore_vacuum_error_info() at all. It's already > > > > > > called at the end of lazy_vacuum_page() (and others) to allow functions to > > > > > > clean up after their own state changes, rather than requiring callers to do it. > > > > > > I don't think you should use it in a loop, nor introduce another > > > > > > LVSavedErrInfo. > > > > > > > > > > > > Since phase and blkno are already set, I think you should just set > > > > > > vacrelstats->offnum = toff, rather than calling update_vacuum_error_info(). > > > > > > Similar to whats done in lazy_vacuum_heap(): > > > > > > > > > > > > tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples->itemptrs[tupindex]); > > > > > > vacrelstats->blkno = tblk; > > > > > > > > > > Fixed. > > > > > > > > I rearead this thread and I think the earlier suggestion from Masahiko was > > > > right. The loop around dead_tuples only does ItemIdSetUnused() which updates > > > > the page, which has already been read from disk. On my suggestion, your v2 > > > > patch sets offnum directly, but now I think it's not useful to set at all, > > > > since the whole page is manipulated by PageRepairFragmentation() and > > > > log_heap_clean(). An error there would misleadingly say "..at offset number > > > > MM", but would always show the page's last offset, and not the offset where an > > > > error occured. > > > > > > This makes me question whether offset numbers are ever useful during > > > VACUUM_HEAP, since the real work is done a page at a time (not tuple) or by > > > internal functions that don't update vacrelstats->offno. Note that my initial > > > problem report that led to the errcontext implementation was an ERROR in heap > > > *scan* (not vacuum). So an offset number at that point would've been > > > sufficient. > > > https://www.postgresql.org/message-id/20190808012436.GG11185@telsasoft.com > > > > > > I mentioned that lazy_check_needs_freeze() should save and restore the errinfo, > > > so an error in heap_page_prune() (for example) doesn't get the wrong offset > > > associated with it. > > > > > > So see the attached changes on top of your v2 patch. > > > > Actually I was waiting for review comments from committer and other > > people also and was planning to send a patch after that. I already > > fixed your comments in my offline patch and was waiting for more > > comments. Anyway, thanks for delta patch. > > > > Here, attaching v3 patch for review. > > Thank you for updating the patch! > > Here are my comments on v3 patch: > > @@ -2024,6 +2033,11 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) > if (PageIsNew(page) || PageIsEmpty(page)) > return false; > > + /* Update error traceback information */ > + update_vacuum_error_info(vacrelstats, &saved_err_info, > + VACUUM_ERRCB_PHASE_SCAN_HEAP, vacrelstats->blkno, > + InvalidOffsetNumber); > + > maxoff = PageGetMaxOffsetNumber(page); > for (offnum = FirstOffsetNumber; > offnum <= maxoff; > > You update the error callback phase to VACUUM_ERRCB_PHASE_SCAN_HEAP > but I think we're already in that phase. I'm okay with explicitly > setting it but on the other hand, we don't set the phase in > heap_page_is_all_visible(). Is there any reason for that? > > Also, since we don't reset vacrelstats->offnum at the end of > heap_page_is_all_visible(), if an error occurs by the end of > lazy_vacuum_page(), the caller of heap_page_is_all_visible(), we > report the error context with the last offset number in the page, > making the users confused. Your point is valid. Added update and restore functions in heap_page_is_all_visible in the latest patch. > > --- > @@ -2045,10 +2060,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup) > > if (heap_tuple_needs_freeze(tupleheader, FreezeLimit, > MultiXactCutoff, buf)) > - return true; > + break; > } /* scan along page */ > > - return false; > + /* Revert to the previous phase information for error traceback */ > + restore_vacuum_error_info(vacrelstats, &saved_err_info); > + > + return offnum <= maxoff ? true : false; > } > > I think we can write just "return (offnum <= maxoff)". Fixed this. > > --- > - /* Revert back to the old phase information for error traceback */ > + /* Revert to the old phase information for error traceback */ > > If we want to modify this comment how about the following phrase for > consistency with other places? Fixed this. > > /* Revert to the previous phase information for error traceback */ > > Regards, > > -- > Masahiko Sawada http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services Apart from these, I fixed Justin's comment of extra brackets(That was due to "patch -p 1 < file", as 002_fix was not applying directly). I haven't updated the document for this(Sawada's comment). I will try in the next patch. Attaching v4 patch for review. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
Attachment
On Sun, Aug 2, 2020 at 10:43 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > I think that since the user who uses this errcontext information is > likely to know more or less the internal of PostgreSQL I think 0-based > block number and 1-based offset will not be a problem. However, I > expected these are documented but couldn't find. If not yet, I think > it’s a good time to document that. I agree. That's just how TIDs are. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > > Apart from these, I fixed comments given by Sawada and Michael in the > > latest patch. Attaching v2 patch for review. > > Thanks. > > lazy_check_needs_freeze iterates over blocks and this patch changes it to > update vacrelstats. I think it should do what > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at > its beginning (even though only the offset is changed), and then > restore_vacuum_error_info() at its end (to "revert back" to the item number it > started with). > I see that Mahendra has changed patch as per this suggestion but I am not convinced that it is a good idea to sprinkle update_vacuum_error_info()/restore_vacuum_error_info() at places more than required. I see that it might look a bit clean from the perspective that if tomorrow we use the function lazy_check_needs_freeze() for a different purpose then we don't need to worry about the wrong phase information. If we are worried about that then we should have an assert in that function to ensure that the current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP. -- With Regards, Amit Kapila.
On Wed, Aug 5, 2020 at 12:47 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > Apart from these, I fixed Justin's comment of extra brackets(That was > due to "patch -p 1 < file", as 002_fix was not applying directly). I > haven't updated the document for this(Sawada's comment). I will try in > the next patch. > Attaching v4 patch for review. > Few comments on the latest patch: 1. @@ -2640,6 +2659,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats) */ new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); vacrelstats->blkno = new_rel_pages; + vacrelstats->offnum = InvalidOffsetNumber; Do we really need to update the 'vacrelstats->offnum' here when we have already set it to InvalidOffsetNumber in the caller? 2. @@ -3574,8 +3605,14 @@ vacuum_error_callback(void *arg) { case VACUUM_ERRCB_PHASE_SCAN_HEAP: if (BlockNumberIsValid(errinfo->blkno)) - errcontext("while scanning block %u of relation \"%s.%s\"", - errinfo->blkno, errinfo->relnamespace, errinfo->relname); + { + if (OffsetNumberIsValid(errinfo->offnum)) + errcontext("while scanning block %u of relation \"%s.%s\", item offset %u", + I am not completely sure if this error message is an improvement over what you have in the initial version of patch "while scanning block %u and offset %u of relation \"%s.%s\"",...". I see that Justin has raised a concern above that whether users will be aware of 'offset' but I also see that we have used it in a few other messages in the code. For example: PageIndexTupleDeleteNoCompact() { .. nline = PageGetMaxOffsetNumber(page); if ((int) offnum <= 0 || (int) offnum > nline) elog(ERROR, "invalid index offnum: %u", offnum); .. } hash_desc { .. case XLOG_HASH_INSERT: { xl_hash_insert *xlrec = (xl_hash_insert *) rec; appendStringInfo(buf, "off %u", xlrec->offnum); break; } Similarly in other desc functions, we have used off or offnum. I find the message in your initial patch better. -- With Regards, Amit Kapila.
On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote: > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Wed, Jul 29, 2020 at 12:35:17AM +0530, Mahendra Singh Thalor wrote: > > > Apart from these, I fixed comments given by Sawada and Michael in the > > > latest patch. Attaching v2 patch for review. > > > > Thanks. > > > > lazy_check_needs_freeze iterates over blocks and this patch changes it to > > update vacrelstats. I think it should do what > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at > > its beginning (even though only the offset is changed), and then > > restore_vacuum_error_info() at its end (to "revert back" to the item number it > > started with). > > > > I see that Mahendra has changed patch as per this suggestion but I am > not convinced that it is a good idea to sprinkle > update_vacuum_error_info()/restore_vacuum_error_info() at places more > than required. I see that it might look a bit clean from the > perspective that if tomorrow we use the function > lazy_check_needs_freeze() for a different purpose then we don't need > to worry about the wrong phase information. If we are worried about > that then we should have an assert in that function to ensure that the > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP. The motivation was to restore the offnum, which is set to Invalid at the start of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but should be restored or re-set to Invalid when returns to lazy_scan_heap(). If you think it's important, we could just set vacrelstats->offnum = Invalid before returning, but that's what the restore function was built for. We do direct assignment in 2 places to avoid a function call within a loop. lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats, Relation *Irel, int nindexes, bool aggressive) ... for (blkno = 0; blkno < nblocks; blkno++) { ... update_vacuum_error_info(vacrelstats, NULL, VACUUM_ERRCB_PHASE_SCAN_HEAP, blkno, InvalidOffsetNumber); if (!ConditionalLockBufferForCleanup(buf)) { ... if (!lazy_check_needs_freeze(buf, &hastup, vacrelstats)) { ... for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum)) -- Justin
On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote: > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > > > lazy_check_needs_freeze iterates over blocks and this patch changes it to > > > update vacrelstats. I think it should do what > > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at > > > its beginning (even though only the offset is changed), and then > > > restore_vacuum_error_info() at its end (to "revert back" to the item number it > > > started with). > > > > > > > I see that Mahendra has changed patch as per this suggestion but I am > > not convinced that it is a good idea to sprinkle > > update_vacuum_error_info()/restore_vacuum_error_info() at places more > > than required. I see that it might look a bit clean from the > > perspective that if tomorrow we use the function > > lazy_check_needs_freeze() for a different purpose then we don't need > > to worry about the wrong phase information. If we are worried about > > that then we should have an assert in that function to ensure that the > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP. > > The motivation was to restore the offnum, which is set to Invalid at the start > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but > should be restored or re-set to Invalid when returns to lazy_scan_heap(). If > you think it's important, we could just set vacrelstats->offnum = Invalid > before returning, > Yeah, I would prefer that and probably a comment to indicate why we are doing that. > but that's what the restore function was built for. > I think it would be better to call restore wherever we call update. I see your point that there is some value doing it via update/restore but I think we should try to avoid that at many places unless it is required and we already update blockno information without update/restore at few places. -- With Regards, Amit Kapila.
On Fri, 7 Aug 2020 at 10:49, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote: > > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > > > > > > lazy_check_needs_freeze iterates over blocks and this patch changes it to > > > > update vacrelstats. I think it should do what > > > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at > > > > its beginning (even though only the offset is changed), and then > > > > restore_vacuum_error_info() at its end (to "revert back" to the item number it > > > > started with). > > > > > > > > > > I see that Mahendra has changed patch as per this suggestion but I am > > > not convinced that it is a good idea to sprinkle > > > update_vacuum_error_info()/restore_vacuum_error_info() at places more > > > than required. I see that it might look a bit clean from the > > > perspective that if tomorrow we use the function > > > lazy_check_needs_freeze() for a different purpose then we don't need > > > to worry about the wrong phase information. If we are worried about > > > that then we should have an assert in that function to ensure that the > > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP. > > > > The motivation was to restore the offnum, which is set to Invalid at the start > > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but > > should be restored or re-set to Invalid when returns to lazy_scan_heap(). If > > you think it's important, we could just set vacrelstats->offnum = Invalid > > before returning, > > > > Yeah, I would prefer that and probably a comment to indicate why we > are doing that. +1 I'm concerned that we call the update and restore in heap_page_is_all_visible(). Unlike lazy_check_needs_freeze(), this function is called for every vacuumed page. I commented that if we want to update the offset number during iterating tuples in the function we should change the phase to SCAN_HEAP at the beginning of the function because it's actually not vacuuming. But if the error is unlikely to happen within the function I think we can avoid updating the offset number and phase to avoid performance overhead. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Aug 7, 2020 at 8:10 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Fri, 7 Aug 2020 at 10:49, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote: > > > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > > > > > > > > > lazy_check_needs_freeze iterates over blocks and this patch changes it to > > > > > update vacrelstats. I think it should do what > > > > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at > > > > > its beginning (even though only the offset is changed), and then > > > > > restore_vacuum_error_info() at its end (to "revert back" to the item number it > > > > > started with). > > > > > > > > > > > > > I see that Mahendra has changed patch as per this suggestion but I am > > > > not convinced that it is a good idea to sprinkle > > > > update_vacuum_error_info()/restore_vacuum_error_info() at places more > > > > than required. I see that it might look a bit clean from the > > > > perspective that if tomorrow we use the function > > > > lazy_check_needs_freeze() for a different purpose then we don't need > > > > to worry about the wrong phase information. If we are worried about > > > > that then we should have an assert in that function to ensure that the > > > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP. > > > > > > The motivation was to restore the offnum, which is set to Invalid at the start > > > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but > > > should be restored or re-set to Invalid when returns to lazy_scan_heap(). If > > > you think it's important, we could just set vacrelstats->offnum = Invalid > > > before returning, > > > > > > > Yeah, I would prefer that and probably a comment to indicate why we > > are doing that. > > +1 > > I'm concerned that we call the update and restore in > heap_page_is_all_visible(). Unlike lazy_check_needs_freeze(), this > function is called for every vacuumed page. I commented that if we > want to update the offset number during iterating tuples in the > function we should change the phase to SCAN_HEAP at the beginning of > the function because it's actually not vacuuming. > AFAICS, heap_page_is_all_visible() is called from only one place and that is lazy_vacuum_page(), so I think if there is any error in heap_page_is_all_visible(), it should be considered as VACUUM_HEAP phase error. But if the error is > unlikely to happen within the function I think we can avoid updating > the offset number and phase to avoid performance overhead. > I am not sure we can guarantee that and even if it is true today one can add an error in that path in future. But I feel we can keep the phase as VACUUM_HEAP. -- With Regards, Amit Kapila.
On Fri, 7 Aug 2020 at 13:04, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Aug 7, 2020 at 8:10 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Fri, 7 Aug 2020 at 10:49, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote: > > > > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > > > > > > > > > > > > lazy_check_needs_freeze iterates over blocks and this patch changes it to > > > > > > update vacrelstats. I think it should do what > > > > > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at > > > > > > its beginning (even though only the offset is changed), and then > > > > > > restore_vacuum_error_info() at its end (to "revert back" to the item number it > > > > > > started with). > > > > > > > > > > > > > > > > I see that Mahendra has changed patch as per this suggestion but I am > > > > > not convinced that it is a good idea to sprinkle > > > > > update_vacuum_error_info()/restore_vacuum_error_info() at places more > > > > > than required. I see that it might look a bit clean from the > > > > > perspective that if tomorrow we use the function > > > > > lazy_check_needs_freeze() for a different purpose then we don't need > > > > > to worry about the wrong phase information. If we are worried about > > > > > that then we should have an assert in that function to ensure that the > > > > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP. > > > > > > > > The motivation was to restore the offnum, which is set to Invalid at the start > > > > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but > > > > should be restored or re-set to Invalid when returns to lazy_scan_heap(). If > > > > you think it's important, we could just set vacrelstats->offnum = Invalid > > > > before returning, > > > > > > > > > > Yeah, I would prefer that and probably a comment to indicate why we > > > are doing that. > > > > +1 > > > > I'm concerned that we call the update and restore in > > heap_page_is_all_visible(). Unlike lazy_check_needs_freeze(), this > > function is called for every vacuumed page. I commented that if we > > want to update the offset number during iterating tuples in the > > function we should change the phase to SCAN_HEAP at the beginning of > > the function because it's actually not vacuuming. > > > > AFAICS, heap_page_is_all_visible() is called from only one place and > that is lazy_vacuum_page(), so I think if there is any error in > heap_page_is_all_visible(), it should be considered as VACUUM_HEAP > phase error. It's true that heap_page_is_all_visible() is called from only lazy_vacuum_page() but I'm concerned it would lead misleading since it's not actually removing tuples but just checking after vacuum. I guess that the errcontext should show what the process is actually doing and therefore help the investigation, so I thought VACUUM_HEAP might not be appropriate for this case. But I see also your point. Other vacuum error context phases match with vacuum progress information phrases. So in heap_page_is_all_visible (), I agree it's better to update the offset number and keep the phase VACUUM_HEAP rather than do nothing. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Aug 10, 2020 at 10:24 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > It's true that heap_page_is_all_visible() is called from only > lazy_vacuum_page() but I'm concerned it would lead misleading since > it's not actually removing tuples but just checking after vacuum. I > guess that the errcontext should show what the process is actually > doing and therefore help the investigation, so I thought VACUUM_HEAP > might not be appropriate for this case. But I see also your point. > Other vacuum error context phases match with vacuum progress > information phrases. So in heap_page_is_all_visible (), I agree it's > better to update the offset number and keep the phase VACUUM_HEAP > rather than do nothing. > Okay, I have changed accordingly and this means that the offset will be displayed for the vacuum phase as well. Apart from this, I have fixed all the comments raised by me in the attached patch. One thing we need to think is do we want to set offset during heap_page_prune when called from lazy_scan_heap? I think the code path for heap_prune_chain is quite deep, so an error can occur in that path. What do you think? -- With Regards, Amit Kapila.
Attachment
On Fri, Aug 7, 2020 at 7:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 6, 2020 at 7:51 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Thu, Aug 06, 2020 at 07:39:21PM +0530, Amit Kapila wrote: > > > On Wed, Jul 29, 2020 at 1:09 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > > > > > > lazy_check_needs_freeze iterates over blocks and this patch changes it to > > > > update vacrelstats. I think it should do what > > > > lazy_{vacuum/cleanup}_heap/page/index do and call update_vacuum_error_info() at > > > > its beginning (even though only the offset is changed), and then > > > > restore_vacuum_error_info() at its end (to "revert back" to the item number it > > > > started with). > > > > > > > > > > I see that Mahendra has changed patch as per this suggestion but I am > > > not convinced that it is a good idea to sprinkle > > > update_vacuum_error_info()/restore_vacuum_error_info() at places more > > > than required. I see that it might look a bit clean from the > > > perspective that if tomorrow we use the function > > > lazy_check_needs_freeze() for a different purpose then we don't need > > > to worry about the wrong phase information. If we are worried about > > > that then we should have an assert in that function to ensure that the > > > current phase is VACUUM_ERRCB_PHASE_SCAN_HEAP. > > > > The motivation was to restore the offnum, which is set to Invalid at the start > > of lazy_scan_heap(), and then set valid within lazy_check_needs_freeze, but > > should be restored or re-set to Invalid when returns to lazy_scan_heap(). If > > you think it's important, we could just set vacrelstats->offnum = Invalid > > before returning, > > > > Yeah, I would prefer that and probably a comment to indicate why we > are doing that. > Changed accordingly in the updated patch. -- With Regards, Amit Kapila.
On Thu, Aug 6, 2020 at 7:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Aug 5, 2020 at 12:47 AM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > Apart from these, I fixed Justin's comment of extra brackets(That was > > due to "patch -p 1 < file", as 002_fix was not applying directly). I > > haven't updated the document for this(Sawada's comment). I will try in > > the next patch. > > Attaching v4 patch for review. > > > > Few comments on the latest patch: > 1. > @@ -2640,6 +2659,7 @@ lazy_truncate_heap(Relation onerel, LVRelStats > *vacrelstats) > */ > new_rel_pages = count_nondeletable_pages(onerel, vacrelstats); > vacrelstats->blkno = new_rel_pages; > + vacrelstats->offnum = InvalidOffsetNumber; > > Do we really need to update the 'vacrelstats->offnum' here when we > have already set it to InvalidOffsetNumber in the caller? > I have removed this change. > 2. > @@ -3574,8 +3605,14 @@ vacuum_error_callback(void *arg) > { > case VACUUM_ERRCB_PHASE_SCAN_HEAP: > if (BlockNumberIsValid(errinfo->blkno)) > - errcontext("while scanning block %u of relation \"%s.%s\"", > - errinfo->blkno, errinfo->relnamespace, errinfo->relname); > + { > + if (OffsetNumberIsValid(errinfo->offnum)) > + errcontext("while scanning block %u of relation \"%s.%s\", item offset %u", > + > > I am not completely sure if this error message is an improvement over > what you have in the initial version of patch "while scanning block %u > and offset %u of relation \"%s.%s\"",...". I see that Justin has > raised a concern above that whether users will be aware of 'offset' > but I also see that we have used it in a few other messages in the > code. I have changed the message to what you have in the original patch. Apart from above, I have also reset the offset number back to InvalidOffsetNumber in lazy_scan_heap after processing all the tuples, otherwise, it will erroneously display wring offset if any error occurred afterward. Let me know what you think of the changes done in the latest patch. -- With Regards, Amit Kapila.
On Fri, Aug 14, 2020 at 4:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Aug 10, 2020 at 10:24 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > It's true that heap_page_is_all_visible() is called from only > > lazy_vacuum_page() but I'm concerned it would lead misleading since > > it's not actually removing tuples but just checking after vacuum. I > > guess that the errcontext should show what the process is actually > > doing and therefore help the investigation, so I thought VACUUM_HEAP > > might not be appropriate for this case. But I see also your point. > > Other vacuum error context phases match with vacuum progress > > information phrases. So in heap_page_is_all_visible (), I agree it's > > better to update the offset number and keep the phase VACUUM_HEAP > > rather than do nothing. > > > > Okay, I have changed accordingly and this means that the offset will > be displayed for the vacuum phase as well. Apart from this, I have > fixed all the comments raised by me in the attached patch. One thing > we need to think is do we want to set offset during heap_page_prune > when called from lazy_scan_heap? I think the code path for > heap_prune_chain is quite deep, so an error can occur in that path. > What do you think? > The reason why I have not included heap_page_prune related change in the patch is that I don't want to sprinkle this in every possible function (code path) called via vacuum especially if the probability of an error in that code path is low. But, I am fine if you and or others think that it is a good idea to update offset in heap_page_prune as well. -- With Regards, Amit Kapila.
On Sat, 15 Aug 2020 at 12:19, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Aug 14, 2020 at 4:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Aug 10, 2020 at 10:24 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > It's true that heap_page_is_all_visible() is called from only > > > lazy_vacuum_page() but I'm concerned it would lead misleading since > > > it's not actually removing tuples but just checking after vacuum. I > > > guess that the errcontext should show what the process is actually > > > doing and therefore help the investigation, so I thought VACUUM_HEAP > > > might not be appropriate for this case. But I see also your point. > > > Other vacuum error context phases match with vacuum progress > > > information phrases. So in heap_page_is_all_visible (), I agree it's > > > better to update the offset number and keep the phase VACUUM_HEAP > > > rather than do nothing. > > > > > > > Okay, I have changed accordingly and this means that the offset will > > be displayed for the vacuum phase as well. Apart from this, I have > > fixed all the comments raised by me in the attached patch. One thing > > we need to think is do we want to set offset during heap_page_prune > > when called from lazy_scan_heap? I think the code path for > > heap_prune_chain is quite deep, so an error can occur in that path. > > What do you think? > > > > The reason why I have not included heap_page_prune related change in > the patch is that I don't want to sprinkle this in every possible > function (code path) called via vacuum especially if the probability > of an error in that code path is low. But, I am fine if you and or > others think that it is a good idea to update offset in > heap_page_prune as well. I agree to not try sprinkling it many places than necessity. Regarding heap_page_prune(), I'm concerned a bit that heap_page_prune() is typically the first function to check the tuple visibility within the vacuum code. I've sometimes observed an error with the message like "DETAIL: could not open file “pg_xact/00AB”: No such file or directory" due to a tuple header corruption. I suspect this message was emitted while checking tuple visibility in heap_page_prune(). So I guess the likelihood of an error in that code is not so low. On the other hand, if we want to update the offset number in heap_page_prune() we will need to expose some vacuum structs defined as a static including LVRelStats. But I don't think it's a good idea. The second idea I came up with is that we set another errcontext for heap_page_prune(). Since heap_page_prune() could be called also by a regular page scanning it would work fine for both cases, although there will be extra overheads for both. What do you think? Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Aug 17, 2020 at 11:38 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sat, 15 Aug 2020 at 12:19, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > The reason why I have not included heap_page_prune related change in > > the patch is that I don't want to sprinkle this in every possible > > function (code path) called via vacuum especially if the probability > > of an error in that code path is low. But, I am fine if you and or > > others think that it is a good idea to update offset in > > heap_page_prune as well. > > I agree to not try sprinkling it many places than necessity. > > Regarding heap_page_prune(), I'm concerned a bit that > heap_page_prune() is typically the first function to check the tuple > visibility within the vacuum code. I've sometimes observed an error > with the message like "DETAIL: could not open file “pg_xact/00AB”: No > such file or directory" due to a tuple header corruption. I suspect > this message was emitted while checking tuple visibility in > heap_page_prune(). So I guess the likelihood of an error in that code > is not so low. > Fair point. > On the other hand, if we want to update the offset number in > heap_page_prune() we will need to expose some vacuum structs defined > as a static including LVRelStats. > I don't think we need to expose LVRelStats. We can just pass the address of vacrelstats->offset_num to achieve what we want. I have tried that and it works, see the v6-0002-additinal-error-context-information-in-heap_page_ patch attached. Do you see any problem with it? -- With Regards, Amit Kapila.
Attachment
On Tue, 18 Aug 2020 at 13:06, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Aug 17, 2020 at 11:38 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Sat, 15 Aug 2020 at 12:19, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > The reason why I have not included heap_page_prune related change in > > > the patch is that I don't want to sprinkle this in every possible > > > function (code path) called via vacuum especially if the probability > > > of an error in that code path is low. But, I am fine if you and or > > > others think that it is a good idea to update offset in > > > heap_page_prune as well. > > > > I agree to not try sprinkling it many places than necessity. > > > > Regarding heap_page_prune(), I'm concerned a bit that > > heap_page_prune() is typically the first function to check the tuple > > visibility within the vacuum code. I've sometimes observed an error > > with the message like "DETAIL: could not open file “pg_xact/00AB”: No > > such file or directory" due to a tuple header corruption. I suspect > > this message was emitted while checking tuple visibility in > > heap_page_prune(). So I guess the likelihood of an error in that code > > is not so low. > > > > Fair point. > > > On the other hand, if we want to update the offset number in > > heap_page_prune() we will need to expose some vacuum structs defined > > as a static including LVRelStats. > > > > I don't think we need to expose LVRelStats. We can just pass the > address of vacrelstats->offset_num to achieve what we want. I have > tried that and it works, see the > v6-0002-additinal-error-context-information-in-heap_page_ patch > attached. Do you see any problem with it? Yes, you're right. I'm concerned a bit the number of arguments passing to heap_page_prune() might get higher when we need other values to update for errcontext, but I'm okay with the current patch. Currently, we're in SCAN_HEAP phase in heap_page_prune() but should it be VACUUM_HEAP instead? Also, I've tested the patch with log_min_messages = 'info' and get the following sever logs: 2020-08-19 14:28:09.917 JST [72912] INFO: launched 1 parallel vacuum worker for index vacuuming (planned: 1) 2020-08-19 14:28:09.917 JST [72912] CONTEXT: while scanning block 973 of relation "public.tbl" 2020-08-19 14:28:09.959 JST [72912] INFO: scanned index "i1" to remove 109872 row versions 2020-08-19 14:28:09.959 JST [72912] DETAIL: CPU: user: 0.04 s, system: 0.00 s, elapsed: 0.04 s 2020-08-19 14:28:09.959 JST [72912] CONTEXT: while vacuuming index "i1" of relation "public.tbl" 2020-08-19 14:28:09.967 JST [72936] INFO: scanned index "i2" to remove 109872 row versions by parallel vacuum worker 2020-08-19 14:28:09.967 JST [72936] DETAIL: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.04 s 2020-08-19 14:28:09.967 JST [72936] CONTEXT: while vacuuming index "i2" of relation "public.tbl" 2020-08-19 14:28:09.967 JST [72912] INFO: scanned index "i2" to remove 109872 row versions by parallel vacuum worker 2020-08-19 14:28:09.967 JST [72912] DETAIL: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.04 s 2020-08-19 14:28:09.967 JST [72912] CONTEXT: while vacuuming index "i2" of relation "public.tbl" parallel worker while scanning block 973 of relation "public.tbl" 2020-08-19 14:28:09.968 JST [72912] INFO: "tbl": removed 109872 row versions in 487 pages 2020-08-19 14:28:09.968 JST [72912] DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2020-08-19 14:28:09.968 JST [72912] CONTEXT: while vacuuming block 973 of relation "public.tbl" 2020-08-19 14:28:09.968 JST [72912] INFO: index "i1" now contains 110000 row versions in 578 pages 2020-08-19 14:28:09.968 JST [72912] DETAIL: 109872 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. 2020-08-19 14:28:09.968 JST [72912] CONTEXT: while scanning block 973 of relation "public.tbl" 2020-08-19 14:28:09.968 JST [72912] INFO: index "i2" now contains 110000 row versions in 578 pages 2020-08-19 14:28:09.968 JST [72912] DETAIL: 109872 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. 2020-08-19 14:28:09.968 JST [72912] CONTEXT: while scanning block 973 of relation "public.tbl" 2020-08-19 14:28:09.969 JST [72912] INFO: "tbl": found 110000 removable, 110000 nonremovable row versions in 974 out of 974 pages 2020-08-19 14:28:09.969 JST [72912] DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 519 There were 372 unused item identifiers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.06 s. 2020-08-19 14:28:09.969 JST [72912] CONTEXT: while scanning block 973 of relation "public.tbl" This is not directly related to the patch but it looks like we can improve the current errcontext settings. For instance, the message from lazy_vacuum_index(): there are two messages reporting the phases. I've attached the patch that improves the current errcontext setting, which can be applied before the patch adding offset number. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 18 Aug 2020 at 13:06, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I don't think we need to expose LVRelStats. We can just pass the > > address of vacrelstats->offset_num to achieve what we want. I have > > tried that and it works, see the > > v6-0002-additinal-error-context-information-in-heap_page_ patch > > attached. Do you see any problem with it? > > Yes, you're right. I'm concerned a bit the number of arguments passing > to heap_page_prune() might get higher when we need other values to > update for errcontext, but I'm okay with the current patch. > Yeah, we might need to think if we want to increase the number of parameters but not sure we need to worry at this stage. If required, I think we can either expose LVRelStats or extract a few parameters from it and form a separate structure that could be passed to heap_page_prune. > Currently, we're in SCAN_HEAP phase in heap_page_prune() but should it > be VACUUM_HEAP instead? > I think it is currently similar to what we do in progress reporting. We set to VACUUM_HEAP phase where the progress reporting is also set to *HEAP_BLKS_VACUUMED. Currently, heap_page_prune() is covered under *HEAP_BLKS_SCANNED, so I don't see a pressing need to change the error context phase for heap_page_prune(). And also, we need to add some more smarts in heap_page_prune() for this which I want to avoid. > Also, I've tested the patch with log_min_messages = 'info' and get the > following sever logs: > .. > > This is not directly related to the patch but it looks like we can > improve the current errcontext settings. For instance, the message > from lazy_vacuum_index(): there are two messages reporting the phases. > I've attached the patch that improves the current errcontext setting, > which can be applied before the patch adding offset number. > After your patch, I see output like below with log_min_messages=info, 2020-08-20 10:11:46.769 IST [2640] INFO: scanned index "idx_test_c1" to remove 10000 row versions 2020-08-20 10:11:46.769 IST [2640] DETAIL: CPU: user: 0.06 s, system: 0.01 s, elapsed: 0.06 s 2020-08-20 10:11:46.769 IST [2640] CONTEXT: while vacuuming index "idx_test_c1" of relation "public.test_vac" 2020-08-20 10:11:46.901 IST [2640] INFO: scanned index "idx_test_c2" to remove 10000 row versions 2020-08-20 10:11:46.901 IST [2640] DETAIL: CPU: user: 0.10 s, system: 0.01 s, elapsed: 0.13 s 2020-08-20 10:11:46.901 IST [2640] CONTEXT: while vacuuming index "idx_test_c2" of relation "public.test_vac" 2020-08-20 10:11:46.917 IST [2640] INFO: "test_vac": removed 10000 row versions in 667 pages 2020-08-20 10:11:46.917 IST [2640] DETAIL: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s 2020-08-20 10:11:46.928 IST [2640] INFO: index "idx_test_c1" now contains 50000 row versions in 276 pages 2020-08-20 10:11:46.928 IST [2640] DETAIL: 10000 index row versions were removed. 136 index pages have been deleted, 109 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. 2020-08-20 10:11:46.928 IST [2640] CONTEXT: while cleaning up index "idx_test_c1" of relation "public.test_vac" Here, we can notice that for the index, we are getting context information but not for the heap. The reason is that in vacuum_error_callback, we are not printing additional information for phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP when block number is invalid. If we want to cover the 'info' messages then won't it be better if we print a message in those phases even block number is invalid (something like 'while scanning relation \"%s.%s\"") -- With Regards, Amit Kapila.
On Thu, 20 Aug 2020 at 14:01, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 18 Aug 2020 at 13:06, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > I don't think we need to expose LVRelStats. We can just pass the > > > address of vacrelstats->offset_num to achieve what we want. I have > > > tried that and it works, see the > > > v6-0002-additinal-error-context-information-in-heap_page_ patch > > > attached. Do you see any problem with it? > > > > Yes, you're right. I'm concerned a bit the number of arguments passing > > to heap_page_prune() might get higher when we need other values to > > update for errcontext, but I'm okay with the current patch. > > > > Yeah, we might need to think if we want to increase the number of > parameters but not sure we need to worry at this stage. If required, I > think we can either expose LVRelStats or extract a few parameters from > it and form a separate structure that could be passed to > heap_page_prune. Agreed. > > > Currently, we're in SCAN_HEAP phase in heap_page_prune() but should it > > be VACUUM_HEAP instead? > > > > I think it is currently similar to what we do in progress reporting. > We set to VACUUM_HEAP phase where the progress reporting is also set > to *HEAP_BLKS_VACUUMED. Currently, heap_page_prune() is covered under > *HEAP_BLKS_SCANNED, so I don't see a pressing need to change the error > context phase for heap_page_prune(). And also, we need to add some > more smarts in heap_page_prune() for this which I want to avoid. Agreed. > > > Also, I've tested the patch with log_min_messages = 'info' and get the > > following sever logs: > > > .. > > > > This is not directly related to the patch but it looks like we can > > improve the current errcontext settings. For instance, the message > > from lazy_vacuum_index(): there are two messages reporting the phases. > > I've attached the patch that improves the current errcontext setting, > > which can be applied before the patch adding offset number. > > > > After your patch, I see output like below with log_min_messages=info, > > 2020-08-20 10:11:46.769 IST [2640] INFO: scanned index "idx_test_c1" > to remove 10000 row versions > 2020-08-20 10:11:46.769 IST [2640] DETAIL: CPU: user: 0.06 s, system: > 0.01 s, elapsed: 0.06 s > 2020-08-20 10:11:46.769 IST [2640] CONTEXT: while vacuuming index > "idx_test_c1" of relation "public.test_vac" > > 2020-08-20 10:11:46.901 IST [2640] INFO: scanned index "idx_test_c2" > to remove 10000 row versions > 2020-08-20 10:11:46.901 IST [2640] DETAIL: CPU: user: 0.10 s, system: > 0.01 s, elapsed: 0.13 s > 2020-08-20 10:11:46.901 IST [2640] CONTEXT: while vacuuming index > "idx_test_c2" of relation "public.test_vac" > > 2020-08-20 10:11:46.917 IST [2640] INFO: "test_vac": removed 10000 > row versions in 667 pages > 2020-08-20 10:11:46.917 IST [2640] DETAIL: CPU: user: 0.01 s, system: > 0.00 s, elapsed: 0.01 s > > 2020-08-20 10:11:46.928 IST [2640] INFO: index "idx_test_c1" now > contains 50000 row versions in 276 pages > 2020-08-20 10:11:46.928 IST [2640] DETAIL: 10000 index row versions > were removed. > 136 index pages have been deleted, 109 are currently reusable. > CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > 2020-08-20 10:11:46.928 IST [2640] CONTEXT: while cleaning up index > "idx_test_c1" of relation "public.test_vac" > > Here, we can notice that for the index, we are getting context > information but not for the heap. The reason is that in > vacuum_error_callback, we are not printing additional information for > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP > when block number is invalid. If we want to cover the 'info' messages > then won't it be better if we print a message in those phases even > block number is invalid (something like 'while scanning relation > \"%s.%s\"") Yeah, there is an inconsistency. I agree to print the message even when the block number is invalid. We're not actually doing any vacuum jobs when printing the message but it would be less confusing than printing the wrong phase and more consistent. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Aug 20, 2020 at 12:18 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 20 Aug 2020 at 14:01, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Here, we can notice that for the index, we are getting context > > information but not for the heap. The reason is that in > > vacuum_error_callback, we are not printing additional information for > > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP > > when block number is invalid. If we want to cover the 'info' messages > > then won't it be better if we print a message in those phases even > > block number is invalid (something like 'while scanning relation > > \"%s.%s\"") > > Yeah, there is an inconsistency. I agree to print the message even > when the block number is invalid. > Okay, I will update this and send this patch and rebased patch to display offsets later today or tomorrow. -- With Regards, Amit Kapila.
On Thu, Aug 20, 2020 at 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 20, 2020 at 12:18 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Thu, 20 Aug 2020 at 14:01, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > Here, we can notice that for the index, we are getting context > > > information but not for the heap. The reason is that in > > > vacuum_error_callback, we are not printing additional information for > > > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP > > > when block number is invalid. If we want to cover the 'info' messages > > > then won't it be better if we print a message in those phases even > > > block number is invalid (something like 'while scanning relation > > > \"%s.%s\"") > > > > Yeah, there is an inconsistency. I agree to print the message even > > when the block number is invalid. > > > > Okay, I will update this and send this patch and rebased patch to > display offsets later today or tomorrow. > Attached are both the patches. The first one is to improve existing error context information, so I think we should back-patch to 13. The second one is to add additional vacuum error context information, so that is for only HEAD. Does that make sense? Also, let me know if you have any more comments. -- With Regards, Amit Kapila.
Attachment
On Thu, 20 Aug 2020 at 21:12, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 20, 2020 at 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Aug 20, 2020 at 12:18 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Thu, 20 Aug 2020 at 14:01, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > Here, we can notice that for the index, we are getting context > > > > information but not for the heap. The reason is that in > > > > vacuum_error_callback, we are not printing additional information for > > > > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP > > > > when block number is invalid. If we want to cover the 'info' messages > > > > then won't it be better if we print a message in those phases even > > > > block number is invalid (something like 'while scanning relation > > > > \"%s.%s\"") > > > > > > Yeah, there is an inconsistency. I agree to print the message even > > > when the block number is invalid. > > > > > > > Okay, I will update this and send this patch and rebased patch to > > display offsets later today or tomorrow. > > > > Attached are both the patches. The first one is to improve existing > error context information, so I think we should back-patch to 13. The > second one is to add additional vacuum error context information, so > that is for only HEAD. Does that make sense? Also, let me know if you > have any more comments. Yes, makes sense to me. I don't have comments on both patches. They look good to me. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Aug 21, 2020 at 12:31 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Thu, 20 Aug 2020 at 21:12, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Attached are both the patches. The first one is to improve existing > > error context information, so I think we should back-patch to 13. The > > second one is to add additional vacuum error context information, so > > that is for only HEAD. Does that make sense? Also, let me know if you > > have any more comments. > > Yes, makes sense to me. > > I don't have comments on both patches. They look good to me. > Thanks, I have pushed the first patch. I'll will push the second one in a day or two unless someone has comments on the same. -- With Regards, Amit Kapila.
On Thu, 20 Aug 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 20, 2020 at 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Aug 20, 2020 at 12:18 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Thu, 20 Aug 2020 at 14:01, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Wed, Aug 19, 2020 at 12:54 PM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > Here, we can notice that for the index, we are getting context > > > > information but not for the heap. The reason is that in > > > > vacuum_error_callback, we are not printing additional information for > > > > phases VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP > > > > when block number is invalid. If we want to cover the 'info' messages > > > > then won't it be better if we print a message in those phases even > > > > block number is invalid (something like 'while scanning relation > > > > \"%s.%s\"") > > > > > > Yeah, there is an inconsistency. I agree to print the message even > > > when the block number is invalid. > > > > > > > Okay, I will update this and send this patch and rebased patch to > > display offsets later today or tomorrow. > > > > Attached are both the patches. The first one is to improve existing > error context information, so I think we should back-patch to 13. The > second one is to add additional vacuum error context information, so > that is for only HEAD. Does that make sense? Also, let me know if you > have any more comments. Thanks Amit for updating the patch. All changes in v7-02 look fine to me. Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com > > -- > With Regards, > Amit Kapila. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Wed, Aug 26, 2020 at 8:54 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote: > > On Thu, 20 Aug 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Attached are both the patches. The first one is to improve existing > > error context information, so I think we should back-patch to 13. The > > second one is to add additional vacuum error context information, so > > that is for only HEAD. Does that make sense? Also, let me know if you > > have any more comments. > > Thanks Amit for updating the patch. All changes in v7-02 look fine to me. > Okay, pushed v7-02 as well. I have marked the entry for this in CF as committed. -- With Regards, Amit Kapila.
On Wed, 26 Aug 2020 at 15:07, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Aug 26, 2020 at 8:54 AM Mahendra Singh Thalor > <mahi6run@gmail.com> wrote: > > > > On Thu, 20 Aug 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Attached are both the patches. The first one is to improve existing > > > error context information, so I think we should back-patch to 13. The > > > second one is to add additional vacuum error context information, so > > > that is for only HEAD. Does that make sense? Also, let me know if you > > > have any more comments. > > > > Thanks Amit for updating the patch. All changes in v7-02 look fine to me. > > > > Okay, pushed v7-02 as well. I have marked the entry for this in CF as committed. Thank you! Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services