Re: error context for vacuum to include block number - Mailing list pgsql-hackers

From Andres Freund
Subject Re: error context for vacuum to include block number
Date
Msg-id 20191211165425.4ewww2s5k5cafi4l@alap3.anarazel.de
Whole thread Raw
In response to Re: error context for vacuum to include block number  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: error context for vacuum to include block number  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
Hi,

Thanks for working on this!

On 2019-12-11 08:36:48 -0600, Justin Pryzby wrote:
> On Wed, Dec 11, 2019 at 09:15:07PM +0900, Michael Paquier wrote:
> > On Fri, Dec 06, 2019 at 10:23:25AM -0600, Justin Pryzby wrote:
> > > Find attached updated patch:
> > >  . Use structure to include relation name.
> > >  . Split into a separate patch rename of "StringInfoData buf".
> > > 
> > > 2019-11-27 20:04:53.640 CST [14244] ERROR:  canceling statement due to statement timeout
> > > 2019-11-27 20:04:53.640 CST [14244] CONTEXT:  block 2314 of relation t
> > > 2019-11-27 20:04:53.640 CST [14244] STATEMENT:  vacuum t;
> > > 
> > > I tried to use BufferGetTag() to avoid using a 2ndary structure, but fails if
> > > the buffer is not pinned.

The tag will not add all that informative details, because the
relfilenode isn't easily mappable to the table name or such.


> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 043ebb4..9376989 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -138,6 +138,12 @@ typedef struct LVRelStats
>      bool        lock_waiter_detected;
>  } LVRelStats;
>  
> +typedef struct
> +{
> +    char *relname;
> +    char *relnamespace;
> +    BlockNumber blkno;
> +} vacuum_error_callback_arg;

Hm, wonder if could be worthwhile to not use a separate struct here, but
instead extend one of the existing structs to contain the necessary
information. Or perhaps have one new struct with all the necessary
information. There's already quite a few places that do
get_namespace_name(), for example.



> @@ -524,6 +531,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>          PROGRESS_VACUUM_MAX_DEAD_TUPLES
>      };
>      int64        initprog_val[3];
> +    ErrorContextCallback errcallback;
> +    vacuum_error_callback_arg cbarg;

Not a fan of "cbarg", too generic.

>      pg_rusage_init(&ru0);
>  
> @@ -635,6 +644,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>      else
>          skipping_blocks = false;
>  
> +    /* Setup error traceback support for ereport() */
> +    errcallback.callback = vacuum_error_callback;
> +    cbarg.relname = relname;
> +    cbarg.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
> +    cbarg.blkno = 0; /* Not known yet */
> +    errcallback.arg = (void *) &cbarg;
> +    errcallback.previous = error_context_stack;
> +    error_context_stack = &errcallback;
> +
>      for (blkno = 0; blkno < nblocks; blkno++)
>      {
>          Buffer        buf;
> @@ -658,6 +676,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>  
>          pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
>  
> +        cbarg.blkno = blkno;
> +
>          if (blkno == next_unskippable_block)
>          {
>              /* Time to advance next_unskippable_block */
> @@ -817,7 +837,6 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>  
>          buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
>                                   RBM_NORMAL, vac_strategy);
> -
>          /* We need buffer cleanup lock so that we can prune HOT chains. */
>          if (!ConditionalLockBufferForCleanup(buf))
>          {
> @@ -1388,6 +1407,9 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
>              RecordPageWithFreeSpace(onerel, blkno, freespace);
>      }
>  
> +    /* Pop the error context stack */
> +    error_context_stack = errcallback.previous;
> +
>      /* report that everything is scanned and vacuumed */
>      pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
>  
> @@ -2354,3 +2376,15 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
>  
>      return all_visible;
>  }

I think this will misattribute errors that happen when in the:
        /*
         * If we are close to overrunning the available space for dead-tuple
         * TIDs, pause and do a cycle of vacuuming before we tackle this page.
         */
section of lazy_scan_heap(). That will

a) scan the index, during which we presumably don't want the same error
   context, as it'd be quite misleading: The block that was just scanned
   in the loop isn't actually likely to be the culprit for an index
   problem. And we'd not mention the fact that the problem is occurring
   in the index.
b) will report the wrong block, when in lazy_vacuum_heap().

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes
Next
From: Andres Freund
Date:
Subject: Re: global / super barriers (for checksums)