Thread: Why does [auto-]vacuum delay not report a wait event?

Why does [auto-]vacuum delay not report a wait event?

From
Andres Freund
Date:
Hi,

I was looking at [1], wanting to suggest a query to monitor what
autovacuum is mostly waiting on. Partially to figure out whether it's
mostly autovacuum cost limiting.

But uh, unfortunately the vacuum delay code just sleeps without setting
a wait event:

void
vacuum_delay_point(void)
{
...
    /* Nap if appropriate */
    if (msec > 0)
    {
        if (msec > VacuumCostDelay * 4)
            msec = VacuumCostDelay * 4;

        pg_usleep((long) (msec * 1000));


Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
class?

Given how frequently we run into trouble with [auto]vacuum throttling
being a problem, and there not being any way to monitor that currently,
that seems like it'd be a significant improvement, given the effort?


It'd probably also be helpful to report the total time [auto]vacuum
spent being delayed for vacuum verbose/autovacuum logging, but imo
that'd be a parallel feature to a wait event, not a replacement.


Greetings,

Andres Freund

[1] https://postgr.es/m/CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io%2BrtnAUFnaJJVS4g%40mail.gmail.com



Re: Why does [auto-]vacuum delay not report a wait event?

From
Amit Kapila
Date:
On Fri, Mar 20, 2020 at 4:15 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> I was looking at [1], wanting to suggest a query to monitor what
> autovacuum is mostly waiting on. Partially to figure out whether it's
> mostly autovacuum cost limiting.
>
> But uh, unfortunately the vacuum delay code just sleeps without setting
> a wait event:
>
> void
> vacuum_delay_point(void)
> {
> ...
>         /* Nap if appropriate */
>         if (msec > 0)
>         {
>                 if (msec > VacuumCostDelay * 4)
>                         msec = VacuumCostDelay * 4;
>
>                 pg_usleep((long) (msec * 1000));
>
>
> Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> class?
>

+1.  I think it will be quite helpful.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Why does [auto-]vacuum delay not report a wait event?

From
Magnus Hagander
Date:
On Fri, Mar 20, 2020 at 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Mar 20, 2020 at 4:15 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > I was looking at [1], wanting to suggest a query to monitor what
> > autovacuum is mostly waiting on. Partially to figure out whether it's
> > mostly autovacuum cost limiting.
> >
> > But uh, unfortunately the vacuum delay code just sleeps without setting
> > a wait event:
> >
> > void
> > vacuum_delay_point(void)
> > {
> > ...
> >         /* Nap if appropriate */
> >         if (msec > 0)
> >         {
> >                 if (msec > VacuumCostDelay * 4)
> >                         msec = VacuumCostDelay * 4;
> >
> >                 pg_usleep((long) (msec * 1000));
> >
> >
> > Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> > class?
> >
>
> +1.  I think it will be quite helpful.

Definite +1. There should be a wait event, and identifying this
particular case is certainly interesting enough that it should have
it's own.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Why does [auto-]vacuum delay not report a wait event?

From
Justin Pryzby
Date:
On Thu, Mar 19, 2020 at 03:44:49PM -0700, Andres Freund wrote:
> But uh, unfortunately the vacuum delay code just sleeps without setting
> a wait event:
...
> Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> class?
> 
> Given how frequently we run into trouble with [auto]vacuum throttling
> being a problem, and there not being any way to monitor that currently,
> that seems like it'd be a significant improvement, given the effort?

I see that pg_sleep sets WAIT_EVENT_PG_SLEEP, but pg_usleep doesn't, I guess
because the overhead is significant for a small number of usecs, and because it
doesn't work for pg_usleep to set a generic event if callers want to be able to
set a more specific wait event.

Also, I noticed that SLEEP_ON_ASSERT comment (31338352b) wants to use pg_usleep
"which seems too short.".  Surely it should use pg_sleep, added at 782eefc58 a
few years later ?

Also, there was a suggestion recently that this should have a separate
vacuum_progress phase:
|vacuumlazy.c:#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL    50 /* ms */
|vacuumlazy.c:pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);

I was planning to look at that eventually ; should it have a wait event instead
or in addition ?

> It'd probably also be helpful to report the total time [auto]vacuum
> spent being delayed for vacuum verbose/autovacuum logging, but imo
> that'd be a parallel feature to a wait event, not a replacement.

This requires wider changes than I anticipated.

2020-03-20 22:35:51.308 CDT [16534] LOG:  automatic aggressive vacuum of table "template1.pg_catalog.pg_class": index
scans:1
 
        pages: 0 removed, 11 remain, 0 skipped due to pins, 0 skipped frozen
        tuples: 6 removed, 405 remain, 0 are dead but not yet removable, oldest xmin: 1574
        buffer usage: 76 hits, 7 misses, 8 dirtied
        avg read rate: 16.378 MB/s, avg write rate: 18.718 MB/s
        Cost-based delay: 2 msec
        system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

VACUUM VERBOSE wouldn't normally be run with cost_delay > 0, so that field will
typically be zero, so I made it conditional, which is supposedly why it's
written like that, even though that's not otherwise being used since 17eaae98.

Added at https://commitfest.postgresql.org/28/2515/

-- 
Justin

Attachment

Re: Why does [auto-]vacuum delay not report a wait event?

From
Mahendra Singh Thalor
Date:
On Sat, 21 Mar 2020 at 09:38, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Mar 19, 2020 at 03:44:49PM -0700, Andres Freund wrote:
> > But uh, unfortunately the vacuum delay code just sleeps without setting
> > a wait event:
> ...
> > Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> > class?
> >
> > Given how frequently we run into trouble with [auto]vacuum throttling
> > being a problem, and there not being any way to monitor that currently,
> > that seems like it'd be a significant improvement, given the effort?
>
> I see that pg_sleep sets WAIT_EVENT_PG_SLEEP, but pg_usleep doesn't, I guess
> because the overhead is significant for a small number of usecs, and because it
> doesn't work for pg_usleep to set a generic event if callers want to be able to
> set a more specific wait event.
>
> Also, I noticed that SLEEP_ON_ASSERT comment (31338352b) wants to use pg_usleep
> "which seems too short.".  Surely it should use pg_sleep, added at 782eefc58 a
> few years later ?
>
> Also, there was a suggestion recently that this should have a separate
> vacuum_progress phase:
> |vacuumlazy.c:#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL        50 /* ms */
> |vacuumlazy.c:pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
>
> I was planning to look at that eventually ; should it have a wait event instead
> or in addition ?
>
> > It'd probably also be helpful to report the total time [auto]vacuum
> > spent being delayed for vacuum verbose/autovacuum logging, but imo
> > that'd be a parallel feature to a wait event, not a replacement.
>
> This requires wider changes than I anticipated.
>
> 2020-03-20 22:35:51.308 CDT [16534] LOG:  automatic aggressive vacuum of table "template1.pg_catalog.pg_class": index scans: 1
>         pages: 0 removed, 11 remain, 0 skipped due to pins, 0 skipped frozen
>         tuples: 6 removed, 405 remain, 0 are dead but not yet removable, oldest xmin: 1574
>         buffer usage: 76 hits, 7 misses, 8 dirtied
>         avg read rate: 16.378 MB/s, avg write rate: 18.718 MB/s
>         Cost-based delay: 2 msec
>         system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>
> VACUUM VERBOSE wouldn't normally be run with cost_delay > 0, so that field will
> typically be zero, so I made it conditional, which is supposedly why it's
> written like that, even though that's not otherwise being used since 17eaae98.
>
> Added at https://commitfest.postgresql.org/28/2515/
>
> --
> Justin

Thanks Justin for quick patch.

I haven't reviewed your full patch but I can see that "make installcheck" is failing with segment fault.

Stack trace;
Core was generated by `postgres: autovacuum worker regression                                    '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000560080cc827c in ginInsertCleanup (ginstate=0x7ffe0b648980, full_clean=false, fill_fsm=true, forceCleanup=true, stats=0x0) at ginfast.c:895
895            stats->delay_msec += vacuum_delay_point();
(gdb) bt
#0  0x0000560080cc827c in ginInsertCleanup (ginstate=0x7ffe0b648980, full_clean=false, fill_fsm=true, forceCleanup=true, stats=0x0) at ginfast.c:895
#1  0x0000560080cdd0c3 in ginvacuumcleanup (info=0x7ffe0b64b0c0, stats=0x0) at ginvacuum.c:706
#2  0x0000560080d791d4 in index_vacuum_cleanup (info=0x7ffe0b64b0c0, stats=0x0) at indexam.c:711
#3  0x0000560080fa790e in do_analyze_rel (onerel=0x56008259e6e0, params=0x560082206de4, va_cols=0x0, acquirefunc=0x560080fa8a75 <acquire_sample_rows>, relpages=25, inh=false,
    in_outer_xact=false, elevel=13) at analyze.c:683
#4  0x0000560080fa5f3e in analyze_rel (relid=37789, relation=0x5600822ba1a0, params=0x560082206de4, va_cols=0x0, in_outer_xact=false, bstrategy=0x5600822064e0) at analyze.c:263
#5  0x00005600810d9eb7 in vacuum (relations=0x56008227e5b8, params=0x560082206de4, bstrategy=0x5600822064e0, isTopLevel=true) at vacuum.c:468
#6  0x0000560081357608 in autovacuum_do_vac_analyze (tab=0x560082206de0, bstrategy=0x5600822064e0) at autovacuum.c:3115
#7  0x00005600813557dd in do_autovacuum () at autovacuum.c:2466
#8  0x000056008135373d in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1693
#9  0x0000560081352f75 in StartAutoVacWorker () at autovacuum.c:1487
#10 0x000056008137ed5f in StartAutovacuumWorker () at postmaster.c:5580
#11 0x000056008137e199 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5297
#12 <signal handler called>
#13 0x00007f18b778bff7 in __GI___select (nfds=9, readfds=0x7ffe0b64c050, writefds=0x0, exceptfds=0x0, timeout=0x7ffe0b64bfc0) at ../sysdeps/unix/sysv/linux/select.c:41
#14 0x000056008137499a in ServerLoop () at postmaster.c:1691
#15 0x0000560081373e63 in PostmasterMain (argc=3, argv=0x560082189020) at postmaster.c:1400
#16 0x00005600811d37ea in main (argc=3, argv=0x560082189020) at main.c:210

Here, stats is null so it is crashing.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Re: Why does [auto-]vacuum delay not report a wait event?

From
Andres Freund
Date:
Hi,

> On Thu, Mar 19, 2020 at 03:44:49PM -0700, Andres Freund wrote:
> > But uh, unfortunately the vacuum delay code just sleeps without setting
> > a wait event:
> ...
> > Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> > class?
> > 
> > Given how frequently we run into trouble with [auto]vacuum throttling
> > being a problem, and there not being any way to monitor that currently,
> > that seems like it'd be a significant improvement, given the effort?
> 
> I see that pg_sleep sets WAIT_EVENT_PG_SLEEP, but pg_usleep doesn't, I guess
> because the overhead is significant for a small number of usecs, and because it
> doesn't work for pg_usleep to set a generic event if callers want to be able to
> set a more specific wait event.

I don't think the overhead is a meaningful issue - compared to yielding
to the kernel / context switching, setting the wait event isn't a
significant cost.

I think the issue is more the second part - it's used as part of other
things using their own wait events potentially.

I think we should just rip out pg_usleep and replace it with latch
waits. While the case at hand is user configurable (but the max wait
time is 100ms, so it's not too bad), it's generally not great to sleep
without ensuring that interrupts are handled.  Nor is it great that we
don't sleep again if the sleep is interrupted.  There may be a case or
two where we don't want to layer ontop of latches (perhaps the spinlock
delay loop?), but pretty much everywhere else a different routine would
make more sense.


> Also, I noticed that SLEEP_ON_ASSERT comment (31338352b) wants to use pg_usleep
> "which seems too short.".  Surely it should use pg_sleep, added at 782eefc58 a
> few years later ?

I don't see problem with using sleep here?


> Also, there was a suggestion recently that this should have a separate
> vacuum_progress phase:
> |vacuumlazy.c:#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL    50 /* ms */
> |vacuumlazy.c:pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
> 
> I was planning to look at that eventually ; should it have a wait event instead
> or in addition ?

A separate phase? How would that look like? We don't want to replace the
knowledge that currently e.g. the heap scan is in progress?



> > It'd probably also be helpful to report the total time [auto]vacuum
> > spent being delayed for vacuum verbose/autovacuum logging, but imo
> > that'd be a parallel feature to a wait event, not a replacement.
> 
> This requires wider changes than I anticipated.
> 
> 2020-03-20 22:35:51.308 CDT [16534] LOG:  automatic aggressive vacuum of table "template1.pg_catalog.pg_class": index
scans:1
 
>         pages: 0 removed, 11 remain, 0 skipped due to pins, 0 skipped frozen
>         tuples: 6 removed, 405 remain, 0 are dead but not yet removable, oldest xmin: 1574
>         buffer usage: 76 hits, 7 misses, 8 dirtied
>         avg read rate: 16.378 MB/s, avg write rate: 18.718 MB/s
>         Cost-based delay: 2 msec
>         system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> 
> VACUUM VERBOSE wouldn't normally be run with cost_delay > 0, so that field will
> typically be zero, so I made it conditional

I personally dislike conditional output like that, because it makes
parsing the output harder.


> , which is supposedly why it's written like that, even though that's
> not otherwise being used since 17eaae98.

Well, it's also just hard to otherwise manage this long translatable
strings. And we're essentially still conditional, due to the 'msgfmt'
branches - if the whole output were output in a single appendStringInfo
call, we'd have to duplicate all the following format strings too.

Personally I really wish we'd just merge the vacuum verbose and the
autovacuum lgoging code, even if it causes some temporary pain.


On 2020-03-20 23:07:51 -0500, Justin Pryzby wrote:
> From 68c5ad8c7a9feb0c68afad310e3f52c21c3cdbaf Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Fri, 20 Mar 2020 20:47:30 -0500
> Subject: [PATCH v1 1/2] Report wait event for cost-based vacuum delay
> 
> ---
>  doc/src/sgml/monitoring.sgml    | 2 ++
>  src/backend/commands/vacuum.c   | 2 ++
>  src/backend/postmaster/pgstat.c | 3 +++
>  src/include/pgstat.h            | 3 ++-
>  4 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index 5bffdcce10..46c99a55b7 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -1507,6 +1507,8 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
>            (<filename>pg_wal</filename>, archive or stream) before trying
>            again to retrieve WAL data, at recovery.
>           </entry>
> +         <entry><literal>VacuumDelay</literal></entry>
> +         <entry>Waiting in a cost-based vacuum delay point.</entry>
>          </row>
>          <row>
>           <entry morerows="68"><literal>IO</literal></entry>
> diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
> index d625d17bf4..59731d687f 100644
> --- a/src/backend/commands/vacuum.c
> +++ b/src/backend/commands/vacuum.c
> @@ -2019,7 +2019,9 @@ vacuum_delay_point(void)
>          if (msec > VacuumCostDelay * 4)
>              msec = VacuumCostDelay * 4;
>  
> +        pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY);
>          pg_usleep((long) (msec * 1000));
> +        pgstat_report_wait_end();
>  
>          VacuumCostBalance = 0;
>  
> diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
> index d29c211a76..742ec07b59 100644
> --- a/src/backend/postmaster/pgstat.c
> +++ b/src/backend/postmaster/pgstat.c
> @@ -3824,6 +3824,9 @@ pgstat_get_wait_timeout(WaitEventTimeout w)
>          case WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL:
>              event_name = "RecoveryRetrieveRetryInterval";
>              break;
> +        case WAIT_EVENT_VACUUM_DELAY:
> +            event_name = "VacuumDelay";
> +            break;
>              /* no default case, so that compiler will warn */
>      }
>  
> diff --git a/src/include/pgstat.h b/src/include/pgstat.h
> index 851d0a7246..4db40e23cc 100644
> --- a/src/include/pgstat.h
> +++ b/src/include/pgstat.h
> @@ -848,7 +848,8 @@ typedef enum
>      WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
>      WAIT_EVENT_PG_SLEEP,
>      WAIT_EVENT_RECOVERY_APPLY_DELAY,
> -    WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL
> +    WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL,
> +    WAIT_EVENT_VACUUM_DELAY,
>  } WaitEventTimeout;

Looks good to me - unless somebody protests I'm going to apply this
shortly.


> From 8153d909cabb94474890f0b55c7733f33923e3c5 Mon Sep 17 00:00:00 2001
> From: Justin Pryzby <pryzbyj@telsasoft.com>
> Date: Fri, 20 Mar 2020 22:08:09 -0500
> Subject: [PATCH v1 2/2] vacuum to report time spent in cost-based delay
> 
> ---
>  src/backend/access/gin/ginfast.c      |  6 +++---
>  src/backend/access/gin/ginvacuum.c    |  6 +++---
>  src/backend/access/gist/gistvacuum.c  |  2 +-
>  src/backend/access/hash/hash.c        |  2 +-
>  src/backend/access/heap/vacuumlazy.c  | 17 +++++++++++------
>  src/backend/access/nbtree/nbtree.c    |  2 +-
>  src/backend/access/spgist/spgvacuum.c |  4 ++--
>  src/backend/commands/vacuum.c         |  8 ++++++--
>  src/include/access/genam.h            |  1 +
>  src/include/commands/vacuum.h         |  2 +-
>  10 files changed, 30 insertions(+), 20 deletions(-)
> 
> diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
> index 11d7ec067a..c99dc4a8be 100644
> --- a/src/backend/access/gin/ginfast.c
> +++ b/src/backend/access/gin/ginfast.c
> @@ -892,7 +892,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
>           */
>          processPendingPage(&accum, &datums, page, FirstOffsetNumber);
>  
> -        vacuum_delay_point();
> +        stats->delay_msec += vacuum_delay_point();
>  
>          /*
>           * Is it time to flush memory to disk?    Flush if we are at the end of
> @@ -929,7 +929,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
>              {
>                  ginEntryInsert(ginstate, attnum, key, category,
>                                 list, nlist, NULL);
> -                vacuum_delay_point();
> +                stats->delay_msec += vacuum_delay_point();
>              }
>  
>              /*
> @@ -1002,7 +1002,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
>          /*
>           * Read next page in pending list
>           */
> -        vacuum_delay_point();
> +        stats->delay_msec += vacuum_delay_point();
>          buffer = ReadBuffer(index, blkno);
>          LockBuffer(buffer, GIN_SHARE);
>          page = BufferGetPage(buffer);

On a green field I'd really like to pass a 'vacuum state' struct to
vacuum_delay_point(). But that likely would be too invasive to add,
because it seems like it'd would have to be wired to a number of
functions that can be used by extensions etc (like the bulk delete
callbacks).  Then we'd just have vacuum_delay_point() internally sum up
the waiting time.

Given the current style of vacuum_delay_point() calculating everything
via globals (which I hate), it might be less painful to do this by
adding another global to track the sleeps via a global alongside
VacuumCostBalance?


> diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
> index 4871b7ff4d..86a9c7fdaa 100644
> --- a/src/backend/access/hash/hash.c
> +++ b/src/backend/access/hash/hash.c
> @@ -709,7 +709,7 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
>          bool        retain_pin = false;
>          bool        clear_dead_marking = false;
>  
> -        vacuum_delay_point();
> +        // XXX stats->delay_msec += vacuum_delay_point();
>  
>          page = BufferGetPage(buf);
>          opaque = (HashPageOpaque) PageGetSpecialPointer(page);

I assume this is because there's no stats object reachable here? Should
still continue to call vacuum_delay_point ;)

Greetings,

Andres Freund



Re: Why does [auto-]vacuum delay not report a wait event?

From
Peter Geoghegan
Date:
On Sat, Mar 21, 2020 at 5:25 PM Andres Freund <andres@anarazel.de> wrote:
> > diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
> > index 11d7ec067a..c99dc4a8be 100644
> > --- a/src/backend/access/gin/ginfast.c
> > +++ b/src/backend/access/gin/ginfast.c
> > @@ -892,7 +892,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
> >                */
> >               processPendingPage(&accum, &datums, page, FirstOffsetNumber);
> >
> > -             vacuum_delay_point();
> > +             stats->delay_msec += vacuum_delay_point();
> >
> >               /*
> >                * Is it time to flush memory to disk?  Flush if we are at the end of
> > @@ -929,7 +929,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
> >                       {
> >                               ginEntryInsert(ginstate, attnum, key, category,
> >                                                          list, nlist, NULL);
> > -                             vacuum_delay_point();
> > +                             stats->delay_msec += vacuum_delay_point();
> >                       }
> >
> >                       /*
> > @@ -1002,7 +1002,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
> >               /*
> >                * Read next page in pending list
> >                */
> > -             vacuum_delay_point();
> > +             stats->delay_msec += vacuum_delay_point();
> >               buffer = ReadBuffer(index, blkno);
> >               LockBuffer(buffer, GIN_SHARE);
> >               page = BufferGetPage(buffer);
>
> On a green field I'd really like to pass a 'vacuum state' struct to
> vacuum_delay_point().

In a green field situation, there'd be no ginInsertCleanup() at all.
It is a Lovecraftian horror show. The entire thing should be scrapped
now, in fact.

--
Peter Geoghegan



Re: Why does [auto-]vacuum delay not report a wait event?

From
Andres Freund
Date:
Hi,

On March 21, 2020 5:51:19 PM PDT, Peter Geoghegan <pg@bowt.ie> wrote:
>On Sat, Mar 21, 2020 at 5:25 PM Andres Freund <andres@anarazel.de>
>wrote:
>> > diff --git a/src/backend/access/gin/ginfast.c
>b/src/backend/access/gin/ginfast.c
>> > index 11d7ec067a..c99dc4a8be 100644
>> > --- a/src/backend/access/gin/ginfast.c
>> > +++ b/src/backend/access/gin/ginfast.c
>> > @@ -892,7 +892,7 @@ ginInsertCleanup(GinState *ginstate, bool
>full_clean,
>> >                */
>> >               processPendingPage(&accum, &datums, page,
>FirstOffsetNumber);
>> >
>> > -             vacuum_delay_point();
>> > +             stats->delay_msec += vacuum_delay_point();
>> >
>> >               /*
>> >                * Is it time to flush memory to disk?  Flush if we
>are at the end of
>> > @@ -929,7 +929,7 @@ ginInsertCleanup(GinState *ginstate, bool
>full_clean,
>> >                       {
>> >                               ginEntryInsert(ginstate, attnum, key,
>category,
>> >                                                          list,
>nlist, NULL);
>> > -                             vacuum_delay_point();
>> > +                             stats->delay_msec +=
>vacuum_delay_point();
>> >                       }
>> >
>> >                       /*
>> > @@ -1002,7 +1002,7 @@ ginInsertCleanup(GinState *ginstate, bool
>full_clean,
>> >               /*
>> >                * Read next page in pending list
>> >                */
>> > -             vacuum_delay_point();
>> > +             stats->delay_msec += vacuum_delay_point();
>> >               buffer = ReadBuffer(index, blkno);
>> >               LockBuffer(buffer, GIN_SHARE);
>> >               page = BufferGetPage(buffer);
>>
>> On a green field I'd really like to pass a 'vacuum state' struct to
>> vacuum_delay_point().
>
>In a green field situation, there'd be no ginInsertCleanup() at all.
>It is a Lovecraftian horror show. The entire thing should be scrapped
>now, in fact.

My comment is entirely unrelated to GIN, but about the way the delay infrastructure manages state (in global vars).

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Why does [auto-]vacuum delay not report a wait event?

From
Peter Geoghegan
Date:
On Sat, Mar 21, 2020 at 5:53 PM Andres Freund <andres@anarazel.de> wrote:
> My comment is entirely unrelated to GIN, but about the way the delay infrastructure manages state (in global vars).

The fact that ginInsertCleanup() uses "stats != NULL" to indicate
whether it is being called from within VACUUM or not is surely
relevant, or at least relevant to the issue that Mahendra just
reported. shiftList() relies on this directly already.

-- 
Peter Geoghegan



Re: Why does [auto-]vacuum delay not report a wait event?

From
Justin Pryzby
Date:
On Sat, Mar 21, 2020 at 05:24:57PM -0700, Andres Freund wrote:
> > Also, I noticed that SLEEP_ON_ASSERT comment (31338352b) wants to use pg_usleep
> > "which seems too short.".  Surely it should use pg_sleep, added at 782eefc58 a
> > few years later ?
> 
> I don't see problem with using sleep here?

There's no problem with pg_sleep (with no "u") - it just didn't exist when
SLEEP_ON_ASSERT was added (and I guess it's potentially unsafe to do much of
anything, like loop around pg_usleep(1e6)).  I'm suggesting it *should* use
pg_sleep, rather than explaining why pg_usleep (with a "u") doesn't work.

> > Also, there was a suggestion recently that this should have a separate
> > vacuum_progress phase:
> > |vacuumlazy.c:#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL    50 /* ms */
> > |vacuumlazy.c:pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
> > 
> > I was planning to look at that eventually ; should it have a wait event instead
> > or in addition ?
> 
> A separate phase? How would that look like? We don't want to replace the
> knowledge that currently e.g. the heap scan is in progress?

I don't think that's an issue, since the heap scan is done at that point ?
heap_vacuum_rel() (the publicly callable routine) calls lazy_scan_heap (which
does everything) and then (optionally) lazy_truncate_heap() and then
immediately afterwards does:

        pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
            PROGRESS_VACUUM_PHASE_FINAL_CLEANUP);
        ...
        pgstat_progress_end_command();

> > VACUUM VERBOSE wouldn't normally be run with cost_delay > 0, so that field will
> > typically be zero, so I made it conditional
> 
> I personally dislike conditional output like that, because it makes
> parsing the output harder.

I dislike it too, mostly because there's a comment explaining why it's done
like that, without any desirable use of the functionality.  If it's not useful
for a case where the field is typically zero, it should go away until its
utility is instantiated.

-- 
Justin



Re: Why does [auto-]vacuum delay not report a wait event?

From
Andres Freund
Date:
Hi,

On 2020-03-21 17:24:57 -0700, Andres Freund wrote:
> > diff --git a/src/include/pgstat.h b/src/include/pgstat.h
> > index 851d0a7246..4db40e23cc 100644
> > --- a/src/include/pgstat.h
> > +++ b/src/include/pgstat.h
> > @@ -848,7 +848,8 @@ typedef enum
> >      WAIT_EVENT_BASE_BACKUP_THROTTLE = PG_WAIT_TIMEOUT,
> >      WAIT_EVENT_PG_SLEEP,
> >      WAIT_EVENT_RECOVERY_APPLY_DELAY,
> > -    WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL
> > +    WAIT_EVENT_RECOVERY_RETRIEVE_RETRY_INTERVAL,
> > +    WAIT_EVENT_VACUUM_DELAY,
> >  } WaitEventTimeout;
> 
> Looks good to me - unless somebody protests I'm going to apply this
> shortly.

And pushed. The only thing I changed was to remove the added trailing ,
:)

Thanks for the patch,

Andres