Re: Why does [auto-]vacuum delay not report a wait event? - Mailing list pgsql-hackers

From Mahendra Singh Thalor
Subject Re: Why does [auto-]vacuum delay not report a wait event?
Date
Msg-id CAKYtNAqwZSaJGdXCL+1x_CFWx7-f=qJ5CMQiaVhRbpc_O00oLQ@mail.gmail.com
Whole thread Raw
In response to Re: Why does [auto-]vacuum delay not report a wait event?  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Sat, 21 Mar 2020 at 09:38, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Mar 19, 2020 at 03:44:49PM -0700, Andres Freund wrote:
> > But uh, unfortunately the vacuum delay code just sleeps without setting
> > a wait event:
> ...
> > Seems like it should instead use a new wait event in the PG_WAIT_TIMEOUT
> > class?
> >
> > Given how frequently we run into trouble with [auto]vacuum throttling
> > being a problem, and there not being any way to monitor that currently,
> > that seems like it'd be a significant improvement, given the effort?
>
> I see that pg_sleep sets WAIT_EVENT_PG_SLEEP, but pg_usleep doesn't, I guess
> because the overhead is significant for a small number of usecs, and because it
> doesn't work for pg_usleep to set a generic event if callers want to be able to
> set a more specific wait event.
>
> Also, I noticed that SLEEP_ON_ASSERT comment (31338352b) wants to use pg_usleep
> "which seems too short.".  Surely it should use pg_sleep, added at 782eefc58 a
> few years later ?
>
> Also, there was a suggestion recently that this should have a separate
> vacuum_progress phase:
> |vacuumlazy.c:#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL        50 /* ms */
> |vacuumlazy.c:pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
>
> I was planning to look at that eventually ; should it have a wait event instead
> or in addition ?
>
> > It'd probably also be helpful to report the total time [auto]vacuum
> > spent being delayed for vacuum verbose/autovacuum logging, but imo
> > that'd be a parallel feature to a wait event, not a replacement.
>
> This requires wider changes than I anticipated.
>
> 2020-03-20 22:35:51.308 CDT [16534] LOG:  automatic aggressive vacuum of table "template1.pg_catalog.pg_class": index scans: 1
>         pages: 0 removed, 11 remain, 0 skipped due to pins, 0 skipped frozen
>         tuples: 6 removed, 405 remain, 0 are dead but not yet removable, oldest xmin: 1574
>         buffer usage: 76 hits, 7 misses, 8 dirtied
>         avg read rate: 16.378 MB/s, avg write rate: 18.718 MB/s
>         Cost-based delay: 2 msec
>         system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>
> VACUUM VERBOSE wouldn't normally be run with cost_delay > 0, so that field will
> typically be zero, so I made it conditional, which is supposedly why it's
> written like that, even though that's not otherwise being used since 17eaae98.
>
> Added at https://commitfest.postgresql.org/28/2515/
>
> --
> Justin

Thanks Justin for quick patch.

I haven't reviewed your full patch but I can see that "make installcheck" is failing with segment fault.

Stack trace;
Core was generated by `postgres: autovacuum worker regression                                    '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000560080cc827c in ginInsertCleanup (ginstate=0x7ffe0b648980, full_clean=false, fill_fsm=true, forceCleanup=true, stats=0x0) at ginfast.c:895
895            stats->delay_msec += vacuum_delay_point();
(gdb) bt
#0  0x0000560080cc827c in ginInsertCleanup (ginstate=0x7ffe0b648980, full_clean=false, fill_fsm=true, forceCleanup=true, stats=0x0) at ginfast.c:895
#1  0x0000560080cdd0c3 in ginvacuumcleanup (info=0x7ffe0b64b0c0, stats=0x0) at ginvacuum.c:706
#2  0x0000560080d791d4 in index_vacuum_cleanup (info=0x7ffe0b64b0c0, stats=0x0) at indexam.c:711
#3  0x0000560080fa790e in do_analyze_rel (onerel=0x56008259e6e0, params=0x560082206de4, va_cols=0x0, acquirefunc=0x560080fa8a75 <acquire_sample_rows>, relpages=25, inh=false,
    in_outer_xact=false, elevel=13) at analyze.c:683
#4  0x0000560080fa5f3e in analyze_rel (relid=37789, relation=0x5600822ba1a0, params=0x560082206de4, va_cols=0x0, in_outer_xact=false, bstrategy=0x5600822064e0) at analyze.c:263
#5  0x00005600810d9eb7 in vacuum (relations=0x56008227e5b8, params=0x560082206de4, bstrategy=0x5600822064e0, isTopLevel=true) at vacuum.c:468
#6  0x0000560081357608 in autovacuum_do_vac_analyze (tab=0x560082206de0, bstrategy=0x5600822064e0) at autovacuum.c:3115
#7  0x00005600813557dd in do_autovacuum () at autovacuum.c:2466
#8  0x000056008135373d in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1693
#9  0x0000560081352f75 in StartAutoVacWorker () at autovacuum.c:1487
#10 0x000056008137ed5f in StartAutovacuumWorker () at postmaster.c:5580
#11 0x000056008137e199 in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5297
#12 <signal handler called>
#13 0x00007f18b778bff7 in __GI___select (nfds=9, readfds=0x7ffe0b64c050, writefds=0x0, exceptfds=0x0, timeout=0x7ffe0b64bfc0) at ../sysdeps/unix/sysv/linux/select.c:41
#14 0x000056008137499a in ServerLoop () at postmaster.c:1691
#15 0x0000560081373e63 in PostmasterMain (argc=3, argv=0x560082189020) at postmaster.c:1400
#16 0x00005600811d37ea in main (argc=3, argv=0x560082189020) at main.c:210

Here, stats is null so it is crashing.

--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Phillip Black
Date:
Subject: Database recovery from tablespace only
Next
From: Andrew Dunstan
Date:
Subject: Re: ssl passphrase callback