Re: Track the amount of time waiting due to cost_delay - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Track the amount of time waiting due to cost_delay |
Date | |
Msg-id | CAFiTN-tF4n1QSUGeoH2hgkGmnk9FaN1z1aWo5Grf_7br=BKyLg@mail.gmail.com Whole thread Raw |
In response to | Re: Track the amount of time waiting due to cost_delay (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
List | pgsql-hackers |
On Mon, Dec 9, 2024 at 6:55 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Mon, Dec 09, 2024 at 05:18:30PM +0530, Dilip Kumar wrote: > > On Mon, Dec 9, 2024 at 2:51 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > > > > On 2024-12-06 18:31, Bertrand Drouvot wrote: > > > > Hi, > > > > > > > > On Thu, Dec 05, 2024 at 10:43:51AM +0000, Bertrand Drouvot wrote: > > > >> Yeah, people would likely use this new field to monitor long running > > > >> vacuum. > > > >> Long enough that this error should be acceptable. Do you agree? > > > > > > > > OTOH, adding the 100% accuracy looks as simple as v9-0002 attached > > > > (0001 is > > > > same as for v8), so I think we should provide it. > > > > > This Idea looks good to me. > > Thanks for looking at it! > > > 1. > > + Total amount of time spent in milliseconds waiting due to > > <xref linkend="guc-vacuum-cost-delay"/> > > + or <xref linkend="guc-autovacuum-vacuum-cost-delay"/>. In case > > of parallel > > + vacuum the reported time is across all the workers and the leader. The > > + workers update the column no more frequently than once per second, so it > > + could show slightly old values. > > + </para></entry> > > > > I think this waiting is influenced due to cost delay as well as cost > > limit GUCs because here we are counting total wait time and that very > > much depends upon how frequently we are waiting and that's completely > > driven by cost limit. Isn't it? > > Yeah. I think we could change the wording that way: > > s/waiting due to/waiting during/ > > Does that make sense? I don't think we need to mention cost limit here. Yeah that should be fine. > > 2. > > + if (IsParallelWorker()) > > + { > > + instr_time time_since_last_report; > > + > > + INSTR_TIME_SET_ZERO(time_since_last_report); > > + INSTR_TIME_ACCUM_DIFF(time_since_last_report, delay_end, > > + last_report_time); > > + nap_time_since_last_report += INSTR_TIME_GET_MILLISEC(delayed_time); > > + > > + if (INSTR_TIME_GET_MILLISEC(time_since_last_report) > > > WORKER_REPORT_DELAY_INTERVAL) > > + { > > + pgstat_progress_parallel_incr_param(PROGRESS_VACUUM_TIME_DELAYED, > > + nap_time_since_last_report); > > + nap_time_since_last_report = 0; > > + last_report_time = delay_end; > > + } > > + } > > > > Does it make sense to track this "nap_time_since_last_report" in a > > shared variable instead of local in individual workers and whenever > > the shared variable crosses WORKER_REPORT_DELAY_INTERVAL we can report > > this would avoid individual reporting from different workers. Since > > we are already computing the cost balance in shared variable i.e. > > VacuumSharedCostBalance, or do you think it will complicate the code? > > > > I'm not sure I follow. nap_time_since_last_report is the time a worker had to > wait since its last report. There is no direct relationship with > WORKER_REPORT_DELAY_INTERVAL (which is compared to time_since_last_report and > not nap_time_since_last_report). I mean currently we are tracking "time_since_last_report" and accumulating the delayed_time in "nap_time_since_last_report" for each worker. So my question was does it make sense to do this in a shared variable across workers so that we can reduce the number of reports to the leader? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: