Re: Track the amount of time waiting due to cost_delay - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Track the amount of time waiting due to cost_delay
Date
Msg-id Znp/zyfvO1wwtGu6@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Track the amount of time waiting due to cost_delay  ("Imseih (AWS), Sami" <simseih@amazon.com>)
List pgsql-hackers
Hi,

On Tue, Jun 25, 2024 at 01:12:16AM +0000, Imseih (AWS), Sami wrote:

Thanks for the feedback!

> >> 2. the leader being interrupted while waiting is also already happening on master
> >> due to the pgstat_progress_parallel_incr_param() calls in
> >> parallel_vacuum_process_one_index() (that have been added in
> >> 46ebdfe164). It has been the case "only" 36 times during my test case.
> 
> 46ebdfe164 will interrupt the leaders sleep every time a parallel workers reports
> progress, and we currently don't handle interrupts by restarting the sleep with
> the remaining time. nanosleep does provide the ability to restart with the remaining
> time [1], but I don't think it's worth the effort to ensure more accurate
> vacuum delays for the leader process. 

+1. I don't think it's necessary to have a 100% accurate delay for all the
times the delay is involded. I think that's an heuristic parameter (among
with cost limit). What matters at the end is by how much you've been able to
pause the whole vacuum (and not by a sleep by sleep basis)).

> > 1. Having a time based only approach to throttle 
> 
> I do agree with a time based approach overall.
> 
> 
> > 1.1) the more parallel workers is used, the less the impact of the leader on
> > the vacuum index phase duration/workload is (because the repartition is done
> > on more processes).
> 
> Did you mean " because the vacuum is done on more processes"? 

Yes.

> When a leader is operating on a large index(s) during the entirety
> of the vacuum operation, wouldn't more parallel workers end up
> interrupting the leader more often?

That's right but my point was about the impact on the "whole" duration time and
"whole" workload (leader + workers included) and not about the number of times the
leader is interrupted. If there is say 100 workers then interrupting the leader
(1 process out of 101) is probably less of an issue as it means that there is a
lot of work to be done to have those 100 workers busy. I don't think the size of
the index the leader is vacuuming has an impact. I think that having the leader
vacuuming a 100 GB index or 100 x 1GB indexes is the same (as long as all the
other workers are actives during all that time).

> > 3. A 1 second reporting "throttling" looks a reasonable threshold as:
> 
> > 3.1 the idea is to have a significant impact when the leader could have been
> > interrupted say hundred/thousand times per second.
> 
> > 3.2 it does not make that much sense for any tools to sample pg_stat_progress_vacuum
> > multiple times per second (so a one second reporting granularity seems ok).
> 
> I feel 1 second may still be too frequent. 

Maybe we'll need more measurements but this is what my test case made of:

vacuum_cost_delay = 1
vacuum_cost_limit = 10
8 parallel workers, 1 leader
21 indexes (about 1GB each, one 40MB), all in memory

lead to:

With 1 second reporting frequency, the leader has been interruped about 2500
times over 8m39s leading to about the same time as on master (8m43s).

> What about 10 seconds ( or 30 seconds )? 

I'm not sure (may need more measurements) but it would probably complicate the
reporting a bit (as with the current v3 we'd miss reporting the indexes that
take less time than the threshold to complete).

> I think this metric in particular will be mainly useful for vacuum runs that are 
> running for minutes or more, making reporting every 10 or 30 seconds 
> still useful.

Agree. OTOH, one could be interested to diagnose what happened during a say 5
seconds peak on I/O resource consumption/latency. Sampling pg_stat_progress_vacuum
at 1 second interval and see by how much the vaccum has been paused during that
time could help too (specially if it is made of a lot of parallel workers that
could lead to a lot of I/O). But it would miss data if we are reporting at a
larger rate.

> It just occurred to me also that pgstat_progress_parallel_incr_param 
> should have a code comment that it will interrupt a leader process and
> cause activity such as a sleep to end early.

Good point, I'll add a comment for it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Next
From: Amit Kapila
Date:
Subject: Re: New standby_slot_names GUC in PG 17