Re: Track the amount of time waiting due to cost_delay - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Track the amount of time waiting due to cost_delay
Date
Msg-id ZnbIIUQpFJKAJx2W@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Track the amount of time waiting due to cost_delay  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Track the amount of time waiting due to cost_delay
List pgsql-hackers
Hi,

On Thu, Jun 13, 2024 at 11:56:26AM +0000, Bertrand Drouvot wrote:
> ====================== Observations ===========================================
> 
> As compare to v2:
> 
> 1. scanning heap time is about the same
> 2. vacuuming indexes time is about 3 minutes longer on master
> 3. vacuuming heap time is about the same

I had a closer look to understand why the vacuuming indexes time has been about
3 minutes longer on master.

During the vacuuming indexes phase, the leader is helping vacuuming the indexes
until it reaches WaitForParallelWorkersToFinish() (means when all the remaining
indexes are currently handled by the parallel workers, the leader has nothing
more to do and so it is waiting for the parallel workers to finish).

During the time the leader process is involved in indexes vacuuming it is
also subject to wait due to cost delay.

But with v2 applied, the leader may be interrupted by the parallel workers while
it is waiting (due to the new pgstat_progress_parallel_incr_param() calls that
the patch is adding).

So, with v2 applied, the leader is waiting less (as interrupted while waiting)
when being involved in indexes vacuuming and that's why v2 is "faster" than
master.

To put some numbers, I did count the number of times the leader's pg_usleep() has
been interrupted (by counting the number of times the nanosleep() did return a
value < 0 in pg_usleep()). Here they are:

v2: the leader has been interrupted about 342605 times
master: the leader has been interrupted about 36 times

The ones on master are mainly coming from the pgstat_progress_parallel_incr_param() 
calls in parallel_vacuum_process_one_index().

The additional ones on v2 are coming from the pgstat_progress_parallel_incr_param()
calls added in vacuum_delay_point().

======== Conclusion ======

1. vacuuming indexes time has been longer on master because with v2, the leader
has been interrupted 342605 times while waiting, then making v2 "faster".

2. the leader being interrupted while waiting is also already happening on master
due to the pgstat_progress_parallel_incr_param() calls in
parallel_vacuum_process_one_index() (that have been added in 
46ebdfe164). It has been the case "only" 36 times during my test case.

I think that 2. is less of a concern but I think that 1. is something that needs
to be addressed because the leader process is not honouring its cost delay wait
time in a noticeable way (at least during my test case).

I did not think of a proposal yet, just sharing my investigation as to why
v2 has been faster than master during the vacuuming indexes phase.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Robins Tharakan
Date:
Subject: Re: recoveryCheck/008_fsm_truncation is failing on dodo in v14- (due to slow fsync?)
Next
From: Bruce Momjian
Date:
Subject: Re: New standby_slot_names GUC in PG 17