Thread: Restart pg_usleep when interrupted

Restart pg_usleep when interrupted

From

"Imseih (AWS), Sami"

Date:

28 June 2024, 19:59:15

Attachment protected by Amazon:
0001-Handle-Sleep-interrupts.patch

Amazon has replaced attachment with download link. Downloads will be available until July 28, 2024, 19:59 (UTC+00:00). Tell us what you think

For more information click here

Hi,

In the proposal by Bertrand [1] to implement vacuum cost delay tracking

in pg_stat_progress_vacuum, it was discovered that the vacuum cost delay

ends early on the leader process of a parallel vacuum due to parallel workers

reporting progress on index vacuuming, which was introduced in 17

with commit [2]. With this patch, everytime a parallel worker

completes a vacuum index, it will send a completion message to the leader.

The facility that allows a parallel worker to report progress to the leader was

introduced in commit [3].

In the case of the patch being proposed by Bertrand, the number of interrupts

will be much more frequent as parallel workers would send a message to the leader

to update the vacuum delay counters every vacuum_delay_point call.

Looking into this, the ideal solution appears to provide the ability for a pg_usleep

call to restart after being interrupted. Thanks to the usage of nanosleep as

of commit[4], this should be trivial to do as nanosleep

provides a remaining time, which can be used to restart the sleep. This

idea was also mentioned in [5].

I am attaching a POC patch that does the $SUBJECT. Rather than changing the

existing pg_usleep, the patch introduces a function, pg_usleep_handle_interrupt,

that takes in the sleep duration and a boolean flag to force handling of the

interrupt, or not.

This function can replace pg_usleep inside vacuum_delay_point, and could be

useful in other cases in which it’s important to handle interrupts.

For Windows, the “force” = “true” flag of the new function uses SleepEx which

from what I can tell based on the code comments is a non-interruptible sleep.

Thoughts?

[1] https://www.postgresql.org/message-id/ZnbIIUQpFJKAJx2W%40ip-10-97-1-34.eu-west-3.compute.internal

[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=46ebdfe164c61fbac961d1eb7f40e9a684289ae6

[3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f1889729dd3ab0352dc0ccc2ffcc1b1901f8e39f

[4] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a948e49e2ef11815be0b211723bfc5b53b7f75a8

[5] https://www.postgresql.org/message-id/24848.1473607575%40sss.pgh.pa.us

Regards,

Sami Imseih

Amazon Web Services (AWS)

Re: Restart pg_usleep when interrupted

From

Robert Haas

Date:

28 June 2024, 20:29:14

Hi,

I think you need to find a way to disable this "Attachment protected
by Amazon" thing:

http://postgr.es/m/01000190606e3d2a-116ead16-84d2-4449-8d18-5053da66b1f4-000000@email.amazonses.com

We want patches to be in the pgsql-hackers archives, not temporarily
accessible via some link.

...Robert

Re: Restart pg_usleep when interrupted

From

"Imseih (AWS), Sami"

Date:

28 June 2024, 20:32:27

> I think you need to find a way to disable this "Attachment protected
> by Amazon" thing:

Yes, I am looking into that. I only noticed after sending the email.

Sorry about that.

Sami

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

28 June 2024, 21:14:36

> We want patches to be in the pgsql-hackers archives, not temporarily
> accessible via some link.
>
> …Robert
>

Moving to another email going forward.

Reattaching the patch.




Regards,

Sami Imseih
Amazon Web Services

Attachment

0001-Handle-Sleep-interrupts.patch

Re: Restart pg_usleep when interrupted

From

Tom Lane

Date:

28 June 2024, 21:34:28

Sami Imseih <samimseih@gmail.com> writes:
> Reattaching the patch.

I feel like this is fundamentally a wrong solution, for the reasons
cited in the comment for pg_usleep: long sleeps are a bad idea
because of the resulting uncertainty about whether we'll respond to
interrupts and such promptly.  An example here is that if we get
a query cancel interrupt, we should probably not insist on finishing
out the current sleep before responding.

Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place.  A better idea might be to use
pg_sleep() or equivalent code.

            regards, tom lane

Re: Restart pg_usleep when interrupted

From

Thomas Munro

Date:

28 June 2024, 22:11:39

On Sat, Jun 29, 2024 at 9:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Therefore, rather than "improving" pg_usleep (and uglifying its API),
> the right answer is to fix parallel vacuum leaders to not depend on
> pg_usleep in the first place.  A better idea might be to use
> pg_sleep() or equivalent code.

In case it's useful for someone looking into that, in earlier
discussions we figured out that it is possible to have high resolution
timeouts AND support latch multiplexing on Linux, FreeBSD, macOS:

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BhC9mFx8tEcBsyo7-cAfWgtbRy1eDizeFuff2K7T%3D4bA%40mail.gmail.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

28 June 2024, 22:50:20

Thanks for the feedback!

On Jun 28, 2024, at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Sami Imseih <samimseih@gmail.com> writes:
Reattaching the patch.

I feel like this is fundamentally a wrong solution, for the reasons
cited in the comment for pg_usleep: long sleeps are a bad idea
because of the resulting uncertainty about whether we'll respond to
interrupts and such promptly. An example here is that if we get
a query cancel interrupt, we should probably not insist on finishing
out the current sleep before responding.

The case which brought up this discussion is the pg_usleep that

is called within the vacuum_delay_point being interrupted.

When I read the same code comment you cited, it sounded to me

that “long sleeps” are those that are in seconds or minutes. The longest

vacuum delay allowed is 100ms.

Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place. A better idea might be to use
pg_sleep() or equivalent code.

Yes, that is a good idea to explore and it will not require introducing

an awkward new API. I will look into using something similar to

pg_sleep.

Regards,

Sami

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

01 July 2024, 16:16:32

Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place. A better idea might be to use
pg_sleep() or equivalent code.

Yes, that is a good idea to explore and it will not require introducing
an awkward new API. I will look into using something similar to
pg_sleep.

Looking through the history of the sleep in vacuum_delay_point, commit

720de00af49 replaced WaitLatch with pg_usleep to allow for microsecond

sleep precision [1].

Thomas has proposed a WaitLatchUs implementation in [2], but I have not

yet tried it.

So I see there are 2 possible options here to deal with the interrupt of a

parallel vacuum leader when a message is sent by a parallel vacuum worker.

Option 1/ something like my initial proposal which is

to create a function similar to pg_usleep that is able to deal with

interrupts in a sleep. This could be a function scoped only to vacuum.c,

so it can only be used for vacuum delay purposes.

——

Option 2/ to explore the WaitLatchUs implementation by

Thomas which will give both a latch implementation for a sleep with

the microsecond precision.

It is worth mentioning that if we do end up using WaitLatch(Us) inside

vacuum_delay_point, it will need to set only WL_TIMEOUT and

WL_EXIT_ON_PM_DEATH.

i.e.

(void) WaitLatch(MyLatch, WL_TIMEOUT| WL_EXIT_ON_PM_DEATH,

msec

WAIT_EVENT_VACUUM_DELAY);

This way it is not interrupted by a WL_LATCH_SET when a message

is set by a parallel worker.

——

Ultimately, I think option 2 may be worth a closer look as it is a cleaner

and safer approach, to detect a postmaster death.

Thoughts?

[1] https://postgr.es/m/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg%40mail.gmail.com

[2] https://www.postgresql.org/message-id/CA%2BhUKGKVbJE59JkwnUj5XMY%2B-rzcTFciV9vVC7i%3DLUfWPds8Xw%40mail.gmail.com

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

02 July 2024, 09:05:40

Hi,

On Fri, Jun 28, 2024 at 05:50:20PM -0500, Sami Imseih wrote:
> 
> Thanks for the feedback!
> 
> > On Jun 28, 2024, at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 
> > Sami Imseih <samimseih@gmail.com> writes:
> >> Reattaching the patch.
> > 
> > I feel like this is fundamentally a wrong solution, for the reasons
> > cited in the comment for pg_usleep: long sleeps are a bad idea
> > because of the resulting uncertainty about whether we'll respond to
> > interrupts and such promptly.  An example here is that if we get
> > a query cancel interrupt, we should probably not insist on finishing
> > out the current sleep before responding.
> 
> The case which brought up this discussion is the pg_usleep that 
> is called within the vacuum_delay_point being interrupted. 
> 
> When I read the same code comment you cited, it sounded to me 
> that “long sleeps” are those that are in seconds or minutes. The longest 
> vacuum delay allowed is 100ms.

I think that with the proposed patch the actual wait time can be "long".

Indeed, the time between the interruptions and restarts of the nanosleep() call
will lead to drift (as mentioned in the nanosleep() man page). So, with a large
number of interruptions, the actual wait time could be way longer than the
expected wait time.

To put numbers, I did a few tests with the patch (and with v2 shared in [1]):

cost delay is 1ms and cost limit is 10.

With 50 indexes and 10 parallel workers I can see things like:

2024-07-02 08:22:23.789 UTC [2189616] LOG:  expected 1.000000, actual 239.378368
2024-07-02 08:22:24.575 UTC [2189616] LOG:  expected 0.100000, actual 224.331737
2024-07-02 08:22:25.363 UTC [2189616] LOG:  expected 1.300000, actual 230.462793
2024-07-02 08:22:26.154 UTC [2189616] LOG:  expected 1.000000, actual 225.980803

Means we waited more than the max allowed cost delay (100ms).

With 49 parallel workers, it's worst as I can see things like:

2024-07-02 08:26:36.069 UTC [2189807] LOG:  expected 1.000000, actual 1106.790136
2024-07-02 08:26:36.298 UTC [2189807] LOG:  expected 1.000000, actual 218.148985

The first actual wait time is about 1 second (it has been interrupted about
16300 times during this second).

To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
with an absolute time value, that might be another idea to explore.

[1]: https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

05 July 2024, 16:49:45

With 50 indexes and 10 parallel workers I can see things like:

2024-07-02 08:22:23.789 UTC [2189616] LOG: expected 1.000000, actual 239.378368
2024-07-02 08:22:24.575 UTC [2189616] LOG: expected 0.100000, actual 224.331737
2024-07-02 08:22:25.363 UTC [2189616] LOG: expected 1.300000, actual 230.462793
2024-07-02 08:22:26.154 UTC [2189616] LOG: expected 1.000000, actual 225.980803

Means we waited more than the max allowed cost delay (100ms).

With 49 parallel workers, it's worst as I can see things like:

2024-07-02 08:26:36.069 UTC [2189807] LOG: expected 1.000000, actual 1106.790136
2024-07-02 08:26:36.298 UTC [2189807] LOG: expected 1.000000, actual 218.148985

The first actual wait time is about 1 second (it has been interrupted about
16300 times during this second).

To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
with an absolute time value, that might be another idea to explore.

[1]: https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal

I could not reproduce the same time you drift you observed on my

machine, so I am guessing the time drift could be worse on certain

platforms than others.

I also looked into the WaitLatchUs patch proposed by Thomas in [1]

and since my system does have epoll_pwait(2) available, I could not

achieve the sub-millisecond wait times.

A more portable approach which could be to continue using nanosleep and

add checks to ensure that nanosleep exists whenever

it goes past an absolute time. This was suggested by Bertrand in an offline

conversation. I am not yet fully convinced of this idea, but posting the patch

that implements this idea for anyone interested in looking.

Since sub-millisecond sleep times are not guaranteed as suggested by

the vacuum_cost_delay docs ( see below ), an alternative idea

is to use clock_nanosleep for vacuum delay when it’s available, else

fallback to WaitLatch.

"While vacuum_cost_delay can be set to fractional-millisecond values,

such delays may not be measured accurately on older platforms”

[1] https://www.postgresql.org/message-id/CA%2BhUKGKVbJE59JkwnUj5XMY%2B-rzcTFciV9vVC7i%3DLUfWPds8Xw%40mail.gmail.com

Regards,

Sami

Attachment

0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

05 July 2024, 16:57:37

>
> A more portable approach which could be to continue using nanosleep and
> add checks to ensure that nanosleep exists whenever
> it goes past an absolute time. This was suggested by Bertrand in an offline
> conversation. I am not yet fully convinced of this idea, but posting the patch
> that implements this idea for anyone interested in looking.

oops, forgot to attach the patch. Here it is.

Attachment

0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

09 July 2024, 10:44:05

Hi,

On Fri, Jul 05, 2024 at 11:49:45AM -0500, Sami Imseih wrote:
> 
> > With 50 indexes and 10 parallel workers I can see things like:
> > 
> > 2024-07-02 08:22:23.789 UTC [2189616] LOG:  expected 1.000000, actual 239.378368
> > 2024-07-02 08:22:24.575 UTC [2189616] LOG:  expected 0.100000, actual 224.331737
> > 2024-07-02 08:22:25.363 UTC [2189616] LOG:  expected 1.300000, actual 230.462793
> > 2024-07-02 08:22:26.154 UTC [2189616] LOG:  expected 1.000000, actual 225.980803
> > 
> > Means we waited more than the max allowed cost delay (100ms).
> > 
> > With 49 parallel workers, it's worst as I can see things like:
> > 
> > 2024-07-02 08:26:36.069 UTC [2189807] LOG:  expected 1.000000, actual 1106.790136
> > 2024-07-02 08:26:36.298 UTC [2189807] LOG:  expected 1.000000, actual 218.148985
> > 
> > The first actual wait time is about 1 second (it has been interrupted about
> > 16300 times during this second).
> > 
> > To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
> > with an absolute time value, that might be another idea to explore.
> > 
> > [1]: https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
> > 
> 
> 
> A more portable approach which could be to continue using nanosleep and
> add checks to ensure that nanosleep exists whenever
> it goes past an absolute time. This was suggested by Bertrand in an offline
> conversation. I am not yet fully convinced of this idea, but posting the patch
> that implements this idea for anyone interested in looking.

Thanks! 

I did a few tests with the patch and did not see any "large" drifts like the
ones observed above.

As far the patch, not thoroughly review (as it's still one option among others
being discussed)):

+           struct timespec current;
+           float time_diff;
+
+           clock_gettime(PG_INSTR_CLOCK, ¤t);
+
+           time_diff = (absolute.tv_sec - current.tv_sec) + (absolute.tv_nsec - current.tv_nsec) / 1000000000.0;

I think it could be "simplified" by making use of instr_time instead of timespec
for current and absolute. Then I think it would be enough to compare their
ticks.

> Since sub-millisecond sleep times are not guaranteed as suggested by
> the vacuum_cost_delay docs ( see below ), an alternative idea
> is to use clock_nanosleep for vacuum delay when it’s available, else
> fallback to WaitLatch.

Wouldn't that increase even more the cases where sub-millisecond won't be
guaranteed?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

11 July 2024, 15:15:41

I did a few tests with the patch and did not see any "large" drifts like the
ones observed above.

Thanks for testing.

I think it could be "simplified" by making use of instr_time instead of timespec
for current and absolute. Then I think it would be enough to compare their
ticks.

Correct I attached a v2 of this patch that uses instr_time to check the elapsed

time and break out of the loop. It needs some more benchmarking.

Since sub-millisecond sleep times are not guaranteed as suggested by
the vacuum_cost_delay docs ( see below ), an alternative idea
is to use clock_nanosleep for vacuum delay when it’s available, else
fallback to WaitLatch.

Wouldn't that increase even more the cases where sub-millisecond won't be
guaranteed?

Yes, nanosleep is going to provide the most coverage as it’s widely available.

Regards,

Sami

Attachment

v2-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

11 July 2024, 15:19:22

>
> Correct I attached a v2 of this patch that uses instr_time to check the elapsed
> time and break out of the loop. It needs some more benchmarking.

My email client has an issue sending attachments it seems. Reattaching 

Regards,

Sami

Attachment

v2-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

11 July 2024, 15:34:06

+        /*
+         * We allow nanosleep to handle interrupts and retry with the remaining time.
+         * However, since nanosleep is susceptible to time drift when interrupted
+         * frequently, we add a safeguard to break out of the nanosleep whenever the
+         * total time of the sleep exceeds the requested sleep time. Using nanosleep
+         * is a more portable approach than clock_nanosleep.
+         */

I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
the bottom of the while loop to avoid needing this extra check.  Also, I
think we need some commentary about why we want to retry after an interrupt
in this case.

-- 
nathan

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

11 July 2024, 18:10:25

>
> I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
> the bottom of the while loop to avoid needing this extra check.

Can you elaborate further? I am not sure how this will work since delay is a timespec
and elapsed time is an instr_time.

Also, in every iteration of the loop, the delay must be set to the remaining time. The
purpose of the elapsed_time is to make sure that we don’t surpass requested time
delay as an additional safeguard.

> Also, I
> think we need some commentary about why we want to retry after an interrupt
> in this case.

I will elaborate further in the comments for the next revision.


Regards,

Sami

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

11 July 2024, 19:56:44

On Thu, Jul 11, 2024 at 01:10:25PM -0500, Sami Imseih wrote:
>> I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
>> the bottom of the while loop to avoid needing this extra check.  
> 
> Can you elaborate further? I am not sure how this will work since delay is a timespec 
> and elapsed time is an instr_time. 
> 
> Also, in every iteration of the loop, the delay must be set to the remaining time. The
> purpose of the elapsed_time is to make sure that we don´t surpass requested time
> delay as an additional safeguard.

I'm imagining something like this:

    struct timespec delay;
    TimestampTz end_time;

    end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);

    do
    {
        long        secs;
        int         microsecs;

        TimestampDifference(GetCurrentTimestamp(), end_time,
                            &secs, µsecs);

        delay.tv_sec = secs;
        delay.tv_nsec = microsecs * 1000;

    } while (nanosleep(&delay, NULL) == -1 && errno == EINTR);

-- 
nathan

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

12 July 2024, 07:54:51

Hi,

On Thu, Jul 11, 2024 at 10:15:41AM -0500, Sami Imseih wrote:
> 
> > I did a few tests with the patch and did not see any "large" drifts like the
> > ones observed above.
> 
> Thanks for testing.
> 
> > I think it could be "simplified" by making use of instr_time instead of timespec
> > for current and absolute. Then I think it would be enough to compare their
> > ticks.
> 
> Correct I attached a v2 of this patch that uses instr_time to check the elapsed
> time and break out of the loop. It needs some more benchmarking.

Thanks!

Outside of Nathan's comment:

1 ===

+                * However, since nanosleep is susceptible to time drift when interrupted
+                * frequently, we add a safeguard to break out of the nanosleep whenever the

I'm not sure that "nanosleep is susceptible to time drift when interrupted frequently"
is a correct wording.

What about?

"
However, since the time between frequent interruptions and restarts of the
nanosleep calls can substantially lead to drift in the time when the sleep
finally completes, we add...."

2 ===

+static void vacuum_sleep(double msec);

What about a more generic name that could be used outside of vacuum?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Alvaro Herrera

Date:

12 July 2024, 09:39:54

On 2024-Jul-11, Nathan Bossart wrote:

> I'm imagining something like this:
> 
>     struct timespec delay;
>     TimestampTz end_time;
> 
>     end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);
> 
>     do
>     {
>         long        secs;
>         int         microsecs;
> 
>         TimestampDifference(GetCurrentTimestamp(), end_time,
>                             &secs, µsecs);
> 
>         delay.tv_sec = secs;
>         delay.tv_nsec = microsecs * 1000;
> 
>     } while (nanosleep(&delay, NULL) == -1 && errno == EINTR);

This looks nicer.  We could deal with clock drift easily (in case the
sysadmin winds the clock back) by testing that tv_sec+tv_nsec is not
higher than the initial time to sleep.  I don't know how common this
situation is nowadays, but I remember debugging a system years ago where
autovacuum was sleeping for a very long time because of that.  I can't
remember now if we did anything in the code to cope, or just told
sysadmins not to do that anymore :-)

FWIW my (Linux's) nanosleep() manpage contains this note:

       If  the interval specified in req is not an exact multiple of the granu‐
       larity underlying clock (see time(7)), then the interval will be rounded
       up  to the next multiple.  Furthermore, after the sleep completes, there
       may still be a delay before the CPU becomes free to once  again  execute
       the calling thread.

It's not clear to me what happens if the time to sleep is zero, so maybe
there should be a "if tv_sec == 0 && tv_nsec == 0 then break" statement
at the bottom of the loop, to quit without sleeping one more time than
needed.

For Windows, this [1] looks like an interesting and possibly relevant
read (though maybe SleepEx already does what we want to do here.)

[1] https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

12 July 2024, 17:14:56

I'm imagining something like this:

   struct timespec delay;
   TimestampTz end_time;

   end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);

   do
   {
       long        secs;
       int         microsecs;

       TimestampDifference(GetCurrentTimestamp(), end_time,
                           &secs, &microsecs);

       delay.tv_sec = secs;
       delay.tv_nsec = microsecs * 1000;

   } while (nanosleep(&delay, NULL) == -1 && errno == EINTR);

I do agree that this is cleaner code, but I am not sure I like this.

1/ TimestampDifference has a dependency on gettimeofday,

while my proposal utilizes clock_gettime. There are old discussions

that did not reach a conclusion comparing both mechanisms.

My main conclusion from these hacker discussions [1], [2] and other

online discussions on the topic is clock_gettime should replace

getimeofday when possible. Precision is the main reason.

2/ It no longer uses the remain time. I think the remain time

is still required here. I did a unrealistic stress test which shows

the original proposal can handle frequent interruptions much better.

#1 in one session kicked off a vacuum

set vacuum_cost_delay = 10;

set vacuum_cost_limit = 1;

set client_min_messages = log;

update large_tbl set version = 1;

vacuum (verbose, parallel 4) large_tbl;

#2 in another session, ran a loop to continually

interrupt the vacuum leader. This was during the

“heap scan” phase of the vacuum.

PID=< pid of vacuum leader >

while :

kill -USR1 $PID

done

Using the proposed loop with the remainder, I noticed that

the actual time reported remains close to the requested

delay time.

LOG: 10.000000,10.013420

LOG: 10.000000,10.011188

LOG: 10.000000,10.010860

LOG: 10.000000,10.014839

LOG: 10.000000,10.004542

LOG: 10.000000,10.006035

LOG: 10.000000,10.012230

LOG: 10.000000,10.014535

LOG: 10.000000,10.009645

LOG: 10.000000,10.000817

LOG: 10.000000,10.002162

LOG: 10.000000,10.011721

LOG: 10.000000,10.011655

Using the approach mentioned by Nathan, there

are large differences between requested and actual time.

LOG: 10.000000,17.801778

LOG: 10.000000,12.795450

LOG: 10.000000,11.793723

LOG: 10.000000,11.796317

LOG: 10.000000,13.785993

LOG: 10.000000,11.803775

LOG: 10.000000,15.782767

LOG: 10.000000,31.783901

LOG: 10.000000,19.792440

LOG: 10.000000,21.795795

LOG: 10.000000,18.800412

LOG: 10.000000,16.782886

LOG: 10.000000,10.795197

LOG: 10.000000,14.793333

LOG: 10.000000,29.806556

LOG: 10.000000,18.810784

LOG: 10.000000,11.804956

LOG: 10.000000,24.809812

LOG: 10.000000,25.815600

LOG: 10.000000,22.809493

LOG: 10.000000,22.790908

LOG: 10.000000,19.699097

LOG: 10.000000,23.795613

LOG: 10.000000,24.797078

Let me know what you think?

[1] https://www.postgresql.org/message-id/flat/31856.1400021891%40sss.pgh.pa.us

[2] https://www.postgresql.org/message-id/flat/E1cO7fR-0003y0-9E%40gemulon.postgresql.org

Regards,

Sami

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

12 July 2024, 19:18:37

On Fri, Jul 12, 2024 at 12:14:56PM -0500, Sami Imseih wrote:
> 1/ TimestampDifference has a dependency on gettimeofday, 
> while my proposal utilizes clock_gettime. There are old discussions
> that did not reach a conclusion comparing both mechanisms. 
> My main conclusion from these hacker discussions [1], [2] and other 
> online discussions on the topic is clock_gettime should replace
> getimeofday when possible. Precision is the main reason.
> 
> 2/ It no longer uses the remain time. I think the remain time
> is still required here. I did a unrealistic stress test which shows 
> the original proposal can handle frequent interruptions much better.

My comment was mostly about coding style and not about gettimeofday()
versus clock_gettime().  What does your testing show when you don't have
the extra check, i.e.,

    struct timespec delay;
    struct timespec remain;

    delay.tv_sec = microsec / 1000000L;
    delay.tv_nsec = (microsec % 1000000L) * 1000;

    while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
        delay = remain;

-- 
nathan

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

12 July 2024, 20:39:57

What does your testing show when you don't have
the extra check, i.e.,

struct timespec delay;
struct timespec remain;

delay.tv_sec = microsec / 1000000L;
delay.tv_nsec = (microsec % 1000000L) * 1000;

while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
delay = remain;

This is similar to the first attempt [1],

+pg_usleep_handle_interrupt(long microsec, bool force)

{

if (microsec > 0)

{

#ifndef WIN32

struct timespec delay;

+ struct timespec remaining;

delay.tv_sec = microsec / 1000000L;

delay.tv_nsec = (microsec % 1000000L) * 1000;

- (void) nanosleep(&delay, NULL);

+ if (force)

+ while (nanosleep(&delay, &remaining) == -1 && errno == EINTR)

+ delay = remaining;

but Bertrand found long drifts [2[ which I could not reproduce.

To safeguard the long drifts, continue to use the &remain time with an

additional safeguard to make sure the actual sleep does not exceed the

requested sleep time.

[1] https://www.postgresql.org/message-id/7D50DC5B-80C6-47B5-8DA8-A6C68A115EE5%40gmail.com

[2] https://www.postgresql.org/message-id/ZoPC5IeP4k7sZpki%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

Sami

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

15 July 2024, 17:20:29

On Fri, Jul 12, 2024 at 03:39:57PM -0500, Sami Imseih wrote:
> but Bertrand found long drifts [2[ which I could not reproduce.
> To safeguard the long drifts, continue to use the &remain time with an 
> additional safeguard to make sure the actual sleep does not exceed the 
> requested sleep time.

Bertrand, does this safeguard fix the long drifts you saw?

-- 
nathan

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

16 July 2024, 05:06:47

Hi,

On Mon, Jul 15, 2024 at 12:20:29PM -0500, Nathan Bossart wrote:
> On Fri, Jul 12, 2024 at 03:39:57PM -0500, Sami Imseih wrote:
> > but Bertrand found long drifts [2[ which I could not reproduce.
> > To safeguard the long drifts, continue to use the &remain time with an 
> > additional safeguard to make sure the actual sleep does not exceed the 
> > requested sleep time.
> 
> Bertrand, does this safeguard fix the long drifts you saw?

Yeah, it was the case with the first version using the safeguard (see [1]) and
it's also the case with the last one shared in [2].

[1]: https://www.postgresql.org/message-id/Zo0UdeE3i9d0Wt5E%40ip-10-97-1-34.eu-west-3.compute.internal
[2]: https://www.postgresql.org/message-id/C18017A1-EDFD-4F2F-BCA1-0572D4CCC92B%40gmail.com 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

25 July 2024, 22:27:15

I am attaching v3 of the patch which addresses the comments made

earlier by Bertrand about the comment in the patch [1]. Also I will stick with

vacuum_sleep as the name as the function will be inside vacuum.c. I am not

sure we should make this function available outside of vacuum, but I would like

to hear other thoughts.

Also, earlier in the thread, Alvaro mentions what happens

if the sleep time is 0 [2]. In that case, we do not do anything as we check

if sleep time is > 0 microseconds before proceeding with the sleep

[1] https://www.postgresql.org/message-id/ZpDhS4nFX66ItAze%40ip-10-97-1-34.eu-west-3.compute.internal

[2] https://www.postgresql.org/message-id/202407120939.pr6wpjffmxov%40alvherre.pgsql

Regards,

Sami

Attachment

v3-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

25 July 2024, 22:29:06

attaching the patch again. Something is strange with my email client.

Regards,

Sami

Attachment

v3-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

26 July 2024, 08:27:25

Hi,

On Thu, Jul 25, 2024 at 05:27:15PM -0500, Sami Imseih wrote:
> I am attaching v3 of the patch which addresses the comments made
> earlier by Bertrand about the comment in the patch [1].

Thanks!

Looking at it:

1 ===

+       struct instr_time start_time;

I think we can get rid of the "struct" keyword here.

2 ===

+           struct instr_time current_time;
+           struct instr_time elapsed_time;

Same as above.

3 ===

I gave more thoughts and I think it can be simplified a bit to reduce the
number of operations in the while loop.

What about relying on a "absolute" time that way:

    instr_time absolute;
    absolute.ticks = start_time.ticks + msec * 1000000;

and then in the while loop:

    while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
    {
        instr_time current_time;
        INSTR_TIME_SET_CURRENT(current_time);

        if (current_time.ticks > absolute.ticks)
        {
            break;

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

29 July 2024, 23:15:49

> On Jul 26, 2024, at 3:27 AM, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jul 25, 2024 at 05:27:15PM -0500, Sami Imseih wrote:
>> I am attaching v3 of the patch which addresses the comments made
>> earlier by Bertrand about the comment in the patch [1].
>
> Thanks!
>
> Looking at it:
>
> 1 ===
>
> +       struct instr_time start_time;
>
> I think we can get rid of the "struct" keyword here.
>
> 2 ===
>
> +           struct instr_time current_time;
> +           struct instr_time elapsed_time;
>
> Same as above.

Will fix those 2.

>
> 3 ===
>
> I gave more thoughts and I think it can be simplified a bit to reduce the
> number of operations in the while loop.
>
> What about relying on a "absolute" time that way:
>
>     instr_time absolute;
>    absolute.ticks = start_time.ticks + msec * 1000000;
>
> and then in the while loop:
>
>    while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
>    {
>        instr_time current_time;
>        INSTR_TIME_SET_CURRENT(current_time);
>
>        if (current_time.ticks > absolute.ticks)
>        {
>            break;

While I agree this code is cleaner, myy hesitation there is we don’t
have any other place in which we access .ticks directly and the
common practice is to use the intsr_time.h APIs.


What do you think?


Regards,

Sami

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

30 July 2024, 06:31:34

Hi,

On Mon, Jul 29, 2024 at 06:15:49PM -0500, Sami Imseih wrote:
> > On Jul 26, 2024, at 3:27 AM, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > 3 ===
> > 
> > I gave more thoughts and I think it can be simplified a bit to reduce the
> > number of operations in the while loop.
> > 
> > What about relying on a "absolute" time that way:
> > 
> >     instr_time absolute;
> >    absolute.ticks = start_time.ticks + msec * 1000000;
> > 
> > and then in the while loop:
> > 
> >    while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
> >    {
> >        instr_time current_time;
> >        INSTR_TIME_SET_CURRENT(current_time);
> > 
> >        if (current_time.ticks > absolute.ticks)
> >        {
> >            break;
> 
> While I agree this code is cleaner, myy hesitation there is we don’t 
> have any other place in which we access .ticks directly and the 
> common practice is to use the intsr_time.h APIs.

yeah, we already have a few macros that access the .ticks, so maybe we could add
2 new ones, say:

1. INSTR_TIME_ADD_MS(t1, msec)
2. INSTR_TIME_IS_GREATER(t1, t2)

I think the less operations is done in the while loop the better.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

05 August 2024, 20:07:34

> yeah, we already have a few macros that access the .ticks, so maybe we could add
> 2 new ones, say:
>
> 1. INSTR_TIME_ADD_MS(t1, msec)
> 2. INSTR_TIME_IS_GREATER(t1, t2)
>
> I think the less operations is done in the while loop the better.
>

See v4. it includes 2 new instr_time.h macros to simplify the
code insidethe while loop.

Regards,

Sami

Attachment

v4-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

06 August 2024, 09:00:03

Hi,

On Mon, Aug 05, 2024 at 03:07:34PM -0500, Sami Imseih wrote:

> 
> > yeah, we already have a few macros that access the .ticks, so maybe we could add
> > 2 new ones, say:
> > 
> > 1. INSTR_TIME_ADD_MS(t1, msec)
> > 2. INSTR_TIME_IS_GREATER(t1, t2)
> > 
> > I think the less operations is done in the while loop the better.
> > 
> 
> See v4. it includes 2 new instr_time.h macros to simplify the 
> code insidethe while loop.

Thanks!

1 ===

+#define INSTR_TIME_IS_GREATER(x,y) \
+       ((bool) (x).ticks > (y).ticks)

Around parentheses are missing, that should be ((bool) ((x).ticks > (y).ticks)).
I did not pay attention to it initially but found it was the culprit of breaks
not occuring (while my test case produces some).

That said, I don't think the cast is necessary here and that we could get rid of
it. 

2 ===

What about providing a quick comment about the 2 new macros in header of 
instr_time.h? (like it is done for the others macros)

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

06 August 2024, 17:36:44

v5 takes care of the latest comments by Bertrand.

Regards,

Sami

Attachment

v5-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

07 August 2024, 06:00:53

Hi,

On Tue, Aug 06, 2024 at 12:36:44PM -0500, Sami Imseih wrote:

> 
> 
> v5 takes care of the latest comments by Bertrand.
> 

Thanks!

Please find attached a rebase version (due to 39a138fbef) and in passing I changed
one comment:

"add t in microseconds to a instr_time" -> "add t (in microseconds) to x"

Does that make sense to you?

That said, it looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

v6-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

07 August 2024, 14:11:19

On Aug 7, 2024, at 1:00 AM, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:

add t (in microseconds) to x”

I was attempting to be more verbose in the comment,

but what you have above matches the format of

the other comments. I am ok with your revision.

Regards.

Sami

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

07 August 2024, 14:23:41

Hi,

On Wed, Aug 07, 2024 at 09:11:19AM -0500, Sami Imseih wrote:
> 
> 
> > On Aug 7, 2024, at 1:00 AM, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
> > 
> > add t (in microseconds) to x”
> 
> 
> I was attempting to be more verbose in the comment,
> but what you have above matches the format of
> the other comments. I am ok with your revision. 
> 

Thanks!

I'm marking the CF entry [1], as RFC.

[1]: https://commitfest.postgresql.org/49/5161/

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

07 August 2024, 14:36:59

On Wed, Aug 07, 2024 at 06:00:53AM +0000, Bertrand Drouvot wrote:
> +        SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE);

I think this deserves a comment.

> +#define INSTR_TIME_ADD_MICROSEC(x,t) \
> +    ((x).ticks += t * NS_PER_US)

I'd add parentheses around "t" to ensure any expressions given as "t" are
evaluated first.

Also, do we need to worry about overflow here?  It looks like the rest of
instr_time.h is oblivious about overflow, so maybe this is better discussed
in a separate thread...

-- 
nathan

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

07 August 2024, 21:22:22

> 
> On Wed, Aug 07, 2024 at 06:00:53AM +0000, Bertrand Drouvot wrote:
>> +        SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE);
> 
> I think this deserves a comment.
> 

Done

>> +#define INSTR_TIME_ADD_MICROSEC(x,t) \
>> +    ((x).ticks += t * NS_PER_US)
> 
> I'd add parentheses around "t" to ensure any expressions given as "t" are
> evaluated first.
> 

Done


> Also, do we need to worry about overflow here?  It looks like the rest of
> instr_time.h is oblivious about overflow, so maybe this is better discussed
> in a separate thread...
> 

I agree, this needs to be handled in a different thread.

Please see v7.


Regards,

Sami

Attachment

v7-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

08 August 2024, 06:49:27

Hi,

On Wed, Aug 07, 2024 at 09:36:59AM -0500, Nathan Bossart wrote:
> Also, do we need to worry about overflow here?  It looks like the rest of
> instr_time.h is oblivious about overflow, so maybe this is better discussed
> in a separate thread...

Yeah, a separate thread would be better.

FWIW and just out of curiosity:

1. it seems to me that most of the time (always?) we are manipulating instr_time(s)
as interval(s) which (with int64) gives “space” for about 292 years interval time
measurement (if my maths are correct).

2. for "absolute" manipulation (if any) it would depend of the PG_INSTR_CLOCK.

A "man clock_gettime" mentions:

 2.1 CLOCK_MONOTONIC: on Linux, time since the system was booted. Not sure what
the longest Linux uptime record is but can't be more than since the 90's.

 2.2 CLOCK_REALTIME: Its time represents seconds and nanoseconds since the Epoch.
It means that we’re currently about 237 years away from the limit. So even,
if we were to say add 2 "recents" of them we are still about 183 years away from
the limit.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

08 August 2024, 19:52:07

On Thu, Aug 08, 2024 at 06:49:27AM +0000, Bertrand Drouvot wrote:
>  2.2 CLOCK_REALTIME: Its time represents seconds and nanoseconds since the Epoch.
> It means that we´re currently about 237 years away from the limit. So even,
> if we were to say add 2 "recents" of them we are still about 183 years away from
> the limit.

I hope to be retired before then.

-- 
nathan

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

08 August 2024, 20:01:20

On Wed, Aug 07, 2024 at 04:22:22PM -0500, Sami Imseih wrote:
> Please see v7.

Thanks.  This one looks pretty good to me, and so I plan to commit it in
the near future unless anyone voices concerns about the approach.

-- 
nathan

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

09 August 2024, 19:03:55

On Thu, Aug 08, 2024 at 03:01:20PM -0500, Nathan Bossart wrote:
> Thanks.  This one looks pretty good to me, and so I plan to commit it in
> the near future unless anyone voices concerns about the approach.

As I am preparing this for commit, I'm wondering whether it makes sense to
name the new function vacuum_sleep() and keep it private to vacuum.c.
Nothing about this function is terribly specific to vacuum, and it's not
inconceivable that it might be useful elsewhere.  Perhaps we should move it
to pgsleep.c and rename it to something to the effect of
pg_usleep_non_interruptable().

-- 
nathan

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

09 August 2024, 19:26:46

Hi,

On Fri, Aug 09, 2024 at 02:03:55PM -0500, Nathan Bossart wrote:
> On Thu, Aug 08, 2024 at 03:01:20PM -0500, Nathan Bossart wrote:
> > Thanks.  This one looks pretty good to me, and so I plan to commit it in
> > the near future unless anyone voices concerns about the approach.
> 
> As I am preparing this for commit, I'm wondering whether it makes sense to
> name the new function vacuum_sleep() and keep it private to vacuum.c.
> Nothing about this function is terribly specific to vacuum, and it's not
> inconceivable that it might be useful elsewhere.  Perhaps we should move it
> to pgsleep.c and rename it to something to the effect of
> pg_usleep_non_interruptable().

Yeah, I had the same thought in [1], so +1.

[1]: https://www.postgresql.org/message-id/ZpDhS4nFX66ItAze%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

10 August 2024, 00:01:55

>
> Yeah, I had the same thought in [1], so +1.
>
> [1]: https://www.postgresql.org/message-id/ZpDhS4nFX66ItAze%40ip-10-97-1-34.eu-west-3.compute.internal
>

The intention ( see start of the thread ) was to make this a general API,
but I was not sure if there are use cases outside of vacuum.c.

In v8, I moved the function to pgsleep.c/signals.c and called it pg_usleep_non_interruptible.
The function, unlike vacuum_sleep, will take in micros as an argument to match with pg_usleep.

Regards

Sami

Attachment

v8-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

10 August 2024, 13:27:56

v9 has some has some minor corrections to the comments.


Regards,

Sami

Attachment

v9-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

12 August 2024, 05:24:49

Hi,

On Sat, Aug 10, 2024 at 08:27:56AM -0500, Sami Imseih wrote:

> 
> 
> v9 has some has some minor corrections to the comments.
> 

Thanks!

1 ===

+ * Unlike pg_usleep, This function continues

s/This/this/?

Apart from the above, LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
mazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

12 August 2024, 15:13:38

> 
> + * Unlike pg_usleep, This function continues
> 
> s/This/this/?
> 
> Apart from the above, LGTM.
> 



Thanks for this catch. Uploaded v10.

Regards,

Sami

Attachment

v10-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

12 August 2024, 15:19:56

My email client attached the last response for
some reason :(

v10 attached in the previous message addresses
Bertrands last comment and replaces “This” with “this"

Regards,

Sami

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

12 August 2024, 15:56:18

Hi,

On Mon, Aug 12, 2024 at 10:19:56AM -0500, Sami Imseih wrote:
> v10 attached in the previous message addresses 
> Bertrands last comment and replaces “This” with “this"
> 

Thanks, v10 LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

12 August 2024, 19:50:21

On Mon, Aug 12, 2024 at 03:56:18PM +0000, Bertrand Drouvot wrote:
> Thanks, v10 LGTM.

As cfbot points out [0], this is missing a Windows/frontend implementation.

[0] https://cirrus-ci.com/task/6393555868975104

-- 
nathan

Re: Restart pg_usleep when interrupted

From

"Imseih (AWS), Sami"

Date:

12 August 2024, 20:51:35

> As cfbot points out [0], this is missing a Windows/frontend implementation.
>
> [0] https://cirrus-ci.com/task/6393555868975104

Looks like the pgsleep implementation is missing the
#ifndef WIN32

I followed what is done in pg_usleep.

v11 should now build on Windows, hopefully.

Strangely, v10 build on a Windows machine I have locally.


Regards,

Sami

Attachment

v11-0001-vaccum_delay-with-absolute-time-nanosleep.patch

Re: Restart pg_usleep when interrupted

From

Heikki Linnakangas

Date:

12 August 2024, 21:04:28

I'm trying to understand what the point of this patch is, so I went to 
read this thread from the beginning:

> In the proposal by Bertrand [1] to implement vacuum cost delay tracking
> in pg_stat_progress_vacuum, it was discovered that the vacuum cost delay
> ends early on the leader process of a parallel vacuum due to parallel workers > reporting progress on index
vacuuming,which was introduced in 17
 
> with commit [2]. With this patch, everytime a parallel worker
> completes a vacuum index, it will send a completion message to the leader.

Ok, so we might sometimes skip the sleep, if an interrupt is received. I 
agree that's a bit sloppy, but probably won't make any difference in 
practice.

> The facility that allows a parallel worker to report progress to the leader was
> introduced in commit [3].
>
> In the case of the patch being proposed by Bertrand, the number of interrupts > will be much more frequent as
parallelworkers would send a message 
 
to the leader
> to update the vacuum delay counters every vacuum_delay_point call.

Hmm, I wonder if that's a good design, if it results in a lot of interrupts.

On the patch itself: Making the sleeps in vacuum uninterruptible means 
that vacuum will be more slow to respond to interrupts. If you SIGTERM a 
vacuum process, or hit CTRL-C, you *would* want to exit the sleep ASAP.

Tom raised that concern earlier in this thread 
(https://www.postgresql.org/message-id/2100439.1719610468%40sss.pgh.pa.us), 
but it seems the discussion wandered off to the details of how to do the 
sleep, and left that unaddressed.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

12 August 2024, 22:04:02

On Tue, Aug 13, 2024 at 12:04:28AM +0300, Heikki Linnakangas wrote:
>> In the case of the patch being proposed by Bertrand, the number of
>> interrupts > will be much more frequent as parallel workers would send a
>> message
> to the leader
>> to update the vacuum delay counters every vacuum_delay_point call.
> 
> Hmm, I wonder if that's a good design, if it results in a lot of interrupts.

Skimming the last few messages of that thread [0], it looks like Bertrand
is exploring ways to avoid so many interrupts.  I guess the unavoidable
question is whether this work is still worthwhile given that improvement.

> On the patch itself: Making the sleeps in vacuum uninterruptible means that
> vacuum will be more slow to respond to interrupts. If you SIGTERM a vacuum
> process, or hit CTRL-C, you *would* want to exit the sleep ASAP.

Since the delay will typically be pretty small (2 milliseconds by default
for autovacuum), I'm assuming this won't ordinarily be noticeable.  But I
do think it is an important consideration.

[0] https://commitfest.postgresql.org/49/5027/

-- 
nathan

Re: Restart pg_usleep when interrupted

From

"Imseih (AWS), Sami"

Date:

12 August 2024, 22:35:08

> Skimming the last few messages of that thread [0], it looks like Bertrand
> is exploring ways to avoid so many interrupts.  I guess the unavoidable
> question is whether this work is still worthwhile given that improvement.
The way the instrumentation in [0] dealt with interrupts was too complex,
which is why it seemed better to handle the restart the remainder of the
sleep in the sleep function
>> On the patch itself: Making the sleeps in vacuum uninterruptible means that
>> vacuum will be more slow to respond to interrupts. If you SIGTERM a vacuum
>> process, or hit CTRL-C, you *would* want to exit the sleep ASAP.
> Since the delay will typically be pretty small (2 milliseconds by default
> for autovacuum), I'm assuming this won't ordinarily be noticeable.  But I
> do think it is an important consideration.
>
At most, the sleep will be 100ms for vacuum.

>
> Tom raised that concern earlier in this thread 
> (https://www.postgresql.org/message-id/2100439.1719610468%40sss.pgh.pa.us), 
> but it seems the discussion wandered off to the details of how to do 
> the sleep, and left that unaddressed.
>
Doing something like pg_sleep, using WaitLatch [1], was explored. 
However this
does not support microsecond sleeps which was allowed in 720de00af49
Thomas shared WaitLatchUs [2], which supports higher precision sleeps, but
it requires epoll_pwait(2) on the system, thus it's not very portable.
Using nanosleep with remain time and checking for drift was the most 
portable
approach.

Regards,

Sami

[0] https://commitfest.postgresql.org/49/5027/
[1] https://www.postgresql.org/message-id/67072E39-3B4E-4240-8373-AC45E23721E7%40gmail.com
[2] https://www.postgresql.org/message-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=LUfWPds8Xw@mail.gmail.com

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

13 August 2024, 05:41:16

Hi,

On Mon, Aug 12, 2024 at 05:04:02PM -0500, Nathan Bossart wrote:
> On Tue, Aug 13, 2024 at 12:04:28AM +0300, Heikki Linnakangas wrote:
> >> In the case of the patch being proposed by Bertrand, the number of
> >> interrupts > will be much more frequent as parallel workers would send a
> >> message
> > to the leader
> >> to update the vacuum delay counters every vacuum_delay_point call.
> > 
> > Hmm, I wonder if that's a good design, if it results in a lot of interrupts.
> 
> Skimming the last few messages of that thread [0], it looks like Bertrand
> is exploring ways to avoid so many interrupts.

Yeah, that was mainly to avoid the side effect of the interrupts making the 
vacuum faster as compared to the master branch (as in [0], the leader is not
honouring the delays when the parallel workers report their delayed time): that
could be noticeable depending of the amount of work and the number of parallel
workers involved in the vacuum.

That could be solved thanks to this thread. Once this thread is over (and whatever
the outcome is), I'll resume my testing as far the cost delay report is concerned.

> I guess the unavoidable
> question is whether this work is still worthwhile given that improvement.

The delay not being honored already affects the vacuum since we allow a parallel
worker to report progress to the leader (see [1]). The interrupts are far less
frequent (as compare to [1]) though.

[0]: https://commitfest.postgresql.org/49/5027/
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f1889729dd

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

13 August 2024, 15:09:57

On Mon, Aug 12, 2024 at 05:35:08PM -0500, Imseih (AWS), Sami wrote:
>> Skimming the last few messages of that thread [0], it looks like Bertrand
>> is exploring ways to avoid so many interrupts.  I guess the unavoidable
>> question is whether this work is still worthwhile given that improvement.
>
> The way the instrumentation in [0] dealt with interrupts was too complex,
> which is why it seemed better to handle the restart the remainder of the
> sleep in the sleep function

Can you elaborate on how it is too complex?

-- 
nathan

Re: Restart pg_usleep when interrupted

From

"Imseih (AWS), Sami"

Date:

13 August 2024, 15:47:51

On 8/13/24 10:09 AM, Nathan Bossart wrote:
> On Mon, Aug 12, 2024 at 05:35:08PM -0500, Imseih (AWS), Sami wrote:
>>> Skimming the last few messages of that thread [0], it looks like Bertrand
>>> is exploring ways to avoid so many interrupts.  I guess the unavoidable
>>> question is whether this work is still worthwhile given that improvement.
>> The way the instrumentation in [0] dealt with interrupts was too complex,
>> which is why it seemed better to handle the restart the remainder of the
>> sleep in the sleep function
> Can you elaborate on how it is too complex?
>
[0] made vacuum_delay_point more complex as it has to
instrument cost_delay at an interval to reduce the number
of interrupts to the leader.

On the other hand, with allowing the sleep to deal with
interrupts,no additional logic to space out instrumentation
is required.


Regards,

Sami


[0] https://commitfest.postgresql.org/49/5027/

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

13 August 2024, 15:57:24

On Tue, Aug 13, 2024 at 10:47:51AM -0500, Imseih (AWS), Sami wrote:
> On 8/13/24 10:09 AM, Nathan Bossart wrote:
>> On Mon, Aug 12, 2024 at 05:35:08PM -0500, Imseih (AWS), Sami wrote:
>> > > Skimming the last few messages of that thread [0], it looks like Bertrand
>> > > is exploring ways to avoid so many interrupts.  I guess the unavoidable
>> > > question is whether this work is still worthwhile given that improvement.
>> > The way the instrumentation in [0] dealt with interrupts was too complex,
>> > which is why it seemed better to handle the restart the remainder of the
>> > sleep in the sleep function
>> Can you elaborate on how it is too complex?
>> 
> [0] made vacuum_delay_point more complex as it has to
> instrument cost_delay at an interval to reduce the number
> of interrupts to the leader.

Sure, but looking at the patch [0], it adds maybe an extra 10 lines of code
to limit the reports to 1 Hz.  That doesn't strike me as too complex...

[0] https://postgr.es/m/ZnlPZZZJCRu/8fka%40ip-10-97-1-34.eu-west-3.compute.internal

-- 
nathan

Re: Restart pg_usleep when interrupted

From

"Imseih (AWS), Sami"

Date:

13 August 2024, 16:07:46

On 8/13/24 10:57 AM, Nathan Bossart wrote:
> On Tue, Aug 13, 2024 at 10:47:51AM -0500, Imseih (AWS), Sami wrote:
>> On 8/13/24 10:09 AM, Nathan Bossart wrote:
>>> On Mon, Aug 12, 2024 at 05:35:08PM -0500, Imseih (AWS), Sami wrote:
>>>>> Skimming the last few messages of that thread [0], it looks like Bertrand
>>>>> is exploring ways to avoid so many interrupts.  I guess the unavoidable
>>>>> question is whether this work is still worthwhile given that improvement.
>>>> The way the instrumentation in [0] dealt with interrupts was too complex,
>>>> which is why it seemed better to handle the restart the remainder of the
>>>> sleep in the sleep function
>>> Can you elaborate on how it is too complex?
>>>
>> [0] made vacuum_delay_point more complex as it has to
>> instrument cost_delay at an interval to reduce the number
>> of interrupts to the leader.
> Sure, but looking at the patch [0], it adds maybe an extra 10 lines of code
> to limit the reports to 1 Hz.  That doesn't strike me as too complex...
>
> [0] https://postgr.es/m/ZnlPZZZJCRu/8fka%40ip-10-97-1-34.eu-west-3.compute.internal
Perhaps "complex" may not be the correct way to describe it.

Having to add special handling to space out instrumentation
directly in vacuum_delay_point seems very odd to me. I don't
think vacuum_delay_point should have to worry about this.

Also,
1/ what is an appropriate interval to collect these stats?
2/ What if there are other callers in the future that wish
to instrument parallel vacuum workers? they will need to implement
similar logic.

Regards,

Sami

Re: Restart pg_usleep when interrupted

From

"Imseih (AWS), Sami"

Date:

13 August 2024, 16:15:56

Please disregards this point from the last reply:


"""

2/ What if there are other callers in the future that wish
to instrument parallel vacuum workers? they will need to implement
similar logic.

"""

I misspoke about this and this point does not matter since only
vacuum sleep matters for this current discussion.


Regards,

Sami

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

13 August 2024, 16:40:27

On Tue, Aug 13, 2024 at 11:07:46AM -0500, Imseih (AWS), Sami wrote:
> Having to add special handling to space out instrumentation
> directly in vacuum_delay_point seems very odd to me. I don't
> think vacuum_delay_point should have to worry about this.
> 
> Also,
> 1/ what is an appropriate interval to collect these stats?
> 2/ What if there are other callers in the future that wish
> to instrument parallel vacuum workers? they will need to implement
> similar logic.

None of this seems intractable to me.  1 Hz seems like an entirely
reasonable place to start, and it is very easy to change (or to even make
configurable).  pg_stat_progress_vacuum might show slightly old values in
this column, but that should be easy enough to explain in the docs if we
are really concerned about it.  If other callers want to do something
similar, maybe we should add a more generic implementation in
backend_progress.c.

-- 
nathan

Re: Restart pg_usleep when interrupted

From

"Imseih (AWS), Sami"

Date:

13 August 2024, 18:12:30

> None of this seems intractable to me.  1 Hz seems like an entirely
> reasonable place to start, and it is very easy to change (or to even make
> configurable).  pg_stat_progress_vacuum might show slightly old values in
> this column, but that should be easy enough to explain in the docs if we
> are really concerned about it.  If other callers want to do something
> similar, maybe we should add a more generic implementation in
> backend_progress.c.
>
I don't know if I agree. Making vacuum sleep more robust to handle
interrupts seems like a cleaner general solution than to add
even more code to handle this case or have to explain the behavior of
cost delay instrumentation in docs.


Regards,


Sami

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

13 August 2024, 21:30:46

On Tue, Aug 13, 2024 at 01:12:30PM -0500, Imseih (AWS), Sami wrote:
>> None of this seems intractable to me.  1 Hz seems like an entirely
>> reasonable place to start, and it is very easy to change (or to even make
>> configurable).  pg_stat_progress_vacuum might show slightly old values in
>> this column, but that should be easy enough to explain in the docs if we
>> are really concerned about it.  If other callers want to do something
>> similar, maybe we should add a more generic implementation in
>> backend_progress.c.
>> 
> I don't know if I agree. Making vacuum sleep more robust to handle
> interrupts seems like a cleaner general solution than to add
> even more code to handle this case or have to explain the behavior of
> cost delay instrumentation in docs.

Another concern is the huge number of PqMsg_Progress messages sent by
parallel workers with that approach.  In Bertrand's tests, he was seeing
nearly 350K interrupts for a ~19 minute vacuum (~300 interrupts per
second).  That seems a bit extreme to me.  I don't see how anyone could
possibly need stats about vacuum delays with that level of accuracy.

-- 
nathan

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

13 August 2024, 23:18:22

On Tue, Aug 13, 2024 at 4:30 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

On Tue, Aug 13, 2024 at 01:12:30PM -0500, Imseih (AWS), Sami wrote:
>> None of this seems intractable to me. 1 Hz seems like an entirely
>> reasonable place to start, and it is very easy to change (or to even make
>> configurable). pg_stat_progress_vacuum might show slightly old values in
>> this column, but that should be easy enough to explain in the docs if we
>> are really concerned about it. If other callers want to do something
>> similar, maybe we should add a more generic implementation in
>> backend_progress.c.
>>
> I don't know if I agree. Making vacuum sleep more robust to handle
> interrupts seems like a cleaner general solution than to add
> even more code to handle this case or have to explain the behavior of
> cost delay instrumentation in docs.

Another concern is the huge number of PqMsg_Progress messages sent by
parallel workers with that approach. In Bertrand's tests, he was seeing
nearly 350K interrupts for a ~19 minute vacuum (~300 interrupts per
second). That seems a bit extreme to me. I don't see how anyone could
possibly need stats about vacuum delays with that level of accuracy.

--
nathan

Fair point. If there is a clear benefit to spacing out the vacuum sleep

delay instrumentation, that could be taken up in that thread. This will

reduce the interrupts, but not eliminate them.

There could still be benefit to ensure that vacuum sleeps can deal

with interrupts and sleep the requested time consistently.

Regards,

Sami

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

14 August 2024, 06:00:06

Hi,

On Tue, Aug 13, 2024 at 04:30:46PM -0500, Nathan Bossart wrote:
> On Tue, Aug 13, 2024 at 01:12:30PM -0500, Imseih (AWS), Sami wrote:
> >> None of this seems intractable to me.  1 Hz seems like an entirely
> >> reasonable place to start, and it is very easy to change (or to even make
> >> configurable).  pg_stat_progress_vacuum might show slightly old values in
> >> this column, but that should be easy enough to explain in the docs if we
> >> are really concerned about it.  If other callers want to do something
> >> similar, maybe we should add a more generic implementation in
> >> backend_progress.c.
> >> 
> > I don't know if I agree. Making vacuum sleep more robust to handle
> > interrupts seems like a cleaner general solution than to add
> > even more code to handle this case or have to explain the behavior of
> > cost delay instrumentation in docs.
> 
> Another concern is the huge number of PqMsg_Progress messages sent by
> parallel workers with that approach.  In Bertrand's tests, he was seeing
> nearly 350K interrupts for a ~19 minute vacuum (~300 interrupts per
> second).  That seems a bit extreme to me.  I don't see how anyone could
> possibly need stats about vacuum delays with that level of accuracy.
> 

I gave it more thoughts and I don't think we have to choose between the two.
The 1 Hz approach reduces the number of interrupts and Sami's patch provides a
way to get "accurate" delay in case of interrupts. I think both have their own
benefit. 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Nathan Bossart

Date:

16 August 2024, 00:13:29

On Wed, Aug 14, 2024 at 06:00:06AM +0000, Bertrand Drouvot wrote:
> I gave it more thoughts and I don't think we have to choose between the two.
> The 1 Hz approach reduces the number of interrupts and Sami's patch provides a
> way to get "accurate" delay in case of interrupts. I think both have their own
> benefit. 

Is it really that important to delay with that level of accuracy?  In most
cases, the chances of actually interrupting a given vacuum delay point are
pretty small.  Even in the extreme scenario you tested with ~350K
interrupts in a 19 minute vacuum, you only saw a 10-15% difference in total
time.  I wouldn't say I'm diametrically opposed to this patch, but I do
think we need to carefully consider whether it's worth the extra code.

Separately, I've been wondering whether it's worth allowing the sleep to be
interrupted in certain cases, such as SIGINT and SIGTERM.  That should
address one of Heikki's points.

-- 
nathan

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

17 August 2024, 17:49:07

> time. I wouldn't say I'm diametrically opposed to this patch, but I do
> think we need to carefully consider whether it's worth the extra code.

FWIW, besides the patch that Bertrand is proposing [1], there is another parallel 
vacuum case being discussed to allow for parallel heap scan [2].

Being able to support both instrumentation of sleep time by a parallel workers
and ensuring that actual sleep times are as close as possible to the 
requested times is a good think, IMO.

> Separately, I've been wondering whether it's worth allowing the sleep to be
> interrupted in certain cases, such as SIGINT and SIGTERM. That should
> address one of Heikki's points.

An idea may be to check for pending interrupts inside the 
pg_usleep_non_interruptible nanosleep loop. If there is a
pending interrupt and the interrupt is QueryCancelPending or
ClientConnectionLost, we can break out immediately.

I am not sure yet how this can work for Windows, since for
this patch, we are using a simple SleepEx call which is 
non-interruptible anyhow. 

Is it worth the effort and even more code to deal with specific
Interrupts for such short sleeps ( less than 100ms for vacuum at most )?

I am also thinking that pg_usleep_non_interruptuble routine should have
a cap on the sleep time allowed. That cap can be 100ms to match the 
max vacuum_cost_delay. This will prevent anyone from trying to use
this API for much longer sleeps.

What do you think?

[1] https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
[2] https://www.postgresql.org/message-id/CAD21AoAEfCNv-GgaDheDJ%2Bs-p_Lv1H24AiJeNoPGCmZNSwL1YA%40mail.gmail.com


Regards,

Sami

Re: Restart pg_usleep when interrupted

From

Thomas Munro

Date:

18 August 2024, 02:12:22

On Wed, Aug 14, 2024 at 9:30 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Another concern is the huge number of PqMsg_Progress messages sent by
> parallel workers with that approach.  In Bertrand's tests, he was seeing
> nearly 350K interrupts for a ~19 minute vacuum (~300 interrupts per
> second).  That seems a bit extreme to me.  I don't see how anyone could
> possibly need stats about vacuum delays with that level of accuracy.

I suspect CF #5118 would fix lots of cases of ProcSignal() senders
going berserk, because it deletes SendProcSignal(), and introduces
SendInterrupt(), which calls SetLatch(), which doesn't send a signal if
the latch is already set.  Even if the latch is not already set, it
only sends a signal if the latch is currently being waited on
("maybe_sleeping" flag).  Even when it sends a signal, it goes to a
signalfd, kqueue or NT event flag on common platforms.

Of course that is only talking about the receiving side.  I'm sure we
can improve the senders too.  There's nothing we can do about NOTIFY,
because that's under user control, but that PqMsg_Progress case sounds
pretty bad, and the recovery conflict system could probably be made
more precise in its logic about who to wake up and when, etc.

Other backends going bananas with SendProcSignal() is the reason
dsm_impl_posix_resize() has to block signals while calling
posix_fallocate().  Unlike nanosleep(), which you can fix by tracking
remaining time, posix_fallocate() is all-or-nothing, it has no way to
report partial progress, so it must therefore undo its work if
interrupted, so its EINTR retry loop could get stuck forever when
other backends are trigger-happy with signals, which was a real
production issue.  I guess both of these issues go away in practice if
CF #5118 goes in.

Re: Restart pg_usleep when interrupted

From

Thomas Munro

Date:

18 August 2024, 02:27:36

On Sun, Aug 18, 2024 at 11:12 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> I guess both of these issues go away in practice if
> CF #5118 goes in.

To be more precise, if you just keep doing pg_usleep() the issue goes
away, and likewise for posix_fallocate() it goes away...  But if you
switch to WaitLatchUs() so you can handle latch wakeups in vacuum
delays, which really you should because the latch might be an urgent
request for you to CHECK_FOR_INTERRUPTS(), because another backend is
waiting for all backends to service a ProcSignalBarrier (we need a new
name for that), well now you'll get wakeups, so you're back to square
one if someone is sending them very fast.

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

19 August 2024, 17:29:57

Hi,

On Thu, Aug 15, 2024 at 04:13:29PM -0500, Nathan Bossart wrote:
> On Wed, Aug 14, 2024 at 06:00:06AM +0000, Bertrand Drouvot wrote:
> > I gave it more thoughts and I don't think we have to choose between the two.
> > The 1 Hz approach reduces the number of interrupts and Sami's patch provides a
> > way to get "accurate" delay in case of interrupts. I think both have their own
> > benefit. 
> 
> Is it really that important to delay with that level of accuracy?  In most
> cases, the chances of actually interrupting a given vacuum delay point are
> pretty small.  Even in the extreme scenario you tested with ~350K
> interrupts in a 19 minute vacuum, you only saw a 10-15% difference in total
> time.  I wouldn't say I'm diametrically opposed to this patch, but I do
> think we need to carefully consider whether it's worth the extra code.
>

I'm not 100% sure that it is worth it but on OTOH I'm thinking that could be the
case for someone that cares enough to change the cost delay from say 2ms to 3ms.

I mean, given the granularity we expose in the cost delay, one could expect to
get "accurate" delay. The doc is cautious enough to mention that "such delays may
not be measured accurately on older platforms" which makes me think that could
be worth to implement it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

20 August 2024, 15:55:32

Hi,

On Tue, Aug 13, 2024 at 11:40:27AM -0500, Nathan Bossart wrote:
> On Tue, Aug 13, 2024 at 11:07:46AM -0500, Imseih (AWS), Sami wrote:
> > Having to add special handling to space out instrumentation
> > directly in vacuum_delay_point seems very odd to me. I don't
> > think vacuum_delay_point should have to worry about this.
> > 
> > Also,
> > 1/ what is an appropriate interval to collect these stats?
> > 2/ What if there are other callers in the future that wish
> > to instrument parallel vacuum workers? they will need to implement
> > similar logic.
> 
> None of this seems intractable to me.  1 Hz seems like an entirely
> reasonable place to start, and it is very easy to change (or to even make
> configurable).  pg_stat_progress_vacuum might show slightly old values in
> this column, but that should be easy enough to explain in the docs if we
> are really concerned about it.  If other callers want to do something
> similar, maybe we should add a more generic implementation in
> backend_progress.c.
> 

As it looks like we have a consensus that reducing the number of interrupts also 
makes sense, I just provided a rebase version of the 1 Hz version (see [0], that
also makes clear in the doc that the new field might show slightly old values).

[0]: https://www.postgresql.org/message-id/ZsSQnS9OW9EWSOk4%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

20 August 2024, 22:25:19

> As it looks like we have a consensus that reducing the number of interrupts also 
> makes sense, I just provided a rebase version of the 1 Hz version (see [0], that
> also makes clear in the doc that the new field might show slightly old values).

That makes sense. However, I suspect the "1 Hz" work code will no longer
be needed if CF entry 5118 [1] mentioned by Thomas [2] a few days back 
goes through. Maybe this extra work can be removed if [1] goes through.
What do you think?

With regards to CF 5118 and what it means to the current discussion, below
are my thoughts.

I tested with both CF 5118 [1] and the cost-delay tracking patch. With that in place, 
pg_usleep is able to sleep the full requested, as mentioned by Thomas [3]. This is
because certain interrupts like Parallel Message and others are not signaled
by SIGUSR1 any longer, but with latches.

From this discussion, there is desire for a sleep function that:
1/ Sleeps for the full duration of the requested time
2/ Continues to handle important interrupts during the sleep

While something like CF 5118 will take care of point #1, it will not deal
with #2. Also, the v11 [4] patch does not do #2 either. So I think
in the sleep loop, we need a C_F_I call. The same type of loop can
also be used to call WaitForSingleObject.

If CF 5118 gets committed, will still need similar loop that calls C_F_I, 
but the function will need to call WaitLatchUs [5].

Thoughts?

--
Sami 


[1] https://commitfest.postgresql.org/49/5118/
[2] https://www.postgresql.org/message-id/CA%2BhUKG%2Bf-nEc_SowDLW1JMUa6Of5sCK-JZ%3Dv-KhL1xgXk83fiw%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CA%2BhUKGKpo3fM%3DrnfdxHqt%2BjNGh_zUNcL1ob4hMsb%3DjFfKn9Aag%40mail.gmail.com
[4] https://www.postgresql.org/message-id/e3297b5e-0b33-4f45-afcd-4b00ba0b3547%40gmail.com
[5] https://www.postgresql.org/message-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=LUfWPds8Xw@mail.gmail.com

Re: Restart pg_usleep when interrupted

From

Bertrand Drouvot

Date:

21 August 2024, 18:06:30

Hi,

On Tue, Aug 20, 2024 at 02:25:19PM -0500, Sami Imseih wrote:
> > As it looks like we have a consensus that reducing the number of interrupts also 
> > makes sense, I just provided a rebase version of the 1 Hz version (see [0], that
> > also makes clear in the doc that the new field might show slightly old values).
> 
> That makes sense. However, I suspect the "1 Hz" work code will no longer
> be needed if CF entry 5118 [1] mentioned by Thomas [2] a few days back 
> goes through. Maybe this extra work can be removed if [1] goes through.
> What do you think?

Yeah agree that the "1 Hz" work wouldn't be needed anymore if the CF entry 5118
goes through *and* if vacuum delays keep doing pg_usleep() (as the leader won't
receive SIGUSR1 from the parallel workers while executing nanosleep() anymore).
It could still receive SIGHUP or such but that's outside of the PqMsg_Progress
case though.

> With regards to CF 5118 and what it means to the current discussion, below
> are my thoughts.
> 
> I tested with both CF 5118 [1] and the cost-delay tracking patch. With that in place, 
> pg_usleep is able to sleep the full requested, as mentioned by Thomas [3]. This is
> because certain interrupts like Parallel Message and others are not signaled
> by SIGUSR1 any longer, but with latches.
>

Yeah.

> From this discussion, there is desire for a sleep function that:
> 1/ Sleeps for the full duration of the requested time
> 2/ Continues to handle important interrupts during the sleep
> 
> While something like CF 5118 will take care of point #1,

I'm not sure, even with CF entry 5118, nanosleep() could be interrupted. But I
agree that the leader won't be interrupted by PqMsg_Progress anymore.

> it will not deal
> with #2. Also, the v11 [4] patch does not do #2 either. So I think
> in the sleep loop, we need a C_F_I call. The same type of loop can
> also be used to call WaitForSingleObject.
> 
> If CF 5118 gets committed, will still need similar loop that calls C_F_I, 
> but the function will need to call WaitLatchUs [5].
> 
> Thoughts?

If CF 5118 gets committed, then I think we would need to move to using WaitLatchUs()
to react to urgent request. We'd also need to find a way to ensure that we'd
wait for the full duration of the requested time depending of the reason why we
waked up (well, only if we agree that 1/ is needed and I'm not sure we got a
consensus).

So I think that:

1. we should still implement the "1 Hz" stuff as 1.1/ it could be useful if CF
5118 gets committed and we move to WaitLatchUs() and 2.2/ it won't break anything
if CF gets committed and we don't move to WaitLatchUs(). For 1.1 it would still
prevent the leader to be waked up too frequently by the parallel workers.

2. still discuss the "need" and get a consensus regarding a sleep that could
ensure the wait duration (in some cases and depending of the reason why the
process is waked up).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Restart pg_usleep when interrupted

From

Sami Imseih

Date:

30 August 2024, 18:50:49

> From this discussion, there is desire for a sleep function that:
> 1/ Sleeps for the full duration of the requested time
> 2/ Continues to handle important interrupts during the sleep
> 
> While something like CF 5118 will take care of point #1,


> I'm not sure, even with CF entry 5118, nanosleep() could be interrupted. But I
> agree that the leader won't be interrupted by PqMsg_Progress anymore.

Correct.


> 1. we should still implement the "1 Hz" stuff as 1.1/ it could be useful if CF
> 5118 gets committed and we move to WaitLatchUs() and 2.2/ it won't break anything
> if CF gets committed and we don't move to WaitLatchUs(). For 1.1 it would still
> prevent the leader to be waked up too frequently by the parallel workers.

Yes, regardless of any changes that may occur in the future that change the behaior
of pg_usleep, preventing a leader from being woken up too frequently is
good to have. The "1 Hz" work is still good to have.


> 2. still discuss the "need" and get a consensus regarding a sleep that could
> ensure the wait duration (in some cases and depending of the reason why the
> process is waked up).

Unless there is objection, I will withdraw the CF [1] entry for this patch next week.

This discussion however should be one of the points that CF 5118 must deal with.
That is, 5118 will change the behavior of pg_usleep when dealing with interrupts
previously signaled by SIGUSR1. 


[1] https://commitfest.postgresql.org/49/5161/

Regards, 

Sami