Re: Restart pg_usleep when interrupted - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: Restart pg_usleep when interrupted
Date
Msg-id 01A15AEA-C35C-41DF-8E81-3B5A0B523939@gmail.com
Whole thread Raw
In response to Re: Restart pg_usleep when interrupted  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Restart pg_usleep when interrupted
List pgsql-hackers

I'm imagining something like this:

   struct timespec delay;
   TimestampTz end_time;

   end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);

   do
   {
       long        secs;
       int         microsecs;

       TimestampDifference(GetCurrentTimestamp(), end_time,
                           &secs, &microsecs);

       delay.tv_sec = secs;
       delay.tv_nsec = microsecs * 1000;

   } while (nanosleep(&delay, NULL) == -1 && errno == EINTR);


I do agree that this is cleaner code, but I am not sure I like this.


1/ TimestampDifference has a dependency on gettimeofday, 
while my proposal utilizes clock_gettime. There are old discussions
that did not reach a conclusion comparing both mechanisms. 
My main conclusion from these hacker discussions [1], [2] and other 
online discussions on the topic is clock_gettime should replace
getimeofday when possible. Precision is the main reason.

2/ It no longer uses the remain time. I think the remain time
is still required here. I did a unrealistic stress test which shows 
the original proposal can handle frequent interruptions much better.

#1 in one session kicked off a vacuum

    set vacuum_cost_delay = 10;
    set vacuum_cost_limit = 1;
    set client_min_messages = log;
    update large_tbl set version = 1;
    vacuum (verbose, parallel 4) large_tbl;

#2 in another session, ran a loop to continually
interrupt the vacuum leader. This was during the
“heap scan” phase of the vacuum.

PID=< pid of vacuum leader >
while :
do
    kill -USR1 $PID
done


Using the proposed loop with the remainder, I noticed that
the actual time reported remains close to the requested
delay time.

LOG:  10.000000,10.013420

LOG:  10.000000,10.011188

LOG:  10.000000,10.010860

LOG:  10.000000,10.014839

LOG:  10.000000,10.004542

LOG:  10.000000,10.006035

LOG:  10.000000,10.012230

LOG:  10.000000,10.014535

LOG:  10.000000,10.009645

LOG:  10.000000,10.000817

LOG:  10.000000,10.002162

LOG:  10.000000,10.011721

LOG:  10.000000,10.011655


Using the approach mentioned by Nathan, there

are large differences between requested and actual time.


LOG:  10.000000,17.801778

LOG:  10.000000,12.795450

LOG:  10.000000,11.793723

LOG:  10.000000,11.796317

LOG:  10.000000,13.785993

LOG:  10.000000,11.803775

LOG:  10.000000,15.782767

LOG:  10.000000,31.783901

LOG:  10.000000,19.792440

LOG:  10.000000,21.795795

LOG:  10.000000,18.800412

LOG:  10.000000,16.782886

LOG:  10.000000,10.795197

LOG:  10.000000,14.793333

LOG:  10.000000,29.806556

LOG:  10.000000,18.810784

LOG:  10.000000,11.804956

LOG:  10.000000,24.809812

LOG:  10.000000,25.815600

LOG:  10.000000,22.809493

LOG:  10.000000,22.790908

LOG:  10.000000,19.699097

LOG:  10.000000,23.795613

LOG:  10.000000,24.797078


Let me know what you think?

[1] https://www.postgresql.org/message-id/flat/31856.1400021891%40sss.pgh.pa.us



Regards,

Sami 

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Remove dependence on integer wrapping
Next
From: Dean Rasheed
Date:
Subject: Re: Adding OLD/NEW support to RETURNING