Re: Suppressing useless wakeups in walreceiver - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Suppressing useless wakeups in walreceiver
Date
Msg-id CA+hUKGKEg53sAYhjjB3HE++bi9d5o4Dt1Hy-5rkU4WwAqSjNGA@mail.gmail.com
Whole thread Raw
In response to Re: Suppressing useless wakeups in walreceiver  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
On Tue, Nov 15, 2022 at 12:14 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> On Tue, Nov 15, 2022 at 09:42:26AM +1300, Thomas Munro wrote:
> > That works for 020_pg_receivewal.  I wonder if there are also tests
> > that stream a bit of WAL first and then do wait_for_catchup that were
> > previously benefiting from the 100ms-after-startup message by
> > scheduling luck (as in, that was usually enough for replay)?  I might
> > go and teach Cluster.pm to log how much time is wasted in
> > wait_for_catchup to get some observability, and then try to figure out
> > how to optimise it properly.  We could perhaps put the 100ms duct tape
> > back temporarily though, if necessary.
>
> Oh, I see.  Since we don't check the apply position when determining
> whether to send a reply, tests may need to wait a full
> wal_receiver_status_interval.  FWIW with my patch, the runtime of the
> src/test/recovery tests seems to be back to what it was on my machine, but
> I certainly wouldn't rule out scheduling luck.

Yeah, what you have looks good in my experiments.  Here are some
statistics on waits on my system.  Times in seconds, first row is
before the new time logic went in, your patch is the last row.

     name      | count |   sum   |  avg  | stddev |  min  |  max
---------------+-------+---------+-------+--------+-------+--------
 before        |   718 |  37.375 | 0.052 |  0.135 | 0.006 |  1.110
 master        |   718 | 173.100 | 0.241 |  1.374 | 0.006 | 10.004
 initial-100ms |   718 |  37.169 | 0.052 |  0.126 | 0.006 |  0.676
 initial-0ms   |   718 |  35.276 | 0.049 |  0.123 | 0.006 |  0.661

The difference on master is explained by these 14 outliers:

                   name                   |  time
------------------------------------------+--------
 testrun/recovery/019_replslot_limit      | 10.004
 testrun/recovery/033_replay_tsp_drops    |  9.917
 testrun/recovery/033_replay_tsp_drops    |  9.957
 testrun/pg_basebackup/020_pg_receivewal  |  9.899
 testrun/pg_rewind/003_extrafiles         |  9.961
 testrun/pg_rewind/003_extrafiles         |  9.916
 testrun/pg_rewind/008_min_recovery_point |  9.929
 recovery/019_replslot_limit              | 10.004
 recovery/033_replay_tsp_drops            |  9.917
 recovery/033_replay_tsp_drops            |  9.957
 pg_basebackup/020_pg_receivewal          |  9.899
 pg_rewind/003_extrafiles                 |  9.961
 pg_rewind/003_extrafiles                 |  9.916
 pg_rewind/008_min_recovery_point         |  9.929
(14 rows)

Now that I can see what is going on I'm going to try to see how much I
can squeeze out of this...



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: meson oddities
Next
From: Nathan Bossart
Date:
Subject: Re: predefined role(s) for VACUUM and ANALYZE