Thread: Question regarding "Make archiver process an auxiliary process. commit"

Question regarding "Make archiver process an auxiliary process. commit"

From
Sravan Kumar
Date:
Hi,

I see that in the archiver code, in the function pgarch_MainLoop,
the archiver sleeps for a certain time or until there's a signal. The time
it sleeps for is represented by:

timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time);
It so happens that last_copy_time and curtime are always set at the same
time which always makes timeout equal (actually roughly equal) to
PGARCH_AUTOWAKE_INTERVAL.

I see that this behaviour was introduced as a part of the commit:
d75288fb27b8fe0a926aaab7d75816f091ecdc27. The discussion thread is:

The change was introduced in v31, with the following comment in the
discussion thread:

- pgarch_MainLoop start the loop with wakened = true when both
notified or timed out. Otherwise time_to_stop is set and exits from
the loop immediately. So the variable wakened is actually
useless. Removed it.

This behaviour was different before the commit: d75288fb27b8fe0a926aaab7d75816f091ecdc27,
in which the archiver keeps track of how much time has elapsed since last_copy_time
in case there was a signal, and it results in a smaller subsequent value of
timeout, until timeout is zero. This also avoids calling pgarch_ArchiverCopyLoop
before PGARCH_AUTOWAKE_INTERVAL in case there's an intermittent signal.

With the current changes it may be okay to always sleep for PGARCH_AUTOWAKE_INTERVAL,
but that means curtime and last_copy_time are no more needed.

I would like to validate if my understanding is correct, and which of the
behaviours we would like to retain.


Thanks & Regards,
Sravan Velagandula
The Enterprise PostgreSQL Company

Re: Question regarding "Make archiver process an auxiliary process. commit"

From
Kyotaro Horiguchi
Date:
At Mon, 5 Dec 2022 12:06:11 +0530, Sravan Kumar <sravanvcybage@gmail.com> wrote in 
> timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time);
> It so happens that last_copy_time and curtime are always set at the same
> time which always makes timeout equal (actually roughly equal) to
> PGARCH_AUTOWAKE_INTERVAL.

Oooo *^^*.

> This behaviour was different before the commit:
> d75288fb27b8fe0a926aaab7d75816f091ecdc27,
> in which the archiver keeps track of how much time has elapsed since
> last_copy_time
> in case there was a signal, and it results in a smaller subsequent value of
> timeout, until timeout is zero. This also avoids calling
> pgarch_ArchiverCopyLoop
> before PGARCH_AUTOWAKE_INTERVAL in case there's an intermittent signal.

Yes, WaitLatch() (I believe) no longer makes a spurious wakeup.

> With the current changes it may be okay to always sleep for
> PGARCH_AUTOWAKE_INTERVAL,
> but that means curtime and last_copy_time are no more needed.

I think you're right.

> I would like to validate if my understanding is correct, and which of the
> behaviours we would like to retain.

As my understanding the patch didn't change the copying behavior of
the function. I think we should simplify the loop by removing
last_copy_time and curtime in the "if (!time_to_stop)" block. Then we
can remove the variable "timeout" and the "if (timeout > 0)"
branch. Are you willing to work on this?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Question regarding "Make archiver process an auxiliary process. commit"

From
Sravan Kumar
Date:
Thank you for the feedback.

I'll be glad to help with the fix. Here's the patch for review.


On Tue, Dec 6, 2022 at 1:54 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Mon, 5 Dec 2022 12:06:11 +0530, Sravan Kumar <sravanvcybage@gmail.com> wrote in
> timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time);
> It so happens that last_copy_time and curtime are always set at the same
> time which always makes timeout equal (actually roughly equal) to
> PGARCH_AUTOWAKE_INTERVAL.

Oooo *^^*.

> This behaviour was different before the commit:
> d75288fb27b8fe0a926aaab7d75816f091ecdc27,
> in which the archiver keeps track of how much time has elapsed since
> last_copy_time
> in case there was a signal, and it results in a smaller subsequent value of
> timeout, until timeout is zero. This also avoids calling
> pgarch_ArchiverCopyLoop
> before PGARCH_AUTOWAKE_INTERVAL in case there's an intermittent signal.

Yes, WaitLatch() (I believe) no longer makes a spurious wakeup.

> With the current changes it may be okay to always sleep for
> PGARCH_AUTOWAKE_INTERVAL,
> but that means curtime and last_copy_time are no more needed.

I think you're right.

> I would like to validate if my understanding is correct, and which of the
> behaviours we would like to retain.

As my understanding the patch didn't change the copying behavior of
the function. I think we should simplify the loop by removing
last_copy_time and curtime in the "if (!time_to_stop)" block. Then we
can remove the variable "timeout" and the "if (timeout > 0)"
branch. Are you willing to work on this?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


--
Thanks And Regards,
Sravan

Take life one day at a time.
Attachment

Re: Question regarding "Make archiver process an auxiliary process. commit"

From
Bharath Rupireddy
Date:
On Tue, Dec 6, 2022 at 4:57 PM Sravan Kumar <sravanvcybage@gmail.com> wrote:
>
> Thank you for the feedback.
>
> I'll be glad to help with the fix. Here's the patch for review.

Thanks. +1 for fixing this.

I would like to quote recent discussions on reducing the useless
wakeups or increasing the sleep/hibernation times in various processes
to reduce the power savings [1] [2] [3] [4] [5]. With that in context,
does the archiver need to wake up every 60 sec at all when its latch
gets set (PgArchWakeup()) whenever the server switches to a new WAL
file? What happens if we get rid of PGARCH_AUTOWAKE_INTERVAL and rely
on its latch being set? If required, we can spread PgArchWakeup() to
more places, no?

Before even answering the above questions, I think we need to see if
there're any cases where a process can miss SetLatch() calls (I don't
have an answer for that).

Or do we want to stick to what the below comment says?

    /*
     * There shouldn't be anything for the archiver to do except to wait for a
     * signal ... however, the archiver exists to protect our data, so she
     * wakes up occasionally to allow herself to be proactive.
     */

[1] https://www.postgresql.org/message-id/CA%2BhUKGJCbcv8AtujLw3kEO2wRB7Ffzo1fmwaGG-tQLuMOjf6qQ%40mail.gmail.com
[2]
commit cd4329d9393f84dce34f0bd2dd936adc8ffaa213
Author: Thomas Munro <tmunro@postgresql.org>
Date:   Tue Nov 29 11:28:08 2022 +1300

    Remove promote_trigger_file.

    Previously, an idle startup (recovery) process would wake up every 5
    seconds to have a chance to poll for promote_trigger_file, even if that
    GUC was not configured.  That promotion triggering mechanism was
    effectively superseded by pg_ctl promote and pg_promote() a long time
    ago.  There probably aren't many users left and it's very easy to change
    to the modern mechanisms, so we agreed to remove the feature.

    This is part of a campaign to reduce wakeups on idle systems.

[3] https://commitfest.postgresql.org/41/4035/
[4] https://commitfest.postgresql.org/41/4020/
[5]
commit 05a7be93558c614ab89c794cb1d301ea9ff33199
Author: Thomas Munro <tmunro@postgresql.org>
Date:   Tue Nov 8 20:36:36 2022 +1300

    Suppress useless wakeups in walreceiver.

    Instead of waking up 10 times per second to check for various timeout
    conditions, keep track of when we next have periodic work to do.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Question regarding "Make archiver process an auxiliary process. commit"

From
Sravan Kumar
Date:


On Tue, Dec 6, 2022 at 5:24 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Dec 6, 2022 at 4:57 PM Sravan Kumar <sravanvcybage@gmail.com> wrote:
>
> Thank you for the feedback.
>
> I'll be glad to help with the fix. Here's the patch for review.

Thanks. +1 for fixing this.
I would like to quote recent discussions on reducing the useless
wakeups or increasing the sleep/hibernation times in various processes
to reduce the power savings [1] [2] [3] [4] [5]. With that in context,
does the archiver need to wake up every 60 sec at all when its latch
gets set (PgArchWakeup()) whenever the server switches to a new WAL
file? What happens if we get rid of PGARCH_AUTOWAKE_INTERVAL and rely
on its latch being set? If required, we can spread PgArchWakeup() to
more places, no?
 
I like the idea of not having to wake up intermittently and probably we should start a discussion about it.

I see the following comment in PgArchWakeup().

 /*
* We don't acquire ProcArrayLock here.  It's actually fine because
* procLatch isn't ever freed, so we just can potentially set the wrong
* process' (or no process') latch.  Even in that case the archiver will
* be relaunched shortly and will start archiving.
*/

While I do not fully understand the comment, it gives me an impression that
the SetLatch() done here is counting on the timeout to wake up the archiver
in some cases where the latch is wrongly set.

The proposed idea is a behaviour change while this thread intends to clean up some code that's
a result of the mentioned commit. So probably the proposed idea can be discussed as a separate thread.


Before even answering the above questions, I think we need to see if
there're any cases where a process can miss SetLatch() calls (I don't
have an answer for that).

Or do we want to stick to what the below comment says?

    /*
     * There shouldn't be anything for the archiver to do except to wait for a
     * signal ... however, the archiver exists to protect our data, so she
     * wakes up occasionally to allow herself to be proactive.
     */

[1] https://www.postgresql.org/message-id/CA%2BhUKGJCbcv8AtujLw3kEO2wRB7Ffzo1fmwaGG-tQLuMOjf6qQ%40mail.gmail.com
[2]
commit cd4329d9393f84dce34f0bd2dd936adc8ffaa213
Author: Thomas Munro <tmunro@postgresql.org>
Date:   Tue Nov 29 11:28:08 2022 +1300

    Remove promote_trigger_file.

    Previously, an idle startup (recovery) process would wake up every 5
    seconds to have a chance to poll for promote_trigger_file, even if that
    GUC was not configured.  That promotion triggering mechanism was
    effectively superseded by pg_ctl promote and pg_promote() a long time
    ago.  There probably aren't many users left and it's very easy to change
    to the modern mechanisms, so we agreed to remove the feature.

    This is part of a campaign to reduce wakeups on idle systems.

[3] https://commitfest.postgresql.org/41/4035/
[4] https://commitfest.postgresql.org/41/4020/
[5]
commit 05a7be93558c614ab89c794cb1d301ea9ff33199
Author: Thomas Munro <tmunro@postgresql.org>
Date:   Tue Nov 8 20:36:36 2022 +1300

    Suppress useless wakeups in walreceiver.

    Instead of waking up 10 times per second to check for various timeout
    conditions, keep track of when we next have periodic work to do.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


--
Thanks And Regards,
Sravan

Take life one day at a time.

Re: Question regarding "Make archiver process an auxiliary process. commit"

From
Kyotaro Horiguchi
Date:
At Tue, 6 Dec 2022 17:23:50 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> Thanks. +1 for fixing this.
> 
> I would like to quote recent discussions on reducing the useless
> wakeups or increasing the sleep/hibernation times in various processes
> to reduce the power savings [1] [2] [3] [4] [5]. With that in context,
> does the archiver need to wake up every 60 sec at all when its latch
> gets set (PgArchWakeup()) whenever the server switches to a new WAL
> file? What happens if we get rid of PGARCH_AUTOWAKE_INTERVAL and rely
> on its latch being set? If required, we can spread PgArchWakeup() to
> more places, no?

I thought so first, but archiving may be interrupted for various
reasons (disk full I think is the most common one). So, only
intentional wakeups aren't sufficient.

> Before even answering the above questions, I think we need to see if
> there're any cases where a process can miss SetLatch() calls (I don't
> have an answer for that).

I read a recent Thomas' mail that says something like "should we
consider the case latch sets are missed?".  It is triggered by SIGURG
or SetEvent().  I'm not sure but I believe the former is now reliable
and the latter was born reliable.

> Or do we want to stick to what the below comment says?
> 
>     /*
>      * There shouldn't be anything for the archiver to do except to wait for a
>      * signal ... however, the archiver exists to protect our data, so she
>      * wakes up occasionally to allow herself to be proactive.
>      */

So I think this is still valid.  If we want to eliminate useless
wakeups, archiver needs to remember whether the last iteration was
fully done or not. But it seems to be a race condition is involved.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Question regarding "Make archiver process an auxiliary process. commit"

From
Kyotaro Horiguchi
Date:
At Wed, 7 Dec 2022 11:28:23 +0530, Sravan Kumar <sravanvcybage@gmail.com> wrote in 
> On Tue, Dec 6, 2022 at 5:24 PM Bharath Rupireddy <
> bharath.rupireddyforpostgres@gmail.com> wrote:
> 
> > On Tue, Dec 6, 2022 at 4:57 PM Sravan Kumar <sravanvcybage@gmail.com>
> > wrote:
> > >
> > > Thank you for the feedback.
> > >
> > > I'll be glad to help with the fix. Here's the patch for review.
> >
> > Thanks. +1 for fixing this.
> >> I would like to quote recent discussions on reducing the useless
> >> wakeups or increasing the sleep/hibernation times in various processes
> >> to reduce the power savings [1] [2] [3] [4] [5]. With that in context,
> >> does the archiver need to wake up every 60 sec at all when its latch
> >> gets set (PgArchWakeup()) whenever the server switches to a new WAL
> >> file? What happens if we get rid of PGARCH_AUTOWAKE_INTERVAL and rely
> >> on its latch being set? If required, we can spread PgArchWakeup() to
> >> more places, no?
> >
> >
> I like the idea of not having to wake up intermittently and probably we
> should start a discussion about it.
> 
> I see the following comment in PgArchWakeup().
> 
>  /*
> * We don't acquire ProcArrayLock here.  It's actually fine because
> * procLatch isn't ever freed, so we just can potentially set the wrong
> * process' (or no process') latch.  Even in that case the archiver will
> * be relaunched shortly and will start archiving.
> */
> 
> While I do not fully understand the comment, it gives me an impression that
> the SetLatch() done here is counting on the timeout to wake up the archiver
> in some cases where the latch is wrongly set.

It is telling about the first iteration of archive process, not
periodical wakeups.  So I think it is doable by considering how to
handle incomplete archiving iterations.

> The proposed idea is a behaviour change while this thread intends to clean
> up some code that's
> a result of the mentioned commit. So probably the proposed idea can be
> discussed as a separate thread.

Agreed.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Question regarding "Make archiver process an auxiliary process. commit"

From
Sravan Kumar
Date:
I have added the thread to the commitfest: https://commitfest.postgresql.org/42/
Did you get a chance to review the patch? Please let me know if you
need anything from my end.

Thanks & Regards,
Sravan Velagandula
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

On Wed, Dec 7, 2022 at 11:49 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 7 Dec 2022 11:28:23 +0530, Sravan Kumar <sravanvcybage@gmail.com> wrote in
> > On Tue, Dec 6, 2022 at 5:24 PM Bharath Rupireddy <
> > bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > On Tue, Dec 6, 2022 at 4:57 PM Sravan Kumar <sravanvcybage@gmail.com>
> > > wrote:
> > > >
> > > > Thank you for the feedback.
> > > >
> > > > I'll be glad to help with the fix. Here's the patch for review.
> > >
> > > Thanks. +1 for fixing this.
> > >> I would like to quote recent discussions on reducing the useless
> > >> wakeups or increasing the sleep/hibernation times in various processes
> > >> to reduce the power savings [1] [2] [3] [4] [5]. With that in context,
> > >> does the archiver need to wake up every 60 sec at all when its latch
> > >> gets set (PgArchWakeup()) whenever the server switches to a new WAL
> > >> file? What happens if we get rid of PGARCH_AUTOWAKE_INTERVAL and rely
> > >> on its latch being set? If required, we can spread PgArchWakeup() to
> > >> more places, no?
> > >
> > >
> > I like the idea of not having to wake up intermittently and probably we
> > should start a discussion about it.
> >
> > I see the following comment in PgArchWakeup().
> >
> >  /*
> > * We don't acquire ProcArrayLock here.  It's actually fine because
> > * procLatch isn't ever freed, so we just can potentially set the wrong
> > * process' (or no process') latch.  Even in that case the archiver will
> > * be relaunched shortly and will start archiving.
> > */
> >
> > While I do not fully understand the comment, it gives me an impression that
> > the SetLatch() done here is counting on the timeout to wake up the archiver
> > in some cases where the latch is wrongly set.
>
> It is telling about the first iteration of archive process, not
> periodical wakeups.  So I think it is doable by considering how to
> handle incomplete archiving iterations.
>
> > The proposed idea is a behaviour change while this thread intends to clean
> > up some code that's
> > a result of the mentioned commit. So probably the proposed idea can be
> > discussed as a separate thread.
>
> Agreed.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center



--
Thanks & Regards,
Sravan Velagandula
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Question regarding "Make archiver process an auxiliary process. commit"

From
Nathan Bossart
Date:
On Wed, Jan 04, 2023 at 11:35:33AM +0530, Sravan Kumar wrote:
> I have added the thread to the commitfest: https://commitfest.postgresql.org/42/
> Did you get a chance to review the patch? Please let me know if you
> need anything from my end.

This seems like worthwhile simplification to me.  Ultimately, your patch
shouldn't result in any sort of signficant behavior change, and I don't see
any reason to further complicate the timeout calculation.  The copy loop
will run any time the archiver's latch is set, and it'll wait up to 60
seconds otherwise.  As discussed upthread, it might be possible to remove
the timeout completely, but that probably deserves its own thread.

I noticed that time.h is no longer needed by the archiver, so I removed
that and fixed an indentation nitpick in the attached v2.  I'm going to set
the commitfest entry to ready-for-committer shortly after sending this
message.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Question regarding "Make archiver process an auxiliary process. commit"

From
Nathan Bossart
Date:
On Fri, Jan 20, 2023 at 11:39:56AM -0800, Nathan Bossart wrote:
> I noticed that time.h is no longer needed by the archiver, so I removed
> that and fixed an indentation nitpick in the attached v2.  I'm going to set
> the commitfest entry to ready-for-committer shortly after sending this
> message.

I'm not sure why I thought time.h was no longer needed.  time() is clearly
used elsewhere in this file.  Here's a new version with that added back.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Question regarding "Make archiver process an auxiliary process. commit"

From
Michael Paquier
Date:
On Tue, Jan 31, 2023 at 08:30:13PM -0800, Nathan Bossart wrote:
> I'm not sure why I thought time.h was no longer needed.  time() is clearly
> used elsewhere in this file.  Here's a new version with that added back.

Ah, I see.  The key point is that curtime and last_copy_time will most
likely be the same value as time() is second-based, so timeout is
basically always PGARCH_AUTOWAKE_INTERVAL.  There is no need to care
about time_to_stop, as we just go through and exit if it happens to be
switched to true.  Applied v3, keeping time_to_stop as it is in v2 and
v3 so as we don't loop again on a postmaster death.
--
Michael

Attachment