Thread: Isn't it better with "autovacuum worker...." instead of "worker took too long to start; canceled" specific to "auto

Hi,

Is there a specific reason that we have a generic WARNING "worker took
too long to start; canceled" for an autovacuum worker? Isn't it better
with "autovacuum worker took too long to start; canceled"? It is
confusing to see the generic message in the server logs while
debugging an issue for a user who doesn't know the internals of
autovacuum code.

To be more informative about the message, how about the following:
1) ereport(WARNING,
                    (errmsg( "worker took too long to start"),
                     errdetail("Previous attempt to start autovacuum
worker was failed, canceled.")));
or
2) ereport(WARNING,
                    (errmsg( "worker took too long to start, canceled"),
                     errdetail("The postmaster couldn't start an
autovacuum worker.")));
or
3) ereport(WARNING,
                    (errmsg( "worker took too long to start, canceled"),
                     errdetail("Previous attempt to start autovacuum
worker was failed.")));
or
4) elog(WARNING, "postmaster couldn't start an autovacuum worker");

Thoughts?

Regards,
Bharath Rupireddy.



On 10/27/21, 9:29 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> Is there a specific reason that we have a generic WARNING "worker took
> too long to start; canceled" for an autovacuum worker? Isn't it better
> with "autovacuum worker took too long to start; canceled"? It is
> confusing to see the generic message in the server logs while
> debugging an issue for a user who doesn't know the internals of
> autovacuum code.

It looks like it has been this way for a while [0].  I don't know if
I've ever seen this message before, and from the comments near it, it
sounds like it is expected to rarely happen.

> To be more informative about the message, how about the following:

My vote is to just change it to

        ereport(WARNING,
                (errmsg("autovacuum worker took too long to start; canceled")));

and call it a day.  If we wanted to add errdetail(), I think we should
make sure it is providing useful context, but I'm not sure what that
might look like.

Nathan

[0] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bae0b56


At Wed, 27 Oct 2021 14:26:11 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in 
> On Wed, Oct 27, 2021 at 07:05:10PM +0000, Bossart, Nathan wrote:
> > My vote is to just change it to
> > 
> >         ereport(WARNING,
> >                 (errmsg("autovacuum worker took too long to start; canceled")));
> > 
> > and call it a day.  If we wanted to add errdetail(), I think we should
> > make sure it is providing useful context, but I'm not sure what that
> > might look like.
> 
> I think that's fine.

+1

> Note that the backend_type is illuminating for those who use CSV logs, or use
> P13+ and log_line_prefix += %b (see 70a7b4776).
> 
> session_line           | 1
> error_severity         | WARNING
> message                | worker took too long to start; canceled
> backend_type           | autovacuum launcher

Yeah, the additional "autovacuum" is not noisy at all even in that
context. Some other messages are prefixed with "autovacuum".

   "could not fork autovacuum worker process: %m"
   "autovacuum worker started without a worker entry"

By a quick look all occurances of "laucher" are prefixed with
"autovacuum" or "logical replcaion", which seems fine.

As a related topic, autovacuum.c has another use of bare "worker"s.

>    tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
>                                   "Start worker tmp cxt",
>                                   ALLOCSET_DEFAULT_SIZES);

>    AutovacMemCxt = AllocSetContextCreate(TopMemoryContext,
>                                          "AV worker",
>                                          ALLOCSET_DEFAULT_SIZES);

I'm not sure the former needs to be fixed, but the latter is actually
visible to users via pg_log_backend_memory_contexts().

LOG:  level: 1; AV worker: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Thu, Oct 28, 2021 at 7:11 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 27 Oct 2021 14:26:11 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in
> > On Wed, Oct 27, 2021 at 07:05:10PM +0000, Bossart, Nathan wrote:
> > > My vote is to just change it to
> > >
> > >         ereport(WARNING,
> > >                 (errmsg("autovacuum worker took too long to start; canceled")));
> > >
> > > and call it a day.  If we wanted to add errdetail(), I think we should
> > > make sure it is providing useful context, but I'm not sure what that
> > > might look like.
> >
> > I think that's fine.
>
> +1

Done.

> > Note that the backend_type is illuminating for those who use CSV logs, or use
> > P13+ and log_line_prefix += %b (see 70a7b4776).
> >
> > session_line           | 1
> > error_severity         | WARNING
> > message                | worker took too long to start; canceled
> > backend_type           | autovacuum launcher
>
> Yeah, the additional "autovacuum" is not noisy at all even in that
> context. Some other messages are prefixed with "autovacuum".
>
>    "could not fork autovacuum worker process: %m"
>    "autovacuum worker started without a worker entry"
>
> By a quick look all occurances of "laucher" are prefixed with
> "autovacuum" or "logical replcaion", which seems fine.
>
> As a related topic, autovacuum.c has another use of bare "worker"s.
>
> >       tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
> >                                                                  "Start worker tmp cxt",
> >                                                                  ALLOCSET_DEFAULT_SIZES);
>
> >       AutovacMemCxt = AllocSetContextCreate(TopMemoryContext,
> >                                                                                 "AV worker",
> >                                                                                 ALLOCSET_DEFAULT_SIZES);
>
> I'm not sure the former needs to be fixed, but the latter is actually
> visible to users via pg_log_backend_memory_contexts().
>
> LOG:  level: 1; AV worker: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used

Good catch. I've seen the use of "AV" in some of the mem context
names, why that? Let's be specific and say "Autovacuum". Attached
patch does that. Please review it.

Regards,
Bharath Rupireddy.

Attachment
On Thu, Oct 28, 2021 at 8:14 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > LOG:  level: 1; AV worker: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
>
> Good catch. I've seen the use of "AV" in some of the mem context
> names, why that? Let's be specific and say "Autovacuum". Attached
> patch does that. Please review it.

+1, the error message and other improvements look good.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Thu, Oct 28, 2021 at 11:44 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Oct 28, 2021 at 7:11 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Wed, 27 Oct 2021 14:26:11 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in
> > > On Wed, Oct 27, 2021 at 07:05:10PM +0000, Bossart, Nathan wrote:
> > > > My vote is to just change it to
> > > >
> > > >         ereport(WARNING,
> > > >                 (errmsg("autovacuum worker took too long to start; canceled")));
> > > >
> > > > and call it a day.  If we wanted to add errdetail(), I think we should
> > > > make sure it is providing useful context, but I'm not sure what that
> > > > might look like.
> > >
> > > I think that's fine.
> >
> > +1
>
> Done.
>
> > > Note that the backend_type is illuminating for those who use CSV logs, or use
> > > P13+ and log_line_prefix += %b (see 70a7b4776).
> > >
> > > session_line           | 1
> > > error_severity         | WARNING
> > > message                | worker took too long to start; canceled
> > > backend_type           | autovacuum launcher
> >
> > Yeah, the additional "autovacuum" is not noisy at all even in that
> > context. Some other messages are prefixed with "autovacuum".
> >
> >    "could not fork autovacuum worker process: %m"
> >    "autovacuum worker started without a worker entry"
> >
> > By a quick look all occurances of "laucher" are prefixed with
> > "autovacuum" or "logical replcaion", which seems fine.
> >
> > As a related topic, autovacuum.c has another use of bare "worker"s.
> >
> > >       tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
> > >                                                                  "Start worker tmp cxt",
> > >                                                                  ALLOCSET_DEFAULT_SIZES);
> >
> > >       AutovacMemCxt = AllocSetContextCreate(TopMemoryContext,
> > >                                                                                 "AV worker",
> > >                                                                                 ALLOCSET_DEFAULT_SIZES);
> >
> > I'm not sure the former needs to be fixed, but the latter is actually
> > visible to users via pg_log_backend_memory_contexts().
> >
> > LOG:  level: 1; AV worker: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
>
> Good catch. I've seen the use of "AV" in some of the mem context
> names, why that? Let's be specific and say "Autovacuum". Attached
> patch does that. Please review it.

+1. The patch looks good to me too.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Thu, Oct 28, 2021 at 12:41 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Oct 28, 2021 at 11:44 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Oct 28, 2021 at 7:11 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > At Wed, 27 Oct 2021 14:26:11 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in
> > > > On Wed, Oct 27, 2021 at 07:05:10PM +0000, Bossart, Nathan wrote:
> > > > > My vote is to just change it to
> > > > >
> > > > >         ereport(WARNING,
> > > > >                 (errmsg("autovacuum worker took too long to start; canceled")));
> > > > >
> > > > > and call it a day.  If we wanted to add errdetail(), I think we should
> > > > > make sure it is providing useful context, but I'm not sure what that
> > > > > might look like.
> > > >
> > > > I think that's fine.
> > >
> > > +1
> >
> > Done.
> >
> > > > Note that the backend_type is illuminating for those who use CSV logs, or use
> > > > P13+ and log_line_prefix += %b (see 70a7b4776).
> > > >
> > > > session_line           | 1
> > > > error_severity         | WARNING
> > > > message                | worker took too long to start; canceled
> > > > backend_type           | autovacuum launcher
> > >
> > > Yeah, the additional "autovacuum" is not noisy at all even in that
> > > context. Some other messages are prefixed with "autovacuum".
> > >
> > >    "could not fork autovacuum worker process: %m"
> > >    "autovacuum worker started without a worker entry"
> > >
> > > By a quick look all occurances of "laucher" are prefixed with
> > > "autovacuum" or "logical replcaion", which seems fine.
> > >
> > > As a related topic, autovacuum.c has another use of bare "worker"s.
> > >
> > > >       tmpcxt = AllocSetContextCreate(CurrentMemoryContext,
> > > >                                                                  "Start worker tmp cxt",
> > > >                                                                  ALLOCSET_DEFAULT_SIZES);
> > >
> > > >       AutovacMemCxt = AllocSetContextCreate(TopMemoryContext,
> > > >                                                                                 "AV worker",
> > > >                                                                                 ALLOCSET_DEFAULT_SIZES);
> > >
> > > I'm not sure the former needs to be fixed, but the latter is actually
> > > visible to users via pg_log_backend_memory_contexts().
> > >
> > > LOG:  level: 1; AV worker: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
> >
> > Good catch. I've seen the use of "AV" in some of the mem context
> > names, why that? Let's be specific and say "Autovacuum". Attached
> > patch does that. Please review it.
>
> +1. The patch looks good to me too.

Thanks all for reviewing this. Here's the CF entry -
https://commitfest.postgresql.org/35/3378/

Regards,
Bharath Rupireddy.



On 10/28/21, 5:42 AM, "Bharath Rupireddy" <bharath.rupireddyforpostgres@gmail.com> wrote:
> Thanks all for reviewing this. Here's the CF entry -
> https://commitfest.postgresql.org/35/3378/

I've marked this one as ready-for-committer.

Nathan


On 2021-Oct-28, Bharath Rupireddy wrote:

> Thanks all for reviewing this. Here's the CF entry -
> https://commitfest.postgresql.org/35/3378/

Thanks, pushed.  I changed a couple of things though -- notably changed
the elog() to ereport() as suggested by Nathan early on, but never
materialized in the submitted patch.  I also changed the wording of the
context names, since the proposed ones weren't much more satisfactory
than the existing ones.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Every machine is a smoke machine if you operate it wrong enough."
https://twitter.com/libseybieda/status/1541673325781196801