Re: Printing backtrace of postgres processes - Mailing list pgsql-hackers

From vignesh C
Subject Re: Printing backtrace of postgres processes
Date
Msg-id CALDaNm2eiNpx6H3ONum_XWBnF7ma1x=wiszEZak3gzmTgMT_bQ@mail.gmail.com
Whole thread Raw
In response to Re: Printing backtrace of postgres processes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Printing backtrace of postgres processes
List pgsql-hackers
On Tue, Apr 5, 2022 at 9:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 30, 2022 at 12:03 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > On Wed, Mar 30, 2022 at 11:53:52AM -0400, Greg Stark wrote:
> > > Sadly the cfbot is showing a patch conflict again. It's just a trivial
> > > conflict in the regression test schedule so I'm not going to update
> > > the status but it would be good to rebase it so we continue to get
> > > cfbot testing.
> >
> > Done.  No changes.
>
> + if (chk_auxiliary_proc)
> + ereport(WARNING,
> + errmsg("PID %d is not a PostgreSQL server process", pid));
> + else
> + ereport(WARNING,
> + errmsg("PID %d is not a PostgreSQL backend process", pid));
>
> This doesn't look right to me. I don't think that PostgreSQL server
> processes are one kind of thing and PostgreSQL backend processes are
> another kind of thing. I think they're two terms that are pretty
> nearly interchangeable, or maybe at best you want to argue that
> backend processes are some subset of server processes. I don't see
> this sort of thing adding any clarity.

I have changed it to "PID %d is not a PostgreSQL server process" which
is similar to the existing warning message in
pg_log_backend_memory_contexts.

> -static void
> +void
>  set_backtrace(ErrorData *edata, int num_skip)
>  {
>   StringInfoData errtrace;
> @@ -978,7 +978,18 @@ set_backtrace(ErrorData *edata, int num_skip)
>      "backtrace generation is not supported by this installation");
>  #endif
>
> - edata->backtrace = errtrace.data;
> + if (edata)
> + edata->backtrace = errtrace.data;
> + else
> + {
> + /*
> + * LOG_SERVER_ONLY is used to make sure the backtrace is never
> + * sent to client. We want to avoid messing up the other client
> + * session.
> + */
> + elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
> + pfree(errtrace.data);
> + }
>  }
>
> This looks like a grotty hack.

I have changed it so that the backtrace is set and returned to the
caller. The caller will take care of logging or setting it to the
error data context. Thoughts?

> - PGPROC    *proc = BackendPidGetProc(pid);
> -
> - /*
> - * BackendPidGetProc returns NULL if the pid isn't valid; but by the time
> - * we reach kill(), a process for which we get a valid proc here might
> - * have terminated on its own.  There's no way to acquire a lock on an
> - * arbitrary process to prevent that. But since so far all the callers of
> - * this mechanism involve some request for ending the process anyway, that
> - * it might end on its own first is not a problem.
> - *
> - * Note that proc will also be NULL if the pid refers to an auxiliary
> - * process or the postmaster (neither of which can be signaled via
> - * pg_signal_backend()).
> - */
> - if (proc == NULL)
> - {
> - /*
> - * This is just a warning so a loop-through-resultset will not abort
> - * if one backend terminated on its own during the run.
> - */
> - ereport(WARNING,
> - (errmsg("PID %d is not a PostgreSQL backend process", pid)));
> + PGPROC  *proc;
>
> + /* Users can only signal valid backend or an auxiliary process. */
> + proc = CheckPostgresProcessId(pid, false, NULL);
> + if (!proc)
>   return SIGNAL_BACKEND_ERROR;
> - }
>
> Incidentally changing the behavior of pg_signal_backend() doesn't seem
> like a great idea. We can do that as a separate commit, after
> considering whether documentation changes are needed. But it's not
> something that should get folded into a commit on another topic.

Agreed. I have kept the logic of pg_signal_backend as it is.
pg_log_backtrace and pg_log_backend_memory_contexts use the same
functionality to check and send signal. I have added a new function
"CheckProcSendDebugSignal" which has the common code implementation
and will be called by both pg_log_backtrace and
pg_log_backend_memory_contexts.

> + /*
> + * BackendPidGetProc() and AuxiliaryPidGetProc() return NULL if the pid
> + * isn't valid; but by the time we reach kill(), a process for which we
> + * get a valid proc here might have terminated on its own.  There's no way
> + * to acquire a lock on an arbitrary process to prevent that. But since
> + * this mechanism is usually used to debug a backend or an auxiliary
> + * process running and consuming lots of memory, that it might end on its
> + * own first without logging the requested info is not a problem.
> + */
>
> This comment made a lot more sense where it used to be than it does
> where it is now. I think more work is required here than just cutting
> and pasting.

This function was called from pg_signal_backend earlier, this logic is
now moved to CheckProcSendDebugSignal which will be called only by
pg_log_backtrace and pg_log_backend_memory_contexts. This looks
appropriate in CheckProcSendDebugSignal with slight change to specify
it can consume lots of memory or a long running process.
Thoughts?

Attached v20 patch has the changes for the same.

Regards,
Vignesh

Attachment

pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: Defer selection of asynchronous subplans until the executor initialization stage
Next
From: Etsuro Fujita
Date:
Subject: Re: Defer selection of asynchronous subplans until the executor initialization stage