Re: Printing backtrace of postgres processes - Mailing list pgsql-hackers

From vignesh C
Subject Re: Printing backtrace of postgres processes
Date
Msg-id CALDaNm3aBt1hw+Q=achUxfQgyu3b6DmdxewvMVx5DEikNayAFA@mail.gmail.com
Whole thread Raw
In response to Re: Printing backtrace of postgres processes  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Thanks Bharath for your comments.

On Mon, Feb 1, 2021 at 6:14 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Jan 29, 2021 at 7:10 PM vignesh C <vignesh21@gmail.com> wrote:
> > > 4) How about following
> > > +                     errmsg("must be a superuser to print backtrace
> > > of backend process")));
> > > instead of
> > > +                     errmsg("must be a superuser to print backtrace
> > > of superuser query process")));
> > >
> >
> > Here the message should include superuser, we cannot remove it. Non
> > super user can log non super user provided if user has permissions for
> > it.
> >
> > > 5) How about following
> > >                      errmsg("must be a member of the role whose backed
> > > process's backtrace is being printed or member of
> > > pg_signal_backend")));
> > > instead of
> > > +                     errmsg("must be a member of the role whose
> > > backtrace is being logged or member of pg_signal_backend")));
> > >
> >
> > Modified it.
>
> Maybe I'm confused here to understand the difference between
> SIGNAL_BACKEND_NOSUPERUSER and SIGNAL_BACKEND_NOPERMISSION macros and
> corresponding error messages. Some clarification/use case to know in
> which scenarios we hit those error messages might help me. Did we try
> to add test cases that show up these error messages for
> pg_print_backtrace? If not, can we add?

I have tested this manually:

I have tested it manually, Here is the test I did:
Create 2 users:
create user test password 'test@123';
create user test1 password 'test@123';

Test1: Test print backtrace of a different user's session:
./psql -d postgres -U test
psql (14devel)
Type "help" for help.
postgres=> select pg_backend_pid();
 pg_backend_pid
----------------
          30142
(1 row)
------------------------------------------
./psql -d postgres -U test1
psql (14devel)
Type "help" for help.
postgres=> select pg_print_backtrace(30142);
ERROR:  must be a member of the role whose backtrace is being logged
or member of pg_signal_backend

The above will be successful after:
grant pg_signal_backend to test1;

Test1: Test for non super user trying to print backtrace of a super
user's session:
./psql -d postgres
psql (14devel)
Type "help" for help.
postgres=# select pg_backend_pid();
 pg_backend_pid
----------------
          30211
(1 row)

./psql -d postgres -U test1
psql (14devel)
Type "help" for help.
postgres=> select pg_print_backtrace(30211);
ERROR:  must be a superuser to print backtrace of superuser process
I have not added any tests for this as we required 2 active sessions
and I did not see any existing framework for this.
This test should help in relating the behavior.

> > Attached v5 patch has the fixes for the same.
> > Thoughts?
>
> Thanks. Here are some comments on v5 patch:
>
> 1) typo - it's "process" not "porcess" +    a backend porcess. For example:
>

Modified.

> 2) select * from pg_print_backtrace(NULL);
> postgres=# select proname, proisstrict from pg_proc where proname =
> 'pg_print_backtrace';
>       proname       | proisstrict
> --------------------+-------------
>  pg_print_backtrace | t
>
>  See the documentation:
>  "proisstrict bool
>
> Function returns null if any call argument is null. In that case the
> function won't actually be called at all. Functions that are not
> “strict” must be prepared to handle null inputs."
> So below PG_ARGISNUL check is not needed, you can remove that, because
> pg_print_backtrace will not get called with null input.
> int            bt_pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
>

Modified.

> 3) Can we just set results = true instead of PG_RETURN_BOOL(true); so
> that it will be returned from PG_RETURN_BOOL(result); just for
> consistency?
>             if (!SendProcSignal(bt_pid, PROCSIG_BACKTRACE_PRINT,
> InvalidBackendId))
>                 PG_RETURN_BOOL(true);
>             else
>                 ereport(WARNING,
>                         (errmsg("failed to send signal to postmaster: %m")));
>         }
>
>         PG_RETURN_BOOL(result);
>

I felt existing is better as it will reduce one instruction of setting
first and then returning. There are only 2 places from where we
return. I felt we could directly return true or false.

> 4) Below is what happens if I request for a backtrace of the
> postmaster process? 1388210 is pid of postmaster.
> postgres=# select * from pg_print_backtrace(1388210);
> WARNING:  PID 1388210 is not a PostgreSQL server process
>  pg_print_backtrace
> --------------------
>  f
>
> Does it make sense to have a postmaster's backtrace? If yes, can we
> have something like below in sigusr1_handler()?
>     if (CheckPostmasterSignal(PMSIGNAL_BACKTRACE_EMIT))
>     {
>         LogBackTrace();
>     }
>

We had a discussion about this in [1]  earlier and felt including this
is not very useful.

> 5) Can we have PROCSIG_PRINT_BACKTRACE instead of
> PROCSIG_BACKTRACE_PRINT, just for readability and consistency with
> function name?
>

PROCSIG_PRINT_BACKTRACE is better, I have modified it.

> 6) I think it's not the postmaster that prints backtrace of the
> requested backend and we don't send SIGUSR1 with
> PMSIGNAL_BACKTRACE_EMIT to postmaster, I think it's the backends, upon
> receiving SIGUSR1 with PMSIGNAL_BACKTRACE_EMIT signal, log their own
> backtrace. Am I missing anything here? If I'm correct, we need to
> change the below description and other places wherever we refer to
> this description.
>
> The idea here is to implement & expose pg_print_backtrace function, internally
> what this function does is, the connected backend will send SIGUSR1 signal by
> setting PMSIGNAL_BACKTRACE_EMIT to the postmaster process. Once the process
> receives this signal it will log the backtrace of the process.
>

Modified.

> 7) Can we get the parallel worker's backtrace? IIUC it's possible.
>

Yes we can get the parallel worker's backtrace. As this design is
driven based on CHECK_FOR_INTERRUPTS, it will be printed when
CHECK_FOR_INTERRUPTS is called.

> 8) What happens if a user runs pg_print_backtrace() on Windows or
> MacOS or some other platform? Do you want to say something about other
> platforms where gdb/addr2line doesn't exist?
> +</programlisting>
> +    You can get the file name and line number by using gdb/addr2line in
> +    linux platforms, as a prerequisite users must ensure gdb/addr2line is
> +    already installed:
> +<programlisting>
>

I have added the following:
This feature will be available, if the installer was generated on a
platform which had backtrace capturing capability. If not available it
will return false by throwing the following warning "WARNING:
backtrace generation is not supported by this installation"

> 9) Can't we reuse set_backtrace with just adding a flag to
> set_backtrace(ErrorData *edata, int num_skip, bool
> is_print_backtrace_function), making it a non-static function and call
> set_backtrace(NULL, 0, true)?
>
> void
> set_backtrace(ErrorData *edata, int num_skip, bool is_print_backtrace_function)
> {
>     StringInfoData errtrace;
> -------
> -------
>     if (is_print_backtrace_function)
>         elog(LOG_SERVER_ONLY, "current backtrace:%s", errtrace.data);
>     else
>         edata->backtrace = errtrace.data;
> }
>
> I think it will be good if we do this, because we can avoid duplicate
> code in set_backtrace and LogBackTrace.
>

Yes it is better, I have modified it to use the same function.

> 10) I think it's "pg_signal_backend" instead of "pg_print_backtrace"?
> +        backtrace is being printed or the calling role has been granted
> +        <literal>pg_print_backtrace</literal>, however only superusers can
>

Modified.

> 11) In the documentation added, isn't it good if we talk a bit about
> in which scenarios users can use this function? For instance,
> something like "Use pg_print_backtrace to  know exactly where it's
> currently waiting when a backend process hangs."?
>

Modified to include more details.

[1] - https:://www.postgresql.org/message-id/1778088.1606026308%40sss.pgh.pa.us
Attached v6 patch with the fixes.
Thoughts?

Regards,
Vignesh

Attachment

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Can we have a new SQL callable function to get Postmaster PID?
Next
From: David Rowley
Date:
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans