Re: On-demand running query plans using auto_explain and signals - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: On-demand running query plans using auto_explain and signals
Date
Msg-id CACACo5RYzTsFocaZNB_TUvEw=xcP1tyfEzYwE_9PZ-xoLqAx4Q@mail.gmail.com
Whole thread Raw
In response to Re: On-demand running query plans using auto_explain and signals  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Sep 17, 2015 at 2:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 16, 2015 at 10:31 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> I've also decided we really ought to suppress any possible ERROR level
> messages generated during the course of processing the status request in
> order not to prevent the originally running transaction to complete.  The
> errors so caught are just logged using LOG level and ignored in this new
> version of the patch.

This plan has almost zero chance of surviving committer-level
scrutiny, and if it somehow does, some other committer will scream
bloody murder as soon as they notice it.

It's not safe to just catch an error and continue on executing.  You
have to abort the (sub)transaction once an error happens.

Hm... is this really different from WHEN EXCEPTION clause in plpgsql?

Of course, this gets at a pretty crucial design question for this
patch, which is whether it's OK for one process to interrogate another
process's state, and what the consequences of that might be.  What
permissions are needed?

I think superuser or same user is pretty reasonable requirement.  Can be limited to superuser only if there are some specific concerns.

Can you DOS the guy running a query by
pinging him for status at a very high rate?

Probably, but if you've got enough permissions to do that pg_cancel/terminate_backend() would be easier.

The other question here is whether it's really safe to do this.
ProcessInterrupts() gets called at lots of places in the code, and we
may freely add more in the future.  How are you going to prove that
ProcessCmdStatusInfoRequest() is safe to call in all of those places?
How do you know that some data structure you find by chasing down from
ActivePortal or current_query_stack doesn't have a NULL pointer
someplace that you don't expect, because the code that initializes
that pointer hasn't run yet?

I'd like to see this made to work and be safe, but I think proving
that it's truly robust in all cases is going to be hard.

Yes, this is a perfectly good point.  For the purpose of obtaining the explain plans, I believe anything we capture in ExecutorRun ought to be safe to run Explain on: the plannedstmt is there and isn't going to change.

In a more general case, if we seek to extend this approach, then yes one should be very careful in what is allowed to run during ProcessInterrupts().

What we could do instead to be extra-safe, is make the running backend prepare the information for other backends to obtain from the shared memory.  Explain plans can be easily captured at the ExecutorRun hook.  The obvious downside is that you can no longer choose the explain format, or obtain partially collected instrumentation data.

--
Alex

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: On-demand running query plans using auto_explain and signals
Next
From: Jeff Janes
Date:
Subject: Re: creating extension including dependencies