Re: nested queries vs. pg_stat_activity - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: nested queries vs. pg_stat_activity
Date
Msg-id CABUevEzzwefAfJbJKvwYfJ73gp9SJd-5WAF-oNiVuY0+UjW=Qw@mail.gmail.com
Whole thread Raw
In response to nested queries vs. pg_stat_activity  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Mon, Aug 10, 2020 at 4:37 PM Robert Haas <robertmhaas@gmail.com> wrote:
Is there a way that we can show information about nested queries in
pg_stat_activity? It's often inconvenient for users when somebody's
executing a function and it doesn't seem to be finishing as quickly as
anticipated. You can't really tell where in that function things broke
down. There are a couple of possible ways to attack this problem, but
the one that I like best is to just try to advertise the query text of
all the nested queries that are in progress rather than only the
top-level query. This runs up against the problem that we have only a
fixed-length buffer with which to work, but that doesn't seem like a
huge problem: if the outer query fills the buffer, nested queries
won't be advertised. If not, the first level of nested query can be
advertised using the space remaining. If there's still space left,
this can be repeated for multiple levels of nested queries until we
run out of bytes. This requires some way of separating one query
string from the next, but that seems like a pretty solvable problem.
It also requires figuring out how this would show up in the output of
pg_stat_activity, which I'm not quite sure about. And it might have
performance issues too, for some use cases, but it could be an
optional feature, so that people who don't want to pay the cost of
updating the pg_stat_activity information more frequently do not need
to do so.

I'm curious to hear if other people agree that this is a problem, what
they think about the above ideas for improving things, and if they've
got any other suggestions.

This sounds very similar to the just-raised  problem of multiple semicolon-separated queries (see https://www.postgresql.org/message-id/030a4123-550a-9dc1-d326-3cd5c46bcc59%40amazon.com). They should definitely be considered in the same context, so we don't end up creating two incompatible ideas.

Another idea around this, which I haven't really thought through, but figured I'd throw out anyway. It doesn't have to be in pg_stat_activity, if we can access it. E.g. we could have something like "SELECT * FROM pg_querystack(<backend pid>)". In fact, I've also *often* wanted something like "SELECT * FROM pg_queryhistory(<backend pid>)" to see the last 2-3 things it did *before* reaching the current point, as a way of identifying where in an application this happened. That need has led to really ugly hacks like https://github.com/mhagander/pg_commandhistory.

This is not information you'd need all that often, so I think it'd be perfectly reasonable to say that you pay a higher price when you get it, if we can keep down the cost of keeping it updated. Of course, this reduces it down to "how can we get it". If each backend keeps this information locally, we could send it a signal to say something like "dump what you have now into shared memory over here" and read it from there -- which would be cleaner than my hack which dumps it to the log.

I'm sure I'm missing many things in that, but as a wild idea :)

--

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks
Next
From: "Drouvot, Bertrand"
Date:
Subject: Re: Add information to rm_redo_error_callback()