> On Wed, Jul 17, 2024 at 11:32:49AM +0200, Anthonin Bonnefoy wrote:
>> Wouldn't it be enough to call pgstat_report_query_id in ExecutorRun
>> and ProcessUtility? With those changes [1], both normal statements and
>> utility statements called through extended protocol will correctly
>> report the query_id.
>
> Interesting, and this position is really tempting. By doing so you
> would force the query ID to be set to the one from the CTAS and
> EXPLAIN, because these would be executed before the inner queries, and
> pgstat_report_query_id() with its non-force option does not overwrite
> what would be already set (aka what should be the top-level query ID).
>
> Using ExecutorRun() feels consistent with the closest thing I've
> touched in this area lately in 1d477a907e63, because that's the only
> code path that we are sure to take depending on the portal execution
> (two execution scenarios depending on how rows are retrieved, as far
> as I recall). The comment should be perhaps more consistent with the
> executor start counterpart. So I would be OK with that.. The
> location before the hook of ProcessUtility is tempting, as it would
> take care of the case of PortalRunMulti(). However.. Joining with a
> point from Sami upthread..
>
> This is still not enough in the case of where we have a holdStore, no?
> This is the case where we would do *one* ExecutorRun(), followed up by
> a scan of the tuplestore in more than one execute message. The v2
> proposed upthread, by positioning a query ID to be set in
> PortalRunSelect(), is still positioning that in two places.
Correct, I also don’t think ExecutorRun is enough. Another reason is we should also
be setting the queryId during bind, right before planning starts.
Planning could have significant impact on the server and I think we better
track the responsible queryId.
I have not tested the holdStore case. IIUC the holdStore deals with fetching a
WITH HOLD CURSOR. Why would this matter for this conversation?
> Hmm... How about being much more aggressive and just do the whole
> business in exec_execute_message(), just before we do the PortalRun()?
> I mean, that's the source of all our problems, and we know the
> statements that the portal will work on so we could go through the
> list, grab the first planned query and set the query ID based on that,
> without caring about the portal patterns we would need to think about.
Doing the work in exec_execute_message makes sense, although maybe
setting the queryId after pgstat_report_activity is better because it occurs earlier.
Also, we should do the same for exec_bind_message and set the queryId
right after pgstat_report_activity in this function as well.
We do have to account for the queryId changing after cache revalidation, so
we should still set the queryId inside GetCachedPlan in the case the query
underwent re-analysis. This means there is a chance that a queryId set at
the start of the exec_bind_message may be different by the time we complete
the function, in the case the query revalidation results in a different queryId.
See the attached v3.
Regards,
Sami