Re: Planning counters in pg_stat_statements (using pgss_store) - Mailing list pgsql-hackers

From Marco Slot
Subject Re: Planning counters in pg_stat_statements (using pgss_store)
Date
Msg-id CAFMSG9HJQr=H8doWJOp=wqyKbVqxMLkk_Qu2KfpmkKvS-Xn7qQ@mail.gmail.com
Whole thread Raw
In response to Re: Planning counters in pg_stat_statements (using pgss_store)  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: Planning counters in pg_stat_statements (using pgss_store)  (Julien Rouhaud <rjuju123@gmail.com>)
List pgsql-hackers
On Thu, Mar 12, 2020 at 11:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> There's at least the current version of IVM patchset that lacks the
> querytext.  Looking at various extensions, I see that pg_background
> and pglogical call pg_plan_query internally but shouldn't have any
> issue providing the query text.  But there's also citus extension,
> which don't keep around the query string at least when distributing
> plans, which makes sense since it's of no use and they're heavily
> modifying the original Query.  I think that citus folks opinion on the
> subject would be useful, so I'm Cc-ing Marco.

Most of the time we keep our Query * data structures in a form that
can be deparsed back into a query string by a modified copy of
ruleutils.c, so we could generate a correct query string if absolutely
necessary, though there are performance-sensitive cases where we'd
rather not have the deparsing overhead.

In case of INSERT..SELECT into a distributed table, we might call
pg_plan_query on the SELECT part (Query *) and send the output into a
DestReceiver that sends tuples into shards of the distributed table
via COPY. The fact that SELECT does not show up in pg_stat_statements
separately is generally fine because it's an implementation detail,
and it would probably be a little confusing because the user never ran
the SELECT query. Moreover, the call to pg_plan_query would already be
reflected in the planning or execution time of the top-level query, so
it would be double counted if it had its own entry.

Another case is when some of the shards turn out to be local to the
server that handles the distributed query. In that case we plan the
queries on those shards via pg_plan_query instead of deparsing and
sending the query string to a remote server. It would be less
confusing for these queries to show in pg_stat_statements, because the
queries on the shards on remote servers will show up as well. However,
this is a performance-sensitive code path where we'd rather avoid
deparsing.

In general, I'd prefer if there was no requirement to pass a correct
query string. I'm ok with passing "SELECT 'citus_internal'" or just ""
if that does not lead to issues. Passing NULL to signal that the
planner call should not be tracked separately does seem a bit cleaner.

Marco



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Next
From: Justin Pryzby
Date:
Subject: Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed atend-of-transaction