Re: [PATCHES] WIP: executor_hook for pg_stat_statements - Mailing list pgsql-hackers

From ITAGAKI Takahiro
Subject Re: [PATCHES] WIP: executor_hook for pg_stat_statements
Date
Msg-id 20080715150758.3016.52131E4D@oss.ntt.co.jp
Whole thread Raw
In response to Re: [PATCHES] WIP: executor_hook for pg_stat_statements  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [PATCHES] WIP: executor_hook for pg_stat_statements
Re: [PATCHES] WIP: executor_hook for pg_stat_statements
List pgsql-hackers
Simon Riggs <simon@2ndquadrant.com> wrote:

> > I wonder whether we ought to change things so that the real query
> > source text is available at the executor level.  Since we are (at least
> > usually) storing the query text in cached plans, I think this might just
> > require some API refactoring, not extra space and copying.  It would
> > amount to a permanent decision that we're willing to pay the overhead
> > of keeping the source text around, though.
> 
> I think its a reasonable decision to do that. Knowing what you're doing
> while you do it is pretty important.

I worked around to it, I found I can use ActivePortal->sourceText in some
situations. But there are still some problems:
- SQL functions:       They don't modify ActivePortal->sourceText, but we could get the       source from
SQLFunctionCache->src.If it is required, we might       need to add a new field in QueryDesc and copy the src to the
field.-Multiple queries:       Query text is not divided into each query.       Only the original combined text is
available.-RULEs:       There are similar issues with multiple queries.       Also, they don't have original query
texts.

The same can be said for planner_hook(). Only available query text is
debug_query_string in it, and it is the top-level query. We cannot use
the actual SQL text which the Query object comes from. The treu query
text might be SQL functions used in the top-level query, a part of multiple
queries, or another query rewritten by RULE.

For these reasons, now I'm thinking to collect only top-query level
statistics, not per-planner+executor level statistics. i.e, when we
receive a multiple query "SELECT 1; SELECT 2;", pg_stat_statements uses
the original combined text as a key. Comsumed resource associated with
the key is sum of resources used in both "SELECT 1" and "SELECT 2".


> > Also, after looking at the patch more closely, was there a good reason
> > for making the hook intercept ExecutePlan rather than ExecutorRun?
> 
> That raises the question of whether we should have ExecutorStart() and
> ExecutorEnd() hooks as well, to round things off.

Yeah, and also ExecutorRewind() hook. There are 4 interface functions in
executor. My addin only needs Run hook because it doesn't modify the actual
behavior of executor. However, when someone hope to replace the behavior,
they need all of the hooks. (Is multi-threaded executor project still alive?)

How about adding new Executor class
and ExecutorStart() returns an instance of Executor?
   typedef struct Executor   {       ExecutorRunFunc     run;       ExecutorEndFunc     end;       ExecutorRewindFunc
rewind;      /* there might be private fields. */   } Executor;      Executor *e = ExecutorStart_hook(...);
ExecutorRun(e,...) => { e->run(e, ...); }   ExecutorEnd(e, ...) => { e->end(e, ...); }
 

It could be make APIs cleaner because QueryDesc has 3 fields only for
executor (tupDesc, estate, planstate). We can move those fields to
Executor's private fields. Is this modification acceptable?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: PATCH: CITEXT 2.0 v3
Next
From: "Heikki Linnakangas"
Date:
Subject: Re: [PATCHES] VACUUM Improvements - WIP Patch