Re: BUG #17552: pg_stat_statements tracks internal FK check queries when COPY used to load data - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #17552: pg_stat_statements tracks internal FK check queries when COPY used to load data
Date
Msg-id 2869953.1698873621@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #17552: pg_stat_statements tracks internal FK check queries when COPY used to load data  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re:BUG #17552: pg_stat_statements tracks internal FK check queries when COPY used to load data  (Sergei Kornilov <sk@zsrv.org>)
List pgsql-bugs
I wrote:
> However, I then started to wonder whether it's really a good idea
> that the code treats PREPARE and EXECUTE alike for this purpose.
> In the case of EXECUTE, the idea is that we'll still be at top
> level when we reach the executor hooks, and they will do the right
> thing and then increment exec_nested_level before any lower-level
> statement can be reached.  But how does that concept apply to
> PREPARE, which won't reach the executor?  ISTM the net result
> is that if any subsidiary statements are reached during PREPARE
> (perhaps via const-simplification of some SQL or PL function),
> we will erroneously treat them as top-level.

Poking at that, I found that (a) PREPARE doesn't run the planner,
so probably nested statements are impossible, and (b) if we
remove the check so that it bumps the nesting level, then some
existing regression test outputs change.  PREPARE does run
parse analysis of the contained statement, and it seems that
that's sensitive to the is-top-level state.  So we do need to
keep the exclusion for PREPARE.  However, I'm still fairly down
on PGSS_HANDLED_UTILITY, because I think the rationale for
treating PREPARE and EXECUTE specially is different in each
place where that's being used, so tying them together is more
likely to cause future bugs than prevent bugs.

So v3-0001 attached revises the patch per those ideas.  It's only
cosmetically different from before, but I think the explanatory
comments are better.

Meanwhile, I realized that we have a second set of bugs.  As I
said above, it's possible to reach nested statements during
planning, as a consequence of const-simplification of an immutable
or stable function.  pg_stat_statements generally does the wrong
thing here, incorrectly treating such statements as top-level if
we weren't already nested.  This needs to be fixed by including
plan_nest_level as a reason to consider execution to be nested too.
What's more, testing this showed that pgss_planner has the same bug
as pgss_ProcessUtility: it needs to bump the nesting level even
if it chooses not to track planning time.

v3-0002 fixes this in a minimally invasive way by replacing each
check of exec_nested_level by plan_nested_level + exec_nested_level.
That's just for review/testing though.  What I think we actually
ought to do is fold plan_nested_level and exec_nested_level into
a single variable nested_level (or nesting_level would be better
English).  There's no reason to keep them separate; it just adds
cycles, complexity, and more risk of mistakes.

Comments?

            regards, tom lane

diff --git a/contrib/pg_stat_statements/expected/level_tracking.out
b/contrib/pg_stat_statements/expected/level_tracking.out
index d924c87b41..e7b29e9c99 100644
--- a/contrib/pg_stat_statements/expected/level_tracking.out
+++ b/contrib/pg_stat_statements/expected/level_tracking.out
@@ -67,6 +67,61 @@ SELECT toplevel, calls, query FROM pg_stat_statements
  t        |     1 | SET pg_stat_statements.track = 'all'
 (7 rows)

+-- DO block - top-level tracking without utility.
+SET pg_stat_statements.track = 'top';
+SET pg_stat_statements.track_utility = FALSE;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+DELETE FROM stats_track_tab;
+DO $$
+BEGIN
+  DELETE FROM stats_track_tab;
+END; $$;
+DO LANGUAGE plpgsql $$
+BEGIN
+  -- this is a SELECT
+  PERFORM 'hello world'::TEXT;
+END; $$;
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C", toplevel;
+ toplevel | calls |               query
+----------+-------+-----------------------------------
+ t        |     1 | DELETE FROM stats_track_tab
+ t        |     1 | SELECT pg_stat_statements_reset()
+(2 rows)
+
+-- DO block - all-level tracking without utility.
+SET pg_stat_statements.track = 'all';
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset
+--------------------------
+
+(1 row)
+
+DELETE FROM stats_track_tab;
+DO $$
+BEGIN
+  DELETE FROM stats_track_tab;
+END; $$;
+DO LANGUAGE plpgsql $$
+BEGIN
+  -- this is a SELECT
+  PERFORM 'hello world'::TEXT;
+END; $$;
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C", toplevel;
+ toplevel | calls |               query
+----------+-------+-----------------------------------
+ f        |     1 | DELETE FROM stats_track_tab
+ t        |     1 | DELETE FROM stats_track_tab
+ f        |     1 | SELECT $1::TEXT
+ t        |     1 | SELECT pg_stat_statements_reset()
+(4 rows)
+
 -- PL/pgSQL function - top-level tracking.
 SET pg_stat_statements.track = 'top';
 SET pg_stat_statements.track_utility = FALSE;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 767b6ceb10..da5ef7d0b7 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -99,13 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT    5    /* free this % of entries at once */
 #define IS_STICKY(c)    ((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)

-/*
- * Utility statements that pgss_ProcessUtility and pgss_post_parse_analyze
- * ignores.
- */
-#define PGSS_HANDLED_UTILITY(n)        (!IsA(n, ExecuteStmt) && \
-                                    !IsA(n, PrepareStmt))
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -840,12 +833,14 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
         return;

     /*
-     * Clear queryId for prepared statements related utility, as those will
-     * inherit from the underlying statement's one.
+     * If it's EXECUTE, clear the queryId so that stats will accumulate for
+     * the underlying PREPARE.  But don't do this if we're not tracking
+     * utility statements, to avoid messing up another extension that might be
+     * tracking them.
      */
     if (query->utilityStmt)
     {
-        if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt))
+        if (pgss_track_utility && IsA(query->utilityStmt, ExecuteStmt))
         {
             query->queryId = UINT64CONST(0);
             return;
@@ -1097,6 +1092,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
     uint64        saved_queryId = pstmt->queryId;
     int            saved_stmt_location = pstmt->stmt_location;
     int            saved_stmt_len = pstmt->stmt_len;
+    bool        enabled = pgss_track_utility && pgss_enabled(exec_nested_level);

     /*
      * Force utility statements to get queryId zero.  We do this even in cases
@@ -1112,7 +1108,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
      * that user configured another extension to handle utility statements
      * only.
      */
-    if (pgss_enabled(exec_nested_level) && pgss_track_utility)
+    if (enabled)
         pstmt->queryId = UINT64CONST(0);

     /*
@@ -1124,11 +1120,13 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
      * hash table entry for the PREPARE (with hash calculated from the query
      * string), and then a different one with the same query string (but hash
      * calculated from the query tree) would be used to accumulate costs of
-     * ensuing EXECUTEs.  This would be confusing, and inconsistent with other
-     * cases where planning time is not included at all.
+     * ensuing EXECUTEs.  This would be confusing.  Since PREPARE doesn't
+     * actually run the planner (only parse+rewrite), its costs are generally
+     * pretty negligible and it seems okay to just ignore it.
      */
-    if (pgss_track_utility && pgss_enabled(exec_nested_level) &&
-        PGSS_HANDLED_UTILITY(parsetree))
+    if (enabled &&
+        !IsA(parsetree, ExecuteStmt) &&
+        !IsA(parsetree, PrepareStmt))
     {
         instr_time    start;
         instr_time    duration;
@@ -1206,14 +1204,41 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
     }
     else
     {
-        if (prev_ProcessUtility)
-            prev_ProcessUtility(pstmt, queryString, readOnlyTree,
-                                context, params, queryEnv,
-                                dest, qc);
-        else
-            standard_ProcessUtility(pstmt, queryString, readOnlyTree,
+        /*
+         * Even though we're not tracking execution time for this statement,
+         * we must still increment the nesting level, to ensure that functions
+         * evaluated within it are not seen as top-level calls.  But don't do
+         * so for EXECUTE; that way, when control reaches pgss_planner or
+         * pgss_ExecutorStart, we will treat the costs as top-level if
+         * appropriate.  Likewise, don't bump for PREPARE, so that parse
+         * analysis will treat the statement as top-level if appropriate.
+         *
+         * To be absolutely certain we don't mess up the nesting level,
+         * evaluate the bump_level condition just once.
+         */
+        bool        bump_level =
+            !IsA(parsetree, ExecuteStmt) &&
+            !IsA(parsetree, PrepareStmt);
+
+        if (bump_level)
+            exec_nested_level++;
+        PG_TRY();
+        {
+            if (prev_ProcessUtility)
+                prev_ProcessUtility(pstmt, queryString, readOnlyTree,
                                     context, params, queryEnv,
                                     dest, qc);
+            else
+                standard_ProcessUtility(pstmt, queryString, readOnlyTree,
+                                        context, params, queryEnv,
+                                        dest, qc);
+        }
+        PG_FINALLY();
+        {
+            if (bump_level)
+                exec_nested_level--;
+        }
+        PG_END_TRY();
     }
 }

diff --git a/contrib/pg_stat_statements/sql/level_tracking.sql b/contrib/pg_stat_statements/sql/level_tracking.sql
index 0c20b8ce69..eab881a43b 100644
--- a/contrib/pg_stat_statements/sql/level_tracking.sql
+++ b/contrib/pg_stat_statements/sql/level_tracking.sql
@@ -33,6 +33,39 @@ END; $$;
 SELECT toplevel, calls, query FROM pg_stat_statements
   ORDER BY query COLLATE "C", toplevel;

+-- DO block - top-level tracking without utility.
+SET pg_stat_statements.track = 'top';
+SET pg_stat_statements.track_utility = FALSE;
+SELECT pg_stat_statements_reset();
+DELETE FROM stats_track_tab;
+DO $$
+BEGIN
+  DELETE FROM stats_track_tab;
+END; $$;
+DO LANGUAGE plpgsql $$
+BEGIN
+  -- this is a SELECT
+  PERFORM 'hello world'::TEXT;
+END; $$;
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C", toplevel;
+
+-- DO block - all-level tracking without utility.
+SET pg_stat_statements.track = 'all';
+SELECT pg_stat_statements_reset();
+DELETE FROM stats_track_tab;
+DO $$
+BEGIN
+  DELETE FROM stats_track_tab;
+END; $$;
+DO LANGUAGE plpgsql $$
+BEGIN
+  -- this is a SELECT
+  PERFORM 'hello world'::TEXT;
+END; $$;
+SELECT toplevel, calls, query FROM pg_stat_statements
+  ORDER BY query COLLATE "C", toplevel;
+
 -- PL/pgSQL function - top-level tracking.
 SET pg_stat_statements.track = 'top';
 SET pg_stat_statements.track_utility = FALSE;
diff --git a/contrib/pg_stat_statements/expected/level_tracking.out
b/contrib/pg_stat_statements/expected/level_tracking.out
index e7b29e9c99..0b94c71c9c 100644
--- a/contrib/pg_stat_statements/expected/level_tracking.out
+++ b/contrib/pg_stat_statements/expected/level_tracking.out
@@ -173,6 +173,31 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
      1 |    1 | SELECT pg_stat_statements_reset()
 (3 rows)

+-- immutable SQL function --- can be executed at plan time
+CREATE FUNCTION PLUS_THREE(i INTEGER) RETURNS INTEGER AS
+$$ SELECT i + 3 LIMIT 1 $$ IMMUTABLE LANGUAGE SQL;
+SELECT PLUS_THREE(8);
+ plus_three
+------------
+         11
+(1 row)
+
+SELECT PLUS_THREE(10);
+ plus_three
+------------
+         13
+(1 row)
+
+SELECT toplevel, calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+ toplevel | calls | rows |                                    query
+----------+-------+------+------------------------------------------------------------------------------
+ t        |     2 |    2 | SELECT PLUS_ONE($1)
+ t        |     2 |    2 | SELECT PLUS_THREE($1)
+ t        |     2 |    2 | SELECT PLUS_TWO($1)
+ t        |     1 |    3 | SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"
+ t        |     1 |    1 | SELECT pg_stat_statements_reset()
+(5 rows)
+
 -- PL/pgSQL function - all-level tracking.
 SET pg_stat_statements.track = 'all';
 SELECT pg_stat_statements_reset();
@@ -184,6 +209,7 @@ SELECT pg_stat_statements_reset();
 -- we drop and recreate the functions to avoid any caching funnies
 DROP FUNCTION PLUS_ONE(INTEGER);
 DROP FUNCTION PLUS_TWO(INTEGER);
+DROP FUNCTION PLUS_THREE(INTEGER);
 -- PL/pgSQL function
 CREATE FUNCTION PLUS_TWO(i INTEGER) RETURNS INTEGER AS $$
 DECLARE
@@ -229,7 +255,34 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
      1 |    1 | SELECT pg_stat_statements_reset()
 (5 rows)

-DROP FUNCTION PLUS_ONE(INTEGER);
+-- immutable SQL function --- can be executed at plan time
+CREATE FUNCTION PLUS_THREE(i INTEGER) RETURNS INTEGER AS
+$$ SELECT i + 3 LIMIT 1 $$ IMMUTABLE LANGUAGE SQL;
+SELECT PLUS_THREE(8);
+ plus_three
+------------
+         11
+(1 row)
+
+SELECT PLUS_THREE(10);
+ plus_three
+------------
+         13
+(1 row)
+
+SELECT toplevel, calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+ toplevel | calls | rows |                                    query
+----------+-------+------+------------------------------------------------------------------------------
+ f        |     2 |    2 | SELECT (i + $2 + $3)::INTEGER
+ f        |     2 |    2 | SELECT (i + $2)::INTEGER LIMIT $3
+ t        |     2 |    2 | SELECT PLUS_ONE($1)
+ t        |     2 |    2 | SELECT PLUS_THREE($1)
+ t        |     2 |    2 | SELECT PLUS_TWO($1)
+ t        |     1 |    5 | SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"
+ f        |     2 |    2 | SELECT i + $2 LIMIT $3
+ t        |     1 |    1 | SELECT pg_stat_statements_reset()
+(8 rows)
+
 --
 -- pg_stat_statements.track = none
 --
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index da5ef7d0b7..f1cb02e463 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -829,7 +829,7 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
         prev_post_parse_analyze_hook(pstate, query, jstate);

     /* Safety check... */
-    if (!pgss || !pgss_hash || !pgss_enabled(exec_nested_level))
+    if (!pgss || !pgss_hash || !pgss_enabled(plan_nested_level + exec_nested_level))
         return;

     /*
@@ -954,12 +954,26 @@ pgss_planner(Query *parse,
     }
     else
     {
-        if (prev_planner_hook)
-            result = prev_planner_hook(parse, query_string, cursorOptions,
-                                       boundParams);
-        else
-            result = standard_planner(parse, query_string, cursorOptions,
-                                      boundParams);
+        /*
+         * Even though we're not tracking plan time for this statement, we
+         * must still increment the nesting level, to ensure that functions
+         * evaluated during planning are not seen as top-level calls.
+         */
+        plan_nested_level++;
+        PG_TRY();
+        {
+            if (prev_planner_hook)
+                result = prev_planner_hook(parse, query_string, cursorOptions,
+                                           boundParams);
+            else
+                result = standard_planner(parse, query_string, cursorOptions,
+                                          boundParams);
+        }
+        PG_FINALLY();
+        {
+            plan_nested_level--;
+        }
+        PG_END_TRY();
     }

     return result;
@@ -981,7 +995,7 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
      * counting of optimizable statements that are directly contained in
      * utility statements.
      */
-    if (pgss_enabled(exec_nested_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
+    if (pgss_enabled(plan_nested_level + exec_nested_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
     {
         /*
          * Set up to track total elapsed time in ExecutorRun.  Make sure the
@@ -1051,7 +1065,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
     uint64        queryId = queryDesc->plannedstmt->queryId;

     if (queryId != UINT64CONST(0) && queryDesc->totaltime &&
-        pgss_enabled(exec_nested_level))
+        pgss_enabled(plan_nested_level + exec_nested_level))
     {
         /*
          * Make sure stats accumulation is done.  (Note: it's okay if several
@@ -1092,7 +1106,7 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
     uint64        saved_queryId = pstmt->queryId;
     int            saved_stmt_location = pstmt->stmt_location;
     int            saved_stmt_len = pstmt->stmt_len;
-    bool        enabled = pgss_track_utility && pgss_enabled(exec_nested_level);
+    bool        enabled = pgss_track_utility && pgss_enabled(plan_nested_level + exec_nested_level);

     /*
      * Force utility statements to get queryId zero.  We do this even in cases
@@ -1296,7 +1310,7 @@ pgss_store(const char *query, uint64 queryId,
     key.userid = GetUserId();
     key.dbid = MyDatabaseId;
     key.queryid = queryId;
-    key.toplevel = (exec_nested_level == 0);
+    key.toplevel = (plan_nested_level + exec_nested_level == 0);

     /* Lookup the hash table entry with shared lock. */
     LWLockAcquire(pgss->lock, LW_SHARED);
diff --git a/contrib/pg_stat_statements/sql/level_tracking.sql b/contrib/pg_stat_statements/sql/level_tracking.sql
index eab881a43b..dcd0b04358 100644
--- a/contrib/pg_stat_statements/sql/level_tracking.sql
+++ b/contrib/pg_stat_statements/sql/level_tracking.sql
@@ -90,6 +90,15 @@ SELECT PLUS_ONE(10);

 SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";

+-- immutable SQL function --- can be executed at plan time
+CREATE FUNCTION PLUS_THREE(i INTEGER) RETURNS INTEGER AS
+$$ SELECT i + 3 LIMIT 1 $$ IMMUTABLE LANGUAGE SQL;
+
+SELECT PLUS_THREE(8);
+SELECT PLUS_THREE(10);
+
+SELECT toplevel, calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+
 -- PL/pgSQL function - all-level tracking.
 SET pg_stat_statements.track = 'all';
 SELECT pg_stat_statements_reset();
@@ -97,6 +106,7 @@ SELECT pg_stat_statements_reset();
 -- we drop and recreate the functions to avoid any caching funnies
 DROP FUNCTION PLUS_ONE(INTEGER);
 DROP FUNCTION PLUS_TWO(INTEGER);
+DROP FUNCTION PLUS_THREE(INTEGER);

 -- PL/pgSQL function
 CREATE FUNCTION PLUS_TWO(i INTEGER) RETURNS INTEGER AS $$
@@ -118,7 +128,15 @@ SELECT PLUS_ONE(3);
 SELECT PLUS_ONE(1);

 SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
-DROP FUNCTION PLUS_ONE(INTEGER);
+
+-- immutable SQL function --- can be executed at plan time
+CREATE FUNCTION PLUS_THREE(i INTEGER) RETURNS INTEGER AS
+$$ SELECT i + 3 LIMIT 1 $$ IMMUTABLE LANGUAGE SQL;
+
+SELECT PLUS_THREE(8);
+SELECT PLUS_THREE(10);
+
+SELECT toplevel, calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";

 --
 -- pg_stat_statements.track = none

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17552: pg_stat_statements tracks internal FK check queries when COPY used to load data
Next
From: Alexander Korotkov
Date:
Subject: Re: BUG #18170: Unexpected error: no relation entry for relid 3