Re: Need a mentor, and a project. - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Need a mentor, and a project.
Date
Msg-id 200912120205.nBC252U25804@momjian.us
Whole thread Raw
In response to Re: Need a mentor, and a project.  (Ashish <abindra@u.washington.edu>)
Responses Re: Need a mentor, and a project.  (Robert Haas <robertmhaas@gmail.com>)
Re: Need a mentor, and a project.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Ashish wrote:
> I am thinking about starting with the following TODO item:
>
> --> Have EXPLAIN ANALYZE issue NOTICE messages when the estimated
> and actual row counts differ by a specified percentage.
>
> I picked this because it is somewhat related to query processing
> which is what I am most interested in. It also <seems> like a
> good start up project for a newbie like me. Before I start
> looking into what this would involve and start a conversation
> on designing a solution - I wanted to know what you guys think
> about this particular TODO, and it suitability to a newbie.
> Looking forward to your comments...

I even have a sample patch you can use as a start, attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/ref/explain.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/explain.sgml,v
retrieving revision 1.38
diff -c -c -r1.38 explain.sgml
*** doc/src/sgml/ref/explain.sgml    18 Sep 2006 19:54:01 -0000    1.38
--- doc/src/sgml/ref/explain.sgml    22 Dec 2006 17:09:05 -0000
***************
*** 64,72 ****
    <para>
     The <literal>ANALYZE</literal> option causes the statement to be actually executed, not only
     planned.  The total elapsed time expended within each plan node (in
!    milliseconds) and total number of rows it actually returned are added to
!    the display.  This is useful for seeing whether the planner's estimates
!    are close to reality.
    </para>

    <important>
--- 64,72 ----
    <para>
     The <literal>ANALYZE</literal> option causes the statement to be actually executed, not only
     planned.  The total elapsed time expended within each plan node (in
!    milliseconds) and total number of rows it actually returned and variance are added to
!    the display.  A sign of the variance indicates whether the estimate was too high or too low.
!    This is useful for seeing how close the planner's estimates are to reality.
    </para>

    <important>
***************
*** 222,229 ****

                                                         QUERY PLAN
   

-------------------------------------------------------------------------------------------------------------------------
!  HashAggregate  (cost=39.53..39.53 rows=1 width=8) (actual time=0.661..0.672 rows=7 loops=1)
!    ->  Index Scan using test_pkey on test  (cost=0.00..32.97 rows=1311 width=8) (actual time=0.050..0.395 rows=99
loops=1)
           Index Cond: ((id > $1) AND (id < $2))
   Total runtime: 0.851 ms
  (4 rows)
--- 222,229 ----

                                                         QUERY PLAN
   

-------------------------------------------------------------------------------------------------------------------------
!  HashAggregate  (cost=39.53..39.53 rows=1 width=8) (actual time=0.661..0.672 rows=7 var=-6.00 loops=1)
!    ->  Index Scan using test_pkey on test  (cost=0.00..32.97 rows=1311 width=8) (actual time=0.050..0.395 rows=99
var=+12.24loops=1) 
           Index Cond: ((id > $1) AND (id < $2))
   Total runtime: 0.851 ms
  (4 rows)
Index: src/backend/commands/explain.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/explain.c,v
retrieving revision 1.152
diff -c -c -r1.152 explain.c
*** src/backend/commands/explain.c    4 Oct 2006 00:29:51 -0000    1.152
--- src/backend/commands/explain.c    22 Dec 2006 17:09:09 -0000
***************
*** 57,62 ****
--- 57,63 ----
  static void show_sort_keys(Plan *sortplan, int nkeys, AttrNumber *keycols,
                 const char *qlabel,
                 StringInfo str, int indent, ExplainState *es);
+ static double ExplainVariance(double estimate, double actual);

  /*
   * ExplainQuery -
***************
*** 704,713 ****
      {
          double        nloops = planstate->instrument->nloops;

!         appendStringInfo(str, " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
                           1000.0 * planstate->instrument->startup / nloops,
                           1000.0 * planstate->instrument->total / nloops,
                           planstate->instrument->ntuples / nloops,
                           planstate->instrument->nloops);
      }
      else if (es->printAnalyze)
--- 705,716 ----
      {
          double        nloops = planstate->instrument->nloops;

!         appendStringInfo(str, " (actual time=%.3f..%.3f rows=%.0f var=%+.2f loops=%.0f)",
                           1000.0 * planstate->instrument->startup / nloops,
                           1000.0 * planstate->instrument->total / nloops,
                           planstate->instrument->ntuples / nloops,
+                          ExplainVariance(plan->plan_rows,
+                                     planstate->instrument->ntuples / nloops),
                           planstate->instrument->nloops);
      }
      else if (es->printAnalyze)
***************
*** 1205,1207 ****
--- 1208,1225 ----

      appendStringInfo(str, "\n");
  }
+
+
+ static double ExplainVariance(double estimate, double actual)
+ {
+     if (estimate == actual)
+         return 0;
+     else if (actual == 0)
+         return estimate;
+     else if (estimate == 0)
+         return -actual;
+     else if (estimate > actual)
+         return (estimate / actual) - 1;
+     else
+         return -(actual / estimate - 1);
+ }

pgsql-hackers by date:

Previous
From: Ashish
Date:
Subject: Re: Need a mentor, and a project.
Next
From: Robert Haas
Date:
Subject: Re: EXPLAIN BUFFERS