Thread: Proposal: new GUC paramter

Proposal: new GUC paramter

From
Tatsuo Ishii
Date:
I see following in src/backend/optimizer/plan/planner.c
if (child && IsA(child, FromExpr)){
    /*     * Yes, so do we want to merge it into parent?    Always do     * so if child has just one element (since
thatdoesn't     * make the parent's list any longer).  Otherwise we have     * to be careful about the increase in
planningtime     * caused by combining the two join search spaces into     * one.  Our heuristic is to merge if the
mergewill     * produce a join list no longer than GEQO_RELS/2.     * (Perhaps need an additional user parameter?)
*/

This is really annoying since:

o these code fragments actually controls the optimization efforts for subqueries and views, not related to GEQO at all.
Sousing GEQO parameters for this kind of purpose seems abuse for me.
 

o Even if geqo = false in postgresql.con, the code looks into the GEQO value. This is really confusing for users.

So I propose a new GUC parameter called "subquery_merge_threshold"
solely for this purpose.

Comments?
--
Tatsuo Ishii


Re: Proposal: new GUC paramter

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> This is really annoying since:
> o these code fragments actually controls the optimization efforts for
>   subqueries and views, not related to GEQO at all. So using GEQO
>   parameters for this kind of purpose seems abuse for me.

But GEQO_RELS is directly related to the maximum number of FROM-clause
entries that we want to try to handle by exhaustive search.  So I think
it's not completely unreasonable to use it for this additional purpose.

Still, if you want to do the work to create another GUC parameter,
I won't object.
        regards, tom lane


Re: Proposal: new GUC paramter

From
Bruce Momjian
Date:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > This is really annoying since:
> > o these code fragments actually controls the optimization efforts for
> >   subqueries and views, not related to GEQO at all. So using GEQO
> >   parameters for this kind of purpose seems abuse for me.
>
> But GEQO_RELS is directly related to the maximum number of FROM-clause
> entries that we want to try to handle by exhaustive search.  So I think
> it's not completely unreasonable to use it for this additional purpose.
>
> Still, if you want to do the work to create another GUC parameter,
> I won't object.

This is a tough call.  The GEQO value is used here to indicate a table
list that is very long and needs GEQO processing, so there is some
relationship.  If we get to a point where the number of tables is too
large, we do have problems.

However, the GEQO setting is set to the point where we want GEQO to take
over from the standard optimizer.  If GEQO was to be improved, this
value would be decreased but the point at which you would want to stop
increasing the target list probably would be the same.

The GEQO/2 is clearly just a ballpark estimate.  I can see the value as
a separate config parameter, but I can also see it as something that may
be confusing to users and <1% of people will want to change it.  In
fact, interestingly, even if GEQO is off, GEQO_THRESHHOLD can be changed
by users wishing to pull more of their subqueries into their target
list.

I started thinking of some more complex comparison we could do, such as
determining if:

    2 * (factorial(rels_in_upper_query) + factorial(rels_in_subquery)) <
    factorial(rels_in_upper_query + factorial(rels_in_subquery)

but this doesn't seem to generate good decisions.

I have applied the following documentation patch to at least document
the current behavior.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: doc/src/sgml/runtime.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v
retrieving revision 1.89
diff -c -r1.89 runtime.sgml
*** doc/src/sgml/runtime.sgml    2001/10/09 18:46:00    1.89
--- doc/src/sgml/runtime.sgml    2001/10/11 21:08:59
***************
*** 719,725 ****
          this many FROM items involved.  (Note that a JOIN construct
      counts as only one FROM item.) The default is 11. For simpler
      queries it is usually best to use the
!         deterministic, exhaustive planner.
         </para>
        </listitem>
       </varlistentry>
--- 719,727 ----
          this many FROM items involved.  (Note that a JOIN construct
      counts as only one FROM item.) The default is 11. For simpler
      queries it is usually best to use the
!         deterministic, exhaustive planner.  This parameter also controls
!         how hard the optimizer will try to merge subquery
!         <literal>FROM</literal> clauses into the upper query.
         </para>
        </listitem>
       </varlistentry>