Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins)
Date
Msg-id 199902022039.PAA13638@candle.pha.pa.us
Whole thread Raw
In response to Optimizer speed and GEQO (was: nested loops in joins)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
OK, I have modified the optimizer to count not only the table, but also
the indexes.  Patch is applied.  The code is:

   {       List       *temp;       int         paths_to_consider = 0;
       foreach(temp, outer_rels)       {           RelOptInfo *rel = (RelOptInfo *) lfirst(temp);
paths_to_consider+= length(rel->pathlist);       }
 
       if ((_use_geqo_) && paths_to_consider >= _use_geqo_rels_)           /* returns _one_ RelOptInfo, so lcons it */
        return lcons(geqo(root), NIL);     }
 


It is my understanding that it is the size of the pathlist that
determines the complexity of the optimization. (Wow, I actually
understood that sentence.)

Tom, can you tell me if this looks good, and also give a suggestion for
a possible default value for GEQO optimization.  My guess is that 6 now
is too low.  Sounds like you have a good testbed to do this.

Old posting attached.  I will look at the functions you mentioned for
possible improvement.


> I have been looking at optimizer runtime using Charles Hornberger's
> example case, and what I find is that the number of tables involved in
> the query is a very inadequate predictor of the optimizer's runtime.
> Thus, it's not surprising that we are getting poor results from using
> number of tables as the GEQO threshold measure.
> 
> I started with Charles' test case as presented, then simplified it by
> removing indexes from the database.  This didn't change the final
> plan, which was a straight nested loop over sequential scans anyway
> (because the tables were so small).  But it drastically reduces the
> number of cases that the optimizer looks at.
> 
> Original test case: query involves seven tables with a total of twelve
> indexes (six on the primary table and one on each other table).
> 
> 7-index test case: I removed all but one of the indexes on the primary
> table, leaving one index per table.
> 
> No-index test case: I removed all indexes.
> 
> It took a certain amount of patience to run these cases with profiling
> turned on :-(, but here are the results:
> 
>             7T+12I        7T+7I        7T
> 
> Runtime, sec        1800        457        59
> equal() calls        585 mil        157 mil        21.7 mil
> better_path() calls    72546        37418        14025
> bp->path_is_cheaper    668        668        668
> create_hashjoin_path    198        198        198
> create_mergejoin_path    2358        723        198
> create_nestloop_path    38227        20297        7765
> 
> Next, I removed the last table from Charles' query, producing these
> cases:
> 
>             6T+11I        6T+6I        6T
> 
> Runtime, sec        34        12        2.3
> equal() calls        14.3 mil    4.7 mil        0.65 mil
> better_path() calls    10721        6172        2443
> bp->path_is_cheaper    225        225        225
> create_hashjoin_path    85        85        85
> create_mergejoin_path    500        236        85
> create_nestloop_path    5684        3344        1354
> 
> The 5T+10I case is down to a couple of seconds of runtime, so I
> didn't bother to do profiles for 5 tables.
> 
> A fairly decent approximation is that the runtime varies as the
> square of the number of create_nestloop_path calls.  That number
> in turn seems to vary as the factorial of the number of tables,
> with a weaker but still strong term involving the number of indexes.
> I understand the square and the factorial terms, but the form of
> the dependency on the index count isn't real clear.
> 
> It might be worthwhile to try a GEQO threshold based on the number of
> tables plus half the number of indexes on those tables.  I have no idea
> where to find the number of indexes, so I'll just offer the idea for
> someone else to try.
> 
> 
> The main thing that jumps out from the profiles is that on these complex
> searches, the optimizer spends practically *all* of its time down inside
> better_path, scanning the list of already known access paths to see if a
> proposed new path is a duplicate of one already known.  (Most of the time
> it is not, as shown by the small number of times path_is_cheaper gets
> called from better_path.)  This scanning is O(N^2) in the number of paths
> considered, whereas it seems that all the other work is no worse than O(N)
> in the number of paths.  The bottom of the duplicate check is equal(),
> which as you can see is getting called a horrendous number of times.
> 
> It would be worth our while to try to eliminate this mostly-unsuccessful
> comparison operation.
> 
> I wonder whether it would work to simply always add the new path to the
> list, and rely on the later "prune" step to get rid of duplicates?
> The prune code is evidently far more efficient at getting rid of useless
> entries than better_path is, because it's nowhere near the top of the
> profile even though it's processing nearly as many paths.
> 
> Anybody here know much about how the optimizer works?  I'm hesitant to
> hack on it myself.
> 
>             regards, tom lane
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: New patch (was: tough bug)
Next
From: Ian Grant
Date:
Subject: Re: [HACKERS] Backend problem with large objects