Thread: Re: [PERFORM] Planner doesn't look at LIMIT?

From:
Tom Lane
Date:

I wrote:
> Dawid Kuroczko <> writes:
>> qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;

>> Limit  (cost=15912.20..15912.31 rows=1 width=272)
>> ->  Hash Join  (cost=15912.20..5328368.96 rows=47044336 width=272)

>> If I set enable_hashjoin=false:

>> qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1;

>> Limit  (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216
>> rows=1 loops=1)
>> ->  Nested Loop Left Join  (cost=0.00..144295895.01 rows=47044336
>> width=272) (actual time=74.204..74.204 rows=1 loops=1)

> This is quite strange.  The nestloop plan definitely should be preferred
> in the context of the LIMIT, considering that it has far lower estimated
> cost.  And it is preferred in simple tests for me.

After a suitable period of contemplating my navel, I figured out
what is going on here: the total costs involved are large enough that
the still-fairly-high startup cost of the hash is disregarded by
compare_fuzzy_path_costs(), and so the nestloop is discarded as not
having any significant potential advantage in startup time.

I think that this refutes the original scheme of using the same fuzz
factor for both startup and total cost comparisons, and therefore
propose the attached patch.

Comments?

            regards, tom lane

*** src/backend/optimizer/util/pathnode.c.orig    Fri Jul 15 13:09:25 2005
--- src/backend/optimizer/util/pathnode.c    Fri Jul 22 12:08:25 2005
***************
*** 98,157 ****
  static int
  compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
  {
-     Cost        fuzz;
-
      /*
!      * The fuzz factor is set at one percent of the smaller total_cost,
!      * but not less than 0.01 cost units (just in case total cost is
!      * zero).
       *
       * XXX does this percentage need to be user-configurable?
       */
-     fuzz = Min(path1->total_cost, path2->total_cost) * 0.01;
-     fuzz = Max(fuzz, 0.01);
-
      if (criterion == STARTUP_COST)
      {
!         if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
!         {
!             if (path1->startup_cost < path2->startup_cost)
!                 return -1;
!             else
!                 return +1;
!         }

          /*
           * If paths have the same startup cost (not at all unlikely),
           * order them by total cost.
           */
!         if (Abs(path1->total_cost - path2->total_cost) > fuzz)
!         {
!             if (path1->total_cost < path2->total_cost)
!                 return -1;
!             else
!                 return +1;
!         }
      }
      else
      {
!         if (Abs(path1->total_cost - path2->total_cost) > fuzz)
!         {
!             if (path1->total_cost < path2->total_cost)
!                 return -1;
!             else
!                 return +1;
!         }

          /*
           * If paths have the same total cost, order them by startup cost.
           */
!         if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
!         {
!             if (path1->startup_cost < path2->startup_cost)
!                 return -1;
!             else
!                 return +1;
!         }
      }
      return 0;
  }
--- 98,138 ----
  static int
  compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
  {
      /*
!      * We use a fuzz factor of 1% of the smaller cost.
       *
       * XXX does this percentage need to be user-configurable?
       */
      if (criterion == STARTUP_COST)
      {
!         if (path1->startup_cost > path2->startup_cost * 1.01)
!             return +1;
!         if (path2->startup_cost > path1->startup_cost * 1.01)
!             return -1;

          /*
           * If paths have the same startup cost (not at all unlikely),
           * order them by total cost.
           */
!         if (path1->total_cost > path2->total_cost * 1.01)
!             return +1;
!         if (path2->total_cost > path1->total_cost * 1.01)
!             return -1;
      }
      else
      {
!         if (path1->total_cost > path2->total_cost * 1.01)
!             return +1;
!         if (path2->total_cost > path1->total_cost * 1.01)
!             return -1;

          /*
           * If paths have the same total cost, order them by startup cost.
           */
!         if (path1->startup_cost > path2->startup_cost * 1.01)
!             return +1;
!         if (path2->startup_cost > path1->startup_cost * 1.01)
!             return -1;
      }
      return 0;
  }

From:
Simon Riggs
Date:

On Fri, 2005-07-22 at 12:20 -0400, Tom Lane wrote:
> I think that this refutes the original scheme of using the same fuzz
> factor for both startup and total cost comparisons, and therefore
> propose the attached patch.
>
> Comments?

Looks good. I think it explains a few other wierd perf reports also.

Best Regards, Simon Riggs


From:
Tom Lane
Date:

Simon Riggs <> writes:
> Looks good. I think it explains a few other wierd perf reports also.

Could be.  I went back to look at Sam Mason's report about three weeks
ago, and it definitely seems to explain his issue.  The "fuzzy cost
comparison" logic is new in 8.0 so it hasn't had all that much
testing...

            regards, tom lane

From:
Dawid Kuroczko
Date:

On 7/22/05, Tom Lane <> wrote:
> > This is quite strange.  The nestloop plan definitely should be preferred
> > in the context of the LIMIT, considering that it has far lower estimated
> > cost.  And it is preferred in simple tests for me.
>
> After a suitable period of contemplating my navel, I figured out
> what is going on here: the total costs involved are large enough that
> the still-fairly-high startup cost of the hash is disregarded by
> compare_fuzzy_path_costs(), and so the nestloop is discarded as not
> having any significant potential advantage in startup time.
>
> I think that this refutes the original scheme of using the same fuzz
> factor for both startup and total cost comparisons, and therefore
> propose the attached patch.
>
> Comments?

Works great!!!

With LIMIT below 4 000 000 rows (its 47-milion row table) it prefers
nested loops, then it starts to introduce merge joins.

   Regards,
      Dawid

From:
Sam Mason
Date:

Tom Lane wrote:
>Could be.  I went back to look at Sam Mason's report about three weeks
>ago, and it definitely seems to explain his issue.

I've just built a patched version as well and it appears to be doing
what I think is the right thing now.  I.e. actually picking the
plan with the lower cost.

Thanks!


  Sam

From:
Tom Lane
Date:

Ian Westmacott <> writes:
> In a nutshell, I have a LIMIT query where the planner
> seems to favor a merge join over a nested loop.

The planner is already estimating only one row out of the join, and so
the LIMIT doesn't affect its cost estimates at all.

It appears to me that the reason the nestloop plan is fast is just
chance: a suitable matching row is found very early in the scan of
tableB, so that the indexscan on it can stop after 29 rows, instead
of having to go through all 55000 rows in the given range of bim.
If it'd have had to go through, say, half of the rows to find a match,
the sort/merge plan would show up a lot better.

If this wasn't chance, but was expected because there are many matching
rows and not only one, then there's a statistical problem.

            regards, tom lane

From:
Ian Westmacott
Date:

I have a case that I though was an example of this issue,
and that this patch would correct.  I applied this patch
to an 8.0.3 source distribution, but it didn't seem to
solve my problem.

In a nutshell, I have a LIMIT query where the planner
seems to favor a merge join over a nested loop.  I've
simplified the query as much as possible:


itvtrackdata3=> \d tableA
Table "public.tableA"
 Column |   Type   | Modifiers
--------+----------+-----------
 foo    | bigint   | not null
 bar    | smallint | not null
 bap    | bigint   | not null
 bip    | bigint   | not null
 bom    | bigint   | not null
Indexes:
    "idx_tableA_bip" btree (bip) WHERE (bip =
9000000000000000000::bigint)
    "idx_tableA_foo" btree (foo)

itvtrackdata3=> \d tableB
Table "tableB"
 Column  |   Type   | Modifiers
---------+----------+-----------
 bim     | bigint   | not null
 bif     | smallint | not null
 baf     | smallint | not null
 bof     | smallint | not null
 buf     | smallint | not null
 foo     | bigint   | not null
Indexes:
    "idx_tableB_bim" btree ("bim", foo)

itvtrackdata3=> set default_statistics_target to 1000;
SET
Time: 0.448 ms
itvtrackdata3=> analyze tableA;
ANALYZE
Time: 4237.151 ms
itvtrackdata3=> analyze tableB;
ANALYZE
Time: 46672.939 ms
itvtrackdata3=> explain analyze SELECT * FROM tableB NATURAL JOIN tableA
WHERE bim>=72555896091359 AND bim<72555935412959 AND bim=bap ORDER BY
bim ASC LIMIT 1;
                                                                                      QUERY PLAN
                                                              

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=149626.57..252987.71 rows=1 width=50) (actual
time=5684.013..5684.013 rows=1 loops=1)
   ->  Merge Join  (cost=149626.57..252987.71 rows=1 width=50) (actual
time=5684.012..5684.012 rows=1 loops=1)
         Merge Cond: (("outer"."bim" = "inner"."bap") AND ("outer".foo =
"inner".foo))
         ->  Index Scan using idx_tableB_bim on tableB
(cost=0.00..97391.22 rows=55672 width=24) (actual time=0.017..0.059
rows=29 loops=1)
               Index Cond: (("bim" >= 72555896091359::bigint) AND ("bim"
< 72555935412959::bigint))
         ->  Sort  (cost=149626.57..151523.94 rows=758948 width=34)
(actual time=5099.300..5442.825 rows=560856 loops=1)
               Sort Key: tableA."bap", tableA.foo
               ->  Seq Scan on tableA  (cost=0.00..47351.48 rows=758948
width=34) (actual time=0.021..1645.204 rows=758948 loops=1)
 Total runtime: 5706.655 ms
(9 rows)

Time: 5729.984 ms
itvtrackdata3=> set enable_mergejoin to false;
SET
Time: 0.373 ms
itvtrackdata3=> explain analyze SELECT * FROM tableB NATURAL JOIN tableA
WHERE bim>=72555896091359 AND bim<72555935412959 AND bim=bap ORDER BY
bim ASC LIMIT 1;
                                                                                      QUERY PLAN
                                                              

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..432619.68 rows=1 width=50) (actual
time=11.149..11.150 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..432619.68 rows=1 width=50) (actual
time=11.148..11.148 rows=1 loops=1)
         Join Filter: ("outer"."bim" = "inner"."bap")
         ->  Index Scan using idx_tableB_bim on tableB
(cost=0.00..97391.22 rows=55672 width=24) (actual time=0.017..0.062
rows=29 loops=1)
               Index Cond: (("bim" >= 72555896091359::bigint) AND ("bim"
< 72555935412959::bigint))
         ->  Index Scan using idx_tableA_foo on tableA  (cost=0.00..6.01
rows=1 width=34) (actual time=0.007..0.379 rows=1 loops=29)
               Index Cond: ("outer".foo = tableA.foo)
 Total runtime: 11.215 ms
(8 rows)

Time: 32.007 ms


Have I just flubbed the patch, or is there something else
going on here?

Thanks,

    --Ian


On Fri, 2005-07-22 at 12:20, Tom Lane wrote:
> I wrote:
> > Dawid Kuroczko <> writes:
> >> qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;
>
> >> Limit  (cost=15912.20..15912.31 rows=1 width=272)
> >> ->  Hash Join  (cost=15912.20..5328368.96 rows=47044336 width=272)
>
> >> If I set enable_hashjoin=false:
>
> >> qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1;
>
> >> Limit  (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216
> >> rows=1 loops=1)
> >> ->  Nested Loop Left Join  (cost=0.00..144295895.01 rows=47044336
> >> width=272) (actual time=74.204..74.204 rows=1 loops=1)
>
> > This is quite strange.  The nestloop plan definitely should be preferred
> > in the context of the LIMIT, considering that it has far lower estimated
> > cost.  And it is preferred in simple tests for me.
>
> After a suitable period of contemplating my navel, I figured out
> what is going on here: the total costs involved are large enough that
> the still-fairly-high startup cost of the hash is disregarded by
> compare_fuzzy_path_costs(), and so the nestloop is discarded as not
> having any significant potential advantage in startup time.
>
> I think that this refutes the original scheme of using the same fuzz
> factor for both startup and total cost comparisons, and therefore
> propose the attached patch.
>
> Comments?
>
>             regards, tom lane
>
> *** src/backend/optimizer/util/pathnode.c.orig    Fri Jul 15 13:09:25 2005
> --- src/backend/optimizer/util/pathnode.c    Fri Jul 22 12:08:25 2005
> ***************
> *** 98,157 ****
>   static int
>   compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
>   {
> -     Cost        fuzz;
> -
>       /*
> !      * The fuzz factor is set at one percent of the smaller total_cost,
> !      * but not less than 0.01 cost units (just in case total cost is
> !      * zero).
>        *
>        * XXX does this percentage need to be user-configurable?
>        */
> -     fuzz = Min(path1->total_cost, path2->total_cost) * 0.01;
> -     fuzz = Max(fuzz, 0.01);
> -
>       if (criterion == STARTUP_COST)
>       {
> !         if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
> !         {
> !             if (path1->startup_cost < path2->startup_cost)
> !                 return -1;
> !             else
> !                 return +1;
> !         }
>
>           /*
>            * If paths have the same startup cost (not at all unlikely),
>            * order them by total cost.
>            */
> !         if (Abs(path1->total_cost - path2->total_cost) > fuzz)
> !         {
> !             if (path1->total_cost < path2->total_cost)
> !                 return -1;
> !             else
> !                 return +1;
> !         }
>       }
>       else
>       {
> !         if (Abs(path1->total_cost - path2->total_cost) > fuzz)
> !         {
> !             if (path1->total_cost < path2->total_cost)
> !                 return -1;
> !             else
> !                 return +1;
> !         }
>
>           /*
>            * If paths have the same total cost, order them by startup cost.
>            */
> !         if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
> !         {
> !             if (path1->startup_cost < path2->startup_cost)
> !                 return -1;
> !             else
> !                 return +1;
> !         }
>       }
>       return 0;
>   }
> --- 98,138 ----
>   static int
>   compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
>   {
>       /*
> !      * We use a fuzz factor of 1% of the smaller cost.
>        *
>        * XXX does this percentage need to be user-configurable?
>        */
>       if (criterion == STARTUP_COST)
>       {
> !         if (path1->startup_cost > path2->startup_cost * 1.01)
> !             return +1;
> !         if (path2->startup_cost > path1->startup_cost * 1.01)
> !             return -1;
>
>           /*
>            * If paths have the same startup cost (not at all unlikely),
>            * order them by total cost.
>            */
> !         if (path1->total_cost > path2->total_cost * 1.01)
> !             return +1;
> !         if (path2->total_cost > path1->total_cost * 1.01)
> !             return -1;
>       }
>       else
>       {
> !         if (path1->total_cost > path2->total_cost * 1.01)
> !             return +1;
> !         if (path2->total_cost > path1->total_cost * 1.01)
> !             return -1;
>
>           /*
>            * If paths have the same total cost, order them by startup cost.
>            */
> !         if (path1->startup_cost > path2->startup_cost * 1.01)
> !             return +1;
> !         if (path2->startup_cost > path1->startup_cost * 1.01)
> !             return -1;
>       }
>       return 0;
>   }
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq