Thread: Planner doesn't look at LIMIT?

From:
Dawid Kuroczko
Date:

Hello, I have PostgreSQL 8.0.3 running on a "workstation" with 768 MB
of RAM, under FreeBSD.  And I have a 47-milion row table:

qnex=# explain select * from log;
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on log  (cost=0.00..1741852.36 rows=47044336 width=180)
(1 row)

...which is joined with a few smaller ones, like:

qnex=# explain select * from useragents;
                            QUERY PLAN
-------------------------------------------------------------------
 Seq Scan on useragents  (cost=0.00..9475.96 rows=364896 width=96)
(1 row)

shared_buffers = 5000
random_page_cost = 3
work_mem = 102400
effective_cache_size = 60000

Now, if I do a SELECT:

qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Limit  (cost=15912.20..15912.31 rows=1 width=272)
   ->  Hash Join  (cost=15912.20..5328368.96 rows=47044336 width=272)
         Hash Cond: ("outer".useragent_id = "inner".useragent_id)
         ->  Seq Scan on log  (cost=0.00..1741852.36 rows=47044336 width=180)
         ->  Hash  (cost=9475.96..9475.96 rows=364896 width=96)
               ->  Seq Scan on useragents  (cost=0.00..9475.96
rows=364896 width=96)
(6 rows)

Or:

qnex=# EXPLAIN SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Limit  (cost=15912.20..15912.31 rows=1 width=272)
   ->  Hash Left Join  (cost=15912.20..5328368.96 rows=47044336 width=272)
         Hash Cond: ("outer".useragent_id = "inner".useragent_id)
         ->  Seq Scan on log  (cost=0.00..1741852.36 rows=47044336 width=180)
         ->  Hash  (cost=9475.96..9475.96 rows=364896 width=96)
               ->  Seq Scan on useragents  (cost=0.00..9475.96
rows=364896 width=96)
(6 rows)

Time: 2.688 ms

...the query seems to last forever (its hashing 47 million rows!)

If I set enable_hashjoin=false:

qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1;
                                                                QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216
rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..144295895.01 rows=47044336
width=272) (actual time=74.204..74.204 rows=1 loops=1)
         ->  Seq Scan on log  (cost=0.00..1741852.36 rows=47044336
width=180) (actual time=23.270..23.270 rows=1 loops=1)
         ->  Index Scan using useragents_pkey on useragents
(cost=0.00..3.02 rows=1 width=96) (actual time=50.867..50.867 rows=1
loops=1)
               Index Cond: ("outer".useragent_id = useragents.useragent_id)
 Total runtime: 74.483 ms

...which is way faster.  Of course if I did:

qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents
WHERE logid = (SELECT logid FROM log LIMIT 1);
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.04..6.09 rows=1 width=272) (actual
time=61.403..61.419 rows=1 loops=1)
   InitPlan
     ->  Limit  (cost=0.00..0.04 rows=1 width=4) (actual
time=0.029..0.032 rows=1 loops=1)
           ->  Seq Scan on log  (cost=0.00..1741852.36 rows=47044336
width=4) (actual time=0.023..0.023 rows=1 loops=1)
   ->  Index Scan using log_pkey on log  (cost=0.00..3.02 rows=1
width=180) (actual time=61.316..61.319 rows=1 loops=1)
         Index Cond: (logid = $0)
   ->  Index Scan using useragents_pkey on useragents
(cost=0.00..3.02 rows=1 width=96) (actual time=0.036..0.042 rows=1
loops=1)
         Index Cond: ("outer".useragent_id = useragents.useragent_id)
 Total runtime: 61.741 ms
(9 rows)

...I tried tweaking cpu_*, work_mem, effective_cache and so on, but without
any luck.  47 milion table is huge compared to useragents (I actually need
to join the log with 3 similar to useragents tables, and create a view out
of it).  Also tried using LEFT/RIGHT JOINS insead of (inner) JOINs...
Of course the database is freshly vacuum analyzed, and statistics are
set at 50...

My view of the problem is that planner ignores the "LIMIT" part.  It assumes
it _needs_ to return all 47 million rows joined with the useragents table, so
the hashjoin is the only sane approach.  But chances are that unless I'll
use LIMIT 200000, the nested loop will be much faster.

Any ideas how to make it work (other than rewriting the query to use
subselects, use explicit id-rows, disabling hashjoin completely)?
Or is this a bug?

   Regards,
      Dawid

From:
PFC
Date:

    Which row do you want ? Do you want 'a row' at random ?
    I presume you want the N latest rows ?
    In that case you should use an ORDER BY on an indexed field, the serial
primary key will do nicely (ORDER BY id DESC) ; it's indexed so it will
use the index and it will fly.

> Any ideas how to make it work (other than rewriting the query to use
> subselects, use explicit id-rows, disabling hashjoin completely)?
> Or is this a bug?
>
>    Regards,
>       Dawid
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>



From:
Tom Lane
Date:

Dawid Kuroczko <> writes:
> qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;

>  Limit  (cost=15912.20..15912.31 rows=1 width=272)
>    ->  Hash Join  (cost=15912.20..5328368.96 rows=47044336 width=272)

> If I set enable_hashjoin=false:

> qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1;

>  Limit  (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216
> rows=1 loops=1)
>    ->  Nested Loop Left Join  (cost=0.00..144295895.01 rows=47044336
> width=272) (actual time=74.204..74.204 rows=1 loops=1)

This is quite strange.  The nestloop plan definitely should be preferred
in the context of the LIMIT, considering that it has far lower estimated
cost.  And it is preferred in simple tests for me.  It seems there must
be something specific to your installation that's causing the planner to
go wrong.  Can you develop a self-contained test case that behaves this
way for you?

I recall we saw a similar complaint a month or two back, but the
complainant never followed up with anything useful for tracking down
the problem.

            regards, tom lane

From:
Sam Mason
Date:

Dawid Kuroczko wrote:
>work_mem = 102400

>...I tried tweaking cpu_*, work_mem, effective_cache and so on, but without
>any luck.

I'm hoping you didn't tweak it enough!  I posted something similar
this a while ago, but haven't since got around to figuring out
a useful test case to send to the list.

Try reducing your work_mem down to 1000 or so and things should
start doing what you expect.


  Sam

From:
Dawid Kuroczko
Date:

On 7/22/05, Tom Lane <> wrote:
> Dawid Kuroczko <> writes:
> > qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;
>
> >  Limit  (cost=15912.20..15912.31 rows=1 width=272)
> >    ->  Hash Join  (cost=15912.20..5328368.96 rows=47044336 width=272)
>
> This is quite strange.  The nestloop plan definitely should be preferred
> in the context of the LIMIT, considering that it has far lower estimated
> cost.  And it is preferred in simple tests for me.  It seems there must
> be something specific to your installation that's causing the planner to
> go wrong.  Can you develop a self-contained test case that behaves this
> way for you?

Why, certainly.  I did test it also on Gentoo Linux PostgreSQL 8.0.1 (yeah,
a bit older one), but the behaviour is the same.  The test looks like this:

-- First lets make a "small" lookup table -- 400000 rows.
CREATE TABLE lookup (
    lookup_id serial PRIMARY KEY,
    value integer NOT NULL
);
INSERT INTO lookup (value) SELECT * FROM generate_series(1, 400000);
VACUUM ANALYZE lookup;
-- Then lets make a huge data table...
CREATE TABLE huge_data (
    huge_data_id serial PRIMARY KEY,
    lookup_id integer NOT NULL
);
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM lookup;
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; --    800 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; --  1 600 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; --  3 200 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; --  6 400 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 12 800 000
-- You may want to put ANALYZE and EXPLAIN between each of these
-- steps.  In my cases, at 12.8 mln rows PostgreSQL seems to go for hashjoin
-- in each case.  YMMV, so you may try to push it up to 1024 mln rows.
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 25 600 000
ANALYZE huge_data;
EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1;

My EXPLAIN FROM Linux (SMP P-III box), with PostgreSQL 8.0.1, during making
this test case:

qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1;
                                     QUERY PLAN
------------------------------------------------------------------------------------
Limit  (cost=0.00..3.21 rows=1 width=12)
  ->  Nested Loop  (cost=0.00..19557596.04 rows=6094777 width=12)
        ->  Seq Scan on huge_data  (cost=0.00..95372.42 rows=6399942 width=8)
        ->  Index Scan using lookup_pkey on lookup  (cost=0.00..3.02
rows=1 width=8)
              Index Cond: ("outer".lookup_id = lookup.lookup_id)
(5 rows)

Time: 4,333 ms
qnex=# INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM
huge_data; -- 12 800 000
INSERT 0 6400000
Time: 501014,692 ms
qnex=# ANALYZE huge_data;
ANALYZE
Time: 4243,453 ms
qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1;
                                  QUERY PLAN
-------------------------------------------------------------------------------
Limit  (cost=11719.00..11719.09 rows=1 width=12)
  ->  Hash Join  (cost=11719.00..1212739.73 rows=12800185 width=12)
        Hash Cond: ("outer".lookup_id = "inner".lookup_id)
        ->  Seq Scan on huge_data  (cost=0.00..190747.84 rows=12800184 width=8)
        ->  Hash  (cost=5961.00..5961.00 rows=400000 width=8)
              ->  Seq Scan on lookup  (cost=0.00..5961.00 rows=400000 width=8)
(6 rows)

   Regards,
     Dawid

From:
Tom Lane
Date:

I wrote:
> Dawid Kuroczko <> writes:
>> qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;

>> Limit  (cost=15912.20..15912.31 rows=1 width=272)
>> ->  Hash Join  (cost=15912.20..5328368.96 rows=47044336 width=272)

>> If I set enable_hashjoin=false:

>> qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1;

>> Limit  (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216
>> rows=1 loops=1)
>> ->  Nested Loop Left Join  (cost=0.00..144295895.01 rows=47044336
>> width=272) (actual time=74.204..74.204 rows=1 loops=1)

> This is quite strange.  The nestloop plan definitely should be preferred
> in the context of the LIMIT, considering that it has far lower estimated
> cost.  And it is preferred in simple tests for me.

After a suitable period of contemplating my navel, I figured out
what is going on here: the total costs involved are large enough that
the still-fairly-high startup cost of the hash is disregarded by
compare_fuzzy_path_costs(), and so the nestloop is discarded as not
having any significant potential advantage in startup time.

I think that this refutes the original scheme of using the same fuzz
factor for both startup and total cost comparisons, and therefore
propose the attached patch.

Comments?

            regards, tom lane

*** src/backend/optimizer/util/pathnode.c.orig    Fri Jul 15 13:09:25 2005
--- src/backend/optimizer/util/pathnode.c    Fri Jul 22 12:08:25 2005
***************
*** 98,157 ****
  static int
  compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
  {
-     Cost        fuzz;
-
      /*
!      * The fuzz factor is set at one percent of the smaller total_cost,
!      * but not less than 0.01 cost units (just in case total cost is
!      * zero).
       *
       * XXX does this percentage need to be user-configurable?
       */
-     fuzz = Min(path1->total_cost, path2->total_cost) * 0.01;
-     fuzz = Max(fuzz, 0.01);
-
      if (criterion == STARTUP_COST)
      {
!         if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
!         {
!             if (path1->startup_cost < path2->startup_cost)
!                 return -1;
!             else
!                 return +1;
!         }

          /*
           * If paths have the same startup cost (not at all unlikely),
           * order them by total cost.
           */
!         if (Abs(path1->total_cost - path2->total_cost) > fuzz)
!         {
!             if (path1->total_cost < path2->total_cost)
!                 return -1;
!             else
!                 return +1;
!         }
      }
      else
      {
!         if (Abs(path1->total_cost - path2->total_cost) > fuzz)
!         {
!             if (path1->total_cost < path2->total_cost)
!                 return -1;
!             else
!                 return +1;
!         }

          /*
           * If paths have the same total cost, order them by startup cost.
           */
!         if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
!         {
!             if (path1->startup_cost < path2->startup_cost)
!                 return -1;
!             else
!                 return +1;
!         }
      }
      return 0;
  }
--- 98,138 ----
  static int
  compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
  {
      /*
!      * We use a fuzz factor of 1% of the smaller cost.
       *
       * XXX does this percentage need to be user-configurable?
       */
      if (criterion == STARTUP_COST)
      {
!         if (path1->startup_cost > path2->startup_cost * 1.01)
!             return +1;
!         if (path2->startup_cost > path1->startup_cost * 1.01)
!             return -1;

          /*
           * If paths have the same startup cost (not at all unlikely),
           * order them by total cost.
           */
!         if (path1->total_cost > path2->total_cost * 1.01)
!             return +1;
!         if (path2->total_cost > path1->total_cost * 1.01)
!             return -1;
      }
      else
      {
!         if (path1->total_cost > path2->total_cost * 1.01)
!             return +1;
!         if (path2->total_cost > path1->total_cost * 1.01)
!             return -1;

          /*
           * If paths have the same total cost, order them by startup cost.
           */
!         if (path1->startup_cost > path2->startup_cost * 1.01)
!             return +1;
!         if (path2->startup_cost > path1->startup_cost * 1.01)
!             return -1;
      }
      return 0;
  }

From:
Simon Riggs
Date:

On Fri, 2005-07-22 at 12:20 -0400, Tom Lane wrote:
> I think that this refutes the original scheme of using the same fuzz
> factor for both startup and total cost comparisons, and therefore
> propose the attached patch.
>
> Comments?

Looks good. I think it explains a few other wierd perf reports also.

Best Regards, Simon Riggs


From:
Tom Lane
Date:

Simon Riggs <> writes:
> Looks good. I think it explains a few other wierd perf reports also.

Could be.  I went back to look at Sam Mason's report about three weeks
ago, and it definitely seems to explain his issue.  The "fuzzy cost
comparison" logic is new in 8.0 so it hasn't had all that much
testing...

            regards, tom lane

From:
Dawid Kuroczko
Date:

On 7/22/05, Tom Lane <> wrote:
> > This is quite strange.  The nestloop plan definitely should be preferred
> > in the context of the LIMIT, considering that it has far lower estimated
> > cost.  And it is preferred in simple tests for me.
>
> After a suitable period of contemplating my navel, I figured out
> what is going on here: the total costs involved are large enough that
> the still-fairly-high startup cost of the hash is disregarded by
> compare_fuzzy_path_costs(), and so the nestloop is discarded as not
> having any significant potential advantage in startup time.
>
> I think that this refutes the original scheme of using the same fuzz
> factor for both startup and total cost comparisons, and therefore
> propose the attached patch.
>
> Comments?

Works great!!!

With LIMIT below 4 000 000 rows (its 47-milion row table) it prefers
nested loops, then it starts to introduce merge joins.

   Regards,
      Dawid

From:
Sam Mason
Date:

Tom Lane wrote:
>Could be.  I went back to look at Sam Mason's report about three weeks
>ago, and it definitely seems to explain his issue.

I've just built a patched version as well and it appears to be doing
what I think is the right thing now.  I.e. actually picking the
plan with the lower cost.

Thanks!


  Sam

From:
Ian Westmacott
Date:

I have a case that I though was an example of this issue,
and that this patch would correct.  I applied this patch
to an 8.0.3 source distribution, but it didn't seem to
solve my problem.

In a nutshell, I have a LIMIT query where the planner
seems to favor a merge join over a nested loop.  I've
simplified the query as much as possible:


itvtrackdata3=> \d tableA
Table "public.tableA"
 Column |   Type   | Modifiers
--------+----------+-----------
 foo    | bigint   | not null
 bar    | smallint | not null
 bap    | bigint   | not null
 bip    | bigint   | not null
 bom    | bigint   | not null
Indexes:
    "idx_tableA_bip" btree (bip) WHERE (bip =
9000000000000000000::bigint)
    "idx_tableA_foo" btree (foo)

itvtrackdata3=> \d tableB
Table "tableB"
 Column  |   Type   | Modifiers
---------+----------+-----------
 bim     | bigint   | not null
 bif     | smallint | not null
 baf     | smallint | not null
 bof     | smallint | not null
 buf     | smallint | not null
 foo     | bigint   | not null
Indexes:
    "idx_tableB_bim" btree ("bim", foo)

itvtrackdata3=> set default_statistics_target to 1000;
SET
Time: 0.448 ms
itvtrackdata3=> analyze tableA;
ANALYZE
Time: 4237.151 ms
itvtrackdata3=> analyze tableB;
ANALYZE
Time: 46672.939 ms
itvtrackdata3=> explain analyze SELECT * FROM tableB NATURAL JOIN tableA
WHERE bim>=72555896091359 AND bim<72555935412959 AND bim=bap ORDER BY
bim ASC LIMIT 1;
                                                                                      QUERY PLAN
                                                              

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=149626.57..252987.71 rows=1 width=50) (actual
time=5684.013..5684.013 rows=1 loops=1)
   ->  Merge Join  (cost=149626.57..252987.71 rows=1 width=50) (actual
time=5684.012..5684.012 rows=1 loops=1)
         Merge Cond: (("outer"."bim" = "inner"."bap") AND ("outer".foo =
"inner".foo))
         ->  Index Scan using idx_tableB_bim on tableB
(cost=0.00..97391.22 rows=55672 width=24) (actual time=0.017..0.059
rows=29 loops=1)
               Index Cond: (("bim" >= 72555896091359::bigint) AND ("bim"
< 72555935412959::bigint))
         ->  Sort  (cost=149626.57..151523.94 rows=758948 width=34)
(actual time=5099.300..5442.825 rows=560856 loops=1)
               Sort Key: tableA."bap", tableA.foo
               ->  Seq Scan on tableA  (cost=0.00..47351.48 rows=758948
width=34) (actual time=0.021..1645.204 rows=758948 loops=1)
 Total runtime: 5706.655 ms
(9 rows)

Time: 5729.984 ms
itvtrackdata3=> set enable_mergejoin to false;
SET
Time: 0.373 ms
itvtrackdata3=> explain analyze SELECT * FROM tableB NATURAL JOIN tableA
WHERE bim>=72555896091359 AND bim<72555935412959 AND bim=bap ORDER BY
bim ASC LIMIT 1;
                                                                                      QUERY PLAN
                                                              

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..432619.68 rows=1 width=50) (actual
time=11.149..11.150 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..432619.68 rows=1 width=50) (actual
time=11.148..11.148 rows=1 loops=1)
         Join Filter: ("outer"."bim" = "inner"."bap")
         ->  Index Scan using idx_tableB_bim on tableB
(cost=0.00..97391.22 rows=55672 width=24) (actual time=0.017..0.062
rows=29 loops=1)
               Index Cond: (("bim" >= 72555896091359::bigint) AND ("bim"
< 72555935412959::bigint))
         ->  Index Scan using idx_tableA_foo on tableA  (cost=0.00..6.01
rows=1 width=34) (actual time=0.007..0.379 rows=1 loops=29)
               Index Cond: ("outer".foo = tableA.foo)
 Total runtime: 11.215 ms
(8 rows)

Time: 32.007 ms


Have I just flubbed the patch, or is there something else
going on here?

Thanks,

    --Ian


On Fri, 2005-07-22 at 12:20, Tom Lane wrote:
> I wrote:
> > Dawid Kuroczko <> writes:
> >> qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;
>
> >> Limit  (cost=15912.20..15912.31 rows=1 width=272)
> >> ->  Hash Join  (cost=15912.20..5328368.96 rows=47044336 width=272)
>
> >> If I set enable_hashjoin=false:
>
> >> qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1;
>
> >> Limit  (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216
> >> rows=1 loops=1)
> >> ->  Nested Loop Left Join  (cost=0.00..144295895.01 rows=47044336
> >> width=272) (actual time=74.204..74.204 rows=1 loops=1)
>
> > This is quite strange.  The nestloop plan definitely should be preferred
> > in the context of the LIMIT, considering that it has far lower estimated
> > cost.  And it is preferred in simple tests for me.
>
> After a suitable period of contemplating my navel, I figured out
> what is going on here: the total costs involved are large enough that
> the still-fairly-high startup cost of the hash is disregarded by
> compare_fuzzy_path_costs(), and so the nestloop is discarded as not
> having any significant potential advantage in startup time.
>
> I think that this refutes the original scheme of using the same fuzz
> factor for both startup and total cost comparisons, and therefore
> propose the attached patch.
>
> Comments?
>
>             regards, tom lane
>
> *** src/backend/optimizer/util/pathnode.c.orig    Fri Jul 15 13:09:25 2005
> --- src/backend/optimizer/util/pathnode.c    Fri Jul 22 12:08:25 2005
> ***************
> *** 98,157 ****
>   static int
>   compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
>   {
> -     Cost        fuzz;
> -
>       /*
> !      * The fuzz factor is set at one percent of the smaller total_cost,
> !      * but not less than 0.01 cost units (just in case total cost is
> !      * zero).
>        *
>        * XXX does this percentage need to be user-configurable?
>        */
> -     fuzz = Min(path1->total_cost, path2->total_cost) * 0.01;
> -     fuzz = Max(fuzz, 0.01);
> -
>       if (criterion == STARTUP_COST)
>       {
> !         if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
> !         {
> !             if (path1->startup_cost < path2->startup_cost)
> !                 return -1;
> !             else
> !                 return +1;
> !         }
>
>           /*
>            * If paths have the same startup cost (not at all unlikely),
>            * order them by total cost.
>            */
> !         if (Abs(path1->total_cost - path2->total_cost) > fuzz)
> !         {
> !             if (path1->total_cost < path2->total_cost)
> !                 return -1;
> !             else
> !                 return +1;
> !         }
>       }
>       else
>       {
> !         if (Abs(path1->total_cost - path2->total_cost) > fuzz)
> !         {
> !             if (path1->total_cost < path2->total_cost)
> !                 return -1;
> !             else
> !                 return +1;
> !         }
>
>           /*
>            * If paths have the same total cost, order them by startup cost.
>            */
> !         if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
> !         {
> !             if (path1->startup_cost < path2->startup_cost)
> !                 return -1;
> !             else
> !                 return +1;
> !         }
>       }
>       return 0;
>   }
> --- 98,138 ----
>   static int
>   compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
>   {
>       /*
> !      * We use a fuzz factor of 1% of the smaller cost.
>        *
>        * XXX does this percentage need to be user-configurable?
>        */
>       if (criterion == STARTUP_COST)
>       {
> !         if (path1->startup_cost > path2->startup_cost * 1.01)
> !             return +1;
> !         if (path2->startup_cost > path1->startup_cost * 1.01)
> !             return -1;
>
>           /*
>            * If paths have the same startup cost (not at all unlikely),
>            * order them by total cost.
>            */
> !         if (path1->total_cost > path2->total_cost * 1.01)
> !             return +1;
> !         if (path2->total_cost > path1->total_cost * 1.01)
> !             return -1;
>       }
>       else
>       {
> !         if (path1->total_cost > path2->total_cost * 1.01)
> !             return +1;
> !         if (path2->total_cost > path1->total_cost * 1.01)
> !             return -1;
>
>           /*
>            * If paths have the same total cost, order them by startup cost.
>            */
> !         if (path1->startup_cost > path2->startup_cost * 1.01)
> !             return +1;
> !         if (path2->startup_cost > path1->startup_cost * 1.01)
> !             return -1;
>       }
>       return 0;
>   }
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq


From:
Tom Lane
Date:

Ian Westmacott <> writes:
> In a nutshell, I have a LIMIT query where the planner
> seems to favor a merge join over a nested loop.

The planner is already estimating only one row out of the join, and so
the LIMIT doesn't affect its cost estimates at all.

It appears to me that the reason the nestloop plan is fast is just
chance: a suitable matching row is found very early in the scan of
tableB, so that the indexscan on it can stop after 29 rows, instead
of having to go through all 55000 rows in the given range of bim.
If it'd have had to go through, say, half of the rows to find a match,
the sort/merge plan would show up a lot better.

If this wasn't chance, but was expected because there are many matching
rows and not only one, then there's a statistical problem.

            regards, tom lane

From:
Ian Westmacott
Date:

On Wed, 2005-08-10 at 18:55, Tom Lane wrote:
> Ian Westmacott <> writes:
> > In a nutshell, I have a LIMIT query where the planner
> > seems to favor a merge join over a nested loop.
>
> The planner is already estimating only one row out of the join, and so
> the LIMIT doesn't affect its cost estimates at all.
>
> It appears to me that the reason the nestloop plan is fast is just
> chance: a suitable matching row is found very early in the scan of
> tableB, so that the indexscan on it can stop after 29 rows, instead
> of having to go through all 55000 rows in the given range of bim.
> If it'd have had to go through, say, half of the rows to find a match,
> the sort/merge plan would show up a lot better.

Oh, I see.  Thanks, that clears up some misconceptions I
had about the explain output.

> If this wasn't chance, but was expected because there are many matching
> rows and not only one, then there's a statistical problem.

Well, there are in fact almost 300 of them in this case.
So I guess what I need to do is give the planner more
information to correctly predict that.

Thanks,

    --Ian