Thread: Optimizing nested loops in the query plan

Optimizing nested loops in the query plan

From
Fran Fabrizio
Date:
Hello,

This is a query I'm working on at the moment.  It's taking about 4
seconds to run, and I see that most of the cost is on the Nested Loop
area of the explain output.  Is there a better way to write this query
to improve execution time, the tables aren't that large and I feel like
there's a good chance I can get it to run faster.   I'm not too familiar
with which part of the query is causing the nested loop and how to
optimize that sort of thing.  findregion() is a function I wrote which
executes very quickly.  'select findregion(entity_id) from
current_status' returns in subsecond time so I don't think this is the
problem.  Thanks for any tips you can give me.

explain select findregion(cs.entity_id) as region, r.name, cs.status,
count(*) from current_status cs, region r where r.region_id =
findregion(cs.entity_id) group by region, r.name, cs.status;
NOTICE:  QUERY PLAN:

Aggregate  (cost=440.47..455.46 rows=150 width=24)
  ->  Group  (cost=440.47..451.71 rows=1499 width=24)
        ->  Sort  (cost=440.47..440.47 rows=1499 width=24)
              ->  Nested Loop  (cost=0.00..361.40 rows=1499 width=24)
                    ->  Seq Scan on region  (cost=0.00..1.07 rows=7
width=16)
                    ->  Seq Scan on current_status  (cost=0.00..28.99
rows=1499 width=8)

EXPLAIN

Thanks,
Fran


Re: Optimizing nested loops in the query plan

From
Masaru Sugawara
Date:
On Wed, 06 Feb 2002 17:36:43 -0500
Fran Fabrizio <ffabrizio@mmrd.com> wrote:


> This is a query I'm working on at the moment.  It's taking about 4
> seconds to run, and I see that most of the cost is on the Nested Loop
> area of the explain output.  Is there a better way to write this query
> to improve execution time, the tables aren't that large and I feel like
> there's a good chance I can get it to run faster.   I'm not too familiar
> with which part of the query is causing the nested loop and how to
> optimize that sort of thing.  findregion() is a function I wrote which
> executes very quickly.  'select findregion(entity_id) from
> current_status' returns in subsecond time so I don't think this is the
> problem.  Thanks for any tips you can give me.
>
> explain select findregion(cs.entity_id) as region, r.name, cs.status,
> count(*) from current_status cs, region r where r.region_id =
> findregion(cs.entity_id) group by region, r.name, cs.status;


 I don't know how many rows the tables have and the query returns,
 but the more rows they have, the more chance to reduce the execution
 time there seems to be.  In case of the small data, on the other hand,
 execution time of the following query may be longer than yours.


-- If the region table doesn't have indices
(1)create index idx_region_region_id on region(region_id);

(2)explain
   select cs.entity_id as region, r.name, cs.status, count(*)
    from region r inner join
         (select distinct findregion(entity_id) as entity_id, status
          from current_status
          order by 1
         ) as cs on  r.region_id = cs.entity_id
   group by region, r.name, cs.status;

NOTICE:  QUERY PLAN:
Aggregate  (cost=13688.40..14338.40 rows=6500 width=24)
  ->  Group  (cost=13688.40..14175.90 rows=65000 width=24)
     ->  Sort  (cost=13688.40..13688.40 rows=65000 width=24)
          ->  Merge Join  (cost=7522.19..7674.94 rows=65000 width=24)
                ->  Index Scan using region_pkey on region r
                   (cost=0.00 59.00 rows=1000 width=16)
                ->  Sort  (cost=7522.19..7522.19 rows=6500 width=8)
                      ->  Subquery Scan cs  (cost=6785.54..7110.54
                                             rows=65 width=8)
                            ->  Unique  (cost=6785.54..7110.54 rows=6500
                                         with=8)
                                ->  Sort  (cost=6785.54..6785.54 rows=650
                                            width=8)
                                      ->  Seq Scan on current_status
                                        (st=0.00..1065.00 rows=65000 width=8)



-- And, if the findregion() function always returns the same result
-- when given the same arguments, you'll be able to create an index
-- on the function. The query time may become shorter.

(3)create function findregion(int4) returns int4 as '
   ...
   ' language 'plpgsql' with (iscachable);

(4)create index idx_findregion on current_status(findregion(entity_id));

(5)explain
   select cs.entity_id as region, r.name, cs.status, count(*)
    from region r inner join
         (select findregion(entity_id) as entity_id, status
          from current_status
          order by 1
         ) as cs on  r.region_id = cs.entity_id
   group by region, r.name, cs.status;

NOTICE:  QUERY PLAN:
Aggregate  (cost=106808.97..113308.97 rows=65000 width=24)
  ->  Group  (cost=106808.97..111683.97 rows=650000 width=24)
        ->  Sort  (cost=106808.97..106808.97 rows=650000 width=24)
              ->  Merge Join  (cost=9077.21..9961.21 rows=650000 width=24)
                    ->  Index Scan using region_pkey on region2 r
                         (cost=0.00..59.00 rows=1000 width=16)
                    ->  Sort  (cost=9077.21..9077.21 rows=65000 width=8)
                          ->  Subquery Scan cs  (cost=0.00..3356.67
                                          rows=65000 width=8)
                                ->  Index Scan using idx_findregion on
                                         current_status2 (cost=0.00..3356.67
                                             rows=65000 width=8)


Regards,
Masaru Sugawara


Re: Optimizing nested loops in the query plan

From
Masaru Sugawara
Date:
On Mon, 11 Feb 2002 12:25:25 -0500
Fran Fabrizio <ffabrizio@mmrd.com> wrote:

> Thank you, it was this step that really made the difference!  It runs in
> 0.64 seconds now.
>
> Interestingly, adding with (iscachable) to the findregion() function
> made the query run twice as slow (even after first execution).  Strange.

 Twice -- I don't know why that case slows down so much.
 But index functions seem to need iscachable explicitly and strictly,
 which is commented in the ../src/backend/commands/indexcmds.c after 7.2.

/*
 * Require that the function be marked cachable. Using a noncachable
 * function for a functional index is highly questionable, since if
 * you aren't going to get the same result for the same data every
 * time, it's not clear what the index entries mean at all.
 */
 if (!func_iscachable(funcid))
   elog(ERROR, "DefineIndex: index function must be marked iscachable");


Regards,
Masaru Sugawara