Nested loop question - Mailing list pgsql-performance

From Nick Fankhauser - Doxpop
Subject Nested loop question
Date
Msg-id NEBBLAAHGLEEPCGOBHDGEEJMKBAA.nickf@doxpop.com
Whole thread Raw
Responses Re: Nested loop question
List pgsql-performance
Hi-

I'm trying to optimize a query that I *think* should run very fast.
Essentially, I'm joining two tables that have very selective indexes and
constraining the query on an indexed field. (There's a third small lookup
table in the mix, but it doesn't really affect the bottom line.)

actor is a table containing roughly 3 million rows with an index on
actor_full_name_uppercase and a unique index on actor_id.

actor_summary also contains roughly 3 million rows. Its PK is a unique
combined index on (actor_id, county_id, case_disp_global_code).

The vast majority of the rows in actor correspond to a single row in
actor_summary I'd estimate this at 95% or more. The remaining actors with
multiple records generally have two corresponding rows in actor summary.
Actor summary was created as a performance enhancer, where we can store some
pre-calculated values such as the number of court cases an actor is involved
in.

The constraint is applied first, with reasonable speed. In the example
below, it takes about 15 seconds to gather the matches in actor.

I'm unsure what is happening next. I notice that an index scan is occurring
on actor_summary_pk, with an "actual time" of 9.15, but then it looks like a
nested loop occurs at the next level to join these tables. Does this mean
that each probe of the actor_summary index will take 9.15 msec, but the
nested loop is going to do this once for each actor_id?

The nested loop appears to be where most of my time is going, so I'm
focusing on this area, but don't know if there is a better approach to this
join.

Is there a more efficient means than a nested loop to handle such a join?
Would a different method be chosen if there was exactly one row in
actor_summary for every row in actor?

-Nick

The query & explain analyze:


alpha=#
alpha=#
alpha=# explain analyze
alpha-#     select
alpha-#     min(actor.actor_id) as actor_id,
alpha-#     min(actor.actor_entity_type) as actor_entity_type,
alpha-#     min(actor.role_class_code) as role_class_code,
alpha-#     min(actor.actor_full_name) as actor_full_name,
alpha-#     min(actor.actor_person_date_of_birth) as
actor_person_date_of_birth,
alpha-#     min(actor.actor_entity_acronym) as actor_entity_acronym,
alpha-#     min(actor.actor_person_last_name) as actor_person_last_name,
alpha-#     min(actor.actor_person_first_name) as actor_person_first_name,
alpha-#     min(actor.actor_person_middle_name) as actor_person_middle_name,
alpha-#     min(actor.actor_person_name_suffix) as actor_person_name_suffix,
alpha-#     min(actor.actor_person_place_of_birth) as
actor_person_place_of_birth,
alpha-#     min(actor.actor_person_height) as actor_person_height,
alpha-#     min(actor.actor_person_height_unit) as actor_person_height_unit,
alpha-#     min(actor.actor_person_weight) as actor_person_weight,
alpha-#     min(actor.actor_person_weight_unit) as actor_person_weight_unit,
alpha-#     min(actor.actor_person_ethnicity) as actor_person_ethnicity,
alpha-#     min(actor.actor_person_citizenship_count) as
actor_person_citizenship_count,
alpha-#     min(actor.actor_person_hair_color) as actor_person_hair_color,
alpha-#     min(actor.actor_person_scars_marks_tatto) as
actor_person_scars_marks_tatto,
alpha-#     min(actor.actor_person_marital_status) as
actor_person_marital_status,
alpha-#     min(actor.actor_alias_for_actor_id) as actor_alias_for_actor_id,
alpha-#     min(to_char(data_source.source_last_update, 'MM/DD/YYYY HH12:MI
AM TZ')) as last_update,
alpha-#     min(actor_summary.single_case_public_id) as case_public_id,
alpha-#     min(actor_summary.single_case_id) as case_id,
alpha-#     sum(actor_summary.case_count)as case_count
alpha-#   from
alpha-#     actor,
alpha-#     actor_summary,
alpha-#     data_source
alpha-#   where
alpha-#     actor.actor_id = actor_summary.actor_id
alpha-#     and data_source.source_id = actor.source_id
alpha-#     and actor_full_name_uppercase like upper('sanders%')
alpha-#   group by
alpha-#     actor.actor_id
alpha-#   order by
alpha-#     min(actor.actor_full_name_uppercase),
alpha-#     case_count desc,
alpha-#     min(actor_summary.case_disp_global_code)
alpha-#   limit
alpha-#     1000
alpha-# ;



QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------------------------------
 Limit  (cost=2555.58..2555.59 rows=1 width=547) (actual
time=48841.76..48842.90 rows=1000 loops=1)
   ->  Sort  (cost=2555.58..2555.59 rows=1 width=547) (actual
time=48841.76..48842.18 rows=1001 loops=1)
         Sort Key: min((actor.actor_full_name_uppercase)::text),
sum(actor_summary.case_count),
min((actor_summary.case_disp_global_code)::text)
         ->  Aggregate  (cost=2555.50..2555.57 rows=1 width=547) (actual
time=48604.17..48755.28 rows=3590 loops=1)
               ->  Group  (cost=2555.50..2555.50 rows=1 width=547) (actual
time=48604.04..48647.91 rows=3594 loops=1)
                     ->  Sort  (cost=2555.50..2555.50 rows=1 width=547)
(actual time=48604.01..48605.70 rows=3594 loops=1)
                           Sort Key: actor.actor_id
                           ->  Nested Loop  (cost=1.14..2555.49 rows=1
width=547) (actual time=69.09..48585.83 rows=3594 loops=1)
                                 ->  Hash Join  (cost=1.14..900.39 rows=204
width=475) (actual time=46.92..15259.02 rows=3639 loops=1)
                                       Hash Cond: ("outer".source_id =
"inner".source_id)
                                       ->  Index Scan using
actor_full_name_uppercase on actor  (cost=0.00..895.04 rows=222 width=463)
(actual time=46.54..15220.77 rows=3639 loops=1)
                                             Index Cond:
((actor_full_name_uppercase >= 'SANDERS'::character varying) AND
(actor_full_name_uppercase < 'SANDERT'::character varying))
                                             Filter:
(actor_full_name_uppercase ~~ 'SANDERS%'::text)
                                       ->  Hash  (cost=1.11..1.11 rows=11
width=12) (actual time=0.05..0.05 rows=0 loops=1)
                                             ->  Seq Scan on data_source
(cost=0.00..1.11 rows=11 width=12) (actual time=0.02..0.04 rows=11 loops=1)
                                 ->  Index Scan using actor_summary_pk on
actor_summary  (cost=0.00..8.11 rows=1 width=72) (actual time=9.14..9.15
rows=1 loops=3639)
                                       Index Cond: ("outer".actor_id =
actor_summary.actor_id)
 Total runtime: 48851.85 msec
(18 rows)


---------------------------------------------------------------------
Nick Fankhauser

    nickf@doxpop.com  Phone 1.765.965.7363  Fax 1.765.962.9788
doxpop - Court records at your fingertips - http://www.doxpop.com/



pgsql-performance by date:

Previous
From: "David Shadovitz"
Date:
Subject: Re: Measuring execution time for sql called from PL/pgSQL
Next
From: "Zeugswetter Andreas SB SD"
Date:
Subject: Re: [HACKERS] fsync method checking