Nested loop performance - Mailing list pgsql-performance
From | Nick Fankhauser |
---|---|
Subject | Nested loop performance |
Date | |
Msg-id | NEBBLAAHGLEEPCGOBHDGOEJMKBAA.nickf@ontko.com Whole thread Raw |
Responses |
Re: Nested loop performance
Re: Nested loop performance |
List | pgsql-performance |
Hi- I'm trying to optimize a query that I *think* should run very fast. Essentially, I'm joining two tables that have very selective indexes and constraining the query on an indexed field. (There's a third small lookup table in the mix, but it doesn't really affect the bottom line.) actor is a table containing roughly 3 million rows with an index on actor_full_name_uppercase and a unique index on actor_id. actor_summary also contains roughly 3 million rows. Its PK is a unique combined index on (actor_id, county_id, case_disp_global_code). The vast majority of the rows in actor correspond to a single row in actor_summary I'd estimate this at 95% or more. The remaining actors with multiple records generally have two corresponding rows in actor summary. Actor summary was created as a performance enhancer, where we can store some pre-calculated values such as the number of court cases an actor is involved in. The constraint is applied first, with reasonable speed. In the example below, it takes about 15 seconds to gather the matches in actor. I'm unsure what is happening next. I notice that an index scan is occurring on actor_summary_pk, with an "actual time" of 9.15, but then it looks like a nested loop occurs at the next level to join these tables. Does this mean that each probe of the actor_summary index will take 9.15 msec, but the nested loop is going to do this once for each actor_id? The nested loop appears to be where most of my time is going, so I'm focusing on this area, but don't know if there is a better approach to this join. Is there a more efficient means than a nested loop to handle such a join? Would a different method be chosen if there was exactly one row in actor_summary for every row in actor? -Nick The query & explain analyze: alpha=# alpha=# alpha=# explain analyze alpha-# select alpha-# min(actor.actor_id) as actor_id, alpha-# min(actor.actor_entity_type) as actor_entity_type, alpha-# min(actor.role_class_code) as role_class_code, alpha-# min(actor.actor_full_name) as actor_full_name, alpha-# min(actor.actor_person_date_of_birth) as actor_person_date_of_birth, alpha-# min(actor.actor_entity_acronym) as actor_entity_acronym, alpha-# min(actor.actor_person_last_name) as actor_person_last_name, alpha-# min(actor.actor_person_first_name) as actor_person_first_name, alpha-# min(actor.actor_person_middle_name) as actor_person_middle_name, alpha-# min(actor.actor_person_name_suffix) as actor_person_name_suffix, alpha-# min(actor.actor_person_place_of_birth) as actor_person_place_of_birth, alpha-# min(actor.actor_person_height) as actor_person_height, alpha-# min(actor.actor_person_height_unit) as actor_person_height_unit, alpha-# min(actor.actor_person_weight) as actor_person_weight, alpha-# min(actor.actor_person_weight_unit) as actor_person_weight_unit, alpha-# min(actor.actor_person_ethnicity) as actor_person_ethnicity, alpha-# min(actor.actor_person_citizenship_count) as actor_person_citizenship_count, alpha-# min(actor.actor_person_hair_color) as actor_person_hair_color, alpha-# min(actor.actor_person_scars_marks_tatto) as actor_person_scars_marks_tatto, alpha-# min(actor.actor_person_marital_status) as actor_person_marital_status, alpha-# min(actor.actor_alias_for_actor_id) as actor_alias_for_actor_id, alpha-# min(to_char(data_source.source_last_update, 'MM/DD/YYYY HH12:MI AM TZ')) as last_update, alpha-# min(actor_summary.single_case_public_id) as case_public_id, alpha-# min(actor_summary.single_case_id) as case_id, alpha-# sum(actor_summary.case_count)as case_count alpha-# from alpha-# actor, alpha-# actor_summary, alpha-# data_source alpha-# where alpha-# actor.actor_id = actor_summary.actor_id alpha-# and data_source.source_id = actor.source_id alpha-# and actor_full_name_uppercase like upper('sanders%') alpha-# group by alpha-# actor.actor_id alpha-# order by alpha-# min(actor.actor_full_name_uppercase), alpha-# case_count desc, alpha-# min(actor_summary.case_disp_global_code) alpha-# limit alpha-# 1000 alpha-# ; QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ------------------------------- Limit (cost=2555.58..2555.59 rows=1 width=547) (actual time=48841.76..48842.90 rows=1000 loops=1) -> Sort (cost=2555.58..2555.59 rows=1 width=547) (actual time=48841.76..48842.18 rows=1001 loops=1) Sort Key: min((actor.actor_full_name_uppercase)::text), sum(actor_summary.case_count), min((actor_summary.case_disp_global_code)::text) -> Aggregate (cost=2555.50..2555.57 rows=1 width=547) (actual time=48604.17..48755.28 rows=3590 loops=1) -> Group (cost=2555.50..2555.50 rows=1 width=547) (actual time=48604.04..48647.91 rows=3594 loops=1) -> Sort (cost=2555.50..2555.50 rows=1 width=547) (actual time=48604.01..48605.70 rows=3594 loops=1) Sort Key: actor.actor_id -> Nested Loop (cost=1.14..2555.49 rows=1 width=547) (actual time=69.09..48585.83 rows=3594 loops=1) -> Hash Join (cost=1.14..900.39 rows=204 width=475) (actual time=46.92..15259.02 rows=3639 loops=1) Hash Cond: ("outer".source_id = "inner".source_id) -> Index Scan using actor_full_name_uppercase on actor (cost=0.00..895.04 rows=222 width=463) (actual time=46.54..15220.77 rows=3639 loops=1) Index Cond: ((actor_full_name_uppercase >= 'SANDERS'::character varying) AND (actor_full_name_uppercase < 'SANDERT'::character varying)) Filter: (actor_full_name_uppercase ~~ 'SANDERS%'::text) -> Hash (cost=1.11..1.11 rows=11 width=12) (actual time=0.05..0.05 rows=0 loops=1) -> Seq Scan on data_source (cost=0.00..1.11 rows=11 width=12) (actual time=0.02..0.04 rows=11 loops=1) -> Index Scan using actor_summary_pk on actor_summary (cost=0.00..8.11 rows=1 width=72) (actual time=9.14..9.15 rows=1 loops=3639) Index Cond: ("outer".actor_id = actor_summary.actor_id) Total runtime: 48851.85 msec (18 rows) --------------------------------------------------------------------- Nick Fankhauser nickf@doxpop.com Phone 1.765.965.7363 Fax 1.765.962.9788 doxpop - Court records at your fingertips - http://www.doxpop.com/
pgsql-performance by date:
Next
From: "David Shadovitz"Date:
Subject: Why is VACUUM ANALYZE