A plan returned by explain doesn't make sense to me - Mailing list pgsql-admin
From | Nick Fankhauser |
---|---|
Subject | A plan returned by explain doesn't make sense to me |
Date | |
Msg-id | NEBBLAAHGLEEPCGOBHDGIEFNELAA.nickf@ontko.com Whole thread Raw |
Responses |
Re: A plan returned by explain doesn't make sense to me
|
List | pgsql-admin |
Hi- I'm trying to squeeze better performance out of a query, but the explain returns a plan that really puzzles me. A little background is in order- the table that I think is central to my problem is what I call an "assignment table" I'm working with court data, so I have people and cases. A person can be involved in 0 or more cases and a case can involve 1 or more persons. To represent this, I have the actor_case_assignment table, which is essentially three fields- a primary key, an actor_id (person), and a case_id. I have four indexes: One for each id, and then a *unique* index on (actor_id, case_id) because although either id on it's own may be duplicated many times in the assignment table, no person should ever be assigned to a case more than once, The problem is that I have a view using this table that runs way too slow. It looks like this: create view criminal_detail as select case_data.case_id, case_data.court_id, actor_case_assignment.actor_id, charge.charge_count, charge.charge_number, charge.charge_type, charge.charge_version, charge.charge_class, charge.charge_text, charge.charge_qualifier, charge.charge_code, charge.charge_desc, charge.plea_code, charge.plea_desc, charge.charge_date, charge.statute_reference, charge.charge_status_desc, disposition_number, cr_disposition_date, cr_disposition_desc, cr_disposition_code, cr_disposition_type, cr_disposition_mode from (charge left outer join criminal_disposition on (charge.charge_id = criminal_disposition.charge_id) ), actor_case_assignment, case_data where actor_case_assignment.actor_case_assignment_id = charge.actor_case_assignment_id and actor_case_assignment.case_id = case_data.case_id ; The explain looks like this: develop=# explain SELECT * FROM CRIMINAL_DETAIL WHERE case_id = '102SC01353'; NOTICE: QUERY PLAN: Hash Join (cost=155.06..9263.84 rows=1 width=320) -> Hash Join (cost=120.53..6034.05 rows=79880 width=260) -> Seq Scan on charge (cost=0.00..2664.80 rows=79880 width=184) -> Hash (cost=109.82..109.82 rows=4282 width=76) -> Seq Scan on criminal_disposition (cost=0.00..109.82 rows=4282 width=76) -> Hash (cost=34.53..34.53 rows=4 width=60) -> Nested Loop (cost=0.00..34.53 rows=4 width=60) -> Index Scan using case_data_case_id on case_data (cost=0.00..4.01 rows=1 width=24) -> Index Scan using actor_case_assignment_both on actor_case_assignment (cost=0.00..30.42 rows=7 width=36) The part that looks wrong to me is that on the last line of the explain, it is using the index "actor_case_assignment_both" (the index on actor_id,case_id) even though actor_id is never referenced in this query and and index on case_id alone is available. Looking from the outside in, this query should be very fast- an index scan on case_id reduces the number of matches in actor_case_assignment to about 5, and then then joins these 5 rows to other tables based on indexed primary (unique) keys. The question is, how can I re-arrange my query to help the planner come to the same conclusion, and most puzzling, why does the planner choose an index that involves actor_id? Many thanks to those of you who read through all of this! Any suggestions? -Nick -------------------------------------------------------------------------- Nick Fankhauser nickf@ontko.com Phone 1.765.935.4283 Fax 1.765.962.9788 Ray Ontko & Co. Software Consulting Services http://www.ontko.com/
pgsql-admin by date: