More question about plans & explain (long) - Mailing list pgsql-admin
From | Nick Fankhauser |
---|---|
Subject | More question about plans & explain (long) |
Date | |
Msg-id | NEBBLAAHGLEEPCGOBHDGMEOFELAA.nickf@ontko.com Whole thread Raw |
Responses |
Re: More question about plans & explain (long)
|
List | pgsql-admin |
Thanks to everyone who responded to my question about outer joins. Now I have some more <grin>. I'm trying to learn what explain can tell me & whether there are any reasonable ways to push it's choices in one direction based on my knowledge of the tables without really screwing up general performance. I haven't found many resources on the explain info in the documentation, so I'm making a few guesses about what is meant. Corrections are, of course, welcome! I'm trying to get better performance out of this query: explain select event.event_date_time, event.event_duration, event.event_ical_status, event.event_location_name, event.event_type_code, event.event_hearing_type_desc, event.event_summary, case_data.case_public_id || ' ' ||case_data.case_title, court_config.court_name, event.event_id, case_data.case_id, court_config.court_id, actor.actor_id, actor_identifier.actor_identifier_text, actor_identifier.actor_identifier_type from actor_identifier, actor, actor_case_assignment, case_data, event, court_config where actor_identifier.actor_id = actor.actor_id and actor.actor_id = actor_case_assignment.actor_id and actor_case_assignment.case_id = case_data.case_id and case_data.case_id = event.case_id and case_data.court_id = court_config.court_id and actor_identifier_text = '7313 53' and actor_identifier_type = 'AttorneyStateBarID' and event_date_time > '4/1/2002' and event_date_time < '6/1/2002'; All of the columns used for a join or constraint have an index. A vaccum analyze was done just before this explain. The explain looks like this (I've numbered the lines so I can refer to them): 1)Merge Join (cost=1399914.02..1460681.09 rows=154522243 width=248) 2) -> Sort (cost=1374154.11..1374154.11 rows=3605411 width=72) 3) -> Merge Join (cost=138124.99..144473.64 rows=3605411 width=72) 4) -> Sort (cost=93198.01..93198.01 rows=118189 width=60) 5) -> Hash Join (cost=3285.17..75120.12 rows=118189 width=60) 6) -> Seq Scan on actor_case_assignment (cost=0.00..18880.77 rows=814677 width=24) 7) -> Hash (cost=3285.13..3285.13 rows=15 width=36) 8) -> Index Scan using actor_identifier_actor_id_text on actor_identifier (cost=0.00..3285.13 rows=15 width=36) 9) -> Sort (cost=44926.98..44926.98 rows=305054 width=12) 10) -> Seq Scan on actor (cost=0.00..7728.54 rows=305054 width=12) 11) -> Materialize (cost=26017.06..26017.06 rows=428584 width=176) 12) -> Merge Join (cost=25759.91..26017.06 rows=428584 width=176) 13) -> Sort (cost=18978.66..18978.66 rows=18219 width=72) 14) -> Hash Join (cost=1.12..17251.58 rows=18219 width=72) 15) -> Seq Scan on case_data (cost=0.00..5407.91 rows=182191 width=48) 16) -> Hash (cost=1.10..1.10 rows=10 width=24) 17) -> Seq Scan on court_config (cost=0.00..1.10 rows=10 width=24) 18) -> Sort (cost=6781.25..6781.25 rows=2352 width=104) 19) -> Index Scan using event_event_date_time on event (cost=0.00..6649.52 rows=2352 width=104) Here are my questions- (14-17)It looks like on lines 14-17, case_data and court_config are being joined without any constraint, thus returning 182191 rows, but the join & sort on 13 & 14 indicate 18219 rows are expected - why does the planner expect 1/10th of what I expect? Is there a way to affect this expectation? 10)This table has a unique index on the field being scanned, and this field is also the only row returned from the table- why is a seq scan chosen here? 15)This table has a unique index on the field being scanned, and only 3 rows are returned to the query- why the seq scan? (18-19)Is the row count of 2352 from lines 18 and 19 based on a reasonable guess that out of the 235239 rows in the table, we'll select one out of 100? Again, where does this number come from, and can I affect it? 12)Line 12 seems to indicate that the planner expects this join to produce more rows rather than fewer as in the case of line 14 and line 18. In fact, the real situation is that the join on line 14 will produce exactly 182191 rows, and since the results of line 19 will produce 8537 rows, 12 will also produce about 8537 rows. So... how does the planner decide whether a join will increase or decrease the number of rows? 8)The planner seems to expect 15 rows when the table actually has 66559. This is so much different from the 1:100 ratio in 19 that I wonder how it was arrived at? (It *is* a very good guess, since the actual number retturn will be 1.) 6)Indicates a seq scan occuring on a field that is indexed. The table has 800K rows, but it is only 4 columns wide. Is the narrow width making this choice a good one? 5)I would expect this join to typically result in 1 to 1000 cases, so the estimate seems very high in this case. Some other questions- What is the difference between a hash join & a merge join? Does materialize fetch the other fields given a list of index values? Does a seq scan read the entire table, or can it just retrieve the field? I guess that's enough questions for now. If you've made it this far, thanks for reading all of this! I look forward to hearing your thoughts. -Nick -------------------------------------------------------------------------- Nick Fankhauser nickf@ontko.com Phone 1.765.935.4283 Fax 1.765.962.9788 Ray Ontko & Co. Software Consulting Services http://www.ontko.com/
pgsql-admin by date: