Thread: Index scan never executed?
I have never been very good at reading these query plans, but I have a bit of a problem w/ my query. So any help is appreciated. The query is fairly self explanitory i think. 2 tables, call_results ( 6.5 Million records ) and lists ( 11 Million records ) weblink=# explain analyze weblink-# select count(*) as count weblink-# from call_results cr join lists l on cr.list_id = l.id weblink-# where cr.project_id = '55' weblink-# and cr.start_time between '4/4/2003 0:0' and now() weblink-# and l.list_of_lists_id = '691'; QUERY PLAN ---------------------------------------------------------------------------- ---- -------------------------------------------------------------------------Aggregate (cost=2519.58..2519.58 rows=1 width=16)(actual time=110715.45..110715.46 rows=1 loops=1) -> Nested Loop (cost=0.00..2519.58 rows=1 width=16) (actual time=110715.43..110715.43 rows=0 loops=1) -> Index Scan using start_time_idx on call_results cr (cost=0.00..2021.00 rows=164 width=8) (actual time=110715.42..110715.42 rows=0 loops=1) Index Cond: ((start_time >= '2003-04-04 00:00:00-07'::timestamp with time zone) AND (start_time <= now())) Filter: (project_id = 55::bigint) -> Index Scan using lists_pkey on lists l (cost=0.00..3.03 rows=1 width=8) (never executed) Index Cond: ("outer".list_id = l.id) Filter: (list_of_lists_id = 691::bigint)Totalruntime: 110747.58 msec (9 rows) The big thing I dont understand is why it tells me (never executed) on lists_pkey. I also dont see where all the time is being taken up. I thought that (actual time=110715.42..110715.42) meant from millisecond this... TO millisecond that, but that would mean that this index scan took no time. So as you can see I am very confused. :-) TIA for any suggestions on how to make this query faster. Chad PS I have run vacuum full and analyze as reciently as last night :-)
On Thu, 29 May 2003, Chad Thompson wrote: > I have never been very good at reading these query plans, but I have a bit > of a problem w/ my query. So any help is appreciated. > > The query is fairly self explanitory i think. 2 tables, call_results ( 6.5 > Million records ) and lists ( 11 Million records ) > > weblink=# explain analyze > weblink-# select count(*) as count > weblink-# from call_results cr join lists l on cr.list_id = l.id > weblink-# where cr.project_id = '55' > weblink-# and cr.start_time between '4/4/2003 0:0' and now() > weblink-# and l.list_of_lists_id = '691'; > > > QUERY PLAN > ---------------------------------------------------------------------------- > ---- > ------------------------------------------------------------------------- > Aggregate (cost=2519.58..2519.58 rows=1 width=16) (actual > time=110715.45..110715.46 rows=1 loops=1) > -> Nested Loop (cost=0.00..2519.58 rows=1 width=16) (actual > time=110715.43..110715.43 rows=0 loops=1) > -> Index Scan using start_time_idx on call_results cr > (cost=0.00..2021.00 rows=164 width=8) (actual time=110715.42..110715.42 > rows=0 loops=1) > Index Cond: ((start_time >= '2003-04-04 > 00:00:00-07'::timestamp with time zone) AND (start_time <= now())) > Filter: (project_id = 55::bigint) > -> Index Scan using lists_pkey on lists l (cost=0.00..3.03 rows=1 > width=8) (never executed) > Index Cond: ("outer".list_id = l.id) > Filter: (list_of_lists_id = 691::bigint) > Total runtime: 110747.58 msec > (9 rows) > > The big thing I dont understand is why it tells me (never executed) on > lists_pkey. It's doing a nested loop. AFAIK, it's saying for each row returned from the index scan on call_results it does a scan on lists. Since no rows were returned from call_results it didn't need to ever scan lists. I think you want an index on (start_time, project_id). It looks like there are lots of rows matching start_time conditions but none of those have project_id 55, so you'd be doing alot of extra disk access.
"Chad Thompson" <chad@weblinkservices.com> writes: > Aggregate (cost=2519.58..2519.58 rows=1 width=16) (actual > time=110715.45..110715.46 rows=1 loops=1) > -> Nested Loop (cost=0.00..2519.58 rows=1 width=16) (actual > time=110715.43..110715.43 rows=0 loops=1) > -> Index Scan using start_time_idx on call_results cr > (cost=0.00..2021.00 rows=164 width=8) (actual time=110715.42..110715.42 > rows=0 loops=1) > Index Cond: ((start_time >= '2003-04-04 > 00:00:00-07'::timestamp with time zone) AND (start_time <= now())) > Filter: (project_id = 55::bigint) > -> Index Scan using lists_pkey on lists l (cost=0.00..3.03 rows=1 > width=8) (never executed) > Index Cond: ("outer".list_id = l.id) > Filter: (list_of_lists_id = 691::bigint) > Total runtime: 110747.58 msec > (9 rows) > The big thing I dont understand is why it tells me (never executed) on > lists_pkey. Because it was never executed --- the outer scan on call_results produced zero rows, so there was never a reason to run the inner scan. A nestloop join scans the inner table once for each row output by the outer table. > I also dont see where all the time is being taken up. I thought that > (actual time=110715.42..110715.42) meant from millisecond this... TO > millisecond that, but that would mean that this index scan took no time. No, the first "actual time" number means the time required to produce the first output row. So the call_results scan is what is taking all the time (as you'd expect, given that there's nothing for any of the other steps to do). I guess it's a little unclear what to print for the first number when no rows are output at all. The code evidently is using the total time spent in the plan node, but I think it would be at least as justifiable to print a zero instead. Would you have found that less confusing? Anyone else have an opinion about whether to change that detail? > TIA for any suggestions on how to make this query faster. Doesn't seem like the index on start_time is doing you much good --- you're obviously scanning a huge lot of rows and not finding anything. Perhaps an index on project_id would be more helpful, or a two-column index on (project_id, start_time). regards, tom lane
> > I guess it's a little unclear what to print for the first number when no > rows are output at all. The code evidently is using the total time spent > in the plan node, but I think it would be at least as justifiable to > print a zero instead. Would you have found that less confusing? Anyone > else have an opinion about whether to change that detail? > No, that makes perfect sense now that it has been explained. > Perhaps an index on project_id would be more helpful, or a two-column > index on (project_id, start_time). As usual, you are in fine form. A two-column index brought the query from 115 sec to 6. Thanks! Chad
Hi, This may make the query faster. try to split the query into two parts: first fetch all list-id's into a temp table (where cr.project_id = '55' and cr.start_time between '4/4/2003 0:0' and now()) then, join the temp table on lists (where l.list_of_lists_id = '691'). Morten -----Opprinnelig melding----- Fra: Chad Thompson [mailto:chad@weblinkservices.com] Sendt: 29. mai 2003 19:20 Til: pgsql-sql Emne: [SQL] Index scan never executed? I have never been very good at reading these query plans, but I have a bit of a problem w/ my query. So any help is appreciated. The query is fairly self explanitory i think. 2 tables, call_results ( 6.5 Million records ) and lists ( 11 Million records ) weblink=# explain analyze weblink-# select count(*) as count weblink-# from call_results cr join lists l on cr.list_id = l.id weblink-# where cr.project_id = '55' weblink-# and cr.start_time between '4/4/2003 0:0' and now() weblink-# and l.list_of_lists_id = '691'; QUERY PLAN ---------------------------------------------------------------------------- ---- -------------------------------------------------------------------------Aggregate (cost=2519.58..2519.58 rows=1 width=16)(actual time=110715.45..110715.46 rows=1 loops=1) -> Nested Loop (cost=0.00..2519.58 rows=1 width=16) (actual time=110715.43..110715.43 rows=0 loops=1) -> Index Scan using start_time_idx on call_results cr (cost=0.00..2021.00 rows=164 width=8) (actual time=110715.42..110715.42 rows=0 loops=1) Index Cond: ((start_time >= '2003-04-04 00:00:00-07'::timestamp with time zone) AND (start_time <= now())) Filter: (project_id = 55::bigint) -> Index Scan using lists_pkey on lists l (cost=0.00..3.03 rows=1 width=8) (never executed) Index Cond: ("outer".list_id = l.id) Filter: (list_of_lists_id = 691::bigint)Totalruntime: 110747.58 msec (9 rows) The big thing I dont understand is why it tells me (never executed) on lists_pkey. I also dont see where all the time is being taken up. I thought that (actual time=110715.42..110715.42) meant from millisecond this... TO millisecond that, but that would mean that this index scan took no time. So as you can see I am very confused. :-) TIA for any suggestions on how to make this query faster. Chad PS I have run vacuum full and analyze as reciently as last night :-) ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly