Thread: Query Plan - Index Scan & Seq Scan
When joining two tables the query plan is doing a seq scan rather than index scan. I do have indexes on the columns used for joining the tables. Example: SELECT a.id FROM a, b WHERE a.id = b.id; QUERY PLAN ----------------------------------------------------------------------------------------- Hash Join (cost=13865.30..326413.23 rows=6451 width=18) Hash Cond: ("outer".id = "inner".id) -> Seq Scan on a (cost=0.00..125076.37 rows=6450937 width=18) -> Hash (cost=10168.64..10168.64 rows=500664 width=4) -> Seq Scan on b (cost=0.00..10168.64 rows=500664 width=4) (5 rows) The planner used to perform a index scan. I have added a lot of data in those two tables. Right now both tables have millions of records. After adding the new records the planner is going for a seq scan while doing the join. Is there any tunning I can do so that the query planner would do a index scan? I did a vacuum analyze but no change. Thanks, -Prasanth.
Prasanth <dbadmin@nqadmin.com> writes: > Hash Join (cost=13865.30..326413.23 rows=6451 width=18) > Hash Cond: ("outer".id = "inner".id) > -> Seq Scan on a (cost=0.00..125076.37 rows=6450937 width=18) > -> Hash (cost=10168.64..10168.64 rows=500664 width=4) > -> Seq Scan on b (cost=0.00..10168.64 rows=500664 width=4) > (5 rows) > The planner used to perform a index scan. I have added a lot of data in those > two tables. Right now both tables have millions of records. It is highly unlikely that you want an index scan for joining millions of records ... However, if you do have millions in both tables, why does the planner think there are only 500664 rows in b? Maybe you are overdue for ANALYZE. regards, tom lane
On Thu, 2005-05-12 at 10:05, Prasanth wrote: > When joining two tables the query plan is doing a seq scan rather than index > scan. I do have indexes on the columns used for joining the tables. > > Example: > SELECT a.id FROM a, b WHERE a.id = b.id; > > QUERY PLAN > ----------------------------------------------------------------------------------------- > Hash Join (cost=13865.30..326413.23 rows=6451 width=18) > Hash Cond: ("outer".id = "inner".id) > -> Seq Scan on a (cost=0.00..125076.37 rows=6450937 width=18) > -> Hash (cost=10168.64..10168.64 rows=500664 width=4) > -> Seq Scan on b (cost=0.00..10168.64 rows=500664 width=4) > (5 rows) > > > The planner used to perform a index scan. I have added a lot of data in those > two tables. Right now both tables have millions of records. After adding the new > records the planner is going for a seq scan while doing the join. > > Is there any tunning I can do so that the query planner would do a index scan? > > I did a vacuum analyze but no change. try this: explain analyze SELECT a.id FROM a, b WHERE a.id = b.id; set enable_seqscan=off; explain analyze SELECT a.id FROM a, b WHERE a.id = b.id; and see which is faster. It's quite likely that using an index here makes no sense, since there's no selectivity happening, and you need all the data anyway.
Thanks for the prompt reply. Table a has about 6 million and table b had a little more than half a million. Sorry I wasn't exact about my numbers before. I will be having the where conditions on both the tables that would bring down the count drastically. Even in this case the planner is going for a seq scan. A where condition I always use is shown below. This is bringing down the number of rows from 6.5m to 1210. I have an index on code also. Even here it is going for seq scan. EXPLAIN ANALYZE SELECT count(*) fROM a where Code >2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=141321.09..141321.09 rows=1 width=0) (actual time=6454.063..6454.064 rows=1 loops=1) -> Seq Scan on a (cost=0.00..141318.40 rows=1076 width=0) (actual time=15.687..6453.292 rows=1210 loops=1) Filter: (code > 2) Total runtime: 6454.140 ms (4 rows) Below is example where I have where conditions on both the tables this in effect is limiting the number of rows from each tables a & b to 171 & 1076 respectively. EXPLAIN (SELECT fund_value FROM b INNER JOIN a ON a.id = b.id WHERE code >2 AND b.account_id = 16221); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=141372.58..141462.28 rows=1 width=8) Merge Cond: ("outer".id = "inner".id) -> Index Scan using id_idx on b (cost=0.00..14415.96 rows=171 width=4) Filter: (account_id = 16221) -> Sort (cost=141372.58..141375.27 rows=1076 width=12) Sort Key: a.id -> Seq Scan on a (cost=0.00..141318.40 rows=1076 width=12) Filter: (code > 2) (8 rows) Thanks, -Prasanth. Tom Lane wrote: > Prasanth <dbadmin@nqadmin.com> writes: > >> Hash Join (cost=13865.30..326413.23 rows=6451 width=18) >> Hash Cond: ("outer".id = "inner".id) >> -> Seq Scan on a (cost=0.00..125076.37 rows=6450937 width=18) >> -> Hash (cost=10168.64..10168.64 rows=500664 width=4) >> -> Seq Scan on b (cost=0.00..10168.64 rows=500664 width=4) >>(5 rows) > > >>The planner used to perform a index scan. I have added a lot of data in those >>two tables. Right now both tables have millions of records. > > > It is highly unlikely that you want an index scan for joining millions > of records ... > > However, if you do have millions in both tables, why does the planner > think there are only 500664 rows in b? Maybe you are overdue for ANALYZE. > > regards, tom lane > >
I agree with you. But I have the where conditions on the tables I was expecting the planner to user index scan but it went for seq scan. I did a little testing using what you said. Below are the results. SELECT a.id FROM a,b WHERE a.id = b.id AND a.code >2 AND b.account_id = 16221; Total runtime: 18194.936 ms Then I set the seqscan off and ran the same query. Total runtime: 27.554 ms Thanks, -Prasanth. Scott Marlowe wrote: > On Thu, 2005-05-12 at 10:05, Prasanth wrote: > >>When joining two tables the query plan is doing a seq scan rather than index >>scan. I do have indexes on the columns used for joining the tables. >> >>Example: >>SELECT a.id FROM a, b WHERE a.id = b.id; >> >> QUERY PLAN >>----------------------------------------------------------------------------------------- >> Hash Join (cost=13865.30..326413.23 rows=6451 width=18) >> Hash Cond: ("outer".id = "inner".id) >> -> Seq Scan on a (cost=0.00..125076.37 rows=6450937 width=18) >> -> Hash (cost=10168.64..10168.64 rows=500664 width=4) >> -> Seq Scan on b (cost=0.00..10168.64 rows=500664 width=4) >>(5 rows) >> >> >>The planner used to perform a index scan. I have added a lot of data in those >>two tables. Right now both tables have millions of records. After adding the new >>records the planner is going for a seq scan while doing the join. >> >>Is there any tunning I can do so that the query planner would do a index scan? >> >>I did a vacuum analyze but no change. > > > try this: > > explain analyze SELECT a.id FROM a, b WHERE a.id = b.id; > set enable_seqscan=off; > explain analyze SELECT a.id FROM a, b WHERE a.id = b.id; > > and see which is faster. > > It's quite likely that using an index here makes no sense, since there's > no selectivity happening, and you need all the data anyway. > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > >
On Thu, 2005-05-12 at 10:51, Prasanth wrote: > I agree with you. > > But I have the where conditions on the tables I was expecting the planner to > user index scan but it went for seq scan. > > I did a little testing using what you said. > > Below are the results. > > SELECT a.id FROM a,b WHERE a.id = b.id AND a.code >2 AND b.account_id = 16221; > > Total runtime: 18194.936 ms > > Then I set the seqscan off and ran the same query. > > Total runtime: 27.554 ms Good! This tells us two things, 1: Your database can use the indexes (sometimes indexes can't be used for various reasons, which are quickly disappearing by the way.) and 2: Your database is making the wrong choice about when to use a seq scan versus an index. What does the explain analyze output from that query say about row estimates versus actual rows returned?
Prasanth <dbadmin@nqadmin.com> writes: > A where condition I always use is shown below. This is bringing down the number > of rows from 6.5m to 1210. I have an index on code also. Even here it is going > for seq scan. > EXPLAIN ANALYZE SELECT count(*) fROM a where Code >2; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=141321.09..141321.09 rows=1 width=0) (actual > time=6454.063..6454.064 rows=1 loops=1) > -> Seq Scan on a (cost=0.00..141318.40 rows=1076 width=0) (actual > time=15.687..6453.292 rows=1210 loops=1) > Filter: (code > 2) > Total runtime: 6454.140 ms > (4 rows) I'm going to hazard a guess that "code" is not of type integer, and that you're using a pre-8.0 PG release. Cross-type comparisons are not indexable before 8.0, so you need to cast the integer constant 2 to whatever type "code" is. regards, tom lane
EXPLAIN ANALYZE (SELECT id FROM a,b WHERE a.id = b.id AND code >2 AND b.account_id = 16221); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=141372.58..141462.28 rows=1 width=8) (actual time=726.172..726.172 rows=0 loops=1) Merge Cond: ("outer".id = "inner".id) -> Index Scan using id_idx on b (cost=0.00..14415.96 rows=171 width=4) (actual time=726.168..726.168 rows=0 loops=1) Filter: (account_id = 16221) -> Sort (cost=141372.58..141375.27 rows=1076 width=12) (never executed) Sort Key: a.id -> Seq Scan on a (cost=0.00..141318.40 rows=1076 width=12) (never executed) Filter: (code > 2) Total runtime: 726.253 ms (9 rows) Thanks, -Prasanth. Scott Marlowe wrote: > On Thu, 2005-05-12 at 10:51, Prasanth wrote: > >>I agree with you. >> >>But I have the where conditions on the tables I was expecting the planner to >>user index scan but it went for seq scan. >> >>I did a little testing using what you said. >> >>Below are the results. >> >>SELECT a.id FROM a,b WHERE a.id = b.id AND a.code >2 AND b.account_id = 16221; >> >>Total runtime: 18194.936 ms >> >>Then I set the seqscan off and ran the same query. >> >>Total runtime: 27.554 ms > > > Good! This tells us two things, 1: Your database can use the indexes > (sometimes indexes can't be used for various reasons, which are quickly > disappearing by the way.) and 2: Your database is making the wrong > choice about when to use a seq scan versus an index. > > What does the explain analyze output from that query say about row > estimates versus actual rows returned? > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > >
code data type is int2. I am sorry that I did not mention the version number I am using 7.4.7. Thanks, -Prasanth. Tom Lane wrote: > Prasanth <dbadmin@nqadmin.com> writes: > >>A where condition I always use is shown below. This is bringing down the number >>of rows from 6.5m to 1210. I have an index on code also. Even here it is going >>for seq scan. > > >>EXPLAIN ANALYZE SELECT count(*) fROM a where Code >2; >> QUERY PLAN >>------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=141321.09..141321.09 rows=1 width=0) (actual >>time=6454.063..6454.064 rows=1 loops=1) >> -> Seq Scan on a (cost=0.00..141318.40 rows=1076 width=0) (actual >>time=15.687..6453.292 rows=1210 loops=1) >> Filter: (code > 2) >> Total runtime: 6454.140 ms >>(4 rows) > > > I'm going to hazard a guess that "code" is not of type integer, and that > you're using a pre-8.0 PG release. Cross-type comparisons are not > indexable before 8.0, so you need to cast the integer constant 2 to > whatever type "code" is. > > regards, tom lane > >
Prasanth <dbadmin@nqadmin.com> writes: > code data type is int2. Ah-hah. So "where code > 2::int2" should work noticeably better for you. Or you could do "where code > '2'" to avoid hard-wiring the data type knowledge into your queries. Or just change it to int4 ;-) Or update to 8.0. regards, tom lane
Sorry seems like I am missing some thing here. What is the difference between int2 & int4 as far as index scan is concerned? I did try to update to 8.0 but it is not taking my dump from 7.4.7. If I remember right it was complaining about pg_database being not present. I posted a message on forums no response so I had to abort it. I will give it a short again when I have some time. I think pg_database belongs to pgadmin. I can drop those tables and create a dump but then if for some reason I have to restore a old dump them I will be in trouble. Thanks, -Prasanth. Tom Lane wrote: > Prasanth <dbadmin@nqadmin.com> writes: > >>code data type is int2. > > > Ah-hah. So "where code > 2::int2" should work noticeably better for > you. Or you could do "where code > '2'" to avoid hard-wiring the data > type knowledge into your queries. Or just change it to int4 ;-) > Or update to 8.0. > > regards, tom lane > >
Thanks Tom. Seems like having int2 after code is doing the trick. Can you please explain the reasons behind this. I Really appreciate your time. Thanks, -Prasanth. Tom Lane wrote: > Prasanth <dbadmin@nqadmin.com> writes: > >>code data type is int2. > > > Ah-hah. So "where code > 2::int2" should work noticeably better for > you. Or you could do "where code > '2'" to avoid hard-wiring the data > type knowledge into your queries. Or just change it to int4 ;-) > Or update to 8.0. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > >
Ok read about the indexes and type matches. So is this fixed in 8.0? Thanks, -Prasanth. Tom Lane wrote: > Prasanth <dbadmin@nqadmin.com> writes: > >>code data type is int2. > > > Ah-hah. So "where code > 2::int2" should work noticeably better for > you. Or you could do "where code > '2'" to avoid hard-wiring the data > type knowledge into your queries. Or just change it to int4 ;-) > Or update to 8.0. > > regards, tom lane > >
Prasanth <dbadmin@nqadmin.com> writes: > Seems like having int2 after code is doing the trick. > Can you please explain the reasons behind this. When you write "int2col > 2", the operator that is selected is int2-gt-int4. However the index on an int2 column can only deal with int2-gt-int2. 8.0 generalized the index mechanism enough to allow int2 indexes to deal with int2-gt-int4, but in earlier releases you have to take care to make the constant an int2. int8 columns have the same issue from the other direction ... regards, tom lane