Thread: Index not used in join.. (example included)
Hi, I don't understand why my query doesn't use the index on the name column: CREATE TABLE celeb ( id serial, name varchar(255) NOT NULL, PRIMARY KEY (id) ); create index celeb_name_index on celeb (name); INSERT INTO celeb (name) values ('Cameron Diaz'); INSERT INTO celeb (name) values ('Angelina Jolie'); INSERT INTO celeb (name) values ('Reese Witherspoon'); CREATE TABLE detail ( id serial, name varchar(255) NOT NULL, award date, PRIMARY KEY (id) ); create index detail_name_index on detail (name); INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-01-01'); INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-02-01'); INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-03-01'); INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-01-01'); INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-02-01'); INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-03-01'); INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-04-01'); INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-01-01'); INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-02-01'); INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-03-01'); INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-04-01'); INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-05-01'); stium=# explain select * from celeb left join detail on detail.name = celeb.name; QUERY PLAN ----------------------------------------------------------------------- Hash Left Join (cost=11.75..25.30 rows=140 width=1044) Hash Cond: (("outer".name)::text = ("inner".name)::text) -> Seq Scan on celeb (cost=0.00..11.40 rows=140 width=520) -> Hash (cost=11.40..11.40 rows=140 width=524) -> Seq Scan on detail (cost=0.00..11.40 rows=140 width=524) (5 rows) Now, why does it use sequential scans?? I don't get it. It also does this on a very large table which is ANALYZE'd. Thanks, Ron -- NeoNova BV innovatieve internetoplossingen http://www.neonova.nl Science Park 140 1098 XG Amsterdam info: 020-5611300 servicedesk: 020-5611302 fax: 020-5611301 KvK Amsterdam 34151241 Op dit bericht is de volgende disclaimer van toepassing: http://www.neonova.nl/maildisclaimer
In response to Ron Arts : > Hi, > > I don't understand why my query doesn't use the index on the name column: > > CREATE TABLE celeb ( > id serial, > name varchar(255) NOT NULL, > PRIMARY KEY (id) > ); > > create index celeb_name_index on celeb (name); > INSERT INTO celeb (name) values ('Cameron Diaz'); > INSERT INTO celeb (name) values ('Angelina Jolie'); > INSERT INTO celeb (name) values ('Reese Witherspoon'); > > CREATE TABLE detail ( > id serial, > name varchar(255) NOT NULL, > award date, > PRIMARY KEY (id) > ); > > create index detail_name_index on detail (name); > INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-01-01'); > INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-02-01'); > INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-03-01'); > INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-01-01'); > INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-02-01'); > INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-03-01'); > INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-04-01'); > INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-01-01'); > INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-02-01'); > INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-03-01'); > INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-04-01'); > INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-05-01'); > > > stium=# explain select * from celeb left join detail on detail.name = > celeb.name; > QUERY PLAN > ----------------------------------------------------------------------- > Hash Left Join (cost=11.75..25.30 rows=140 width=1044) > Hash Cond: (("outer".name)::text = ("inner".name)::text) > -> Seq Scan on celeb (cost=0.00..11.40 rows=140 width=520) > -> Hash (cost=11.40..11.40 rows=140 width=524) > -> Seq Scan on detail (cost=0.00..11.40 rows=140 width=524) > (5 rows) > > > Now, why does it use sequential scans?? I don't get it. It also does this > on a very large table which is ANALYZE'd. First, bad design. Both tables contains the text-column name, waste of space. You should better use the PK from celeb as foreign key in detail. Furthermore, all distinct values for name in detail are in the result set, so a seq-scan is the best solution. You can try to force an index-scan by disabling the seq-scan (set enable_seqscan='no';) and re-run the query. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
A. Kretschmer schreef: > In response to Ron Arts : >> Hi, >> >> I don't understand why my query doesn't use the index on the name column: >> >> CREATE TABLE celeb ( >> id serial, >> name varchar(255) NOT NULL, >> PRIMARY KEY (id) >> ); >> >> create index celeb_name_index on celeb (name); >> INSERT INTO celeb (name) values ('Cameron Diaz'); >> INSERT INTO celeb (name) values ('Angelina Jolie'); >> INSERT INTO celeb (name) values ('Reese Witherspoon'); >> >> CREATE TABLE detail ( >> id serial, >> name varchar(255) NOT NULL, >> award date, >> PRIMARY KEY (id) >> ); >> >> create index detail_name_index on detail (name); >> INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-01-01'); >> INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-02-01'); >> INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-03-01'); >> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-01-01'); >> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-02-01'); >> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-03-01'); >> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-04-01'); >> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-01-01'); >> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-02-01'); >> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-03-01'); >> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-04-01'); >> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-05-01'); >> >> >> stium=# explain select * from celeb left join detail on detail.name = >> celeb.name; >> QUERY PLAN >> ----------------------------------------------------------------------- >> Hash Left Join (cost=11.75..25.30 rows=140 width=1044) >> Hash Cond: (("outer".name)::text = ("inner".name)::text) >> -> Seq Scan on celeb (cost=0.00..11.40 rows=140 width=520) >> -> Hash (cost=11.40..11.40 rows=140 width=524) >> -> Seq Scan on detail (cost=0.00..11.40 rows=140 width=524) >> (5 rows) >> >> >> Now, why does it use sequential scans?? I don't get it. It also does this >> on a very large table which is ANALYZE'd. > > First, bad design. Both tables contains the text-column name, waste of > space. You should better use the PK from celeb as foreign key in detail. > Yes, bad design, although in the real tables there are multiple records for each celeb. Still bad design, I know. > Furthermore, all distinct values for name in detail are in the result > set, so a seq-scan is the best solution. > If I add a column to celeb, say 'birthdate', and use that in the where clause it still uses seq scan, even though not all distinct values are in the result set, aren't they? stium=# explain select * from celeb left join detail on detail.name = celeb.name where celeb.birthdate > '1973-01-01'; QUERY PLAN ----------------------------------------------------------------------- Merge Left Join (cost=29.45..30.85 rows=47 width=1048) Merge Cond: ("outer"."?column4?" = "inner"."?column4?") -> Sort (cost=13.06..13.17 rows=47 width=524) Sort Key: (celeb.name)::text -> Seq Scan on celeb (cost=0.00..11.75 rows=47 width=524) Filter: (birthdate > '1973-01-01'::date) -> Sort (cost=16.39..16.74 rows=140 width=524) Sort Key: (detail.name)::text -> Seq Scan on detail (cost=0.00..11.40 rows=140 width=524) (9 rows) I'd expect it would take all records above the given date, join the records from detail using the index on that. Am I wrong? > You can try to force an index-scan by disabling the seq-scan (set > enable_seqscan='no';) and re-run the query. That does force it to use the index, but doesn't speed up my query. > > Andreas Thanks for responding. Ron -- NeoNova BV innovatieve internetoplossingen http://www.neonova.nl Science Park 140 1098 XG Amsterdam info: 020-5611300 servicedesk: 020-5611302 fax: 020-5611301 KvK Amsterdam 34151241 Op dit bericht is de volgende disclaimer van toepassing: http://www.neonova.nl/maildisclaimer
2009/9/18 Ron Arts <ron.arts@neonova.nl>: > A. Kretschmer schreef: [...] >> You can try to force an index-scan by disabling the seq-scan (set >> enable_seqscan='no';) and re-run the query. > > That does force it to use the index, but doesn't speed up my query. Would it help to add an index on the name column of the detail table? And maybe to make it a foreign key referencing the first table? -- Michael Wood <esiotrot@gmail.com>
2009/9/18 Michael Wood <esiotrot@gmail.com>: > 2009/9/18 Ron Arts <ron.arts@neonova.nl>: >> A. Kretschmer schreef: > [...] >>> You can try to force an index-scan by disabling the seq-scan (set >>> enable_seqscan='no';) and re-run the query. >> >> That does force it to use the index, but doesn't speed up my query. > > Would it help to add an index on the name column of the detail table? > And maybe to make it a foreign key referencing the first table? Sorry, I just had another look and see you do have an index on the name column in detail. I wonder if it would make a difference if the index on the first table was a unique index. -- Michael Wood <esiotrot@gmail.com>
Michael Wood schreef: > 2009/9/18 Michael Wood <esiotrot@gmail.com>: >> 2009/9/18 Ron Arts <ron.arts@neonova.nl>: >>> A. Kretschmer schreef: >> [...] >>>> You can try to force an index-scan by disabling the seq-scan (set >>>> enable_seqscan='no';) and re-run the query. >>> That does force it to use the index, but doesn't speed up my query. >> Would it help to add an index on the name column of the detail table? >> And maybe to make it a foreign key referencing the first table? > > Sorry, I just had another look and see you do have an index on the > name column in detail. > > I wonder if it would make a difference if the index on the first table > was a unique index. > > -- > Michael Wood <esiotrot@gmail.com> > Well, the name columns are not unique in both tables. So I cannot make it an unique index. Ron -- NeoNova BV innovatieve internetoplossingen http://www.neonova.nl Science Park 140 1098 XG Amsterdam info: 020-5611300 servicedesk: 020-5611302 fax: 020-5611301 KvK Amsterdam 34151241 Op dit bericht is de volgende disclaimer van toepassing: http://www.neonova.nl/maildisclaimer
Ron Arts <ron.arts@neonova.nl> writes: > A. Kretschmer schreef: >> You can try to force an index-scan by disabling the seq-scan (set >> enable_seqscan='no';) and re-run the query. > That does force it to use the index, but doesn't speed up my query. IOW, the planner chose the right plan. Don't complain... I think the real answer here is that the planner thinks the table is too small to bother with an index. If you load up a non-toy amount of data you might get a different plan. regards, tom lane
Sorry. Maybe I should refrase my question. I have two tables A and B. Each table contains a non-unique indexed varchar column. Table A contains around 500000 rows table B contains around 4 million rows for a given value of col there are typically 3 records in A and 20 records in B (give or take) I want to do something like this: select a.somecol from A left join B on B.col = A.col where A.date > '2001-01-01'; But this query runs for hours on an eight core server with 4G etc etc. I configured PG to use 128Mb shared memory, but fiddling with that value doesn't change much.. And I continuously see the planner choosing for seq scans. I don't get it. Ron -- NeoNova BV innovatieve internetoplossingen http://www.neonova.nl Science Park 140 1098 XG Amsterdam info: 020-5611300 servicedesk: 020-5611302 fax: 020-5611301 KvK Amsterdam 34151241 Op dit bericht is de volgende disclaimer van toepassing: http://www.neonova.nl/maildisclaimer
Ron Arts <ron.arts@neonova.nl> writes: > I have two tables A and B. > Each table contains a non-unique indexed varchar column. > Table A contains around 500000 rows > table B contains around 4 million rows > for a given value of col there are typically 3 records in A and > 20 records in B (give or take) > I want to do something like this: > select a.somecol from A left join B on B.col = A.col where A.date > '2001-01-01'; > But this query runs for hours on an eight core server with 4G etc etc. > I configured PG to use 128Mb shared memory, but fiddling with that value > doesn't change much.. > And I continuously see the planner choosing for seq scans. > I don't get it. Given those numbers, index scans wouldn't help. You have not shown us the EXPLAIN results, but I suspect that the best plan is a hash join, or possibly a merge join. In either case the way to make it go faster is to raise work_mem --- there is no other parameter that is likely to change things much. You could probably profitably use work_mem up to 100MB or so on this problem. I would not recommend setting work_mem so high as a global setting, but you can set it locally in the session that's doing this join. regards, tom lane