Thread: Index not used in join.. (example included)

Index not used in join.. (example included)

From
Ron Arts
Date:
Hi,

I don't understand why my query doesn't use the index on the name column:

CREATE TABLE celeb (
  id serial,
  name varchar(255) NOT NULL,
  PRIMARY KEY  (id)
);

create index celeb_name_index on celeb (name);
INSERT INTO celeb (name) values ('Cameron Diaz');
INSERT INTO celeb (name) values ('Angelina Jolie');
INSERT INTO celeb (name) values ('Reese Witherspoon');

CREATE TABLE detail (
  id serial,
  name varchar(255) NOT NULL,
  award date,
  PRIMARY KEY  (id)
);

create index detail_name_index on detail (name);
INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-01-01');
INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-02-01');
INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-03-01');
INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-01-01');
INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-02-01');
INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-03-01');
INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-04-01');
INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-01-01');
INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-02-01');
INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-03-01');
INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-04-01');
INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-05-01');


stium=# explain select * from celeb left join detail on detail.name = celeb.name;
                               QUERY PLAN
-----------------------------------------------------------------------
  Hash Left Join  (cost=11.75..25.30 rows=140 width=1044)
    Hash Cond: (("outer".name)::text = ("inner".name)::text)
    ->  Seq Scan on celeb  (cost=0.00..11.40 rows=140 width=520)
    ->  Hash  (cost=11.40..11.40 rows=140 width=524)
          ->  Seq Scan on detail  (cost=0.00..11.40 rows=140 width=524)
(5 rows)


Now, why does it use sequential scans?? I don't get it. It also does this
on a very large table which is ANALYZE'd.

Thanks,
Ron


--
NeoNova BV
innovatieve internetoplossingen

http://www.neonova.nl  Science Park 140           1098 XG Amsterdam
info: 020-5611300      servicedesk: 020-5611302   fax: 020-5611301
KvK Amsterdam 34151241

Op dit bericht is de volgende disclaimer van toepassing:
http://www.neonova.nl/maildisclaimer

Re: Index not used in join.. (example included)

From
"A. Kretschmer"
Date:
In response to Ron Arts :
> Hi,
>
> I don't understand why my query doesn't use the index on the name column:
>
> CREATE TABLE celeb (
>  id serial,
>  name varchar(255) NOT NULL,
>  PRIMARY KEY  (id)
> );
>
> create index celeb_name_index on celeb (name);
> INSERT INTO celeb (name) values ('Cameron Diaz');
> INSERT INTO celeb (name) values ('Angelina Jolie');
> INSERT INTO celeb (name) values ('Reese Witherspoon');
>
> CREATE TABLE detail (
>  id serial,
>  name varchar(255) NOT NULL,
>  award date,
>  PRIMARY KEY  (id)
> );
>
> create index detail_name_index on detail (name);
> INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-01-01');
> INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-02-01');
> INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-03-01');
> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-01-01');
> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-02-01');
> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-03-01');
> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-04-01');
> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-01-01');
> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-02-01');
> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-03-01');
> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-04-01');
> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-05-01');
>
>
> stium=# explain select * from celeb left join detail on detail.name =
> celeb.name;
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Hash Left Join  (cost=11.75..25.30 rows=140 width=1044)
>    Hash Cond: (("outer".name)::text = ("inner".name)::text)
>    ->  Seq Scan on celeb  (cost=0.00..11.40 rows=140 width=520)
>    ->  Hash  (cost=11.40..11.40 rows=140 width=524)
>          ->  Seq Scan on detail  (cost=0.00..11.40 rows=140 width=524)
> (5 rows)
>
>
> Now, why does it use sequential scans?? I don't get it. It also does this
> on a very large table which is ANALYZE'd.

First, bad design. Both tables contains the text-column name, waste of
space. You should better use the PK from celeb as foreign key in detail.

Furthermore, all distinct values for name in detail are in the result
set, so a seq-scan is the best solution.

You can try to force an index-scan by disabling the seq-scan (set
enable_seqscan='no';) and re-run the query.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)

Re: Index not used in join.. (example included)

From
Ron Arts
Date:
A. Kretschmer schreef:
> In response to Ron Arts :
>> Hi,
>>
>> I don't understand why my query doesn't use the index on the name column:
>>
>> CREATE TABLE celeb (
>>  id serial,
>>  name varchar(255) NOT NULL,
>>  PRIMARY KEY  (id)
>> );
>>
>> create index celeb_name_index on celeb (name);
>> INSERT INTO celeb (name) values ('Cameron Diaz');
>> INSERT INTO celeb (name) values ('Angelina Jolie');
>> INSERT INTO celeb (name) values ('Reese Witherspoon');
>>
>> CREATE TABLE detail (
>>  id serial,
>>  name varchar(255) NOT NULL,
>>  award date,
>>  PRIMARY KEY  (id)
>> );
>>
>> create index detail_name_index on detail (name);
>> INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-01-01');
>> INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-02-01');
>> INSERT INTO detail (name, award) values ('Cameron Diaz', '2009-03-01');
>> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-01-01');
>> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-02-01');
>> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-03-01');
>> INSERT INTO detail (name, award) values ('Angelina Jolie', '2009-04-01');
>> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-01-01');
>> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-02-01');
>> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-03-01');
>> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-04-01');
>> INSERT INTO detail (name, award) values ('Reese Witherspoon', '2009-05-01');
>>
>>
>> stium=# explain select * from celeb left join detail on detail.name =
>> celeb.name;
>>                               QUERY PLAN
>> -----------------------------------------------------------------------
>>  Hash Left Join  (cost=11.75..25.30 rows=140 width=1044)
>>    Hash Cond: (("outer".name)::text = ("inner".name)::text)
>>    ->  Seq Scan on celeb  (cost=0.00..11.40 rows=140 width=520)
>>    ->  Hash  (cost=11.40..11.40 rows=140 width=524)
>>          ->  Seq Scan on detail  (cost=0.00..11.40 rows=140 width=524)
>> (5 rows)
>>
>>
>> Now, why does it use sequential scans?? I don't get it. It also does this
>> on a very large table which is ANALYZE'd.
>
> First, bad design. Both tables contains the text-column name, waste of
> space. You should better use the PK from celeb as foreign key in detail.
>

Yes, bad design, although in the real tables there are multiple records
for each celeb. Still bad design, I know.

> Furthermore, all distinct values for name in detail are in the result
> set, so a seq-scan is the best solution.
>

If I add a column to celeb, say 'birthdate', and use that in the where clause
it still uses seq scan, even though not all distinct values are in the result set,
aren't they?


stium=# explain select * from celeb left join detail on detail.name = celeb.name where celeb.birthdate > '1973-01-01';
                               QUERY PLAN
-----------------------------------------------------------------------
  Merge Left Join  (cost=29.45..30.85 rows=47 width=1048)
    Merge Cond: ("outer"."?column4?" = "inner"."?column4?")
    ->  Sort  (cost=13.06..13.17 rows=47 width=524)
          Sort Key: (celeb.name)::text
          ->  Seq Scan on celeb  (cost=0.00..11.75 rows=47 width=524)
                Filter: (birthdate > '1973-01-01'::date)
    ->  Sort  (cost=16.39..16.74 rows=140 width=524)
          Sort Key: (detail.name)::text
          ->  Seq Scan on detail  (cost=0.00..11.40 rows=140 width=524)
(9 rows)


I'd expect it would take all records above the given date, join the
records from detail using the index on that. Am I wrong?

> You can try to force an index-scan by disabling the seq-scan (set
> enable_seqscan='no';) and re-run the query.


That does force it to use the index, but doesn't speed up my query.
>
> Andreas

Thanks for responding.

Ron

--
NeoNova BV
innovatieve internetoplossingen

http://www.neonova.nl  Science Park 140           1098 XG Amsterdam
info: 020-5611300      servicedesk: 020-5611302   fax: 020-5611301
KvK Amsterdam 34151241

Op dit bericht is de volgende disclaimer van toepassing:
http://www.neonova.nl/maildisclaimer

Re: Index not used in join.. (example included)

From
Michael Wood
Date:
2009/9/18 Ron Arts <ron.arts@neonova.nl>:
> A. Kretschmer schreef:
[...]
>> You can try to force an index-scan by disabling the seq-scan (set
>> enable_seqscan='no';) and re-run the query.
>
> That does force it to use the index, but doesn't speed up my query.

Would it help to add an index on the name column of the detail table?
And maybe to make it a foreign key referencing the first table?

--
Michael Wood <esiotrot@gmail.com>

Re: Index not used in join.. (example included)

From
Michael Wood
Date:
2009/9/18 Michael Wood <esiotrot@gmail.com>:
> 2009/9/18 Ron Arts <ron.arts@neonova.nl>:
>> A. Kretschmer schreef:
> [...]
>>> You can try to force an index-scan by disabling the seq-scan (set
>>> enable_seqscan='no';) and re-run the query.
>>
>> That does force it to use the index, but doesn't speed up my query.
>
> Would it help to add an index on the name column of the detail table?
> And maybe to make it a foreign key referencing the first table?

Sorry, I just had another look and see you do have an index on the
name column in detail.

I wonder if it would make a difference if the index on the first table
was a unique index.

--
Michael Wood <esiotrot@gmail.com>

Re: Index not used in join.. (example included)

From
Ron Arts
Date:
Michael Wood schreef:
> 2009/9/18 Michael Wood <esiotrot@gmail.com>:
>> 2009/9/18 Ron Arts <ron.arts@neonova.nl>:
>>> A. Kretschmer schreef:
>> [...]
>>>> You can try to force an index-scan by disabling the seq-scan (set
>>>> enable_seqscan='no';) and re-run the query.
>>> That does force it to use the index, but doesn't speed up my query.
>> Would it help to add an index on the name column of the detail table?
>> And maybe to make it a foreign key referencing the first table?
>
> Sorry, I just had another look and see you do have an index on the
> name column in detail.
>
> I wonder if it would make a difference if the index on the first table
> was a unique index.
>
> --
> Michael Wood <esiotrot@gmail.com>
>

Well, the name columns are not unique in both tables. So I cannot make it
an unique index.

Ron

--
NeoNova BV
innovatieve internetoplossingen

http://www.neonova.nl  Science Park 140           1098 XG Amsterdam
info: 020-5611300      servicedesk: 020-5611302   fax: 020-5611301
KvK Amsterdam 34151241

Op dit bericht is de volgende disclaimer van toepassing:
http://www.neonova.nl/maildisclaimer

Re: Index not used in join.. (example included)

From
Tom Lane
Date:
Ron Arts <ron.arts@neonova.nl> writes:
> A. Kretschmer schreef:
>> You can try to force an index-scan by disabling the seq-scan (set
>> enable_seqscan='no';) and re-run the query.

> That does force it to use the index, but doesn't speed up my query.

IOW, the planner chose the right plan.  Don't complain...

I think the real answer here is that the planner thinks the table
is too small to bother with an index.  If you load up a non-toy
amount of data you might get a different plan.

            regards, tom lane

Re: Index not used in join.. (example included).

From
Ron Arts
Date:
Sorry. Maybe I should refrase my question.

I have two tables A and B.
Each table contains a non-unique indexed varchar column.

Table A contains around 500000 rows
table B contains around 4 million rows

for a given value of col there are typically 3 records in A and
20 records in B (give or take)

I want to do something like this:

select a.somecol from A left join B on B.col = A.col where A.date > '2001-01-01';

But this query runs for hours on an eight core server with 4G etc etc.
I configured PG to use 128Mb shared memory, but fiddling with that value
doesn't change much..

And I continuously see the planner choosing for seq scans.

I don't get it.

Ron

--
NeoNova BV
innovatieve internetoplossingen

http://www.neonova.nl  Science Park 140           1098 XG Amsterdam
info: 020-5611300      servicedesk: 020-5611302   fax: 020-5611301
KvK Amsterdam 34151241

Op dit bericht is de volgende disclaimer van toepassing:
http://www.neonova.nl/maildisclaimer

Re: Index not used in join.. (example included).

From
Tom Lane
Date:
Ron Arts <ron.arts@neonova.nl> writes:
> I have two tables A and B.
> Each table contains a non-unique indexed varchar column.

> Table A contains around 500000 rows
> table B contains around 4 million rows

> for a given value of col there are typically 3 records in A and
> 20 records in B (give or take)

> I want to do something like this:

> select a.somecol from A left join B on B.col = A.col where A.date > '2001-01-01';

> But this query runs for hours on an eight core server with 4G etc etc.
> I configured PG to use 128Mb shared memory, but fiddling with that value
> doesn't change much..

> And I continuously see the planner choosing for seq scans.

> I don't get it.

Given those numbers, index scans wouldn't help.  You have not shown us
the EXPLAIN results, but I suspect that the best plan is a hash join,
or possibly a merge join.  In either case the way to make it go faster
is to raise work_mem --- there is no other parameter that is likely
to change things much.  You could probably profitably use work_mem up to
100MB or so on this problem.  I would not recommend setting work_mem so
high as a global setting, but you can set it locally in the session
that's doing this join.

            regards, tom lane