Thread: Index usage question

Index usage question

From
Jefim Matskin
Date:
I have a question on index usage:
I have 2 tables:

CREATE TABLE tblAccountAvailablePlugins (
       nAcctPluginDirID     int4,
       nAvailPluginID       int4,
       sPluginKey           varchar(255),
       nMaxInstances        int4,
       bEnable              int2 DEFAULT 0
);

CREATE UNIQUE INDEX XPKtblAccountAvailablePlugins ON
tblAccountAvailablePlugins
(
       nAvailPluginID
);

CREATE UNIQUE INDEX XAK1tblAccountAvailablePlugins ON
tblAccountAvailablePlugins
(
       nAcctPluginDirID,
       sPluginKey
);

CREATE TABLE tblAccountPluginDir (
       nAcctPluginDirID     int4,
       nAccountID           int4,
       sPluginDirKey        varchar(32)
);

CREATE UNIQUE INDEX XPKtblAccountPluginDir ON tblAccountPluginDir
(
       nAcctPluginDirID
);

CREATE UNIQUE INDEX XAK1tblAccountPluginDir ON tblAccountPluginDir
(
       nAccountID,
       sPluginDirKey
);


When I execute the explain on a simple join query I see that the indices are
NOT used for
the join:

explain select tblAccountAvailablePlugins.nAcctPluginDirID,
tblAccountPluginDir.nAcctPluginDirID FROM tblAccountAvailablePlugins,
tblAccountPluginDir   WHERE
tblAccountPluginDir.nAcctPluginDirID=tblAccountAvailablePlugins.nAcctPluginD
irID;
NOTICE:  QUERY PLAN:

Hash Join  (cost=21.45..640.50 rows=6530 width=8)
  ->  Seq Scan on tblaccountavailableplugins  (cost=0.00..187.52 rows=10452
width=4)
  ->  Hash  (cost=18.76..18.76 rows=1076 width=4)
        ->  Seq Scan on tblaccountplugindir  (cost=0.00..18.76 rows=1076
width=4)

EXPLAIN

can anyone explain me what is wrong with my query?


select version();
                            version
---------------------------------------------------------------
 PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.3

> Jefim Matskin
> ---------------------------------------------
> Senior SW engeneer
> Sphera Corporation
> Tel: +972.3.613.2424 Ext:104
> mailto:mjefim@sphera.com
> http://www.sphera.com/
>

Re: Index usage question

From
"Mitch Vincent"
Date:
> When I execute the explain on a simple join query I see that the indices
are
> NOT used for
> the join:
>
> explain select tblAccountAvailablePlugins.nAcctPluginDirID,
> tblAccountPluginDir.nAcctPluginDirID FROM tblAccountAvailablePlugins,
> tblAccountPluginDir   WHERE
>
tblAccountPluginDir.nAcctPluginDirID=tblAccountAvailablePlugins.nAcctPluginD
> irID;
> NOTICE:  QUERY PLAN:
>
> Hash Join  (cost=21.45..640.50 rows=6530 width=8)
>   ->  Seq Scan on tblaccountavailableplugins  (cost=0.00..187.52
rows=10452
> width=4)
>   ->  Hash  (cost=18.76..18.76 rows=1076 width=4)
>         ->  Seq Scan on tblaccountplugindir  (cost=0.00..18.76 rows=1076
> width=4)
>
> EXPLAIN
>
> can anyone explain me what is wrong with my query?

If you haven't VACUUM ANALYZE 'd recently, do so to update the statistics on
your table(s)... The reason it's doing a sequential scan is that a seq scan
is (probably) better if that query really does return as many rows as it
thinks it does.. After you update the statistics, try again and see what the
query plan is then..

-Mitch


Re: Index usage question

From
Einar Karttunen
Date:
I tested it and it was using an index scan. Have you
vacuum analyzed lately? I reformatted the tables to
look more friendly :-)

CREATE TABLE AvailablePlugins (
       DirID            int4,
       ID               int4 primary key,
       Key              text,
       nMaxInstances    int4,
       bEnable          int2 DEFAULT 0,
       unique(DirID,Key)
);

CREATE TABLE PluginDir (
       DirID                int4 primary key,
       nAccountID           int4,
       sPluginDirKey        varchar(32)
);


explain select ap.DirID,pd.DirID
FROM AvailablePlugins ap, PluginDir pd
WHERE pd.DirID=ap.DirID;

NOTICE:  QUERY PLAN:
Merge Join  (cost=0.00..143.01 rows=10000 width=8)
  ->  Index Scan using availableplugins_dirid_key on availableplugins ap  (cost=0.00..59.00 rows=1000 width=4)
  ->  Index Scan using plugindir_pkey on plugindir pd  (cost=0.00..59.00 rows=1000 width=4)
EXPLAIN

- Einar Karttunen

On Tue, Sep 04, 2001 at 07:36:51PM +0200, Jefim Matskin wrote:
>
> I have a question on index usage:
> I have 2 tables:
>
> CREATE TABLE tblAccountAvailablePlugins (
>        nAcctPluginDirID     int4,
>        nAvailPluginID       int4,
>        sPluginKey           varchar(255),
>        nMaxInstances        int4,
>        bEnable              int2 DEFAULT 0
> );
>
> CREATE UNIQUE INDEX XPKtblAccountAvailablePlugins ON
> tblAccountAvailablePlugins
> (
>        nAvailPluginID
> );
>
> CREATE UNIQUE INDEX XAK1tblAccountAvailablePlugins ON
> tblAccountAvailablePlugins
> (
>        nAcctPluginDirID,
>        sPluginKey
> );
>
> CREATE TABLE tblAccountPluginDir (
>        nAcctPluginDirID     int4,
>        nAccountID           int4,
>        sPluginDirKey        varchar(32)
> );
>
> CREATE UNIQUE INDEX XPKtblAccountPluginDir ON tblAccountPluginDir
> (
>        nAcctPluginDirID
> );
>
> CREATE UNIQUE INDEX XAK1tblAccountPluginDir ON tblAccountPluginDir
> (
>        nAccountID,
>        sPluginDirKey
> );
>
>
> When I execute the explain on a simple join query I see that the indices are
> NOT used for
> the join:
>
> explain select tblAccountAvailablePlugins.nAcctPluginDirID,
> tblAccountPluginDir.nAcctPluginDirID FROM tblAccountAvailablePlugins,
> tblAccountPluginDir   WHERE
> tblAccountPluginDir.nAcctPluginDirID=tblAccountAvailablePlugins.nAcctPluginD
> irID;
> NOTICE:  QUERY PLAN:
>
> Hash Join  (cost=21.45..640.50 rows=6530 width=8)
>   ->  Seq Scan on tblaccountavailableplugins  (cost=0.00..187.52 rows=10452
> width=4)
>   ->  Hash  (cost=18.76..18.76 rows=1076 width=4)
>         ->  Seq Scan on tblaccountplugindir  (cost=0.00..18.76 rows=1076
> width=4)
>
> EXPLAIN
>
> can anyone explain me what is wrong with my query?
>
>
> select version();
>                             version
> ---------------------------------------------------------------
>  PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.3
>
> > Jefim Matskin
> > ---------------------------------------------
> > Senior SW engeneer
> > Sphera Corporation
> > Tel: +972.3.613.2424 Ext:104
> > mailto:mjefim@sphera.com
> > http://www.sphera.com/
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

Re: Index usage question

From
"Norbert Zoltan Toth"
Date:
I have experienced the following problem, maybe they are related:

I create a table with
    create table test (id int, name char(10));

then I create an index on it with
    create index test_idx on test (id);

After populating my table, the query
    select id from test where id='1';

uses index scan.

However when I only create the index AFTER inserting rows into the table,
the index is not being used for the evaluation,
even if I run vacuum / vaccum analyze on test;

I would also appreciate some help with this.

Thank you,
Norbert


Re: Index usage question

From
"Mitch Vincent"
Date:
The fastest operation for the number of rows returned by the query is
attempted -- granted that it may not always work but an index scan isn't
always the best answer for large sets of data..You actually have more
overhead using an index in some situations.

You said you enter some rows, but how many rows got returned by your query?

Though I'm not sure, PG might not treat that 1 as the integer 1, since it's
in single quotes.. The index was on an integer field, perhaps that is part
of the problem too...

-Mitch

> I have experienced the following problem, maybe they are related:
>
> I create a table with
>     create table test (id int, name char(10));
>
> then I create an index on it with
>     create index test_idx on test (id);
>
> After populating my table, the query
>     select id from test where id='1';
>
> uses index scan.
>
> However when I only create the index AFTER inserting rows into the table,
> the index is not being used for the evaluation,
> even if I run vacuum / vaccum analyze on test;
>
> I would also appreciate some help with this.



Re: Index usage question

From
Tom Lane
Date:
Jefim Matskin <mjefim@sphera.com> writes:
> can anyone explain me what is wrong with my query?

Nothing.  Hash join is a perfectly respectable way to implement this
query.  If the size estimates quoted in the EXPLAIN are in the right
ballpark, I'd not be surprised to find that the planner made the right
choice --- nestloop will certainly be slower, and there's no reason to
think that a merge join based on index scans would be faster either.

You can try "set enable_hashjoin to off" if you want to experiment with
alternate plans, but you should check the actual timing before assuming
that you know better than the planner.

            regards, tom lane

Re: Index usage question

From
Ryan Mahoney
Date:
What does your data look like?  If you have a lot of duplicate id's, a
sequential scan may be better than an index scan.
If you are not sure if this is the case, try:

SELECT id, count(*) AS count FROM test GROUP BY id ORDER BY count DESC
LIMIT 50;

This should show you the top 50 most duplicated records in your table.

-r

At 08:52 PM 9/4/01 +0200, Norbert Zoltan Toth wrote:

>I have experienced the following problem, maybe they are related:
>
>I create a table with
>     create table test (id int, name char(10));
>
>then I create an index on it with
>     create index test_idx on test (id);
>
>After populating my table, the query
>     select id from test where id='1';
>
>uses index scan.
>
>However when I only create the index AFTER inserting rows into the table,
>the index is not being used for the evaluation,
>even if I run vacuum / vaccum analyze on test;
>
>I would also appreciate some help with this.
>
>Thank you,
>Norbert
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 6: Have you searched our list archives?
>
>http://www.postgresql.org/search.mpl
>
>
>
>---
>Incoming mail is certified Virus Free.
>Checked by AVG anti-virus system (http://www.grisoft.com).
>Version: 6.0.251 / Virus Database: 124 - Release Date: 4/26/01

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.251 / Virus Database: 124 - Release Date: 4/26/01

Re: Index usage question

From
"Norbert Zoltan Toth"
Date:
> The fastest operation for the number of rows returned by the query is
> attempted -- granted that it may not always work but an index scan isn't
> always the best answer for large sets of data..You actually have more
> overhead using an index in some situations.
>
> You said you enter some rows, but how many rows got returned by your
query?
>
> Though I'm not sure, PG might not treat that 1 as the integer 1, since
it's
> in single quotes.. The index was on an integer field, perhaps that is part
> of the problem too...
>
> -Mitch
>
> > I have experienced the following problem, maybe they are related:
> >
> > I create a table with
> >     create table test (id int, name char(10));
> >
> > then I create an index on it with
> >     create index test_idx on test (id);
> >
> > After populating my table, the query
> >     select id from test where id='1';
> >
> > uses index scan.
> >
> > However when I only create the index AFTER inserting rows into the
table,
> > the index is not being used for the evaluation,
> > even if I run vacuum / vaccum analyze on test;
> >
> > I would also appreciate some help with this.
>
>
>


Re: Index usage question - Norbert

From
"Norbert Zoltan Toth"
Date:
(Sorry for reposting...)

> You said you enter some rows, but how many rows got returned by your
query?

You're right, it does make all the difference.  With only a few rows,
indexing is somehow used in the first case only (for my example), but with
larger tables index scan is used in both ways.

Many thanks,
Norbert

>
> Though I'm not sure, PG might not treat that 1 as the integer 1, since
it's
> in single quotes.. The index was on an integer field, perhaps that is part
> of the problem too...
>
> -Mitch
>
> > I have experienced the following problem, maybe they are related:
> >
> > I create a table with
> >     create table test (id int, name char(10));
> >
> > then I create an index on it with
> >     create index test_idx on test (id);
> >
> > After populating my table, the query
> >     select id from test where id='1';
> >
> > uses index scan.
> >
> > However when I only create the index AFTER inserting rows into the
table,
> > the index is not being used for the evaluation,
> > even if I run vacuum / vaccum analyze on test;
> >
> > I would also appreciate some help with this.
>
>
>