Thread: Index usage question
I have a question on index usage: I have 2 tables: CREATE TABLE tblAccountAvailablePlugins ( nAcctPluginDirID int4, nAvailPluginID int4, sPluginKey varchar(255), nMaxInstances int4, bEnable int2 DEFAULT 0 ); CREATE UNIQUE INDEX XPKtblAccountAvailablePlugins ON tblAccountAvailablePlugins ( nAvailPluginID ); CREATE UNIQUE INDEX XAK1tblAccountAvailablePlugins ON tblAccountAvailablePlugins ( nAcctPluginDirID, sPluginKey ); CREATE TABLE tblAccountPluginDir ( nAcctPluginDirID int4, nAccountID int4, sPluginDirKey varchar(32) ); CREATE UNIQUE INDEX XPKtblAccountPluginDir ON tblAccountPluginDir ( nAcctPluginDirID ); CREATE UNIQUE INDEX XAK1tblAccountPluginDir ON tblAccountPluginDir ( nAccountID, sPluginDirKey ); When I execute the explain on a simple join query I see that the indices are NOT used for the join: explain select tblAccountAvailablePlugins.nAcctPluginDirID, tblAccountPluginDir.nAcctPluginDirID FROM tblAccountAvailablePlugins, tblAccountPluginDir WHERE tblAccountPluginDir.nAcctPluginDirID=tblAccountAvailablePlugins.nAcctPluginD irID; NOTICE: QUERY PLAN: Hash Join (cost=21.45..640.50 rows=6530 width=8) -> Seq Scan on tblaccountavailableplugins (cost=0.00..187.52 rows=10452 width=4) -> Hash (cost=18.76..18.76 rows=1076 width=4) -> Seq Scan on tblaccountplugindir (cost=0.00..18.76 rows=1076 width=4) EXPLAIN can anyone explain me what is wrong with my query? select version(); version --------------------------------------------------------------- PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.3 > Jefim Matskin > --------------------------------------------- > Senior SW engeneer > Sphera Corporation > Tel: +972.3.613.2424 Ext:104 > mailto:mjefim@sphera.com > http://www.sphera.com/ >
> When I execute the explain on a simple join query I see that the indices are > NOT used for > the join: > > explain select tblAccountAvailablePlugins.nAcctPluginDirID, > tblAccountPluginDir.nAcctPluginDirID FROM tblAccountAvailablePlugins, > tblAccountPluginDir WHERE > tblAccountPluginDir.nAcctPluginDirID=tblAccountAvailablePlugins.nAcctPluginD > irID; > NOTICE: QUERY PLAN: > > Hash Join (cost=21.45..640.50 rows=6530 width=8) > -> Seq Scan on tblaccountavailableplugins (cost=0.00..187.52 rows=10452 > width=4) > -> Hash (cost=18.76..18.76 rows=1076 width=4) > -> Seq Scan on tblaccountplugindir (cost=0.00..18.76 rows=1076 > width=4) > > EXPLAIN > > can anyone explain me what is wrong with my query? If you haven't VACUUM ANALYZE 'd recently, do so to update the statistics on your table(s)... The reason it's doing a sequential scan is that a seq scan is (probably) better if that query really does return as many rows as it thinks it does.. After you update the statistics, try again and see what the query plan is then.. -Mitch
I tested it and it was using an index scan. Have you vacuum analyzed lately? I reformatted the tables to look more friendly :-) CREATE TABLE AvailablePlugins ( DirID int4, ID int4 primary key, Key text, nMaxInstances int4, bEnable int2 DEFAULT 0, unique(DirID,Key) ); CREATE TABLE PluginDir ( DirID int4 primary key, nAccountID int4, sPluginDirKey varchar(32) ); explain select ap.DirID,pd.DirID FROM AvailablePlugins ap, PluginDir pd WHERE pd.DirID=ap.DirID; NOTICE: QUERY PLAN: Merge Join (cost=0.00..143.01 rows=10000 width=8) -> Index Scan using availableplugins_dirid_key on availableplugins ap (cost=0.00..59.00 rows=1000 width=4) -> Index Scan using plugindir_pkey on plugindir pd (cost=0.00..59.00 rows=1000 width=4) EXPLAIN - Einar Karttunen On Tue, Sep 04, 2001 at 07:36:51PM +0200, Jefim Matskin wrote: > > I have a question on index usage: > I have 2 tables: > > CREATE TABLE tblAccountAvailablePlugins ( > nAcctPluginDirID int4, > nAvailPluginID int4, > sPluginKey varchar(255), > nMaxInstances int4, > bEnable int2 DEFAULT 0 > ); > > CREATE UNIQUE INDEX XPKtblAccountAvailablePlugins ON > tblAccountAvailablePlugins > ( > nAvailPluginID > ); > > CREATE UNIQUE INDEX XAK1tblAccountAvailablePlugins ON > tblAccountAvailablePlugins > ( > nAcctPluginDirID, > sPluginKey > ); > > CREATE TABLE tblAccountPluginDir ( > nAcctPluginDirID int4, > nAccountID int4, > sPluginDirKey varchar(32) > ); > > CREATE UNIQUE INDEX XPKtblAccountPluginDir ON tblAccountPluginDir > ( > nAcctPluginDirID > ); > > CREATE UNIQUE INDEX XAK1tblAccountPluginDir ON tblAccountPluginDir > ( > nAccountID, > sPluginDirKey > ); > > > When I execute the explain on a simple join query I see that the indices are > NOT used for > the join: > > explain select tblAccountAvailablePlugins.nAcctPluginDirID, > tblAccountPluginDir.nAcctPluginDirID FROM tblAccountAvailablePlugins, > tblAccountPluginDir WHERE > tblAccountPluginDir.nAcctPluginDirID=tblAccountAvailablePlugins.nAcctPluginD > irID; > NOTICE: QUERY PLAN: > > Hash Join (cost=21.45..640.50 rows=6530 width=8) > -> Seq Scan on tblaccountavailableplugins (cost=0.00..187.52 rows=10452 > width=4) > -> Hash (cost=18.76..18.76 rows=1076 width=4) > -> Seq Scan on tblaccountplugindir (cost=0.00..18.76 rows=1076 > width=4) > > EXPLAIN > > can anyone explain me what is wrong with my query? > > > select version(); > version > --------------------------------------------------------------- > PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC 2.95.3 > > > Jefim Matskin > > --------------------------------------------- > > Senior SW engeneer > > Sphera Corporation > > Tel: +972.3.613.2424 Ext:104 > > mailto:mjefim@sphera.com > > http://www.sphera.com/ > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
I have experienced the following problem, maybe they are related: I create a table with create table test (id int, name char(10)); then I create an index on it with create index test_idx on test (id); After populating my table, the query select id from test where id='1'; uses index scan. However when I only create the index AFTER inserting rows into the table, the index is not being used for the evaluation, even if I run vacuum / vaccum analyze on test; I would also appreciate some help with this. Thank you, Norbert
The fastest operation for the number of rows returned by the query is attempted -- granted that it may not always work but an index scan isn't always the best answer for large sets of data..You actually have more overhead using an index in some situations. You said you enter some rows, but how many rows got returned by your query? Though I'm not sure, PG might not treat that 1 as the integer 1, since it's in single quotes.. The index was on an integer field, perhaps that is part of the problem too... -Mitch > I have experienced the following problem, maybe they are related: > > I create a table with > create table test (id int, name char(10)); > > then I create an index on it with > create index test_idx on test (id); > > After populating my table, the query > select id from test where id='1'; > > uses index scan. > > However when I only create the index AFTER inserting rows into the table, > the index is not being used for the evaluation, > even if I run vacuum / vaccum analyze on test; > > I would also appreciate some help with this.
Jefim Matskin <mjefim@sphera.com> writes: > can anyone explain me what is wrong with my query? Nothing. Hash join is a perfectly respectable way to implement this query. If the size estimates quoted in the EXPLAIN are in the right ballpark, I'd not be surprised to find that the planner made the right choice --- nestloop will certainly be slower, and there's no reason to think that a merge join based on index scans would be faster either. You can try "set enable_hashjoin to off" if you want to experiment with alternate plans, but you should check the actual timing before assuming that you know better than the planner. regards, tom lane
What does your data look like? If you have a lot of duplicate id's, a sequential scan may be better than an index scan. If you are not sure if this is the case, try: SELECT id, count(*) AS count FROM test GROUP BY id ORDER BY count DESC LIMIT 50; This should show you the top 50 most duplicated records in your table. -r At 08:52 PM 9/4/01 +0200, Norbert Zoltan Toth wrote: >I have experienced the following problem, maybe they are related: > >I create a table with > create table test (id int, name char(10)); > >then I create an index on it with > create index test_idx on test (id); > >After populating my table, the query > select id from test where id='1'; > >uses index scan. > >However when I only create the index AFTER inserting rows into the table, >the index is not being used for the evaluation, >even if I run vacuum / vaccum analyze on test; > >I would also appreciate some help with this. > >Thank you, >Norbert > > >---------------------------(end of broadcast)--------------------------- >TIP 6: Have you searched our list archives? > >http://www.postgresql.org/search.mpl > > > >--- >Incoming mail is certified Virus Free. >Checked by AVG anti-virus system (http://www.grisoft.com). >Version: 6.0.251 / Virus Database: 124 - Release Date: 4/26/01 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.251 / Virus Database: 124 - Release Date: 4/26/01
> The fastest operation for the number of rows returned by the query is > attempted -- granted that it may not always work but an index scan isn't > always the best answer for large sets of data..You actually have more > overhead using an index in some situations. > > You said you enter some rows, but how many rows got returned by your query? > > Though I'm not sure, PG might not treat that 1 as the integer 1, since it's > in single quotes.. The index was on an integer field, perhaps that is part > of the problem too... > > -Mitch > > > I have experienced the following problem, maybe they are related: > > > > I create a table with > > create table test (id int, name char(10)); > > > > then I create an index on it with > > create index test_idx on test (id); > > > > After populating my table, the query > > select id from test where id='1'; > > > > uses index scan. > > > > However when I only create the index AFTER inserting rows into the table, > > the index is not being used for the evaluation, > > even if I run vacuum / vaccum analyze on test; > > > > I would also appreciate some help with this. > > >
(Sorry for reposting...) > You said you enter some rows, but how many rows got returned by your query? You're right, it does make all the difference. With only a few rows, indexing is somehow used in the first case only (for my example), but with larger tables index scan is used in both ways. Many thanks, Norbert > > Though I'm not sure, PG might not treat that 1 as the integer 1, since it's > in single quotes.. The index was on an integer field, perhaps that is part > of the problem too... > > -Mitch > > > I have experienced the following problem, maybe they are related: > > > > I create a table with > > create table test (id int, name char(10)); > > > > then I create an index on it with > > create index test_idx on test (id); > > > > After populating my table, the query > > select id from test where id='1'; > > > > uses index scan. > > > > However when I only create the index AFTER inserting rows into the table, > > the index is not being used for the evaluation, > > even if I run vacuum / vaccum analyze on test; > > > > I would also appreciate some help with this. > > >