Thread: Bug #474: Index using problem

Bug #474: Index using problem

From

pgsql-bugs@postgresql.org

Date:

05 October 2001, 03:46:42

JoE Olcsak (joe78@freemail.hu) reports a bug with a severity of 4
The lower the number the more severe it is.

Short Description
Index using problem

Long Description
I have a problem when using indexes under PostgreSQL 7.1.1 ...

The problem is :

  I have a field of table ... This is an INT4 type field ... and I
  created an index for this field like :
        create index idx_of_field on cim(utod_cim_id);
  ... and I executed a simple query ...

  explain select * from cim where utod_cim_id=0;

  NOTICE:  QUERY PLAN:

  Seq Scan on cim  (cost=0.00..6555.41 rows=253633 width=118)

  EXPLAIN

  The query optimizer does not use my index..

  I'm executed vacuum analyze command for this table but this not   helps me ...

  Then I created another index for this field:

  create index idx_of_field2 on cim(int4(utod_cim_id));

  ... and I executed another simple query :

   explain select * from cim where int4(utod_cim_id)=0;
   NOTICE:  QUERY PLAN:

   Index Scan using idx_of_field2 on cim  (cost=0.00..6499.70    rows=2536 width=118)

   EXPLAIN

What is wrong in the first case ?



Sample Code


No file was uploaded with this report

Re: Bug #474: Index using problem

From

Andreas Wernitznig

Date:

05 October 2001, 05:04:53

The query optimizer uses the index only if enough data are present in the table.
If only a few data are available a Seq Scan performs better and is therefore used.

Further one of the problems (which is hopfully solved in version 7.2) is that the query optimizer used for primary
keys/foreignkeys is not updated for an uninterrupted database connection. 

E.g.:
If have an empty database "Seq Scan" is used in any case. Then I connect to the database, fill in some data, and
executean "VACUUM ANALYZE" (which updates the statistics for the query optimizer). 
If I don't disconnect from the database but use the first connection again, the database still uses the (now) slower
"seqscan" for "primary/foreign key" checking. In this case the query optimizer statistics are not updated for
establishedconnections.  

My suggestion and question to the postgresql-programmers is now:
Why don't you skip the automatic index creation for primary keys and let the user decide to create an index,
that should be used in any case, regardless whether the table is full or empty ?

Andreas

On Fri, 5 Oct 2001 03:21:01 -0400 (EDT)
pgsql-bugs@postgresql.org wrote:

> JoE Olcsak (joe78@freemail.hu) reports a bug with a severity of 4
> The lower the number the more severe it is.
>
> Short Description
> Index using problem
>
> Long Description
> I have a problem when using indexes under PostgreSQL 7.1.1 ...
>
> The problem is :
>
>   I have a field of table ... This is an INT4 type field ... and I
>   created an index for this field like :
>         create index idx_of_field on cim(utod_cim_id);
>   ... and I executed a simple query ...
>
>   explain select * from cim where utod_cim_id=0;
>
>   NOTICE:  QUERY PLAN:
>
>   Seq Scan on cim  (cost=0.00..6555.41 rows=253633 width=118)
>
>   EXPLAIN
>
>   The query optimizer does not use my index..
>
>   I'm executed vacuum analyze command for this table but this not   helps me ...
>
>   Then I created another index for this field:
>
>   create index idx_of_field2 on cim(int4(utod_cim_id));
>
>   ... and I executed another simple query :
>
>    explain select * from cim where int4(utod_cim_id)=0;
>    NOTICE:  QUERY PLAN:
>
>    Index Scan using idx_of_field2 on cim  (cost=0.00..6499.70    rows=2536 width=118)
>
>    EXPLAIN
>
> What is wrong in the first case ?
>
>
>
> Sample Code
>
>
> No file was uploaded with this report
>

Re: Bug #474: Index using problem

From

Tom Lane

Date:

05 October 2001, 09:49:28

pgsql-bugs@postgresql.org writes:
>   explain select * from cim where utod_cim_id=0;
>   NOTICE:  QUERY PLAN:
>   Seq Scan on cim  (cost=0.00..6555.41 rows=253633 width=118)

>   The query optimizer does not use my index..

How many rows are there where utod_cim_id=0?  The planner guesses that
there are a lot (253633, to be exact) and concludes that using the index
is not profitable for this query.

            regards, tom lane

Re: Bug #474: Index using problem

From

Tom Lane

Date:

05 October 2001, 09:52:33

Andreas Wernitznig <andreas@insilico.com> writes:
> If I don't disconnect from the database but use the first connection
> again, the database still uses the (now) slower "seq scan" for
> "primary/foreign key" checking. In this case the query optimizer
> statistics are not updated for established connections.

Sure they are --- in my tests, anyway.  What did you do *exactly*?

            regards, tom lane

Re: Bug #474: Index using problem

From

Andreas Wernitznig

Date:

05 October 2001, 14:45:57

1. I created a small database consisting of two connected tables:

create table table1 (
index1        int4    not null,
textfield      varchar(1000) not null,
constraint PK_table1 primary key (index1)
);

create table table2 (
index2        int4    not null,
index1        int4    not null,
textfield     varchar(1000) not null,
constraint PK_table2 primary key (index2),
constraint FK_table1 foreign key (index1)
   references table1 (index1)
   on delete restrict on update restrict
);

2. Then I insert 100 rows ($n=1..100) in each of these tables:

insert into table1 VALUES ($n, '123456789');
insert into table2 VALUES ($n, $n, '123456789');

3. then I send a "vacuum analyze" and an "explain select * from table1 where index1 = 543;"
The output is:
NOTICE:  QUERY PLAN:
Index Scan using pk_table1 on table1  (cost=0.00..2.01 rows=1 width=16)

4. Then I insert 4900 rows into each of these tables like in step 2.

----------------------------

-- Test A: --
Then I send a "vacuum analyze;"
          and "delete from table2;",
          and "delete from table1;"
and rerun steps 2 to 4 -> step 4 takes 39 seconds.

-- Test B: --
Then I send "delete from table2;",
        and "delete from table1;",
        and a "vacuum analyze;"
and rerun steps 2 to 4 -> step 4 takes 81 seconds.

Although the "explain" command tells me that an Index Scan is used, step 4 is much slower in Test B.
For both tests (steps 2-4) I use one connection to the database.
If I quit the connection after step 3 and establish a new connection for step 4 it takes 39 seconds in either cases.
-> Using one connection the optimizer for pk/fk-checking is not updated by a "vacuum analyze".

On Fri, 05 Oct 2001 09:52:20 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Andreas Wernitznig <andreas@insilico.com> writes:
> > If I don't disconnect from the database but use the first connection
> > again, the database still uses the (now) slower "seq scan" for
> > "primary/foreign key" checking. In this case the query optimizer
> > statistics are not updated for established connections.
>
> Sure they are --- in my tests, anyway.  What did you do *exactly*?
>
>             regards, tom lane
>

Re: Bug #474: Index using problem

From

Tom Lane

Date:

05 October 2001, 15:15:23

Andreas Wernitznig <andreas@insilico.com> writes:
> -> Using one connection the optimizer for pk/fk-checking is not
> updated by a "vacuum analyze".

Oh, I misunderstood you the first time: I thought you were saying that
*other* backends couldn't see the results of the VACUUM.

The reason for this behavior is that the foreign key checker caches a
plan for each foreign-key-checking query the first time it needs to
use that query (within a given backend).  There should be a mechanism
to flush those cached plans when circumstances change ... but currently
there isn't.

            regards, tom lane

Re: Bug #474: Index using problem

From

Andreas Wernitznig

Date:

05 October 2001, 17:58:30

Why don't you skip the automatic index creation for primary keys and let the user decide to create an index,
that should be used in any case, regardless what the query planner recommends ?

On Fri, 05 Oct 2001 15:15:06 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Andreas Wernitznig <andreas@insilico.com> writes:
> > -> Using one connection the optimizer for pk/fk-checking is not
> > updated by a "vacuum analyze".
>
> Oh, I misunderstood you the first time: I thought you were saying that
> *other* backends couldn't see the results of the VACUUM.
>
> The reason for this behavior is that the foreign key checker caches a
> plan for each foreign-key-checking query the first time it needs to
> use that query (within a given backend).  There should be a mechanism
> to flush those cached plans when circumstances change ... but currently
> there isn't.
>
>             regards, tom lane

Re: Bug #474: Index using problem

From

Tom Lane

Date:

05 October 2001, 18:01:23

Andreas Wernitznig <andreas@insilico.com> writes:
> Why don't you skip the automatic index creation for primary keys

And how then would we enforce uniqueness of the primary key?

> and let the user decide to create an index, that should be used in any
> case, regardless what the query planner recommends ?

What?  AFAIR the planner does not even *know* whether an index is
associated with a primary key.  It certainly does not give such an
index any special preference.

            regards, tom lane

Re: Bug #474: Index using problem

From

Andreas Wernitznig

Date:

05 October 2001, 18:37:20

I don't know how the primary key checking and the index scan are related in postgresql (but I am keen to learn more
aboutit).  
From Sybase ASA (SQL Anywhere) I know, that these two functions (pk checking, index search) are decoupled.
(that means even a primary key without an index is possible. This makes only sense in tables with a few rows).

The pg-planner takes the current number of datasets to calculate a query plan to reach the best performance.
If the number of datasets changes (and the user/db-programmer usually knows if it changes) the planner sometimes make
wrongdecisions. 
Then I have to execute a "vacuum analyze" or reconnect in case of foreign key checking.

I would like to tune postgresql to use an index in any case if it is available.

On Fri, 05 Oct 2001 18:01:08 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Andreas Wernitznig <andreas@insilico.com> writes:
> > Why don't you skip the automatic index creation for primary keys
>
> And how then would we enforce uniqueness of the primary key?
>
> > and let the user decide to create an index, that should be used in any
> > case, regardless what the query planner recommends ?
>
> What?  AFAIR the planner does not even *know* whether an index is
> associated with a primary key.  It certainly does not give such an
> index any special preference.
>
>             regards, tom lane
>