Thread: Indices get ignored in large tables

Indices get ignored in large tables

From
Apua Paquola
Date:
Hi,

I found a case where psql 7.1.3 doesn't use indices when the number of
rows is very large.

In the FIRST SESSION (below) I create the table foo and fill it with
200 rows.  Then, I create an index in the 'bar' field and run some
explains and they all return 'Index Scan'. So far, everything is fine.

In the SECOND SESSION, however, I fill the table with 600,000 rows.
After creating the index in the 'bar' field, explain commands report
'Seq Scan' for both fields 'id' and 'bar'.

In the MORE DETAILS section there is an uname -a and a psql --version.

This test gave the same results with psql 7.1.3 compiled on an alpha.
(uname -a: OSF1 verjo19 V4.0 1530 alpha)

If you cannot reproduce this bug or want me to run more tests, please
send an email.

Best regards,
Apuã Paquola
IQ-USP Bioinformatics Lab.



create database bugreport;

------------------------------ foo.sql ---------------------------------------
create table foo
(
id int,
bar int,
primary key(id)
);
------------------------------------------------------------------------------


------------------------------ generate_data.pl ------------------------------
#!/usr/bin/perl -w
use strict;
my $len=shift;
my $i;

print "COPY \"foo\"  FROM stdin;\n";
for($i=0; $i<$len; $i++)
{
    print $i+1,"\t",$len-$i,"\n";
}
print "\\.\n";
------------------------------------------------------------------------------


------------------------------ FIRST SESSION ---------------------------------

$ psql bugreport < foo.sql
NOTICE:  CREATE TABLE/PRIMARY KEY will create implicit index 'foo_pkey' for table 'foo'
CREATE
$ ./generate_data.pl 200 | psql bugreport
$ psql bugreport

bugreport=> explain select * from foo where id=34;
NOTICE:  QUERY PLAN:

Index Scan using foo_pkey on foo  (cost=0.00..8.14 rows=10 width=8)

EXPLAIN
bugreport=> create index bar_key on foo(bar);
CREATE
bugreport=> explain select * from foo where bar=34;
NOTICE:  QUERY PLAN:

Index Scan using bar_key on foo  (cost=0.00..3.03 rows=2 width=8)

EXPLAIN
bugreport=> explain select * from foo where id=34;
NOTICE:  QUERY PLAN:

Index Scan using foo_pkey on foo  (cost=0.00..3.06 rows=2 width=8)

EXPLAIN

------------------------------------------------------------------------------



------------------------------ SECOND SESSION --------------------------------
$ psql bugreport
bugreport=> drop table foo;
DROP

$ psql bugreport < foo.sql
NOTICE:  CREATE TABLE/PRIMARY KEY will create implicit index 'foo_pkey' for table 'foo'
CREATE
$ ./generate_data.pl 600000 | psql bugreport
$ psql bugreport

bugreport=> explain select * from foo where id=34;
NOTICE:  QUERY PLAN:

Index Scan using foo_pkey on foo  (cost=0.00..8.14 rows=10 width=8)

EXPLAIN
bugreport=> create index bar_key on foo(bar);
CREATE
bugreport=> explain select * from foo where id=34;
NOTICE:  QUERY PLAN:

Seq Scan on foo  (cost=0.00..10744.00 rows=6000 width=8)

EXPLAIN
bugreport=> explain select * from foo where bar=34;
NOTICE:  QUERY PLAN:

Seq Scan on foo  (cost=0.00..10744.00 rows=6000 width=8)

EXPLAIN
------------------------------------------------------------------------------



------------------------------ MORE DETAILS ----------------------------------

psql here is installed as a standard debian package.
I'm using the latest (updated daily) debian unstable distribution.

$ psql --version
psql (PostgreSQL) 7.1.3
contains readline, history, multibyte support
Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group
Portions Copyright (c) 1996 Regents of the University of California
Read the file COPYRIGHT or use the command \copyright to see the
usage and distribution terms.

$ uname -a
Linux verjo14 2.4.5 #1 Mon Jun 11 13:12:36 BRT 2001 i686 unknown

------------------------------------------------------------------------------

Re: Indices get ignored in large tables

From
Stephan Szabo
Date:
You probably need to run vacuum analyze after filling in the data.
Otherwise it's using bogus selectivity (see the number of expected
rows returned by the explain output, it's expecting 6000 rows after
the create index, which is large enough to make it guess that
it'll be cheaper to sequence scan the table.