Thread: Difference in indexes

Difference in indexes

From
"A.j. Langereis"
Date:
Dear all,
 
I'm using a PostgreSQL 8.1.0 dabase on a Fedora Core 3 machine here. In this table there is a table hosts:
 
CREATE TABLE hosts
(
  hostid int4 NOT NULL DEFAULT nextval('hosts_hostid_seq'::regclass),
  hostip cidr NOT NULL,
  hostname varchar(50),
  lastseen timestamp DEFAULT '1970-01-01 01:00:00'::timestamp without time zone,
  total int4 DEFAULT 0,
  image varchar(20) DEFAULT 'hosts/unknown.png'::character varying,
  CONSTRAINT hosts_pkey PRIMARY KEY (hostid)
)
WITHOUT OIDS;
ALTER TABLE hosts OWNER TO root;
 
CREATE INDEX hosts_hostip
  ON hosts
  USING btree
  (hostip);
CREATE INDEX hosts_hostname
  ON hosts
  USING btree
  (hostname);
 
When I run 2 queries on this table:
select * from hosts where hostname='Fabian'
select * from hosts where hostname='Foo'
 
I got 2 differen explain plans:
"Seq Scan on hosts  (cost=0.00..10.25 rows=21 width=59) (actual time=0.048..0.600 rows=21 loops=1)"
"  Filter: ((hostname)::text = 'Fabian'::text)"
"Total runtime: 0.794 ms"
"Index Scan using hosts_hostname on hosts  (cost=0.00..9.04 rows=2 width=59) (actual time=0.057..0.057 rows=0 loops=1)"
"  Index Cond: ((hostname)::text = 'Foo'::text)"
"Total runtime: 0.185 ms"
What is happening here? What am I overlooking? The length does not seem to be the problem: 'FooFooFoo' also uses the index..
Also the fact whenever there are results or not does not seem to influence the planner..
 
Yours,
 
Aarjan Langereis

Re: Difference in indexes

From
"Qingqing Zhou"
Date:
""A.j. Langereis"" <a.j.langereis@inter.nl.net> wrote
>
> What is happening here? What am I overlooking? The length does not
> seem to be the problem: 'FooFooFoo' also uses the index..
> Also the fact whenever there are results or not does not seem to
> influence the planner..
>

Check out this thread:

http://archives.postgresql.org/pgsql-bugs/2005-11/msg00032.php

Regards,
Qingqing



Re: Difference in indexes

From
"A.j. Langereis"
Date:
Tnks Qingqing, I've read them! Nevertheless I think there are some
differences here:

* The difference in the amout of rows is much smaller. Nevertheless the
table is small as well: only 100 rows
* There is a faster query plan to solve the query:

"Bitmap Heap Scan on hosts  (cost=2.07..11.34 rows=21 width=59) (actual
time=0.175..0.287 rows=21 loops=1)"
"  Recheck Cond: ((hostname)::text = 'Fabian'::text)"
"  ->  Bitmap Index Scan on hosts_hostname  (cost=0.00..2.07 rows=21
width=0) (actual time=0.145..0.145 rows=21 loops=1)"
"        Index Cond: ((hostname)::text = 'Fabian'::text)"
"Total runtime: 0.510 ms"

This result was achieved by setting enable_seqscan to off (postgresql.conf).
Turning off enable_bitmapscan as well resulted in a index scan which was
even more faster:

"Index Scan using hosts_hostname on hosts  (cost=0.00..37.28 rows=21
width=59) (actual time=0.068..0.281 rows=21 loops=1)"
"  Index Cond: ((hostname)::text = 'Fabian'::text)"
"Total runtime: 0.492 ms"

Yours,

Aarjan

----- Original Message -----
From: "Qingqing Zhou" <zhouqq@cs.toronto.edu>
To: <pgsql-general@postgresql.org>
Sent: Tuesday, November 22, 2005 12:38 AM
Subject: Re: [GENERAL] Difference in indexes


>
> ""A.j. Langereis"" <a.j.langereis@inter.nl.net> wrote
> >
> > What is happening here? What am I overlooking? The length does not
> > seem to be the problem: 'FooFooFoo' also uses the index..
> > Also the fact whenever there are results or not does not seem to
> > influence the planner..
> >
>
> Check out this thread:
>
> http://archives.postgresql.org/pgsql-bugs/2005-11/msg00032.php
>
> Regards,
> Qingqing
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
>



Re: Difference in indexes

From
"Qingqing Zhou"
Date:
""A.j. Langereis"" <a.j.langereis@inter.nl.net> wrote
>
> "Bitmap Heap Scan on hosts  (cost=2.07..11.34 rows=21 width=59) (actual
> time=0.175..0.287 rows=21 loops=1)"
> "  Recheck Cond: ((hostname)::text = 'Fabian'::text)"
> "  ->  Bitmap Index Scan on hosts_hostname  (cost=0.00..2.07 rows=21
> width=0) (actual time=0.145..0.145 rows=21 loops=1)"
> "        Index Cond: ((hostname)::text = 'Fabian'::text)"
> "Total runtime: 0.510 ms"
>
> This result was achieved by setting enable_seqscan to off
> (postgresql.conf).
> Turning off enable_bitmapscan as well resulted in a index scan which was
> even more faster:
>
> "Index Scan using hosts_hostname on hosts  (cost=0.00..37.28 rows=21
> width=59) (actual time=0.068..0.281 rows=21 loops=1)"
> "  Index Cond: ((hostname)::text = 'Fabian'::text)"
> "Total runtime: 0.492 ms"
>

If you compare the difference among the *estimated* cost ("cost=0.00 .."):

seqscan: cost=0.00..10.25
Bitmap: cost=2.07..11.34
indexscan: cost=0.00..37.28

Then you will know why the optimizer prefers sequential scan.  Yes, in your
case, the *real* cost("actual time = ...") is quite different from the
estimated cost. That's because the optimizer can't collect enough
information of the environment at execution. For example, the optimizer does
not know if a data page is in buffer or not(which will incurs IO cost) and
it always assumes not. There is a long story about the why the optimizer
does this. In short, since PG uses small buffer pool and the optimizer is
mainly useful for big tables, so this assumption is reasonable -- but for
small tables, may not that good.

Regards,
Qingqing






Re: Difference in indexes

From
"A.j. Langereis"
Date:
Ok, I didn't look at it from that point of view. It makes it all clear!

Thanks for the explanation!

Yours,

Aarjan Langereis

----- Original Message -----
From: "Qingqing Zhou" <zhouqq@cs.toronto.edu>
To: <pgsql-general@postgresql.org>
Sent: Tuesday, November 22, 2005 2:23 AM
Subject: Re: [GENERAL] Difference in indexes


>
> ""A.j. Langereis"" <a.j.langereis@inter.nl.net> wrote
> >
> > "Bitmap Heap Scan on hosts  (cost=2.07..11.34 rows=21 width=59) (actual
> > time=0.175..0.287 rows=21 loops=1)"
> > "  Recheck Cond: ((hostname)::text = 'Fabian'::text)"
> > "  ->  Bitmap Index Scan on hosts_hostname  (cost=0.00..2.07 rows=21
> > width=0) (actual time=0.145..0.145 rows=21 loops=1)"
> > "        Index Cond: ((hostname)::text = 'Fabian'::text)"
> > "Total runtime: 0.510 ms"
> >
> > This result was achieved by setting enable_seqscan to off
> > (postgresql.conf).
> > Turning off enable_bitmapscan as well resulted in a index scan which was
> > even more faster:
> >
> > "Index Scan using hosts_hostname on hosts  (cost=0.00..37.28 rows=21
> > width=59) (actual time=0.068..0.281 rows=21 loops=1)"
> > "  Index Cond: ((hostname)::text = 'Fabian'::text)"
> > "Total runtime: 0.492 ms"
> >
>
> If you compare the difference among the *estimated* cost ("cost=0.00 .."):
>
> seqscan: cost=0.00..10.25
> Bitmap: cost=2.07..11.34
> indexscan: cost=0.00..37.28
>
> Then you will know why the optimizer prefers sequential scan.  Yes, in
your
> case, the *real* cost("actual time = ...") is quite different from the
> estimated cost. That's because the optimizer can't collect enough
> information of the environment at execution. For example, the optimizer
does
> not know if a data page is in buffer or not(which will incurs IO cost) and
> it always assumes not. There is a long story about the why the optimizer
> does this. In short, since PG uses small buffer pool and the optimizer is
> mainly useful for big tables, so this assumption is reasonable -- but for
> small tables, may not that good.
>
> Regards,
> Qingqing
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
>