Thread: A question about indexes...

A question about indexes...

From
Alexaki Sofia
Date:
Hello,
I have the  following tables in my db
Painter (id integer,  uri varchar(256)) and
paints (id1 integer, id2 integer)
I  want to optimize the question  select id from Painter where uri = 'xxxxx';What kind of index (Btree or Hash) is more
efficientto create on field
 
uri  since it's a string?
I also want  to optimize the join between the ables Painter and paints 
on the fields id and id1 respectively?
I can either define the field id as a Primary Key or create an Btree index
on it. What is more effient?? 
From  my test I see that creating  Btree index is  a bit faster!!. 
Would the performance (of the join) be  improved if  I created indexes
both on field id and id1 or it's sufficient  to create one of the two
indexes ?
As far as I can see the performance is improved if I have a Primary Key on
Painter.id and a BTree index on paints.id1. However when I create a 
Btree index on Painter.id and a BTree index on paints.id1 performance
gets worst.


thank you in advance for your help
Sofia Alexaki
alexaki@ics.forth.gr



Re: A question about indexes...

From
Tom Lane
Date:
Alexaki Sofia <alexaki@ics.forth.gr> writes:
> I can either define the field id as a Primary Key or create an Btree index
> on it. What is more effient?? 
> From  my test I see that creating  Btree index is  a bit faster!!. 

I think you're seeing things.  Declaring a field primary key creates
a btree index on it (and also enables UNIQUE and NOT NULL checks, but
those don't affect the speed of lookups).  There isn't going to be
any difference between the two ways of doing it --- whatever difference
you measured was due to other factors, eg, disk pages already in cache.

As for your other point I'd generally recommend btree over hash indexes.
The btree code is much more thoroughly tested, supports concurrent
updates which hash indexes don't, and allows order-based index scans
which hash doesn't.  I don't see any redeeming social value in a hash
index, actually...
        regards, tom lane


Re: A question about indexes...

From
Alexaki Sofia
Date:
Hello,

I have the following  tables in my database
Painter(id integer, uri varchar(256))
paints(id1 integer, id2 integer)

in order to speed up the join (select * from painter, paints where
painter.id= paints.id1)  between these two tables I have created indexes
on the field painter.id and/or paints.id1.

But as I see from the query plan the indexes are not used, instead
sequential search  is done either I define indexes or not.
As you can see below the query plan remains the same.
Is that reasonable??? Shouldn't Postgresql use the indexes in order 
to optimize question???I can't see why is better to make sequential search
since the size of tables is relatively big.


A)  No indexes are defined on the tables 
Hash Join  (cost=12269.78 rows=60014 width=24) ->  Seq Scan on painter1  (cost=4234.97 rows=99999 width=16) ->  Hash
(cost=1931.92rows=50331 width=8)       ->  Seq Scan on paints  (cost=1931.92 rows=50331 width=8)
 
B1)
BTree index on painter.id
Hash Join  (cost=12269.78 rows=60014 width=24) ->  Seq Scan on painter  (cost=4234.97 rows=99999 width=16) ->  Hash
(cost=1931.92rows=50331 width=8)       ->  Seq Scan on paints  (cost=1931.92 rows=50331 width=8)
 

B2) 
Primary Key on painter.id
Hash Join  (cost=12269.78 rows=60014 width=24) ->  Seq Scan on painter  (cost=4234.97 rows=99999 width=16) ->  Hash
(cost=1931.92rows=50331 width=8)       ->  Seq Scan on paints  (cost=1931.92 rows=50331 width=8)
 
C1)
BTree index on painter.id and Btree on paints.id1
Hash Join  (cost=12269.78 rows=60014 width=24) ->  Seq Scan on painter  (cost=4234.97 rows=99999 width=16) ->  Hash
(cost=1931.92rows=50331 width=8)       ->  Seq Scan on paints  (cost=1931.92 rows=50331 width=8)
 

C2)
Primary Key on painter.id and Btree on paints.id1
Hash Join  (cost=12269.78 rows=60014 width=24) ->  Seq Scan on painter  (cost=4234.97 rows=99999 width=16) ->  Hash
(cost=1931.92rows=50331 width=8)       ->  Seq Scan on paints  (cost=1931.92 rows=50331 width=8)
 

Regards,
Sofia Alexaki




Re: A question about indexes...

From
Tom Lane
Date:
Alexaki Sofia <alexaki@ics.forth.gr> writes:
> But as I see from the query plan the indexes are not used, instead
> sequential search  is done either I define indexes or not.
> As you can see below the query plan remains the same.
> Is that reasonable??? Shouldn't Postgresql use the indexes in order 
> to optimize question???

Not necessarily.  Since you're just doing a join without restricting
the query to a subset of either table, the indexes would only be
useful as a means of ordering the inputs to a mergejoin --- and an
indexscan over a whole table is *not* particularly fast, because of
all the random seeks involved.

The plausible plans for this sort of query are basically

Merge Join-> Index Scan on t1-> Index Scan on t2

Merge Join-> Sort    -> Seq Scan on t1-> Sort    -> Seq Scan on t2

Hash Join-> Seq Scan on t1-> Seq Scan on t2

(Postgres also considers mergejoins with indexscan on one side and
explicit sort on the other, but for brevity I ignore that possibility.)

Any of these might be the best choice depending on number of rows,
width of each row, and harder-to-predict factors like how well-ordered
the tuples are already.  The planner's cost models are evidently
predicting that the hash join will be the quickest.  You could
experiment, if you're interested, by forcing the choice by setting
ENABLE_HASHJOIN and ENABLE_SORT on or off, and then comparing the
estimated costs shown by EXPLAIN and the actual measured query
runtimes.  If the estimated-cost ratios are wildly at variance with
the real runtimes then you have a legitimate gripe.  But your gripe
should be that the cost models don't reflect reality, not that Postgres
ignores your indexes.
        regards, tom lane