Thread: indexes

indexes

From
Remigiusz Sokolowski
Date:
Hi!
I try to optimize following query 
SELECT DISTINCT e1.name_ent AS nazwa_grupy
FROM ent e1, binds b1, ent e2
WHERE e1.id_ent=b1.id_parent AND b1.id_child=e2.id_ent AND b1.id_links=0
AND e2.name_ent='SERWIS';

Is it possible to get better results? From explain notes it looks, that
engine uses indexes on ent table, but I can't constrain it to use any
index on binds table - this causes main problem.
Or may be query should look in other way?

EXPLAIN gives following information:
NOTICE:  QUERY PLAN:
Unique  (cost=77.02 size=0 width=0) ->  Sort  (cost=77.02 size=0 width=0)       ->  Nested Loop  (cost=77.02 size=1
width=28)            ->  Nested Loop  (cost=74.97 size=1 width=12)                   ->  Seq Scan on b1  (cost=72.97
size=1width=8)                   ->  Index Scan on e2  (cost=2.00 size=1 width=4)             ->  Index Scan on e1
(cost=2.05size=1304 width=16)
 

and I have following indexes
Table    = ent2_idx
+--------------------+--------------------+------+
|Field               |Type                |Length|
+--------------------+--------------------+------+
| id_ent             | int4               |4     |
| name_ent           | text               |var   |
+--------------------+--------------------+------+

and 
Table    = binds2_idx
+--------------------+----------------------------------+------+
|Field               |              Type                |Length|
+--------------------+----------------------------------+------+
| id_parent          | int4                             |4     |
| id_child           | int4                             |4     |
| id_links           | int4                             |4     |
+--------------------+----------------------------------+------+

TIARem

-------------------------------------------------------------------*------------
Remigiusz Sokolowski      e-mail: rems@gdansk.sprint.pl           * *        
-----------------------------------------------------------------*****----------



Re: [SQL] indexes

From
Vadim Mikheev
Date:
Remigiusz Sokolowski wrote:
> 
> Hi!
> I try to optimize following query
> SELECT DISTINCT e1.name_ent AS nazwa_grupy
> FROM ent e1, binds b1, ent e2
> WHERE e1.id_ent=b1.id_parent AND b1.id_child=e2.id_ent AND b1.id_links=0
             ^^^^^^^^^^^^^
 
> AND e2.name_ent='SERWIS';
> 
> Is it possible to get better results? From explain notes it looks, that
> engine uses indexes on ent table, but I can't constrain it to use any
> index on binds table - this causes main problem.
> Or may be query should look in other way?
> 
...
> Table    = binds2_idx
> +--------------------+----------------------------------+------+
> |Field               |              Type                |Length|
> +--------------------+----------------------------------+------+
> | id_parent          | int4                             |4     |
> | id_child           | int4                             |4     |
> | id_links           | int4                             |4     |
> +--------------------+----------------------------------+------+

Try to create index on binds (id_links, id_parent, id_child)

Vadim


Re: [SQL] indexes

From
Remigiusz Sokolowski
Date:
> > Hi!
> > I try to optimize following query
> > SELECT DISTINCT e1.name_ent AS nazwa_grupy
> > FROM ent e1, binds b1, ent e2
> > WHERE e1.id_ent=b1.id_parent AND b1.id_child=e2.id_ent AND b1.id_links=0
> > AND e2.name_ent='SERWIS';
> > 
> > Table    = binds2_idx
> > +--------------------+----------------------------------+------+
> > |Field               |              Type                |Length|
> > +--------------------+----------------------------------+------+
> > | id_parent          | int4                             |4     |
> > | id_child           | int4                             |4     |
> > | id_links           | int4                             |4     |
> > +--------------------+----------------------------------+------+
> 
> Try to create index on binds (id_links, id_parent, id_child)
> 
> Vadim

Wow - works great - big thanks!
Could You throw some hints - why index should look like as You wrote?
I looked in docs, but there is no clues how to build indices (or I just
didn't find them)Thanks once more :-)Rem
-------------------------------------------------------------------*------------
Remigiusz Sokolowski      e-mail: rems@gdansk.sprint.pl           * *        
-----------------------------------------------------------------*****----------




Re: [SQL] indexes

From
Vadim Mikheev
Date:
Remigiusz Sokolowski wrote:
> 
> > > Hi!
> > > I try to optimize following query
> > > SELECT DISTINCT e1.name_ent AS nazwa_grupy
> > > FROM ent e1, binds b1, ent e2
> > > WHERE e1.id_ent=b1.id_parent AND b1.id_child=e2.id_ent AND b1.id_links=0
> > > AND e2.name_ent='SERWIS';
> > >
> > > Table    = binds2_idx
> > > +--------------------+----------------------------------+------+
> > > |Field               |              Type                |Length|
> > > +--------------------+----------------------------------+------+
> > > | id_parent          | int4                             |4     |
> > > | id_child           | int4                             |4     |
> > > | id_links           | int4                             |4     |
> > > +--------------------+----------------------------------+------+
> >
> > Try to create index on binds (id_links, id_parent, id_child)
> >
> > Vadim
> 
> Wow - works great - big thanks!
> Could You throw some hints - why index should look like as You wrote?
> I looked in docs, but there is no clues how to build indices (or I just
> didn't find them)

Index can be used only if first index key is defined. In your query
b1.id_parent (your old first index key) is used in join clause and
so binds index could be used only in inner plan: after a tuple
is fetched from e1 (outer plan) making b1.id_parent defined, but
optimizer decided that this is not right way.

b1.id_links has constant value and so binds index with id_links as
first key may be used in outer plan (scanned first). Other keys of
new binds index will not be used in index scan of this query anyway. 
After a binds tuple is fetched, keys for e1/e2 indices is defined and
so these indices can be used, and more of that - both keys of
e2 index will be used in scan.

Vadim


Re: [SQL] indexes

From
Tom Lane
Date:
Remigiusz Sokolowski <rems@gdansk.sprint.pl> writes:
> NOTICE:  QUERY PLAN:
> Unique  (cost=77.02 size=0 width=0)
>   ->  Sort  (cost=77.02 size=0 width=0)
>         ->  Nested Loop  (cost=77.02 size=1 width=28)
>               ->  Nested Loop  (cost=74.97 size=1 width=12)
>                     ->  Seq Scan on b1  (cost=72.97 size=1 width=8)
>                     ->  Index Scan on e2  (cost=2.00 size=1 width=4)
>               ->  Index Scan on e1  (cost=2.05 size=1304 width=16)
> 

I think the real problem here is that the optimizer thinks your tables
are small (notice the size=1 estimates in the inner loop).  Have you
done a VACUUM lately?  You need that to update the statistics that the
optimizer uses.

Unless you are dealing with very small tables, you don't want to see
nested-loop joins (that means scanning the lower table once for each
tuple in the upper table!).  You want to see merge joins or hash joins.

Vadim's suggestion of a better-adapted index was a good one, but I
wonder whether the speedup you saw wasn't just a side effect from
CREATE INDEX having updated the optimizer's stats, so that it stopped
using nested loops...
        regards, tom lane


Re: [SQL] indexes

From
Heiko Wilms
Date:
Hi all,

indices speed up database queries. Tell me if I'm wrong.
Does it make sense to create indices on all fields involved in
a query? If so, what is the best access method (btree,rtree,hash)?

Heiko

--
H. Wilms

Homepage schering.si2.uni-hannover.de/hugo
email    wilms@stud.fh-hannover.de





Re: [SQL] indexes

From
Tom Lane
Date:
Heiko Wilms <wilms@stud.fh-hannover.de> writes:
> indices speed up database queries. Tell me if I'm wrong.
> Does it make sense to create indices on all fields involved in
> a query?

Well, indexes can speed up queries, but they also slow down inserts
and updates, because each index has to be updated whenever you update
the table.  So it's a tradeoff.  You probably don't want to make indexes
that are only useful for seldom-used queries, unless updates to the
table are even less frequent than the queries.

If you are wondering *which* fields to make indexes on, one way to
decide is to create a whole set of indexes and then run EXPLAIN on
the kinds of queries you do most often.  EXPLAIN will show you which
indexes the optimizer is actually using --- then you can get rid of
the rest.  (Be sure to load the tables to representative sizes and
do a VACUUM ANALYZE before you trust the results from EXPLAIN very
much.)

> If so, what is the best access method (btree,rtree,hash)?

btree is the best general-purpose index type.  rtree is only suitable
for 2-D data (polygons and similar data types).  hash is less flexible
than btree because you cannot use it for an ordered scan of the table.
        regards, tom lane


Re: [SQL] indexes

From
Remigiusz Sokolowski
Date:
> Remigiusz Sokolowski <rems@gdansk.sprint.pl> writes:
> > NOTICE:  QUERY PLAN:
> > Unique  (cost=77.02 size=0 width=0)
> >   ->  Sort  (cost=77.02 size=0 width=0)
> >         ->  Nested Loop  (cost=77.02 size=1 width=28)
> >               ->  Nested Loop  (cost=74.97 size=1 width=12)
> >                     ->  Seq Scan on b1  (cost=72.97 size=1 width=8)
> >                     ->  Index Scan on e2  (cost=2.00 size=1 width=4)
> >               ->  Index Scan on e1  (cost=2.05 size=1304 width=16)
> > 
> 
> I think the real problem here is that the optimizer thinks your tables
> are small (notice the size=1 estimates in the inner loop).  Have you
> done a VACUUM lately?  You need that to update the statistics that the
> optimizer uses.

Yea - I've just done VACUUM during my efforts to speed up query (btw.
should I do something with notices from VACUUM like that one:
NOTICE:  Ind binds_idx: NUMBER OF INDEX' TUPLES (4) IS NOT THE SAME AS
HEAP' (1787))
> Unless you are dealing with very small tables, you don't want to see
> nested-loop joins (that means scanning the lower table once for each
> tuple in the upper table!).  You want to see merge joins or hash joins.
> 
> Vadim's suggestion of a better-adapted index was a good one, but I
> wonder whether the speedup you saw wasn't just a side effect from
> CREATE INDEX having updated the optimizer's stats, so that it stopped
> using nested loops...
> 
I have too less experience to know at what quantity of records use which
join.
In this case table ent has 1304 rows and table binds 1787.
But without Vadim's suggestion I've got still similar results - sometimes
optimizer has used index on e1 and on b1, sometimes on e1 and e2, but
never on e1,b1 and e2 at once (note: this was join on ent e1, binds b1 and
ent e2) and whole query was performed at  cost about 70. After rebuilding
index at cost 6-7 (what is enough for me)Rem


-------------------------------------------------------------------*------------
Remigiusz Sokolowski      e-mail: rems@gdansk.sprint.pl           * *        
-----------------------------------------------------------------*****----------



Re: [SQL] indexes

From
Vadim Mikheev
Date:
Remigiusz Sokolowski wrote:
> 
> Yea - I've just done VACUUM during my efforts to speed up query (btw.
> should I do something with notices from VACUUM like that one:
> NOTICE:  Ind binds_idx: NUMBER OF INDEX' TUPLES (4) IS NOT THE SAME AS
> HEAP' (1787))

Ops. Re-build binds_idx! And use latest 6.5 snapshot if you use
6.5 beta...

Vadim


Re: [SQL] indexes

From
Remigiusz Sokolowski
Date:
> > 
> > Yea - I've just done VACUUM during my efforts to speed up query (btw.
> > should I do something with notices from VACUUM like that one:
> > NOTICE:  Ind binds_idx: NUMBER OF INDEX' TUPLES (4) IS NOT THE SAME AS
> > HEAP' (1787))
> 
> Ops. Re-build binds_idx! And use latest 6.5 snapshot if you use
> 6.5 beta...
> 
> Vadim

I use ver. 6.3.2 and have rather little influence on which version is used
- my admin decide about such things.
Is this possible to solve this manually?Rem

-------------------------------------------------------------------*------------
Remigiusz Sokolowski      e-mail: rems@gdansk.sprint.pl           * *        
-----------------------------------------------------------------*****----------



Re: [SQL] indexes

From
Tom Lane
Date:
Remigiusz Sokolowski <rems@gdansk.sprint.pl> writes:
> But without Vadim's suggestion I've got still similar results - sometimes
> optimizer has used index on e1 and on b1, sometimes on e1 and e2, but
> never on e1,b1 and e2 at once (note: this was join on ent e1, binds b1 and
> ent e2)

You won't ever see a three-way join in EXPLAIN's output, if that was
what you were looking for.  Postgres always handles joins by combining
two tables, then combining the result with another table, etc.
        regards, tom lane