Thread: indexes
Hi! I try to optimize following query SELECT DISTINCT e1.name_ent AS nazwa_grupy FROM ent e1, binds b1, ent e2 WHERE e1.id_ent=b1.id_parent AND b1.id_child=e2.id_ent AND b1.id_links=0 AND e2.name_ent='SERWIS'; Is it possible to get better results? From explain notes it looks, that engine uses indexes on ent table, but I can't constrain it to use any index on binds table - this causes main problem. Or may be query should look in other way? EXPLAIN gives following information: NOTICE: QUERY PLAN: Unique (cost=77.02 size=0 width=0) -> Sort (cost=77.02 size=0 width=0) -> Nested Loop (cost=77.02 size=1 width=28) -> Nested Loop (cost=74.97 size=1 width=12) -> Seq Scan on b1 (cost=72.97 size=1width=8) -> Index Scan on e2 (cost=2.00 size=1 width=4) -> Index Scan on e1 (cost=2.05size=1304 width=16) and I have following indexes Table = ent2_idx +--------------------+--------------------+------+ |Field |Type |Length| +--------------------+--------------------+------+ | id_ent | int4 |4 | | name_ent | text |var | +--------------------+--------------------+------+ and Table = binds2_idx +--------------------+----------------------------------+------+ |Field | Type |Length| +--------------------+----------------------------------+------+ | id_parent | int4 |4 | | id_child | int4 |4 | | id_links | int4 |4 | +--------------------+----------------------------------+------+ TIARem -------------------------------------------------------------------*------------ Remigiusz Sokolowski e-mail: rems@gdansk.sprint.pl * * -----------------------------------------------------------------*****----------
Remigiusz Sokolowski wrote: > > Hi! > I try to optimize following query > SELECT DISTINCT e1.name_ent AS nazwa_grupy > FROM ent e1, binds b1, ent e2 > WHERE e1.id_ent=b1.id_parent AND b1.id_child=e2.id_ent AND b1.id_links=0 ^^^^^^^^^^^^^ > AND e2.name_ent='SERWIS'; > > Is it possible to get better results? From explain notes it looks, that > engine uses indexes on ent table, but I can't constrain it to use any > index on binds table - this causes main problem. > Or may be query should look in other way? > ... > Table = binds2_idx > +--------------------+----------------------------------+------+ > |Field | Type |Length| > +--------------------+----------------------------------+------+ > | id_parent | int4 |4 | > | id_child | int4 |4 | > | id_links | int4 |4 | > +--------------------+----------------------------------+------+ Try to create index on binds (id_links, id_parent, id_child) Vadim
> > Hi! > > I try to optimize following query > > SELECT DISTINCT e1.name_ent AS nazwa_grupy > > FROM ent e1, binds b1, ent e2 > > WHERE e1.id_ent=b1.id_parent AND b1.id_child=e2.id_ent AND b1.id_links=0 > > AND e2.name_ent='SERWIS'; > > > > Table = binds2_idx > > +--------------------+----------------------------------+------+ > > |Field | Type |Length| > > +--------------------+----------------------------------+------+ > > | id_parent | int4 |4 | > > | id_child | int4 |4 | > > | id_links | int4 |4 | > > +--------------------+----------------------------------+------+ > > Try to create index on binds (id_links, id_parent, id_child) > > Vadim Wow - works great - big thanks! Could You throw some hints - why index should look like as You wrote? I looked in docs, but there is no clues how to build indices (or I just didn't find them)Thanks once more :-)Rem -------------------------------------------------------------------*------------ Remigiusz Sokolowski e-mail: rems@gdansk.sprint.pl * * -----------------------------------------------------------------*****----------
Remigiusz Sokolowski wrote: > > > > Hi! > > > I try to optimize following query > > > SELECT DISTINCT e1.name_ent AS nazwa_grupy > > > FROM ent e1, binds b1, ent e2 > > > WHERE e1.id_ent=b1.id_parent AND b1.id_child=e2.id_ent AND b1.id_links=0 > > > AND e2.name_ent='SERWIS'; > > > > > > Table = binds2_idx > > > +--------------------+----------------------------------+------+ > > > |Field | Type |Length| > > > +--------------------+----------------------------------+------+ > > > | id_parent | int4 |4 | > > > | id_child | int4 |4 | > > > | id_links | int4 |4 | > > > +--------------------+----------------------------------+------+ > > > > Try to create index on binds (id_links, id_parent, id_child) > > > > Vadim > > Wow - works great - big thanks! > Could You throw some hints - why index should look like as You wrote? > I looked in docs, but there is no clues how to build indices (or I just > didn't find them) Index can be used only if first index key is defined. In your query b1.id_parent (your old first index key) is used in join clause and so binds index could be used only in inner plan: after a tuple is fetched from e1 (outer plan) making b1.id_parent defined, but optimizer decided that this is not right way. b1.id_links has constant value and so binds index with id_links as first key may be used in outer plan (scanned first). Other keys of new binds index will not be used in index scan of this query anyway. After a binds tuple is fetched, keys for e1/e2 indices is defined and so these indices can be used, and more of that - both keys of e2 index will be used in scan. Vadim
Remigiusz Sokolowski <rems@gdansk.sprint.pl> writes: > NOTICE: QUERY PLAN: > Unique (cost=77.02 size=0 width=0) > -> Sort (cost=77.02 size=0 width=0) > -> Nested Loop (cost=77.02 size=1 width=28) > -> Nested Loop (cost=74.97 size=1 width=12) > -> Seq Scan on b1 (cost=72.97 size=1 width=8) > -> Index Scan on e2 (cost=2.00 size=1 width=4) > -> Index Scan on e1 (cost=2.05 size=1304 width=16) > I think the real problem here is that the optimizer thinks your tables are small (notice the size=1 estimates in the inner loop). Have you done a VACUUM lately? You need that to update the statistics that the optimizer uses. Unless you are dealing with very small tables, you don't want to see nested-loop joins (that means scanning the lower table once for each tuple in the upper table!). You want to see merge joins or hash joins. Vadim's suggestion of a better-adapted index was a good one, but I wonder whether the speedup you saw wasn't just a side effect from CREATE INDEX having updated the optimizer's stats, so that it stopped using nested loops... regards, tom lane
Hi all, indices speed up database queries. Tell me if I'm wrong. Does it make sense to create indices on all fields involved in a query? If so, what is the best access method (btree,rtree,hash)? Heiko -- H. Wilms Homepage schering.si2.uni-hannover.de/hugo email wilms@stud.fh-hannover.de
Heiko Wilms <wilms@stud.fh-hannover.de> writes: > indices speed up database queries. Tell me if I'm wrong. > Does it make sense to create indices on all fields involved in > a query? Well, indexes can speed up queries, but they also slow down inserts and updates, because each index has to be updated whenever you update the table. So it's a tradeoff. You probably don't want to make indexes that are only useful for seldom-used queries, unless updates to the table are even less frequent than the queries. If you are wondering *which* fields to make indexes on, one way to decide is to create a whole set of indexes and then run EXPLAIN on the kinds of queries you do most often. EXPLAIN will show you which indexes the optimizer is actually using --- then you can get rid of the rest. (Be sure to load the tables to representative sizes and do a VACUUM ANALYZE before you trust the results from EXPLAIN very much.) > If so, what is the best access method (btree,rtree,hash)? btree is the best general-purpose index type. rtree is only suitable for 2-D data (polygons and similar data types). hash is less flexible than btree because you cannot use it for an ordered scan of the table. regards, tom lane
> Remigiusz Sokolowski <rems@gdansk.sprint.pl> writes: > > NOTICE: QUERY PLAN: > > Unique (cost=77.02 size=0 width=0) > > -> Sort (cost=77.02 size=0 width=0) > > -> Nested Loop (cost=77.02 size=1 width=28) > > -> Nested Loop (cost=74.97 size=1 width=12) > > -> Seq Scan on b1 (cost=72.97 size=1 width=8) > > -> Index Scan on e2 (cost=2.00 size=1 width=4) > > -> Index Scan on e1 (cost=2.05 size=1304 width=16) > > > > I think the real problem here is that the optimizer thinks your tables > are small (notice the size=1 estimates in the inner loop). Have you > done a VACUUM lately? You need that to update the statistics that the > optimizer uses. Yea - I've just done VACUUM during my efforts to speed up query (btw. should I do something with notices from VACUUM like that one: NOTICE: Ind binds_idx: NUMBER OF INDEX' TUPLES (4) IS NOT THE SAME AS HEAP' (1787)) > Unless you are dealing with very small tables, you don't want to see > nested-loop joins (that means scanning the lower table once for each > tuple in the upper table!). You want to see merge joins or hash joins. > > Vadim's suggestion of a better-adapted index was a good one, but I > wonder whether the speedup you saw wasn't just a side effect from > CREATE INDEX having updated the optimizer's stats, so that it stopped > using nested loops... > I have too less experience to know at what quantity of records use which join. In this case table ent has 1304 rows and table binds 1787. But without Vadim's suggestion I've got still similar results - sometimes optimizer has used index on e1 and on b1, sometimes on e1 and e2, but never on e1,b1 and e2 at once (note: this was join on ent e1, binds b1 and ent e2) and whole query was performed at cost about 70. After rebuilding index at cost 6-7 (what is enough for me)Rem -------------------------------------------------------------------*------------ Remigiusz Sokolowski e-mail: rems@gdansk.sprint.pl * * -----------------------------------------------------------------*****----------
Remigiusz Sokolowski wrote: > > Yea - I've just done VACUUM during my efforts to speed up query (btw. > should I do something with notices from VACUUM like that one: > NOTICE: Ind binds_idx: NUMBER OF INDEX' TUPLES (4) IS NOT THE SAME AS > HEAP' (1787)) Ops. Re-build binds_idx! And use latest 6.5 snapshot if you use 6.5 beta... Vadim
> > > > Yea - I've just done VACUUM during my efforts to speed up query (btw. > > should I do something with notices from VACUUM like that one: > > NOTICE: Ind binds_idx: NUMBER OF INDEX' TUPLES (4) IS NOT THE SAME AS > > HEAP' (1787)) > > Ops. Re-build binds_idx! And use latest 6.5 snapshot if you use > 6.5 beta... > > Vadim I use ver. 6.3.2 and have rather little influence on which version is used - my admin decide about such things. Is this possible to solve this manually?Rem -------------------------------------------------------------------*------------ Remigiusz Sokolowski e-mail: rems@gdansk.sprint.pl * * -----------------------------------------------------------------*****----------
Remigiusz Sokolowski <rems@gdansk.sprint.pl> writes: > But without Vadim's suggestion I've got still similar results - sometimes > optimizer has used index on e1 and on b1, sometimes on e1 and e2, but > never on e1,b1 and e2 at once (note: this was join on ent e1, binds b1 and > ent e2) You won't ever see a three-way join in EXPLAIN's output, if that was what you were looking for. Postgres always handles joins by combining two tables, then combining the result with another table, etc. regards, tom lane