Home > mailing lists

Re: GiST index performance - Mailing list pgsql-performance

From	dforum
Subject	Re: GiST index performance
Date	April 16, 2009 17:26:10
Msg-id	49E76896.7080001@vieonet.com Whole thread Raw
In response to	GiST index performance (Matthew Wakeling <matthew@flymine.org>)
Responses	Re: GiST index performance (Tom Lane <tgl@sss.pgh.pa.us>) Re: GiST index performance (Craig Ringer <craig@postnewspapers.com.au>)
List	pgsql-performance

Tree view

hello,

there is other performance problem on this request.

If you analyse query plan, you see that most of the time are lost during
sequencial scan, and you have 2 seq scan.

You have to create other indexes to match the request.

Postgresq is totally dependant on index to reach is performance.

Regarding gist or btree, I personnaly had better performance with btree.

Regards

david

Matthew Wakeling a écrit :
>
> I have been doing some queries that are best answered with GiST
> indexes, however I have found that their performance is a little
> lacking. I thought I would do a direct comparison on a level playing
> field. Here are two EXPLAIN ANALYSE results for the same query, with
> two different indexes. The two indexes are identical except that one
> is btree and the other GiST.
>
> Here is the query:
>
> SELECT *
> FROM
>     location l1,
>     location l2,
>     gene,
>     primer
> WHERE
>         l1.subjectid <> l2.subjectid
>     AND l1.objectid = l2.objectid
>     AND l1.subjectid = gene.id
>     AND l2.subjectid = primer.id
>     AND l2.intermine_start <= l1.intermine_start
>     AND l2.intermine_end >= l1.intermine_start
>
> Here is the btree index:
>
> CREATE INDEX location_object_start ON location (objectid,
> intermine_start);
>
> QUERY PLAN
> ----------------------------------------------------------------------
>  Hash Join
>    (cost=26213.16..135980894.76 rows=3155740824 width=484)
>    (actual time=2799.260..14256.588 rows=2758 loops=1)
>    Hash Cond: (l1.subjectid = gene.id)
>    ->  Nested Loop
>          (cost=0.00..4364485.01 rows=8891802645 width=324)
>          (actual time=9.748..10418.807 rows=390695 loops=1)
>          Join Filter: (l1.subjectid <> l2.subjectid)
>          ->  Nested Loop
>                (cost=0.00..446862.58 rows=572239 width=259)
>                (actual time=9.720..4226.117 rows=211880 loops=1)
>                ->  Seq Scan on primer
>                      (cost=0.00..15358.80 rows=211880 width=194)
>                      (actual time=9.678..579.877 rows=211880 loops=1)
>                ->  Index Scan using location__key_all on location l2
>                      (cost=0.00..2.00 rows=3 width=65)
>                      (actual time=0.004..0.007 rows=1 loops=211880)
>                      Index Cond: (l2.subjectid = primer.id)
>          ->  Index Scan using location_object_start on location l1
>                (cost=0.00..3.85 rows=150 width=65)
>                (actual time=0.005..0.012 rows=3 loops=211880)
>                Index Cond: ((l1.objectid = l2.objectid) AND
> (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >=
> l1.intermine_start))
>    ->  Hash
>          (cost=20496.96..20496.96 rows=457296 width=160)
>          (actual time=2788.698..2788.698 rows=457296 loops=1)
>          ->  Seq Scan on gene
>                (cost=0.00..20496.96 rows=457296 width=160)
>                (actual time=0.038..1420.604 rows=457296 loops=1)
>  Total runtime: 14263.846 ms
> (13 rows)
>
>
> Here is the GiST index:
>
> CREATE INDEX location_object_start_gist ON location USING gist
> (objectid, intermine_start);
>
> QUERY PLAN
> ------------------------------------------------------------------------
>  Hash Join
>    (cost=26213.16..136159960.32 rows=3155740824 width=484)
>    (actual time=2576.109..2300486.267 rows=2758 loops=1)
>    Hash Cond: (l1.subjectid = gene.id)
>    ->  Nested Loop
>          (cost=0.00..4543550.56 rows=8891802645 width=324)
>          (actual time=366.121..2296668.740 rows=390695 loops=1)
>          Join Filter: (l1.subjectid <> l2.subjectid)
>          ->  Nested Loop
>                (cost=0.00..446862.58 rows=572239 width=259)
>                (actual time=362.774..13423.443 rows=211880 loops=1)
>                ->  Seq Scan on primer
>                      (cost=0.00..15358.80 rows=211880 width=194)
>                      (actual time=319.559..1296.907 rows=211880 loops=1)
>                ->  Index Scan using location__key_all on location l2
>                      (cost=0.00..2.00 rows=3 width=65)
>                      (actual time=0.041..0.045 rows=1 loops=211880)
>                      Index Cond: (l2.subjectid = primer.id)
>          ->  Index Scan using location_object_start_gist on location l1
>                (cost=0.00..4.16 rows=150 width=65)
>                (actual time=3.354..10.757 rows=3 loops=211880)
>                Index Cond: ((l1.objectid = l2.objectid) AND
> (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >=
> l1.intermine_start))
>    ->  Hash
>          (cost=20496.96..20496.96 rows=457296 width=160)
>          (actual time=2157.914..2157.914 rows=457296 loops=1)
>          ->  Seq Scan on gene
>                (cost=0.00..20496.96 rows=457296 width=160)
>                (actual time=3.904..1206.907 rows=457296 loops=1)
>  Total runtime: 2300510.674 ms
> (13 rows)
>
> The query plans are identical except in the type of index used, but
> there is a factor of a few hundred in execute time. Is this the kind
> of factor that would be expected, or is there something amiss? Is this
> seen as something that might be improved in the future?
>
> Matthew
>

pgsql-performance by date:

From: Matthew Wakeling
Date: 16 April 2009, 17:23:54
Subject: Re: GiST index performance

From: Tom Lane
Date: 16 April 2009, 17:52:42
Subject: Re: GiST index performance

Re: GiST index performance - Mailing list pgsql-performance

Previous

Next