Index problem or function problem? - Mailing list pgsql-performance

From LIANHE SHAO
Subject Index problem or function problem?
Date
Msg-id 4595eb458230.4582304595eb@jhmimail.jhmi.edu
Whole thread Raw
Responses Re: Index problem or function problem?
List pgsql-performance
Hello,
Today I met a very strange query problem, which I
spend several hours on it but have no clue. To make
thing clear, let me write somewhat in detail.

I have two almost exactly same queries, except that
one is: lower(annotation) = lower (chip), another
is: annotation = chip. While the first one can get
result in less 10 seconds, the second one will hange
for more that 5 minutes. What a big differents !!

I checked the indexes, there are both index for
lower() and without lower(). I even droped these
indexes and recreated them, then use vacuum analyze,
reindex, but thing does not change. the query plan
give quite different paths.

Could somebody give any clues where difference comes
from?  Thanks a lot.

The first query, which get results in less than 10
seconds

   PGA=> explain select ei.expid, er.geneid,
er.sampleid, ei.annotation, si.samplename,
ei.title as exp_name, aaa.chip,
aaa.sequence_derived_from as accession_number,
aaa.gene_symbol, aaa.title as gene_function,
er.exprs, er.mas5exprs from expressiondata er,
experimentinfo ei, sampleinfo si,
affy_array_annotation aaa where exists (select
distinct ei.expid from experimentinfo) and
lower(ei.annotation) = lower (aaa.chip) and (lower
(aaa.title) like '%mif%' or
lower(aaa.sequence_description) like '%mif%') and
exists (select distinct ei.annotation from
experimentinfo) and ei.expid = er.expid and er.expid
= si.expid and er.sampleid = si.sampleid and
er.geneid = aaa.probeset_id order by si.sampleid
limit 20;

                   QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------
-------------------
 Limit  (cost=24289.05..24289.10 rows=19 width=256)
   ->  Sort  (cost=24289.05..24289.10 rows=19 width=256)
         Sort Key: si.sampleid
         ->  Hash Join  (cost=6.11..24288.64 rows=19
width=256)
               Hash Cond: ("outer".expid =
"inner".expid)
               Join Filter: ("outer".sampleid =
"inner".sampleid)
               ->  Nested Loop  (cost=0.00..24278.66
rows=27 width=217)
                     Join Filter: ("outer".expid =
"inner".expid)
                     ->  Nested Loop
(cost=0.00..18378.77 rows=45 width=180)
                           ->  Seq Scan on
experimentinfo ei  (cost=0.00..374.50 rows=5 width=99)
                                 Filter: ((subplan)
AND (subplan))
                                 SubPlan
                                   ->  Unique
(cost=8.67..8.78 rows=2 width=0)
                                         ->  Sort
(cost=8.67..8.72 rows=21 width=0)
                                               Sort
Key: $0
                                               ->
Seq Scan on experimentinfo  (cost=0.00..8.21 rows=21
width=0)
                                   ->  Unique
(cost=8.67..8.78 rows=2 width=0)
                                         ->  Sort
(cost=8.67..8.72 rows=21 width=0)
                                               Sort
Key: $1
                                               ->
Seq Scan on experimentinfo  (cost=0.00..8.21 rows=21
width=0)
                           ->  Index Scan using
affy_array_annotation_lower_chip_idx on
affy_array_annotation aaa  (cost=0.00..3429.2
4 rows=9 width=81)
                                 Index Cond:
(lower(("outer".annotation)::text) =
lower((aaa.chip)::text))
                                 Filter:
((lower(title) ~~ '%mif%'::text) OR
(lower(sequence_description) ~~ '%mif%'::text))
                     ->  Index Scan using
expressiondata_geneid_idx on expressiondata er
(cost=0.00..130.96 rows=34 width=37)
                           Index Cond: (er.geneid =
"outer".probeset_id)
               ->  Hash  (cost=4.55..4.55 rows=155
width=39)
                     ->  Seq Scan on sampleinfo si
(cost=0.00..4.55 rows=155 width=39)
(27 rows)

=====================
The second query, which hangs.


PGA=> explain select ei.expid, er.geneid,
er.sampleid, ei.annotation, si.samplename,
ei.title as exp_name, aaa.chip,
aaa.sequence_derived_from as accession_number,
aaa.gene_symbol, aaa.title as gene_function,
er.exprs, er.mas5exprs from expressiondata er,
experimentinfo ei, sampleinfo si,
affy_array_annotation aaa where exists (select
distinct ei.expid from experimentinfo) and
ei.annotation =  aaa.chip and (lower (aaa.title)
like '%mif%' or lower(aaa.sequence_description) like
'%mif%') and exists (select distinct ei.annotation
from experimentinfo) and ei.expid = er.expid and
er.expid = si.expid and er.sampleid = si.sampleid
and er.geneid = aaa.probeset_id order by si.sampleid
limit 20;

              QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=157127.91..157128.38 rows=20 width=256)
   ->  Merge Join  (cost=157127.91..157137.33
rows=401 width=256)
         Merge Cond: (("outer".sampleid =
"inner".sampleid) AND ("outer".expid = "inner".expid))
         ->  Sort  (cost=157117.73..157119.11
rows=553 width=217)
               Sort Key: er.sampleid, er.expid
               ->  Merge Join
(cost=154417.78..157092.52 rows=553 width=217)
                     Merge Cond:
(("outer".annotation = "inner".chip) AND
("outer".geneid = "inner".probeset_id))
                     ->  Sort
(cost=96501.38..97830.62 rows=531694 width=136)
                           Sort Key: ei.annotation,
er.geneid
                           ->  Nested Loop
(cost=0.00..20188.81 rows=531694 width=136)
                                 ->  Seq Scan on
experimentinfo ei  (cost=0.00..374.50 rows=5 width=99)
                                       Filter:
((subplan) AND (subplan))
                                       SubPlan
                                         ->  Unique
 (cost=8.67..8.78 rows=2 width=0)
                                               ->
Sort  (cost=8.67..8.72 rows=21 width=0)

 Sort Key: $0

 ->  Seq Scan on experimentinfo  (cost=0.00..8.21
rows=21 width=0)
                                         ->  Unique
 (cost=8.67..8.78 rows=2 width=0)
                                               ->
Sort  (cost=8.67..8.72 rows=21 width=0)

 Sort Key: $1

 ->  Seq Scan on experimentinfo  (cost=0.00..8.21
rows=21 width=0)
                                 ->  Index Scan
using expressiondata_expid_idx on expressiondata er
 (cost=0.00..2508.21 rows=101275 width=37)
                                       Index Cond:
("outer".expid = er.expid)
                     ->  Sort
(cost=57916.40..57920.67 rows=1710 width=81)
                           Sort Key: aaa.chip,
aaa.probeset_id
                           ->  Seq Scan on
affy_array_annotation aaa  (cost=0.00..57824.60
rows=1710 width=81)
                                 Filter:
((lower(title) ~~ '%mif%'::text) OR
(lower(sequence_description) ~~ '%mif%'::text))
         ->  Sort  (cost=10.19..10.58 rows=155 width=39)
               Sort Key: si.sampleid, si.expid
               ->  Seq Scan on sampleinfo si
(cost=0.00..4.55 rows=155 width=39)
(30 rows)

=================
The related tables:

         Table "public.experimentinfo"
    Column     |          Type          | Modifiers
---------------+------------------------+-----------
 expid         | integer                |
 name          | character varying(128) |
 lab           | character varying(128) |
 contact       | character varying(128) |
 title         | character varying(128) |
 abstract      | text                   |
 nsamples      | integer                |
 disease_type  | character varying(32)  |
 annotation    | character varying(32)  |
Indexes: experimetininfo_annotation_idx btree
(annotation),
         experimetininfo_lower_annotation_idx btree
(lower(annotation)),
         expinfo btree (expid)


                Table "public.affy_array_annotation"
              Column               |          Type
        | Modifiers
-----------------------------------+------------------------+-----------
 chip                              | character
varying(32)  | not null
 organism                          | character
varying(24)  |
 annotation_date                   | character
varying(24)  |
 sequence_type                     | character
varying(24)  |
 sequence_source                   | character
varying(32)  |
 sequence_derived_from             | character
varying(32)  |
 sequence_description              | text
        |
 sequence_id                       | text
        |
 transcript_id                     | character
varying(32)  |
 group_id                          | character
varying(64)  |
 title                             | text
        |
 gene_symbol                       | character
varying(64)  |

Indexes: affy_array_annotation_chip_idx btree (chip),
  affy_array_annotation_idx_gene_symbol btree
(gene_symbol),
         affy_array_annotation_idx_locuslink btree
(locuslink),
         affy_array_annotation_idx_omim btree (omim),
         affy_array_annotation_idx_pfam btree (pfam),

affy_array_annotation_idx_sequence_derived_from
btree (sequence_derived_from),

affy_array_annotation_idx_sequence_description btree
(sequence_description),

         affy_array_annotation_idx_title btree (title),

         affy_array_annotation_lower_chip_idx btree
(lower(chip)),
         affy_array_annotation_lower_gene_symbol_idx
btree (lower(gene_symbol)),

         affy_array_annotation_lower_probeset_id_idx
btree (lower(probeset_id)),

affy_array_annotation_lower_sequence_description_idx
btree (lower(sequence_description)),
         affy_array_annotation_lower_title_idx btree
(lower(title)),

         affy_array_annotation_pkey btree
(probeset_id, chip),
         affy_array_annotation_probeset_id_idx btree
(probeset_id),






Regards,
William


pgsql-performance by date:

Previous
From: "Passynkov, Vadim"
Date:
Subject: Async Query Processing on Solaris
Next
From: Mark Kirkwood
Date:
Subject: Solaris Performance (Again)