Index problem or function problem? - Mailing list pgsql-performance
From | LIANHE SHAO |
---|---|
Subject | Index problem or function problem? |
Date | |
Msg-id | 4595eb458230.4582304595eb@jhmimail.jhmi.edu Whole thread Raw |
Responses |
Re: Index problem or function problem?
|
List | pgsql-performance |
Hello, Today I met a very strange query problem, which I spend several hours on it but have no clue. To make thing clear, let me write somewhat in detail. I have two almost exactly same queries, except that one is: lower(annotation) = lower (chip), another is: annotation = chip. While the first one can get result in less 10 seconds, the second one will hange for more that 5 minutes. What a big differents !! I checked the indexes, there are both index for lower() and without lower(). I even droped these indexes and recreated them, then use vacuum analyze, reindex, but thing does not change. the query plan give quite different paths. Could somebody give any clues where difference comes from? Thanks a lot. The first query, which get results in less than 10 seconds PGA=> explain select ei.expid, er.geneid, er.sampleid, ei.annotation, si.samplename, ei.title as exp_name, aaa.chip, aaa.sequence_derived_from as accession_number, aaa.gene_symbol, aaa.title as gene_function, er.exprs, er.mas5exprs from expressiondata er, experimentinfo ei, sampleinfo si, affy_array_annotation aaa where exists (select distinct ei.expid from experimentinfo) and lower(ei.annotation) = lower (aaa.chip) and (lower (aaa.title) like '%mif%' or lower(aaa.sequence_description) like '%mif%') and exists (select distinct ei.annotation from experimentinfo) and ei.expid = er.expid and er.expid = si.expid and er.sampleid = si.sampleid and er.geneid = aaa.probeset_id order by si.sampleid limit 20; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- ------------------- Limit (cost=24289.05..24289.10 rows=19 width=256) -> Sort (cost=24289.05..24289.10 rows=19 width=256) Sort Key: si.sampleid -> Hash Join (cost=6.11..24288.64 rows=19 width=256) Hash Cond: ("outer".expid = "inner".expid) Join Filter: ("outer".sampleid = "inner".sampleid) -> Nested Loop (cost=0.00..24278.66 rows=27 width=217) Join Filter: ("outer".expid = "inner".expid) -> Nested Loop (cost=0.00..18378.77 rows=45 width=180) -> Seq Scan on experimentinfo ei (cost=0.00..374.50 rows=5 width=99) Filter: ((subplan) AND (subplan)) SubPlan -> Unique (cost=8.67..8.78 rows=2 width=0) -> Sort (cost=8.67..8.72 rows=21 width=0) Sort Key: $0 -> Seq Scan on experimentinfo (cost=0.00..8.21 rows=21 width=0) -> Unique (cost=8.67..8.78 rows=2 width=0) -> Sort (cost=8.67..8.72 rows=21 width=0) Sort Key: $1 -> Seq Scan on experimentinfo (cost=0.00..8.21 rows=21 width=0) -> Index Scan using affy_array_annotation_lower_chip_idx on affy_array_annotation aaa (cost=0.00..3429.2 4 rows=9 width=81) Index Cond: (lower(("outer".annotation)::text) = lower((aaa.chip)::text)) Filter: ((lower(title) ~~ '%mif%'::text) OR (lower(sequence_description) ~~ '%mif%'::text)) -> Index Scan using expressiondata_geneid_idx on expressiondata er (cost=0.00..130.96 rows=34 width=37) Index Cond: (er.geneid = "outer".probeset_id) -> Hash (cost=4.55..4.55 rows=155 width=39) -> Seq Scan on sampleinfo si (cost=0.00..4.55 rows=155 width=39) (27 rows) ===================== The second query, which hangs. PGA=> explain select ei.expid, er.geneid, er.sampleid, ei.annotation, si.samplename, ei.title as exp_name, aaa.chip, aaa.sequence_derived_from as accession_number, aaa.gene_symbol, aaa.title as gene_function, er.exprs, er.mas5exprs from expressiondata er, experimentinfo ei, sampleinfo si, affy_array_annotation aaa where exists (select distinct ei.expid from experimentinfo) and ei.annotation = aaa.chip and (lower (aaa.title) like '%mif%' or lower(aaa.sequence_description) like '%mif%') and exists (select distinct ei.annotation from experimentinfo) and ei.expid = er.expid and er.expid = si.expid and er.sampleid = si.sampleid and er.geneid = aaa.probeset_id order by si.sampleid limit 20; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=157127.91..157128.38 rows=20 width=256) -> Merge Join (cost=157127.91..157137.33 rows=401 width=256) Merge Cond: (("outer".sampleid = "inner".sampleid) AND ("outer".expid = "inner".expid)) -> Sort (cost=157117.73..157119.11 rows=553 width=217) Sort Key: er.sampleid, er.expid -> Merge Join (cost=154417.78..157092.52 rows=553 width=217) Merge Cond: (("outer".annotation = "inner".chip) AND ("outer".geneid = "inner".probeset_id)) -> Sort (cost=96501.38..97830.62 rows=531694 width=136) Sort Key: ei.annotation, er.geneid -> Nested Loop (cost=0.00..20188.81 rows=531694 width=136) -> Seq Scan on experimentinfo ei (cost=0.00..374.50 rows=5 width=99) Filter: ((subplan) AND (subplan)) SubPlan -> Unique (cost=8.67..8.78 rows=2 width=0) -> Sort (cost=8.67..8.72 rows=21 width=0) Sort Key: $0 -> Seq Scan on experimentinfo (cost=0.00..8.21 rows=21 width=0) -> Unique (cost=8.67..8.78 rows=2 width=0) -> Sort (cost=8.67..8.72 rows=21 width=0) Sort Key: $1 -> Seq Scan on experimentinfo (cost=0.00..8.21 rows=21 width=0) -> Index Scan using expressiondata_expid_idx on expressiondata er (cost=0.00..2508.21 rows=101275 width=37) Index Cond: ("outer".expid = er.expid) -> Sort (cost=57916.40..57920.67 rows=1710 width=81) Sort Key: aaa.chip, aaa.probeset_id -> Seq Scan on affy_array_annotation aaa (cost=0.00..57824.60 rows=1710 width=81) Filter: ((lower(title) ~~ '%mif%'::text) OR (lower(sequence_description) ~~ '%mif%'::text)) -> Sort (cost=10.19..10.58 rows=155 width=39) Sort Key: si.sampleid, si.expid -> Seq Scan on sampleinfo si (cost=0.00..4.55 rows=155 width=39) (30 rows) ================= The related tables: Table "public.experimentinfo" Column | Type | Modifiers ---------------+------------------------+----------- expid | integer | name | character varying(128) | lab | character varying(128) | contact | character varying(128) | title | character varying(128) | abstract | text | nsamples | integer | disease_type | character varying(32) | annotation | character varying(32) | Indexes: experimetininfo_annotation_idx btree (annotation), experimetininfo_lower_annotation_idx btree (lower(annotation)), expinfo btree (expid) Table "public.affy_array_annotation" Column | Type | Modifiers -----------------------------------+------------------------+----------- chip | character varying(32) | not null organism | character varying(24) | annotation_date | character varying(24) | sequence_type | character varying(24) | sequence_source | character varying(32) | sequence_derived_from | character varying(32) | sequence_description | text | sequence_id | text | transcript_id | character varying(32) | group_id | character varying(64) | title | text | gene_symbol | character varying(64) | Indexes: affy_array_annotation_chip_idx btree (chip), affy_array_annotation_idx_gene_symbol btree (gene_symbol), affy_array_annotation_idx_locuslink btree (locuslink), affy_array_annotation_idx_omim btree (omim), affy_array_annotation_idx_pfam btree (pfam), affy_array_annotation_idx_sequence_derived_from btree (sequence_derived_from), affy_array_annotation_idx_sequence_description btree (sequence_description), affy_array_annotation_idx_title btree (title), affy_array_annotation_lower_chip_idx btree (lower(chip)), affy_array_annotation_lower_gene_symbol_idx btree (lower(gene_symbol)), affy_array_annotation_lower_probeset_id_idx btree (lower(probeset_id)), affy_array_annotation_lower_sequence_description_idx btree (lower(sequence_description)), affy_array_annotation_lower_title_idx btree (lower(title)), affy_array_annotation_pkey btree (probeset_id, chip), affy_array_annotation_probeset_id_idx btree (probeset_id), Regards, William
pgsql-performance by date: