Re: Index problem or function problem? - Mailing list pgsql-performance

From Tom Lane
Subject Re: Index problem or function problem?
Date
Msg-id 13828.1071511551@sss.pgh.pa.us
Whole thread Raw
In response to Index problem or function problem?  (LIANHE SHAO <lshao2@jhmi.edu>)
List pgsql-performance
LIANHE SHAO <lshao2@jhmi.edu> writes:
> PGA=> explain select ei.expid, er.geneid,
> er.sampleid, ei.annotation, si.samplename,
> ei.title as exp_name, aaa.chip,
> aaa.sequence_derived_from as accession_number,
> aaa.gene_symbol, aaa.title as gene_function,
> er.exprs, er.mas5exprs from expressiondata er,
> experimentinfo ei, sampleinfo si,
> affy_array_annotation aaa where exists (select
> distinct ei.expid from experimentinfo) and
> ei.annotation =  aaa.chip and (lower (aaa.title)
> like '%mif%' or lower(aaa.sequence_description) like
> '%mif%') and exists (select distinct ei.annotation
> from experimentinfo) and ei.expid = er.expid and
> er.expid = si.expid and er.sampleid = si.sampleid
> and er.geneid = aaa.probeset_id order by si.sampleid
> limit 20;

What is the purpose of the EXISTS() clauses?  They are almost surely not
doing what you intended, because AFAICS they are just an extremely
expensive means of producing a constant-TRUE result.  In
    exists (select distinct ei.expid from experimentinfo)
"ei.expid" is an outer reference, which will necessarily be the same
value over all rows of the sub-select.  After computing this same value
for every row of experimentinfo, the system performs a DISTINCT
operation (sort + unique, not cheap) ... and then all it checks for is
whether at least one row was produced, which means the DISTINCT
operation was completely unnecessary.  The only way the EXISTS could
return false is if experimentinfo were empty, but if it were so then the
outer FROM would've produced no rows and we'd not have got to WHERE
anyway.

I'm not sure why you get a worse plan for the simpler variant of the
query; it would help to see EXPLAIN ANALYZE rather than EXPLAIN output.
But it's not worth trying to improve the performance until you are
calculating correct answers, and I suspect the above is not doing
what you are after at all.

            regards, tom lane

pgsql-performance by date:

Previous
From: Rhaoni Chiu Pereira
Date:
Subject: Unsubscribe
Next
From: Tom Lane
Date:
Subject: Re: TRUNCATE veeeery slow compared to DELETE in 7.4