Re: [Gmod-schema] Re: performace problem after VACUUM - Mailing list pgsql-performance

From Scott Cain
Subject Re: [Gmod-schema] Re: performace problem after VACUUM
Date
Msg-id 1045341390.3944.678.camel@localhost.localdomain
Whole thread Raw
In response to Re: [Gmod-schema] Re: performace problem after VACUUM ANALYZE  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hello Tom,

Here's the short answer: I've got it working much faster now (>100 msec
for the query by explain analyze).

Here's the long answer: I reworked the table, horribly denormalizing
it.  I changed the coordinate system, so that start is always less than
end, regardless of strand.  Here is the original query:

select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
    from feature f, featureloc fl
    where
      fl.srcfeature_id = 1 and
      ((fl.strand=1  and fl.nbeg <= 393164 and fl.nend >= 390956) OR
      (fl.strand=-1 and fl.nend <= 393164 and fl.nbeg >= 390956)) and
      f.feature_id  = fl.feature_id

and here is the equivalent query in the new coordinate system:

 select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
   from feature f, featureloc fl
     where
     fl.srcfeature_id = 1 and
     f.feature_id  = fl.feature_id and
     fl.max >= 390956 and
     fl.min <= 393164

Notice that it is MUCH simpler, and the query planner uses exactly the
indexes I want, and as noted above, runs much faster.  Of course, this
also means that I have to rewrite my database adaptor, but it shouldn't
be too bad.

For those on the GMOD list, here is how I changed the table:

alter table featureloc add column min int;
alter table featureloc add column max int;
update featureloc set min=nbeg where strand=1;
update featureloc set max=nend where strand=1;
update featureloc set max=nbeg where strand=-1;
update featureloc set min=nend where strand=-1;
update featureloc set min=nbeg where (strand=0 or strand is null) and nbeg<nend;
update featureloc set max=nend where (strand=0 or strand is null) and nbeg<nend;
update featureloc set min=nend where (strand=0 or strand is null) and nbeg>nend;
update featureloc set max=nbeg where (strand=0 or strand is null) and nbeg>nend;
create index featureloc_src_min_max on featureloc (srcfeature_id,min,max);
select count(*) from featureloc where min is null and nbeg is not null;

The last select is just a test to make sure I didn't miss anything, and
it did return zero.  Also, it doesn't appear that there are any features
that are strandless.  I found that a little surprising, but included
those updates for completeness.

Tom, thank you much for your help.  Hopefully, I will get the group to
buy into this schema change, and life will be good.

Scott

On Fri, 2003-02-14 at 19:11, Tom Lane wrote:
> SLetovsky@aol.com writes:
> > Am I correct in interpreting your comments as saying you believe that
> > if we could lose the OR and the strand constraint PG would probably
> > use the index properly?
>
> No, I said I thought it could do it without that ;-).  But yes, you'd
> have a much less fragile query if you could lose the OR condition.
>
> Have you looked into using a UNION ALL instead of OR to merge the two
> sets of results?  It sounds grotty, but might be faster...
>
>             regards, tom lane
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


pgsql-performance by date:

Previous
From: Curt Sampson
Date:
Subject: Re: [HACKERS] More benchmarking of wal_buffers
Next
From: Mariusz Czułada
Date:
Subject: Views with unions