Thread: REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

From
Susan Russo
Date:

Hi again,

Very mixed news to report...

Recap:


I'd reported:
> Despite numerous efforts, we're unable to solve a severe performance
>limitation between Pg 7.3.2
> and Pg 8.1.4.
>
> The query and 'explain analyze' plan below, runs in
>     26.20 msec on Pg 7.3.2, and
>     2463.968 ms on Pg 8.1.4,
>


Tom Lane responded:
>You're not getting the indexscan optimization of the LIKE clause, which
>is most likely due to having initdb'd the 8.1 installation in something
>other than C locale.  You can either redo the initdb in C locale (which
>might be a good move to fix other inconsistencies from the 7.3 behavior
>you're used to) or create a varchar_pattern_ops index on the column(s)
>you're using LIKE with.


Steinar H. Gunderson suggested:
>You could always try
>
>  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);


I'd responded:
>>You could always try
>>
>>  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);
>
>WOW!  we're now at runtime 0.367ms on Pg8
>
>Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
>
>Thanks again - will report back soon.


Alvaro Herrera pointed out:
>> Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
>>
>That's alternative to the pattern_ops index; it won't help you obtain a
>plan faster than this one.



Tom concurred:
>>> Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
>>
>> That's alternative to the pattern_ops index; it won't help you obtain
>> a plan faster than this one.
>
>No, but since their old DB was evidently running in C locale, this
>seems like a prudent thing to do to avoid other surprising
>changes in behavior.

=====

We reconfigured the server, as follows:

initdb -D /var/lib/pgsql/data --encoding=UTF8 --locale=C

 -----I'm wondering if this was incorrect (?). our Pg7 servers encode SQL_ASCII -----


NEXT, loaded db, and the good news is the query showed:
    Total runtime:  0.372 ms


As mentioned in original post, this query is just part of a longer procedure.
reminder:
         The longer procedure was taking >10 *hours* to run on Pg8.1.4
                    This same longer procedure runs in ~22 minutes on Pg7.3.2 server.


=====

Before redoing the initdb with C-locale, I did a CREATE INDEX on the 8.1.4
server, which resulted not only in much faster query times, but in a drastic
improvement in the time of the overall/longer procedure (<11mins).

With the initdb locale C Pg8.1.4 server, it ran for 6 hours before I killed it (and output
file was <.25 expected  end size).

======


I'm perplexed we're not seeing better performance on Pg8.1.4 server given the
22 minutes runtime we're seeing on the Pg7.3.2 servers (on older hardware and OS).


So, while initdb  C locale helped the initial query, it seems to have had no positive affect
on the longer procedure.


Is there some other difference between 7.3.2 and 8.1.4  we're missing?


Thanks for any help.
Regards,
Susan Russo


=======
I enclose the db calls (selects) contained in the 'overall procedure' referred to above (taken directly
from a perl script):  THOUGH THIS RUNS IN 22 mins on Pg7.3.2, and >10 hours on Pg8.1.4...

       my $aq = $dbh->prepare(sprintf("SELECT * from dbxref dx, db where accession = '%s' and dx.db_id = db.db_id and
db.name= 'GB_protein'",$rec)); 


        my $pq = $dbh->prepare(sprintf("SELECT o.genus, o.species, f.feature_id, f.uniquename, f.name, accession,
is_currentfrom feature f, feature_dbxref fd, dbxref d, cvterm cvt, organism o where accession = '%s' and d.dbxref_id =
fd.dbxref_idand fd.feature_id = f.feature_id and f.uniquename like '%s' and f.organism_id = o.organism_id and f.type_id
=cvt.cvterm_id and cvt.name = 'gene'",$rec,$fbgnwc)); 


                my $uq = $dbh2->prepare(sprintf("SELECT db.name, accession, version, is_current from feature_dbxref fd,
dbxrefdx, db where fd.feature_id = %d and fd.dbxref_id = dx.dbxref_id and dx.db_id = db.db_id and db.name =
'%s'",$pr{feature_id},$uds{$uh{$rec}{stflag}}));



       my $cq = $dbh2->prepare(sprintf("SELECT f.uniquename, f.name, cvt.name as ntype, dx.db_id, dx.accession,
fd.is_currentfrom dbxref dx, feature f, feature_dbxref fd, cvte 
rm cvt where accession like '%s' and dx.dbxref_id = fd.dbxref_id and fd.feature_id = f.feature_id and f.type_id =
cvt.cvterm_idand cvt.name not in
('gene','protein','natural_transposable_element','chromosome_structure_variation','chromosome_arm','repeat_region')",$nacc));




Re: REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

From
"Steinar H. Gunderson"
Date:
On Thu, May 10, 2007 at 09:23:03AM -0400, Susan Russo wrote:
>        my $aq = $dbh->prepare(sprintf("SELECT * from dbxref dx, db where accession = '%s' and dx.db_id = db.db_id and
db.name= 'GB_protein'",$rec)); 

This is not related to your performance issues, but it usually considered bad
form to use sprintf like this (mainly for security reasons). The usual way of
doing this would be:

  my $aq = $dbh->prepare("SELECT * from dbxref dx, db where accession = ? and dx.db_id = db.db_id and db.name =
'GB_protein'");
  $aq->execute($rec);

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

From
Bill Moran
Date:
In response to Susan Russo <russo@morgan.harvard.edu>:

>
>
> Hi again,
>
> Very mixed news to report...
>
> Recap:
>
>
> I'd reported:
> > Despite numerous efforts, we're unable to solve a severe performance
> >limitation between Pg 7.3.2
> > and Pg 8.1.4.
> >
> > The query and 'explain analyze' plan below, runs in
> >     26.20 msec on Pg 7.3.2, and
> >     2463.968 ms on Pg 8.1.4,
> >
>
>
> Tom Lane responded:
> >You're not getting the indexscan optimization of the LIKE clause, which
> >is most likely due to having initdb'd the 8.1 installation in something
> >other than C locale.  You can either redo the initdb in C locale (which
> >might be a good move to fix other inconsistencies from the 7.3 behavior
> >you're used to) or create a varchar_pattern_ops index on the column(s)
> >you're using LIKE with.
>
>
> Steinar H. Gunderson suggested:
> >You could always try
> >
> >  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);
>
>
> I'd responded:
> >>You could always try
> >>
> >>  CREATE INDEX test_index ON dbxref (accession varchar_pattern_ops);
> >
> >WOW!  we're now at runtime 0.367ms on Pg8
> >
> >Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
> >
> >Thanks again - will report back soon.
>
>
> Alvaro Herrera pointed out:
> >> Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
> >>
> >That's alternative to the pattern_ops index; it won't help you obtain a
> >plan faster than this one.
>
>
>
> Tom concurred:
> >>> Next step is to initdb w/C Locale (tonight) (Thanks Tom et al.!).
> >>
> >> That's alternative to the pattern_ops index; it won't help you obtain
> >> a plan faster than this one.
> >
> >No, but since their old DB was evidently running in C locale, this
> >seems like a prudent thing to do to avoid other surprising
> >changes in behavior.
>
> =====
>
> We reconfigured the server, as follows:
>
> initdb -D /var/lib/pgsql/data --encoding=UTF8 --locale=C
>
>  -----I'm wondering if this was incorrect (?). our Pg7 servers encode SQL_ASCII -----
>
>
> NEXT, loaded db, and the good news is the query showed:
>     Total runtime:  0.372 ms
>
>
> As mentioned in original post, this query is just part of a longer procedure.
> reminder:
>          The longer procedure was taking >10 *hours* to run on Pg8.1.4
>                     This same longer procedure runs in ~22 minutes on Pg7.3.2 server.
>
>
> =====
>
> Before redoing the initdb with C-locale, I did a CREATE INDEX on the 8.1.4
> server, which resulted not only in much faster query times, but in a drastic
> improvement in the time of the overall/longer procedure (<11mins).
>
> With the initdb locale C Pg8.1.4 server, it ran for 6 hours before I killed it (and output
> file was <.25 expected  end size).

Quick reminders:
*) Did you recreate all the indexes on the new system after the initdb?
*) Did you vacuum and analyze after loading your data?

>
> ======
>
>
> I'm perplexed we're not seeing better performance on Pg8.1.4 server given the
> 22 minutes runtime we're seeing on the Pg7.3.2 servers (on older hardware and OS).
>
>
> So, while initdb  C locale helped the initial query, it seems to have had no positive affect
> on the longer procedure.
>
>
> Is there some other difference between 7.3.2 and 8.1.4  we're missing?

I suggest you provide "explain analyze" output for the query on both versions.

>
>
> Thanks for any help.
> Regards,
> Susan Russo
>
>
> =======
> I enclose the db calls (selects) contained in the 'overall procedure' referred to above (taken directly
> from a perl script):  THOUGH THIS RUNS IN 22 mins on Pg7.3.2, and >10 hours on Pg8.1.4...
>
>        my $aq = $dbh->prepare(sprintf("SELECT * from dbxref dx, db where accession = '%s' and dx.db_id = db.db_id and
db.name= 'GB_protein'",$rec)); 
>
>
>         my $pq = $dbh->prepare(sprintf("SELECT o.genus, o.species, f.feature_id, f.uniquename, f.name, accession,
is_currentfrom feature f, feature_dbxref fd, dbxref d, cvterm cvt, organism o where accession = '%s' and d.dbxref_id =
fd.dbxref_idand fd.feature_id = f.feature_id and f.uniquename like '%s' and f.organism_id = o.organism_id and f.type_id
=cvt.cvterm_id and cvt.name = 'gene'",$rec,$fbgnwc)); 
>
>
>                 my $uq = $dbh2->prepare(sprintf("SELECT db.name, accession, version, is_current from feature_dbxref
fd,dbxref dx, db where fd.feature_id = %d and fd.dbxref_id = dx.dbxref_id and dx.db_id = db.db_id and db.name =
'%s'",$pr{feature_id},$uds{$uh{$rec}{stflag}}));
>
>
>
>        my $cq = $dbh2->prepare(sprintf("SELECT f.uniquename, f.name, cvt.name as ntype, dx.db_id, dx.accession,
fd.is_currentfrom dbxref dx, feature f, feature_dbxref fd, cvte 
> rm cvt where accession like '%s' and dx.dbxref_id = fd.dbxref_id and fd.feature_id = f.feature_id and f.type_id =
cvt.cvterm_idand cvt.name not in
('gene','protein','natural_transposable_element','chromosome_structure_variation','chromosome_arm','repeat_region')",$nacc));
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
>
>
>
>
>


--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

From
Susan Russo
Date:
>Quick reminders:
>*) Did you recreate all the indexes on the new system after the initdb?
>*) Did you vacuum and analyze after loading your data?

No, I didn't - am reindexing db now and will run vacuum analyze afterwards.

>I suggest you provide "explain analyze" output for the query on both versions.

Pg8:

-----------------------------------------------------------------
 Merge Join  (cost=151939.73..156342.67 rows=10131 width=1585) (actual time=0.129..0.129 rows=0 loops=1)
   Merge Cond: ("outer".cvterm_id = "inner".type_id)
   ->  Index Scan using cvterm_pkey on cvterm cvt  (cost=0.00..4168.22 rows=32478 width=520) (actual time=0.044..0.044
rows=1loops=1) 
         Filter: (((name)::text <> 'gene'::text) AND ((name)::text <> 'protein'::text) AND ((name)::text <>
'natural_transposable_element'::text) AND ((name)::text <> 'chromosome_structure_variation'::text) AND ((name)::text <>

'chromosome_arm'::text) AND ((name)::text <> 'repeat_region'::text))
   ->  Sort  (cost=151939.73..151965.83 rows=10441 width=1073) (actual time=0.079..0.079 rows=0 loops=1)
         Sort Key: f.type_id
        ->  Nested Loop  (cost=17495.27..151242.80 rows=10441 width=1073) (actual time=0.070..0.070 rows=0 loops=1)
               ->  Hash Join  (cost=17495.27..88325.38 rows=10441 width=525) (actual time=0.068..0.068 rows=0 loops=1)
                     Hash Cond: ("outer".dbxref_id = "inner".dbxref_id)
                    ->  Seq Scan on feature_dbxref fd  (cost=0.00..34182.71 rows=2088171 width=9) (actual
time=0.008..0.008rows=1 loops=1) 
                     ->  Hash  (cost=17466.34..17466.34 rows=11572 width=524) (actual time=0.042..0.042 rows=0 loops=1)
                           ->  Bitmap Heap Scan on dbxref dx  (cost=117.43..17466.34 rows=11572 width=524) (actual
time=0.041..0.041rows=0 loops=1) 
                                 Filter: ((accession)::text ~~ 'AY851043%'::text)
                                 ->  Bitmap Index Scan on dbxref_idx2  (cost=0.00..117.43 rows=11572 width=0) (actual
time=0.037..0.037rows=0 loops=1) 
                                       Index Cond: (((accession)::text >= 'AY851043'::character varying) AND
((accession)::text< 'AY851044'::character varying)) 
               ->  Index Scan using feature_pkey on feature f  (cost=0.00..6.01 rows=1 width=556) (never executed)
                     Index Cond: ("outer".feature_id = f.feature_id)
 Total runtime: 0.381 ms
(18 rows)


=======

Pg7:

-----------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..23.45 rows=1 width=120) (actual time=0.08..0.08 rows=0 loops=1)
   ->  Nested Loop  (cost=0.00..17.49 rows=1 width=82) (actual time=0.08..0.08 rows=0 loops=1)
         ->  Nested Loop  (cost=0.00..11.93 rows=1 width=30) (actual time=0.08..0.08 rows=0 loops=1)
               ->  Index Scan using dbxref_idx2 on dbxref dx  (cost=0.00..5.83 rows=1 width=21) (actual time=0.08..0.08
rows=0loops=1) 
                     Index Cond: ((accession >= 'AY851043'::character varying) AND (accession < 'AY851044'::character
varying))
                     Filter: (accession ~~ 'AY851043%'::text)
               ->  Index Scan using feature_dbxref_idx2 on feature_dbxref fd  (cost=0.00..6.05 rows=5 width=9) (never
executed)
                     Index Cond: (fd.dbxref_id = "outer".dbxref_id)
        ->  Index Scan using feature_pkey on feature f  (cost=0.00..5.54 rows=1 width=52) (never executed)
               Index Cond: ("outer".feature_id = f.feature_id)
   ->  Index Scan using cvterm_pkey on cvterm cvt  (cost=0.00..5.94 rows=1 width=38) (never executed)
         Index Cond: ("outer".type_id = cvt.cvterm_id)
        Filter: ((name <> 'gene'::character varying) AND (name <> 'protein'::character varying) AND (name <>
'natural_transposable_element'::charactervarying) AND (name <> 'chromosome_structure_variation'::character varying) 
AND (name <> 'chromosome_arm'::character varying) AND (name <> 'repeat_region'::character varying))
 Total runtime: 0.36 msec
(14 rows)


Re: REVISIT specific query (not all) on Pg8 MUCH slower than Pg7

From
Susan Russo
Date:
Hello again -
vacuum analyze of db did the trick, thanks!
longer procedure went from over 6 hours to ~11 minutes....quite dramatic.

Reindexing wasn't necessary (did test on one db -slog-slog-, though).

Regards,
Susan