Thread: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

From
bruc@stone.congenomics.com (Robert E. Bruccoleri)
Date:
Dear Tom,
    I am writing to you because you are the maintainer of the
query optimizer and planner.
    I have found a very significant performance degradation
between PostgreSQL 6.5.3 and 7.1beta3, which will severely impact two
large applications that I have developed and maintain for several
clients.  The performance difference is seen with the use of indices in
SELECT statements, whereby the current release does not make effective
use of the indices and 6.5.3 does. All of these tests were run on a SGI
R10000 Indigo2 system running Irix 6.5. All the regression tests passed
as expected for both versions.
    I have followed the discussion in pgsql-hackers over the previous
months and others have noted some performance problems, and the response
has typically been to VACUUM the tables. Unfortunately, this is not a
practical option for my applications. They are very large -- I have one
table that is 17GB in length, and the applications are used frequently.
More importantly, PostgreSQL 6.5.3 works very, very well without
VACUUM'ing.
    In order to assist you to diagnosing and correcting this
problem, I have prepared a test database that shows the problems. I
will attach three files; the test script, the log from running it on
version 6.5.3, and the log from running it on version 7.1beta3. In
addition, I have setup an anonymous FTP directory on
ftp.congen.com:/pub/pg_perf which contains all of these files as well
as the compressed table dumps used to build the test database. (When
you have finished copying the files, please let me know.)
    The test script creates the database including the necessary
indexing, and then runs EXPLAIN on each of the queries followed by
actually executing the queries with "timex" commands to report elapsed
times.  The striking difference in the query plans is that 7.1 uses
only sequential searches for the SELECT's whereas 6.5.3 uses index
scans. As a result, 7.1 is almost two orders of magnitude slower than
6.5.3 with exactly the same data, schema, and queries.

    I plead with you to revisit this question of performance and
fix PostgreSQL 7.1 to work as well as PostgreSQL 6.5.3 does. I depend
upon PostgreSQL for much of my work, and I do not want to abandon it
because of this performance problem which arose only recently. Thank
you.

+----------------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D.      | Phone: 609 737 6383                |
| President, Congenomics, Inc.     | Fax:   609 737 7528                |
| 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org                |
| P.O. Box 314                     | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534             |                                    |
+----------------------------------+------------------------------------+
#!/bin/csh

createdb perf_test

gunzip <proteins.dmp.gz | timex psql -e perf_test
gunzip <comparisons_4.dmp.gz | timex psql -e perf_test
gunzip <concordance_91.dmp.gz | timex psql -e perf_test

psql -e perf_test <<EOF
explain select * from comparisons_4 where name1 = 'HI0001';
explain select count(*) from comparisons_4 where code = 80003;
explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name
=p.name; 
EOF

timex psql -e -c "select * from comparisons_4 where name1 = 'HI0001'" perf_test
timex psql -e -c "select count(*) from comparisons_4 where code = 80003" perf_test
timex psql -e -c "select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 =
p.name"perf_test 
timex psql -e -c "select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where
c.matched_name= p.name" perf_test 

CREATE TABLE "proteins" (
    "name" character varying(16),
    "organism" text,
    "start_position" int4,
    "last_position" int4,
    "seq" text,
    "purpose" text,
    "alternate_key" character varying(16),
    "comment" text,
    "compared" bool,
    "complement" bool,
    "chromosome" character varying(4),
    "essentiality" float8);
QUERY: CREATE TABLE "proteins" (
    "name" character varying(16),
    "organism" text,
    "start_position" int4,
    "last_position" int4,
    "seq" text,
    "purpose" text,
    "alternate_key" character varying(16),
    "comment" text,
    "compared" bool,
    "complement" bool,
    "chromosome" character varying(4),
    "essentiality" float8);
COPY "proteins" FROM stdin;
QUERY: COPY "proteins" FROM stdin;
CREATE  INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" );
QUERY: CREATE  INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" );
CREATE  INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" );
QUERY: CREATE  INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" );
CREATE
CREATE
CREATE
EOF

real     1:11.42
user        3.15
sys         0.53

CREATE TABLE "comparisons_4" (
    "name1" character varying(16),
    "name2" character varying(16),
    "z_score" float8,
    "expected" float8,
    "local_overlap_ratio" float8,
    "local_overlap_count" int4,
    "overlap_ratio" float8,
    "code" int4);
QUERY: CREATE TABLE "comparisons_4" (
    "name1" character varying(16),
    "name2" character varying(16),
    "z_score" float8,
    "expected" float8,
    "local_overlap_ratio" float8,
    "local_overlap_count" int4,
    "overlap_ratio" float8,
    "code" int4);
COPY "comparisons_4" FROM stdin;
QUERY: COPY "comparisons_4" FROM stdin;
CREATE  INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" );
QUERY: CREATE  INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" );
CREATE  INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" );
QUERY: CREATE  INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" );
CREATE  INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" );
QUERY: CREATE  INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" );
CREATE
CREATE
CREATE
CREATE
EOF

real    16:42.13
user        5.86
sys         0.96

CREATE TABLE "concordance_91" (
    "target_name" character varying(16),
    "matched_name" character varying(16),
    "score" text);
QUERY: CREATE TABLE "concordance_91" (
    "target_name" character varying(16),
    "matched_name" character varying(16),
    "score" text);
REVOKE ALL on "concordance_91" from PUBLIC;
QUERY: REVOKE ALL on "concordance_91" from PUBLIC;
GRANT ALL on "concordance_91" to PUBLIC;
QUERY: GRANT ALL on "concordance_91" to PUBLIC;
COPY "concordance_91" FROM stdin;
QUERY: COPY "concordance_91" FROM stdin;
CREATE
CHANGE
CHANGE
EOF

real        0.30
user        0.02
sys         0.04

explain select * from comparisons_4 where name1 = 'HI0001';
QUERY: explain select * from comparisons_4 where name1 = 'HI0001';
NOTICE:  QUERY PLAN:

Index Scan using comparisons_4_name1 on comparisons_4  (cost=2.05 rows=1 width=64)

explain select count(*) from comparisons_4 where code = 80003;
QUERY: explain select count(*) from comparisons_4 where code = 80003;
NOTICE:  QUERY PLAN:

Aggregate  (cost=2.05 rows=1 width=12)
  ->  Index Scan using comparisons_4_code on comparisons_4  (cost=2.05 rows=1 width=12)

explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
QUERY: explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=4.10 rows=1 width=36)
  ->  Index Scan using comparisons_4_name1 on comparisons_4 c  (cost=2.05 rows=1 width=12)
  ->  Index Scan using protein_names on proteins p  (cost=2.05 rows=36840 width=24)

explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name
=p.name; 
QUERY: explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where
c.matched_name= p.name; 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=2093.00 rows=36840 width=60)
  ->  Seq Scan on concordance_91 c  (cost=43.00 rows=1000 width=36)
  ->  Index Scan using protein_names on proteins p  (cost=2.05 rows=36840 width=24)

EXPLAIN
EXPLAIN
EXPLAIN
EXPLAIN
EOF
QUERY: select * from comparisons_4 where name1 = 'HI0001'
name1 |name2          |z_score|expected|local_overlap_ratio|local_overlap_count|overlap_ratio| code
------+---------------+-------+--------+-------------------+-------------------+-------------+-----
HI0001|PDB2DBV_O      | 1217.4|       0|            0.56716|                335|     0.560468|30012
HI0001|PDB4DBV_O      |   1207|       0|            0.56418|                335|     0.557523|30012
HI0001|PDB2GD1_P      | 1226.4|       0|            0.57015|                335|     0.563423|30012
HI0001|PDB1GAE_O      | 1861.8|       0|            0.83133|                332|     0.814164|30012
HI0001|PDB4GPD_1      | 1357.8|       0|            0.64865|                333|     0.637169|30012
HI0001|HP1346         |  850.3| 6.9e-41|            0.39222|                334|     0.386435|30005
HI0001|TP0844         |  780.3| 5.8e-37|            0.46307|                352|     0.465716|30014
HI0001|PDB1HDG_O      | 1020.4|       0|            0.48024|                329|     0.466074|30012
HI0001|SCPIR-DEBYG1   | 1405.2|       0|             0.6497|                334|     0.640117|30000
HI0001|Rv1436         |  970.4|       0|            0.49558|                339|      0.49558|30010
HI0001|PDB1CER_O      |  949.7|       0|            0.47734|                331|     0.466075|30012
HI0001|PDB1NLH_       |  935.1|       0|            0.46847|                333|     0.458825|30012
HI0001|PDB1GGA_A      |    918|       0|            0.52125|                353|      0.51397|30012
HI0001|PDB1GAD_O      | 1869.5|       0|            0.83434|                332|     0.817112|30012
HI0001|PDB1GYP_A      |  900.1|       0|            0.51275|                353|     0.505589|30012
HI0001|MG301          |  866.7|       0|            0.43155|                336|     0.427731|30004
HI0001|SCSW-G3P1_YEAST| 1425.3|       0|            0.65868|                334|     0.648965|30000
HI0001|ScTDH1         | 1424.6|       0|            0.65868|                334|     0.648965|30013
HI0001|ScTDH2         | 1405.2|       0|             0.6497|                334|     0.640117|30013
HI0001|SCSW-G3P3_YEAST| 1417.5|       0|            0.65868|                334|     0.648965|30000
HI0001|ScTDH3         | 1416.8|       0|            0.65868|                334|     0.648965|30013
HI0001|SCGP-3720      | 1416.8|       0|            0.66168|                334|     0.651921|30000
HI0001|SCGP-E243731   | 1416.8|       0|            0.65868|                334|     0.648965|30000
HI0001|SCSW-G3P2_YEAST| 1405.9|       0|             0.6497|                334|     0.640117|30000
HI0001|SCGP-1008189   | 1424.6|       0|            0.65868|                334|     0.648965|30000
HI0001|SCGP-3726      | 1398.7|       0|             0.6497|                334|     0.640117|30000
HI0001|PDB3GPD_R      | 1432.2|       0|            0.63772|                334|     0.628314|30012
HI0001|HP0921         |  762.6| 5.6e-36|            0.40407|                344|      0.41003|30005
HI0001|MJ1146         |  124.7|     1.9|            0.25094|                267|     0.195338|30007
HI0001|SCGP-3724      | 1371.5|       0|            0.63772|                334|     0.628314|30000
(30 rows)


real        0.18
user        0.02
sys         0.03

QUERY: select count(*) from comparisons_4 where code = 80003
count
-----
 3231
(1 row)


real        0.34
user        0.02
sys         0.03

QUERY: select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name
name   |purpose

                                                                                              

-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
MG263  |hypothetical protein

                                                                                              
HP0652 |phosphoserine phosphatase

                                                                                              
MJ1594 |phosphoserine phosphatase

                                                                                              
MG125  |hypothetical protein

                                                                                              
TP0290 |conserved hypothetical protein

                                                                                              
HI1033 |phosphoserine phosphatase (o-phosphoserine phosphohydrolase)

                                                                                              
HI0597 |hypothetical protein

                                                                                              
Rv3813c|(MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical 30.2
kdprotein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa overlap
                                                                                               
Rv3042c|(MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g.
P06862|SERB_ECOLI.N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE
(EC3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884 
MG265  |hypothetical protein

                                                                                              
(10 rows)


real        0.24
user        0.02
sys         0.03

QUERY: select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name
=p.name 
target_name|matched_name|   score|purpose

                    

-----------+------------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ECinfA     |BSInfA      |0.680556|initiation factor IF-1

                    
ECinfA     |HI0548      | 0.80952|initiation factor IF-1

                    
ECinfA     |HP1298      | 0.61111|translation initiation factor EF-1

                    
ECinfA     |Rv3462c     |0.684936|(MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA results:
identicalto IF1_MYCBO P45957 initiation factor if-1 (72 aa)
                       
ECrpmA     |BB0780      |0.635297|ribosomal protein L27

                    
ECrpmA     |HI0879      | 0.87059|ribosomal protein L27

                    
ECrpmA     |HP0297      |0.613632|ribosomal protein L27

                    
ECrpmA     |Rv2441c     |0.616278|(MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s ribosomal
proteinl27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains PS00831
Ribosomalprotein L27 signature 
(8 rows)


real        0.17
user        0.02
sys         0.03

CREATE DATABASE
CREATE TABLE "proteins" (
    "name" character varying(16),
    "organism" text,
    "start_position" int4,
    "last_position" int4,
    "seq" text,
    "purpose" text,
    "alternate_key" character varying(16),
    "comment" text,
    "compared" bool,
    "complement" bool,
    "chromosome" character varying(4),
    "essentiality" float8);
CREATE
COPY "proteins" FROM stdin;
CREATE  INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" );
CREATE
CREATE  INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" );
CREATE

real     1:04.49
user        3.14
sys         0.57

CREATE TABLE "comparisons_4" (
    "name1" character varying(16),
    "name2" character varying(16),
    "z_score" float8,
    "expected" float8,
    "local_overlap_ratio" float8,
    "local_overlap_count" int4,
    "overlap_ratio" float8,
    "code" int4);
CREATE
COPY "comparisons_4" FROM stdin;
CREATE  INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" );
CREATE
CREATE  INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" );
CREATE
CREATE  INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" );
CREATE

real     7:04.43
user        5.87
sys         1.03

CREATE TABLE "concordance_91" (
    "target_name" character varying(16),
    "matched_name" character varying(16),
    "score" text);
CREATE
REVOKE ALL on "concordance_91" from PUBLIC;
CHANGE
GRANT ALL on "concordance_91" to PUBLIC;
CHANGE
COPY "concordance_91" FROM stdin;

real        0.60
user        0.01
sys         0.03

explain select * from comparisons_4 where name1 = 'HI0001';
NOTICE:  QUERY PLAN:

Seq Scan on comparisons_4  (cost=0.00..15640.81 rows=5918 width=64)

EXPLAIN
explain select count(*) from comparisons_4 where code = 80003;
NOTICE:  QUERY PLAN:

Aggregate  (cost=15655.61..15655.61 rows=1 width=0)
  ->  Seq Scan on comparisons_4  (cost=0.00..15640.81 rows=5918 width=0)

EXPLAIN
explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
NOTICE:  QUERY PLAN:

Merge Join  (cost=22495.22..23029.70 rows=2180283 width=36)
  ->  Sort  (cost=16011.62..16011.62 rows=5918 width=12)
        ->  Seq Scan on comparisons_4 c  (cost=0.00..15640.81 rows=5918 width=12)
  ->  Sort  (cost=6483.60..6483.60 rows=36840 width=24)
        ->  Seq Scan on proteins p  (cost=0.00..3247.40 rows=36840 width=24)

EXPLAIN
explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name
=p.name; 
NOTICE:  QUERY PLAN:

Merge Join  (cost=6553.43..7026.43 rows=368400 width=60)
  ->  Sort  (cost=69.83..69.83 rows=1000 width=36)
        ->  Seq Scan on concordance_91 c  (cost=0.00..20.00 rows=1000 width=36)
  ->  Sort  (cost=6483.60..6483.60 rows=36840 width=24)
        ->  Seq Scan on proteins p  (cost=0.00..3247.40 rows=36840 width=24)

EXPLAIN
select * from comparisons_4 where name1 = 'HI0001'
 name1  |      name2      | z_score | expected | local_overlap_ratio | local_overlap_count | overlap_ratio | code
--------+-----------------+---------+----------+---------------------+---------------------+---------------+-------
 HI0001 | PDB1GAD_O       |  1869.5 |        0 |             0.83434 |                 332 |      0.817112 | 30012
 HI0001 | PDB1GAE_O       |  1861.8 |        0 |             0.83133 |                 332 |      0.814164 | 30012
 HI0001 | PDB3GPD_R       |  1432.2 |        0 |             0.63772 |                 334 |      0.628314 | 30012
 HI0001 | SCSW-G3P1_YEAST |  1425.3 |        0 |             0.65868 |                 334 |      0.648965 | 30000
 HI0001 | SCGP-1008189    |  1424.6 |        0 |             0.65868 |                 334 |      0.648965 | 30000
 HI0001 | ScTDH1          |  1424.6 |        0 |             0.65868 |                 334 |      0.648965 | 30013
 HI0001 | SCSW-G3P3_YEAST |  1417.5 |        0 |             0.65868 |                 334 |      0.648965 | 30000
 HI0001 | ScTDH3          |  1416.8 |        0 |             0.65868 |                 334 |      0.648965 | 30013
 HI0001 | SCGP-3720       |  1416.8 |        0 |             0.66168 |                 334 |      0.651921 | 30000
 HI0001 | SCGP-E243731    |  1416.8 |        0 |             0.65868 |                 334 |      0.648965 | 30000
 HI0001 | SCSW-G3P2_YEAST |  1405.9 |        0 |              0.6497 |                 334 |      0.640117 | 30000
 HI0001 | ScTDH2          |  1405.2 |        0 |              0.6497 |                 334 |      0.640117 | 30013
 HI0001 | SCPIR-DEBYG1    |  1405.2 |        0 |              0.6497 |                 334 |      0.640117 | 30000
 HI0001 | SCGP-3726       |  1398.7 |        0 |              0.6497 |                 334 |      0.640117 | 30000
 HI0001 | SCGP-3724       |  1371.5 |        0 |             0.63772 |                 334 |      0.628314 | 30000
 HI0001 | PDB4GPD_1       |  1357.8 |        0 |             0.64865 |                 333 |      0.637169 | 30012
 HI0001 | PDB2GD1_P       |  1226.4 |        0 |             0.57015 |                 335 |      0.563423 | 30012
 HI0001 | PDB2DBV_O       |  1217.4 |        0 |             0.56716 |                 335 |      0.560468 | 30012
 HI0001 | PDB4DBV_O       |    1207 |        0 |             0.56418 |                 335 |      0.557523 | 30012
 HI0001 | PDB1HDG_O       |  1020.4 |        0 |             0.48024 |                 329 |      0.466074 | 30012
 HI0001 | Rv1436          |   970.4 |        0 |             0.49558 |                 339 |       0.49558 | 30010
 HI0001 | PDB1CER_O       |   949.7 |        0 |             0.47734 |                 331 |      0.466075 | 30012
 HI0001 | PDB1NLH_        |   935.1 |        0 |             0.46847 |                 333 |      0.458825 | 30012
 HI0001 | PDB1GGA_A       |     918 |        0 |             0.52125 |                 353 |       0.51397 | 30012
 HI0001 | PDB1GYP_A       |   900.1 |        0 |             0.51275 |                 353 |      0.505589 | 30012
 HI0001 | MG301           |   866.7 |        0 |             0.43155 |                 336 |      0.427731 | 30004
 HI0001 | HP1346          |   850.3 |  6.9e-41 |             0.39222 |                 334 |      0.386435 | 30005
 HI0001 | TP0844          |   780.3 |  5.8e-37 |             0.46307 |                 352 |      0.465716 | 30014
 HI0001 | HP0921          |   762.6 |  5.6e-36 |             0.40407 |                 344 |       0.41003 | 30005
 HI0001 | MJ1146          |   124.7 |      1.9 |             0.25094 |                 267 |      0.195338 | 30007
(30 rows)


real       22.68
user        0.01
sys         0.03

select count(*) from comparisons_4 where code = 80003
 count
-------
  3231
(1 row)


real       21.49
user        0.01
sys         0.03

select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name
  name   |
                                                  purpose
                                                                                                  

---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HI0597  | hypothetical protein
 HI1033  | phosphoserine phosphatase (o-phosphoserine phosphohydrolase)
 HP0652  | phosphoserine phosphatase
 MG125   | hypothetical protein
 MG263   | hypothetical protein
 MG265   | hypothetical protein
 MJ1594  | phosphoserine phosphatase
 Rv3042c | (MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g.
P06862|SERB_ECOLI.N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE
(EC3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884 
 Rv3813c | (MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical
30.2kd protein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa
overlap
 TP0290  | conserved hypothetical protein
(10 rows)


real       23.13
user        0.01
sys         0.03

select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name =
p.name
 target_name | matched_name |  score   |
                              purpose
                            

-------------+--------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ECrpmA      | BB0780       | 0.635297 | ribosomal protein L27
 ECinfA      | BSInfA       | 0.680556 | initiation factor IF-1
 ECinfA      | HI0548       | 0.80952  | initiation factor IF-1
 ECrpmA      | HI0879       | 0.87059  | ribosomal protein L27
 ECrpmA      | HP0297       | 0.613632 | ribosomal protein L27
 ECinfA      | HP1298       | 0.61111  | translation initiation factor EF-1
 ECrpmA      | Rv2441c      | 0.616278 | (MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s
ribosomalprotein l27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains
PS00831Ribosomal protein L27 signature 
 ECinfA      | Rv3462c      | 0.684936 | (MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA
results:identical to IF1_MYCBO P45957 initiation factor if-1 (72 aa) 
(8 rows)


real       11.16
user        0.01
sys         0.03


Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

From
Tom Lane
Date:
bruc@stone.congenomics.com (Robert E. Bruccoleri) writes:
>     I have followed the discussion in pgsql-hackers over the previous
> months and others have noted some performance problems, and the response
> has typically been to VACUUM the tables. Unfortunately, this is not a
> practical option for my applications. They are very large -- I have one
> table that is 17GB in length, and the applications are used frequently.

You can't afford to run a VACUUM ANALYZE even once in the lifetime of
the table?

> More importantly, PostgreSQL 6.5.3 works very, very well without
> VACUUM'ing.

6.5 effectively assumes that "foo = constant" will select exactly one
row, if it has no statistics to prove otherwise.  I don't regard that
as a well-chosen default, even if it does happen to work OK for your
application.  Selecting an indexscan when a seqscan is needed is just
as evil as doing the reverse; what's much worse is that 6.5 will
pick incredibly bad join plans (ie, nested loops) because it thinks
that very little data is coming out of the scans.

If you want to revert to the 6.5 behavior without doing a VACUUM, you
could probably get pretty close withupdate pg_attribute set attdispersion = -1.0;

Stats-gathering and planning certainly does need a great deal of
additional work, but I'm afraid that none of that will happen before
7.1.
        regards, tom lane


Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

From
mlw
Date:
"Robert E. Bruccoleri" wrote:

You can try starting postmaster with the "-o -fs" option. This will disable sequential scans if there is an index.
Thereis also an environment variable you can set, prior to the operation. I have run into this same problem.
 


> Dear Tom,
>         I am writing to you because you are the maintainer of the
> query optimizer and planner.
>         I have found a very significant performance degradation
> between PostgreSQL 6.5.3 and 7.1beta3, which will severely impact two
> large applications that I have developed and maintain for several
> clients.  The performance difference is seen with the use of indices in
> SELECT statements, whereby the current release does not make effective
> use of the indices and 6.5.3 does. All of these tests were run on a SGI
> R10000 Indigo2 system running Irix 6.5. All the regression tests passed
> as expected for both versions.
>         I have followed the discussion in pgsql-hackers over the previous
> months and others have noted some performance problems, and the response
> has typically been to VACUUM the tables. Unfortunately, this is not a
> practical option for my applications. They are very large -- I have one
> table that is 17GB in length, and the applications are used frequently.
> More importantly, PostgreSQL 6.5.3 works very, very well without
> VACUUM'ing.
>         In order to assist you to diagnosing and correcting this
> problem, I have prepared a test database that shows the problems. I
> will attach three files; the test script, the log from running it on
> version 6.5.3, and the log from running it on version 7.1beta3. In
> addition, I have setup an anonymous FTP directory on
> ftp.congen.com:/pub/pg_perf which contains all of these files as well
> as the compressed table dumps used to build the test database. (When
> you have finished copying the files, please let me know.)
>         The test script creates the database including the necessary
> indexing, and then runs EXPLAIN on each of the queries followed by
> actually executing the queries with "timex" commands to report elapsed
> times.  The striking difference in the query plans is that 7.1 uses
> only sequential searches for the SELECT's whereas 6.5.3 uses index
> scans. As a result, 7.1 is almost two orders of magnitude slower than
> 6.5.3 with exactly the same data, schema, and queries.
>
>         I plead with you to revisit this question of performance and
> fix PostgreSQL 7.1 to work as well as PostgreSQL 6.5.3 does. I depend
> upon PostgreSQL for much of my work, and I do not want to abandon it
> because of this performance problem which arose only recently. Thank
> you.
>
> +----------------------------------+------------------------------------+
> | Robert E. Bruccoleri, Ph.D.      | Phone: 609 737 6383                |
> | President, Congenomics, Inc.     | Fax:   609 737 7528                |
> | 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org                |
> | P.O. Box 314                     | URL:   http://www.congen.com/~bruc |
> | Pennington, NJ 08534             |                                    |
> +----------------------------------+------------------------------------+
>
>   ------------------------------------------------------------------------
> #!/bin/csh
>
> createdb perf_test
>
> gunzip <proteins.dmp.gz | timex psql -e perf_test
> gunzip <comparisons_4.dmp.gz | timex psql -e perf_test
> gunzip <concordance_91.dmp.gz | timex psql -e perf_test
>
> psql -e perf_test <<EOF
> explain select * from comparisons_4 where name1 = 'HI0001';
> explain select count(*) from comparisons_4 where code = 80003;
> explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
> explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where
c.matched_name= p.name;
 
> EOF
>
> timex psql -e -c "select * from comparisons_4 where name1 = 'HI0001'" perf_test
> timex psql -e -c "select count(*) from comparisons_4 where code = 80003" perf_test
> timex psql -e -c "select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 =
p.name"perf_test
 
> timex psql -e -c "select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where
c.matched_name= p.name" perf_test
 
>
>   ------------------------------------------------------------------------
> CREATE TABLE "proteins" (
>         "name" character varying(16),
>         "organism" text,
>         "start_position" int4,
>         "last_position" int4,
>         "seq" text,
>         "purpose" text,
>         "alternate_key" character varying(16),
>         "comment" text,
>         "compared" bool,
>         "complement" bool,
>         "chromosome" character varying(4),
>         "essentiality" float8);
> QUERY: CREATE TABLE "proteins" (
>         "name" character varying(16),
>         "organism" text,
>         "start_position" int4,
>         "last_position" int4,
>         "seq" text,
>         "purpose" text,
>         "alternate_key" character varying(16),
>         "comment" text,
>         "compared" bool,
>         "complement" bool,
>         "chromosome" character varying(4),
>         "essentiality" float8);
> COPY "proteins" FROM stdin;
> QUERY: COPY "proteins" FROM stdin;
> CREATE  INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" );
> QUERY: CREATE  INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" );
> CREATE  INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" );
> QUERY: CREATE  INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" );
> CREATE
> CREATE
> CREATE
> EOF
>
> real     1:11.42
> user        3.15
> sys         0.53
>
> CREATE TABLE "comparisons_4" (
>         "name1" character varying(16),
>         "name2" character varying(16),
>         "z_score" float8,
>         "expected" float8,
>         "local_overlap_ratio" float8,
>         "local_overlap_count" int4,
>         "overlap_ratio" float8,
>         "code" int4);
> QUERY: CREATE TABLE "comparisons_4" (
>         "name1" character varying(16),
>         "name2" character varying(16),
>         "z_score" float8,
>         "expected" float8,
>         "local_overlap_ratio" float8,
>         "local_overlap_count" int4,
>         "overlap_ratio" float8,
>         "code" int4);
> COPY "comparisons_4" FROM stdin;
> QUERY: COPY "comparisons_4" FROM stdin;
> CREATE  INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" );
> QUERY: CREATE  INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" );
> CREATE  INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" );
> QUERY: CREATE  INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" );
> CREATE  INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" );
> QUERY: CREATE  INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" );
> CREATE
> CREATE
> CREATE
> CREATE
> EOF
>
> real    16:42.13
> user        5.86
> sys         0.96
>
> CREATE TABLE "concordance_91" (
>         "target_name" character varying(16),
>         "matched_name" character varying(16),
>         "score" text);
> QUERY: CREATE TABLE "concordance_91" (
>         "target_name" character varying(16),
>         "matched_name" character varying(16),
>         "score" text);
> REVOKE ALL on "concordance_91" from PUBLIC;
> QUERY: REVOKE ALL on "concordance_91" from PUBLIC;
> GRANT ALL on "concordance_91" to PUBLIC;
> QUERY: GRANT ALL on "concordance_91" to PUBLIC;
> COPY "concordance_91" FROM stdin;
> QUERY: COPY "concordance_91" FROM stdin;
> CREATE
> CHANGE
> CHANGE
> EOF
>
> real        0.30
> user        0.02
> sys         0.04
>
> explain select * from comparisons_4 where name1 = 'HI0001';
> QUERY: explain select * from comparisons_4 where name1 = 'HI0001';
> NOTICE:  QUERY PLAN:
>
> Index Scan using comparisons_4_name1 on comparisons_4  (cost=2.05 rows=1 width=64)
>
> explain select count(*) from comparisons_4 where code = 80003;
> QUERY: explain select count(*) from comparisons_4 where code = 80003;
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=2.05 rows=1 width=12)
>   ->  Index Scan using comparisons_4_code on comparisons_4  (cost=2.05 rows=1 width=12)
>
> explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
> QUERY: explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 =
p.name;
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=4.10 rows=1 width=36)
>   ->  Index Scan using comparisons_4_name1 on comparisons_4 c  (cost=2.05 rows=1 width=12)
>   ->  Index Scan using protein_names on proteins p  (cost=2.05 rows=36840 width=24)
>
> explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where
c.matched_name= p.name;
 
> QUERY: explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where
c.matched_name= p.name;
 
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=2093.00 rows=36840 width=60)
>   ->  Seq Scan on concordance_91 c  (cost=43.00 rows=1000 width=36)
>   ->  Index Scan using protein_names on proteins p  (cost=2.05 rows=36840 width=24)
>
> EXPLAIN
> EXPLAIN
> EXPLAIN
> EXPLAIN
> EOF
> QUERY: select * from comparisons_4 where name1 = 'HI0001'
> name1 |name2          |z_score|expected|local_overlap_ratio|local_overlap_count|overlap_ratio| code
> ------+---------------+-------+--------+-------------------+-------------------+-------------+-----
> HI0001|PDB2DBV_O      | 1217.4|       0|            0.56716|                335|     0.560468|30012
> HI0001|PDB4DBV_O      |   1207|       0|            0.56418|                335|     0.557523|30012
> HI0001|PDB2GD1_P      | 1226.4|       0|            0.57015|                335|     0.563423|30012
> HI0001|PDB1GAE_O      | 1861.8|       0|            0.83133|                332|     0.814164|30012
> HI0001|PDB4GPD_1      | 1357.8|       0|            0.64865|                333|     0.637169|30012
> HI0001|HP1346         |  850.3| 6.9e-41|            0.39222|                334|     0.386435|30005
> HI0001|TP0844         |  780.3| 5.8e-37|            0.46307|                352|     0.465716|30014
> HI0001|PDB1HDG_O      | 1020.4|       0|            0.48024|                329|     0.466074|30012
> HI0001|SCPIR-DEBYG1   | 1405.2|       0|             0.6497|                334|     0.640117|30000
> HI0001|Rv1436         |  970.4|       0|            0.49558|                339|      0.49558|30010
> HI0001|PDB1CER_O      |  949.7|       0|            0.47734|                331|     0.466075|30012
> HI0001|PDB1NLH_       |  935.1|       0|            0.46847|                333|     0.458825|30012
> HI0001|PDB1GGA_A      |    918|       0|            0.52125|                353|      0.51397|30012
> HI0001|PDB1GAD_O      | 1869.5|       0|            0.83434|                332|     0.817112|30012
> HI0001|PDB1GYP_A      |  900.1|       0|            0.51275|                353|     0.505589|30012
> HI0001|MG301          |  866.7|       0|            0.43155|                336|     0.427731|30004
> HI0001|SCSW-G3P1_YEAST| 1425.3|       0|            0.65868|                334|     0.648965|30000
> HI0001|ScTDH1         | 1424.6|       0|            0.65868|                334|     0.648965|30013
> HI0001|ScTDH2         | 1405.2|       0|             0.6497|                334|     0.640117|30013
> HI0001|SCSW-G3P3_YEAST| 1417.5|       0|            0.65868|                334|     0.648965|30000
> HI0001|ScTDH3         | 1416.8|       0|            0.65868|                334|     0.648965|30013
> HI0001|SCGP-3720      | 1416.8|       0|            0.66168|                334|     0.651921|30000
> HI0001|SCGP-E243731   | 1416.8|       0|            0.65868|                334|     0.648965|30000
> HI0001|SCSW-G3P2_YEAST| 1405.9|       0|             0.6497|                334|     0.640117|30000
> HI0001|SCGP-1008189   | 1424.6|       0|            0.65868|                334|     0.648965|30000
> HI0001|SCGP-3726      | 1398.7|       0|             0.6497|                334|     0.640117|30000
> HI0001|PDB3GPD_R      | 1432.2|       0|            0.63772|                334|     0.628314|30012
> HI0001|HP0921         |  762.6| 5.6e-36|            0.40407|                344|      0.41003|30005
> HI0001|MJ1146         |  124.7|     1.9|            0.25094|                267|     0.195338|30007
> HI0001|SCGP-3724      | 1371.5|       0|            0.63772|                334|     0.628314|30000
> (30 rows)
>
> real        0.18
> user        0.02
> sys         0.03
>
> QUERY: select count(*) from comparisons_4 where code = 80003
> count
> -----
>  3231
> (1 row)
>
> real        0.34
> user        0.02
> sys         0.03
>
> QUERY: select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name
> name   |purpose
>
-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> MG263  |hypothetical protein
> HP0652 |phosphoserine phosphatase
> MJ1594 |phosphoserine phosphatase
> MG125  |hypothetical protein
> TP0290 |conserved hypothetical protein
> HI1033 |phosphoserine phosphatase (o-phosphoserine phosphohydrolase)
> HI0597 |hypothetical protein
> Rv3813c|(MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical
30.2kd protein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa
overlap
> Rv3042c|(MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g.
P06862|SERB_ECOLI.N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE
(EC3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884
 
> MG265  |hypothetical protein
> (10 rows)
>
> real        0.24
> user        0.02
> sys         0.03
>
> QUERY: select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where
c.matched_name= p.name
 
> target_name|matched_name|   score|purpose
>
-----------+------------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ECinfA     |BSInfA      |0.680556|initiation factor IF-1
> ECinfA     |HI0548      | 0.80952|initiation factor IF-1
> ECinfA     |HP1298      | 0.61111|translation initiation factor EF-1
> ECinfA     |Rv3462c     |0.684936|(MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA results:
identicalto IF1_MYCBO P45957 initiation factor if-1 (72 aa)
 
> ECrpmA     |BB0780      |0.635297|ribosomal protein L27
> ECrpmA     |HI0879      | 0.87059|ribosomal protein L27
> ECrpmA     |HP0297      |0.613632|ribosomal protein L27
> ECrpmA     |Rv2441c     |0.616278|(MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s ribosomal
proteinl27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains PS00831
Ribosomalprotein L27 signature
 
> (8 rows)
>
> real        0.17
> user        0.02
> sys         0.03
>
>   ------------------------------------------------------------------------
> CREATE DATABASE
> CREATE TABLE "proteins" (
>         "name" character varying(16),
>         "organism" text,
>         "start_position" int4,
>         "last_position" int4,
>         "seq" text,
>         "purpose" text,
>         "alternate_key" character varying(16),
>         "comment" text,
>         "compared" bool,
>         "complement" bool,
>         "chromosome" character varying(4),
>         "essentiality" float8);
> CREATE
> COPY "proteins" FROM stdin;
> CREATE  INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" );
> CREATE
> CREATE  INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" );
> CREATE
>
> real     1:04.49
> user        3.14
> sys         0.57
>
> CREATE TABLE "comparisons_4" (
>         "name1" character varying(16),
>         "name2" character varying(16),
>         "z_score" float8,
>         "expected" float8,
>         "local_overlap_ratio" float8,
>         "local_overlap_count" int4,
>         "overlap_ratio" float8,
>         "code" int4);
> CREATE
> COPY "comparisons_4" FROM stdin;
> CREATE  INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" );
> CREATE
> CREATE  INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" );
> CREATE
> CREATE  INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" );
> CREATE
>
> real     7:04.43
> user        5.87
> sys         1.03
>
> CREATE TABLE "concordance_91" (
>         "target_name" character varying(16),
>         "matched_name" character varying(16),
>         "score" text);
> CREATE
> REVOKE ALL on "concordance_91" from PUBLIC;
> CHANGE
> GRANT ALL on "concordance_91" to PUBLIC;
> CHANGE
> COPY "concordance_91" FROM stdin;
>
> real        0.60
> user        0.01
> sys         0.03
>
> explain select * from comparisons_4 where name1 = 'HI0001';
> NOTICE:  QUERY PLAN:
>
> Seq Scan on comparisons_4  (cost=0.00..15640.81 rows=5918 width=64)
>
> EXPLAIN
> explain select count(*) from comparisons_4 where code = 80003;
> NOTICE:  QUERY PLAN:
>
> Aggregate  (cost=15655.61..15655.61 rows=1 width=0)
>   ->  Seq Scan on comparisons_4  (cost=0.00..15640.81 rows=5918 width=0)
>
> EXPLAIN
> explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name;
> NOTICE:  QUERY PLAN:
>
> Merge Join  (cost=22495.22..23029.70 rows=2180283 width=36)
>   ->  Sort  (cost=16011.62..16011.62 rows=5918 width=12)
>         ->  Seq Scan on comparisons_4 c  (cost=0.00..15640.81 rows=5918 width=12)
>   ->  Sort  (cost=6483.60..6483.60 rows=36840 width=24)
>         ->  Seq Scan on proteins p  (cost=0.00..3247.40 rows=36840 width=24)
>
> EXPLAIN
> explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where
c.matched_name= p.name;
 
> NOTICE:  QUERY PLAN:
>
> Merge Join  (cost=6553.43..7026.43 rows=368400 width=60)
>   ->  Sort  (cost=69.83..69.83 rows=1000 width=36)
>         ->  Seq Scan on concordance_91 c  (cost=0.00..20.00 rows=1000 width=36)
>   ->  Sort  (cost=6483.60..6483.60 rows=36840 width=24)
>         ->  Seq Scan on proteins p  (cost=0.00..3247.40 rows=36840 width=24)
>
> EXPLAIN
> select * from comparisons_4 where name1 = 'HI0001'
>  name1  |      name2      | z_score | expected | local_overlap_ratio | local_overlap_count | overlap_ratio | code
> --------+-----------------+---------+----------+---------------------+---------------------+---------------+-------
>  HI0001 | PDB1GAD_O       |  1869.5 |        0 |             0.83434 |                 332 |      0.817112 | 30012
>  HI0001 | PDB1GAE_O       |  1861.8 |        0 |             0.83133 |                 332 |      0.814164 | 30012
>  HI0001 | PDB3GPD_R       |  1432.2 |        0 |             0.63772 |                 334 |      0.628314 | 30012
>  HI0001 | SCSW-G3P1_YEAST |  1425.3 |        0 |             0.65868 |                 334 |      0.648965 | 30000
>  HI0001 | SCGP-1008189    |  1424.6 |        0 |             0.65868 |                 334 |      0.648965 | 30000
>  HI0001 | ScTDH1          |  1424.6 |        0 |             0.65868 |                 334 |      0.648965 | 30013
>  HI0001 | SCSW-G3P3_YEAST |  1417.5 |        0 |             0.65868 |                 334 |      0.648965 | 30000
>  HI0001 | ScTDH3          |  1416.8 |        0 |             0.65868 |                 334 |      0.648965 | 30013
>  HI0001 | SCGP-3720       |  1416.8 |        0 |             0.66168 |                 334 |      0.651921 | 30000
>  HI0001 | SCGP-E243731    |  1416.8 |        0 |             0.65868 |                 334 |      0.648965 | 30000
>  HI0001 | SCSW-G3P2_YEAST |  1405.9 |        0 |              0.6497 |                 334 |      0.640117 | 30000
>  HI0001 | ScTDH2          |  1405.2 |        0 |              0.6497 |                 334 |      0.640117 | 30013
>  HI0001 | SCPIR-DEBYG1    |  1405.2 |        0 |              0.6497 |                 334 |      0.640117 | 30000
>  HI0001 | SCGP-3726       |  1398.7 |        0 |              0.6497 |                 334 |      0.640117 | 30000
>  HI0001 | SCGP-3724       |  1371.5 |        0 |             0.63772 |                 334 |      0.628314 | 30000
>  HI0001 | PDB4GPD_1       |  1357.8 |        0 |             0.64865 |                 333 |      0.637169 | 30012
>  HI0001 | PDB2GD1_P       |  1226.4 |        0 |             0.57015 |                 335 |      0.563423 | 30012
>  HI0001 | PDB2DBV_O       |  1217.4 |        0 |             0.56716 |                 335 |      0.560468 | 30012
>  HI0001 | PDB4DBV_O       |    1207 |        0 |             0.56418 |                 335 |      0.557523 | 30012
>  HI0001 | PDB1HDG_O       |  1020.4 |        0 |             0.48024 |                 329 |      0.466074 | 30012
>  HI0001 | Rv1436          |   970.4 |        0 |             0.49558 |                 339 |       0.49558 | 30010
>  HI0001 | PDB1CER_O       |   949.7 |        0 |             0.47734 |                 331 |      0.466075 | 30012
>  HI0001 | PDB1NLH_        |   935.1 |        0 |             0.46847 |                 333 |      0.458825 | 30012
>  HI0001 | PDB1GGA_A       |     918 |        0 |             0.52125 |                 353 |       0.51397 | 30012
>  HI0001 | PDB1GYP_A       |   900.1 |        0 |             0.51275 |                 353 |      0.505589 | 30012
>  HI0001 | MG301           |   866.7 |        0 |             0.43155 |                 336 |      0.427731 | 30004
>  HI0001 | HP1346          |   850.3 |  6.9e-41 |             0.39222 |                 334 |      0.386435 | 30005
>  HI0001 | TP0844          |   780.3 |  5.8e-37 |             0.46307 |                 352 |      0.465716 | 30014
>  HI0001 | HP0921          |   762.6 |  5.6e-36 |             0.40407 |                 344 |       0.41003 | 30005
>  HI0001 | MJ1146          |   124.7 |      1.9 |             0.25094 |                 267 |      0.195338 | 30007
> (30 rows)
>
> real       22.68
> user        0.01
> sys         0.03
>
> select count(*) from comparisons_4 where code = 80003
>  count
> -------
>   3231
> (1 row)
>
> real       21.49
> user        0.01
> sys         0.03
>
> select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name
>   name   |
                                                    purpose
 
>
---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  HI0597  | hypothetical protein
>  HI1033  | phosphoserine phosphatase (o-phosphoserine phosphohydrolase)
>  HP0652  | phosphoserine phosphatase
>  MG125   | hypothetical protein
>  MG263   | hypothetical protein
>  MG265   | hypothetical protein
>  MJ1594  | phosphoserine phosphatase
>  Rv3042c | (MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g.
P06862|SERB_ECOLI.N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE
(EC3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884 
>  Rv3813c | (MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical
30.2kd protein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa
overlap
>  TP0290  | conserved hypothetical protein
> (10 rows)
>
> real       23.13
> user        0.01
> sys         0.03
>
> select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name =
p.name
>  target_name | matched_name |  score   |
                                purpose
 
>
-------------+--------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  ECrpmA      | BB0780       | 0.635297 | ribosomal protein L27
>  ECinfA      | BSInfA       | 0.680556 | initiation factor IF-1
>  ECinfA      | HI0548       | 0.80952  | initiation factor IF-1
>  ECrpmA      | HI0879       | 0.87059  | ribosomal protein L27
>  ECrpmA      | HP0297       | 0.613632 | ribosomal protein L27
>  ECinfA      | HP1298       | 0.61111  | translation initiation factor EF-1
>  ECrpmA      | Rv2441c      | 0.616278 | (MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s
ribosomalprotein l27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains
PS00831Ribosomal protein L27 signature
 
>  ECinfA      | Rv3462c      | 0.684936 | (MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA
results:identical to IF1_MYCBO P45957 initiation factor if-1 (72 aa)
 
> (8 rows)
>
> real       11.16
> user        0.01
> sys         0.03



Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

From
bruc@stone.congenomics.com (Robert E. Bruccoleri)
Date:
Dear Tom,

> You can't afford to run a VACUUM ANALYZE even once in the lifetime of
> the table?

Not very often at best, and certainly not routinely. Some of my
tables exceed 10GB and have multiple indices.

However, to test your suggestion, I modified my performance test
script to "VACUUM ANALYZE" all the tables prior to invoking EXPLAIN,
and it improves all of the searches except this one (EXPLAIN output
also included):

explain select count(*) from comparisons_4 where code = 80003;
NOTICE:  QUERY PLAN:

Aggregate  (cost=15659.29..15659.29 rows=1 width=0) ->  Seq Scan on comparisons_4  (cost=0.00..15640.81 rows=7391
width=0)

EXPLAIN

The choice of sequential scan here takes 30x longer.

> 
> > More importantly, PostgreSQL 6.5.3 works very, very well without
> > VACUUM'ing.
> 
> 6.5 effectively assumes that "foo = constant" will select exactly one
> row, if it has no statistics to prove otherwise.  I don't regard that
> as a well-chosen default, even if it does happen to work OK for your
> application.  Selecting an indexscan when a seqscan is needed is just
> as evil as doing the reverse; what's much worse is that 6.5 will
> pick incredibly bad join plans (ie, nested loops) because it thinks
> that very little data is coming out of the scans.

I've tuned my applications to work well with these defaults (and
demonstrate to my peers that PostgreSQL is comparable to Oracle in
performance for these types of queries). I am willing to make changes
to my applications to make them work as well with 7.1, but the
performance of the query above worries me. I think the current planner
will make the wrong decision more often than the right one.  To test
this further on this table, I went through the comparisons_4 table and
found that code 13 appears the most (73912) out of 591825 rows. In
this case, 6.5.3 takes 8.56 seconds to return its answer, whereas 7.1
takes 12.11 seconds. Even in the worst case for this table, the
indexed scan is faster, but the optimizer decides on the sequential
scan. It appears that the decision point for the switch to sequential
scans isn't set properly.  To me, this is a bug. 

> If you want to revert to the 6.5 behavior without doing a VACUUM, you
> could probably get pretty close with
>     update pg_attribute set attdispersion = -1.0;

Does VACUUM ANALYZE set this column to its calculated value?  What
kinds of queries would not give 6.5 behavior if I set this column as
you suggest?

Alternatively, how hard would it be to add another SET variable like
USE_6_5_PLANNING_RULES? Personally, that would be most helpful from an
application development viewpoint because I could switch to PostgreSQL
7.1 without destroying the performance of my applications, and then
test new versions with the 7.1 planner with less potential for service
disruption.

> Stats-gathering and planning certainly does need a great deal of
> additional work, but I'm afraid that none of that will happen before
> 7.1.

As I said above, I've put a lot of effort into making my applications
work quickly with Postgres, and I'm looking forward to using the new
features that are available with version 7.1. However, I'm very
concerned that I will not be able to achieve the same performance
without detailed knowledge of the internals.  Shouldn't I be assured
that I will improve the performance of a query by creating a index on
the fields used for selecting the row? That is not the case for the
query above.

Finally, I apologize for being a little strident here. I've been
advocating for and using Postgres for four years, and it's frustrating
when a new version results in a serious and noticeable performance
degradation.

Sincerely,
Bob

+----------------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D.      | Phone: 609 737 6383                |
| President, Congenomics, Inc.     | Fax:   609 737 7528                |
| 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org                |
| P.O. Box 314                     | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534             |                                    |
+----------------------------------+------------------------------------+


Re: Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

From
Hannu Krosing
Date:
"Robert E. Bruccoleri" wrote:
> 
> explain select count(*) from comparisons_4 where code = 80003;
> NOTICE:  QUERY PLAN:
> 
> Aggregate  (cost=15659.29..15659.29 rows=1 width=0)
>   ->  Seq Scan on comparisons_4  (cost=0.00..15640.81 rows=7391 width=0)
> 
> EXPLAIN

What is the type of field "code" ?

---------------
Hannu


Re: Re: Performance degradation in PostgreSQL 7.1beta3 vs

From
bruc@stone.congenomics.com (Robert E. Bruccoleri)
Date:
Dear Hannu,
> 
> "Robert E. Bruccoleri" wrote:
> > 
> > explain select count(*) from comparisons_4 where code = 80003;
> > NOTICE:  QUERY PLAN:
> > 
> > Aggregate  (cost=15659.29..15659.29 rows=1 width=0)
> >   ->  Seq Scan on comparisons_4  (cost=0.00..15640.81 rows=7391 width=0)
> > 
> > EXPLAIN
> 
> What is the type of field "code" ?

int4

Do you think that should make a difference?

+----------------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D.      | Phone: 609 737 6383                |
| President, Congenomics, Inc.     | Fax:   609 737 7528                |
| 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org                |
| P.O. Box 314                     | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534             |                                    |
+----------------------------------+------------------------------------+


Re: Re: Performance degradation in PostgreSQL 7.1beta3 vs

From
Hannu Krosing
Date:
"Robert E. Bruccoleri" wrote:
> 
> Dear Hannu,
> >
> > "Robert E. Bruccoleri" wrote:
> > >
> > > explain select count(*) from comparisons_4 where code = 80003;
> > > NOTICE:  QUERY PLAN:
> > >
> > > Aggregate  (cost=15659.29..15659.29 rows=1 width=0)
> > >   ->  Seq Scan on comparisons_4  (cost=0.00..15640.81 rows=7391 width=0)
> > >
> > > EXPLAIN
> >
> > What is the type of field "code" ?
> 
> int4
> 
> Do you think that should make a difference?

Probably not here.

Sometimes it has made difference if the system does not recognize 
the other side of comparison (80003) as being of the same type as 
the index.

what are the cost estimates when you run explain with seqscan disabled ?
do => SET ENABLE_SEQSCAN TO OFF;
see:
(http://www.postgresql.org/devel-corner/docs/admin/runtime-config.htm#RUNTIME-CONFIG-OPTIMIZER)
-----------------
Hannu


Re: Re: Performance degradation in PostgreSQL 7.1beta3 vs

From
bruc@stone.congenomics.com (Robert E. Bruccoleri)
Date:
Dear Hannu,
> 
> "Robert E. Bruccoleri" wrote:
> > 
> > Dear Hannu,
> > >
> > > "Robert E. Bruccoleri" wrote:
> > > >
> > > > explain select count(*) from comparisons_4 where code = 80003;
> > > > NOTICE:  QUERY PLAN:
> > > >
> > > > Aggregate  (cost=15659.29..15659.29 rows=1 width=0)
> > > >   ->  Seq Scan on comparisons_4  (cost=0.00..15640.81 rows=7391 width=0)
> > > >
> > > > EXPLAIN
> > >
> > > What is the type of field "code" ?
> > 
> > int4
> > 
> > Do you think that should make a difference?
> 
> Probably not here.
> 
> Sometimes it has made difference if the system does not recognize 
> the other side of comparison (80003) as being of the same type as 
> the index.
> 
> what are the cost estimates when you run explain with seqscan disabled ?
> do => SET ENABLE_SEQSCAN TO OFF;
> see:
> (http://www.postgresql.org/devel-corner/docs/admin/runtime-config.htm#RUNTIME-CONFIG-OPTIMIZER)

Here's the result from EXPLAIN:

Aggregate  (cost=19966.21..19966.21 rows=1 width=0) ->  Index Scan using comparisons_4_code on comparisons_4
(cost=0.00..19947.73rows=7391 width=0)
 

The estimates are too high.

--Bob

+----------------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D.      | Phone: 609 737 6383                |
| President, Congenomics, Inc.     | Fax:   609 737 7528                |
| 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org                |
| P.O. Box 314                     | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534             |                                    |
+----------------------------------+------------------------------------+


Re: Re: Performance degradation in PostgreSQL 7.1beta3 vs

From
Hannu Krosing
Date:
"Robert E. Bruccoleri" wrote:
> 
> >
> > what are the cost estimates when you run explain with seqscan disabled ?
> > do => SET ENABLE_SEQSCAN TO OFF;
> > see:
> > (http://www.postgresql.org/devel-corner/docs/admin/runtime-config.htm#RUNTIME-CONFIG-OPTIMIZER)
> 
> Here's the result from EXPLAIN:
> 
> Aggregate  (cost=19966.21..19966.21 rows=1 width=0)
>   ->  Index Scan using comparisons_4_code on comparisons_4  (cost=0.00..19947.73 rows=7391 width=0)
> 
> The estimates are too high.

You could try experimenting with 

SET RANDOM_PAGE_COST TO x.x;

from the page above

RANDOM_PAGE_COST (floating point)
      Sets the query optimizer's estimate of the cost of a
nonsequentially fetched disk page.       this is measured as a multiple of the cost of a sequential page
fetch. 
      Note: Unfortunately, there is no well-defined method of
determining ideal values for       the family of "COST" variables that were just described. You are
encouraged to      experiment and share your findings.


-------------
Hannu