Thread: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3
Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3
From
bruc@stone.congenomics.com (Robert E. Bruccoleri)
Date:
Dear Tom, I am writing to you because you are the maintainer of the query optimizer and planner. I have found a very significant performance degradation between PostgreSQL 6.5.3 and 7.1beta3, which will severely impact two large applications that I have developed and maintain for several clients. The performance difference is seen with the use of indices in SELECT statements, whereby the current release does not make effective use of the indices and 6.5.3 does. All of these tests were run on a SGI R10000 Indigo2 system running Irix 6.5. All the regression tests passed as expected for both versions. I have followed the discussion in pgsql-hackers over the previous months and others have noted some performance problems, and the response has typically been to VACUUM the tables. Unfortunately, this is not a practical option for my applications. They are very large -- I have one table that is 17GB in length, and the applications are used frequently. More importantly, PostgreSQL 6.5.3 works very, very well without VACUUM'ing. In order to assist you to diagnosing and correcting this problem, I have prepared a test database that shows the problems. I will attach three files; the test script, the log from running it on version 6.5.3, and the log from running it on version 7.1beta3. In addition, I have setup an anonymous FTP directory on ftp.congen.com:/pub/pg_perf which contains all of these files as well as the compressed table dumps used to build the test database. (When you have finished copying the files, please let me know.) The test script creates the database including the necessary indexing, and then runs EXPLAIN on each of the queries followed by actually executing the queries with "timex" commands to report elapsed times. The striking difference in the query plans is that 7.1 uses only sequential searches for the SELECT's whereas 6.5.3 uses index scans. As a result, 7.1 is almost two orders of magnitude slower than 6.5.3 with exactly the same data, schema, and queries. I plead with you to revisit this question of performance and fix PostgreSQL 7.1 to work as well as PostgreSQL 6.5.3 does. I depend upon PostgreSQL for much of my work, and I do not want to abandon it because of this performance problem which arose only recently. Thank you. +----------------------------------+------------------------------------+ | Robert E. Bruccoleri, Ph.D. | Phone: 609 737 6383 | | President, Congenomics, Inc. | Fax: 609 737 7528 | | 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org | | P.O. Box 314 | URL: http://www.congen.com/~bruc | | Pennington, NJ 08534 | | +----------------------------------+------------------------------------+ #!/bin/csh createdb perf_test gunzip <proteins.dmp.gz | timex psql -e perf_test gunzip <comparisons_4.dmp.gz | timex psql -e perf_test gunzip <concordance_91.dmp.gz | timex psql -e perf_test psql -e perf_test <<EOF explain select * from comparisons_4 where name1 = 'HI0001'; explain select count(*) from comparisons_4 where code = 80003; explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name =p.name; EOF timex psql -e -c "select * from comparisons_4 where name1 = 'HI0001'" perf_test timex psql -e -c "select count(*) from comparisons_4 where code = 80003" perf_test timex psql -e -c "select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name"perf_test timex psql -e -c "select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name= p.name" perf_test CREATE TABLE "proteins" ( "name" character varying(16), "organism" text, "start_position" int4, "last_position" int4, "seq" text, "purpose" text, "alternate_key" character varying(16), "comment" text, "compared" bool, "complement" bool, "chromosome" character varying(4), "essentiality" float8); QUERY: CREATE TABLE "proteins" ( "name" character varying(16), "organism" text, "start_position" int4, "last_position" int4, "seq" text, "purpose" text, "alternate_key" character varying(16), "comment" text, "compared" bool, "complement" bool, "chromosome" character varying(4), "essentiality" float8); COPY "proteins" FROM stdin; QUERY: COPY "proteins" FROM stdin; CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" ); QUERY: CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" ); CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" ); QUERY: CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" ); CREATE CREATE CREATE EOF real 1:11.42 user 3.15 sys 0.53 CREATE TABLE "comparisons_4" ( "name1" character varying(16), "name2" character varying(16), "z_score" float8, "expected" float8, "local_overlap_ratio" float8, "local_overlap_count" int4, "overlap_ratio" float8, "code" int4); QUERY: CREATE TABLE "comparisons_4" ( "name1" character varying(16), "name2" character varying(16), "z_score" float8, "expected" float8, "local_overlap_ratio" float8, "local_overlap_count" int4, "overlap_ratio" float8, "code" int4); COPY "comparisons_4" FROM stdin; QUERY: COPY "comparisons_4" FROM stdin; CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" ); QUERY: CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" ); CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" ); QUERY: CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" ); CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" ); QUERY: CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" ); CREATE CREATE CREATE CREATE EOF real 16:42.13 user 5.86 sys 0.96 CREATE TABLE "concordance_91" ( "target_name" character varying(16), "matched_name" character varying(16), "score" text); QUERY: CREATE TABLE "concordance_91" ( "target_name" character varying(16), "matched_name" character varying(16), "score" text); REVOKE ALL on "concordance_91" from PUBLIC; QUERY: REVOKE ALL on "concordance_91" from PUBLIC; GRANT ALL on "concordance_91" to PUBLIC; QUERY: GRANT ALL on "concordance_91" to PUBLIC; COPY "concordance_91" FROM stdin; QUERY: COPY "concordance_91" FROM stdin; CREATE CHANGE CHANGE EOF real 0.30 user 0.02 sys 0.04 explain select * from comparisons_4 where name1 = 'HI0001'; QUERY: explain select * from comparisons_4 where name1 = 'HI0001'; NOTICE: QUERY PLAN: Index Scan using comparisons_4_name1 on comparisons_4 (cost=2.05 rows=1 width=64) explain select count(*) from comparisons_4 where code = 80003; QUERY: explain select count(*) from comparisons_4 where code = 80003; NOTICE: QUERY PLAN: Aggregate (cost=2.05 rows=1 width=12) -> Index Scan using comparisons_4_code on comparisons_4 (cost=2.05 rows=1 width=12) explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; QUERY: explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; NOTICE: QUERY PLAN: Nested Loop (cost=4.10 rows=1 width=36) -> Index Scan using comparisons_4_name1 on comparisons_4 c (cost=2.05 rows=1 width=12) -> Index Scan using protein_names on proteins p (cost=2.05 rows=36840 width=24) explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name =p.name; QUERY: explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name= p.name; NOTICE: QUERY PLAN: Nested Loop (cost=2093.00 rows=36840 width=60) -> Seq Scan on concordance_91 c (cost=43.00 rows=1000 width=36) -> Index Scan using protein_names on proteins p (cost=2.05 rows=36840 width=24) EXPLAIN EXPLAIN EXPLAIN EXPLAIN EOF QUERY: select * from comparisons_4 where name1 = 'HI0001' name1 |name2 |z_score|expected|local_overlap_ratio|local_overlap_count|overlap_ratio| code ------+---------------+-------+--------+-------------------+-------------------+-------------+----- HI0001|PDB2DBV_O | 1217.4| 0| 0.56716| 335| 0.560468|30012 HI0001|PDB4DBV_O | 1207| 0| 0.56418| 335| 0.557523|30012 HI0001|PDB2GD1_P | 1226.4| 0| 0.57015| 335| 0.563423|30012 HI0001|PDB1GAE_O | 1861.8| 0| 0.83133| 332| 0.814164|30012 HI0001|PDB4GPD_1 | 1357.8| 0| 0.64865| 333| 0.637169|30012 HI0001|HP1346 | 850.3| 6.9e-41| 0.39222| 334| 0.386435|30005 HI0001|TP0844 | 780.3| 5.8e-37| 0.46307| 352| 0.465716|30014 HI0001|PDB1HDG_O | 1020.4| 0| 0.48024| 329| 0.466074|30012 HI0001|SCPIR-DEBYG1 | 1405.2| 0| 0.6497| 334| 0.640117|30000 HI0001|Rv1436 | 970.4| 0| 0.49558| 339| 0.49558|30010 HI0001|PDB1CER_O | 949.7| 0| 0.47734| 331| 0.466075|30012 HI0001|PDB1NLH_ | 935.1| 0| 0.46847| 333| 0.458825|30012 HI0001|PDB1GGA_A | 918| 0| 0.52125| 353| 0.51397|30012 HI0001|PDB1GAD_O | 1869.5| 0| 0.83434| 332| 0.817112|30012 HI0001|PDB1GYP_A | 900.1| 0| 0.51275| 353| 0.505589|30012 HI0001|MG301 | 866.7| 0| 0.43155| 336| 0.427731|30004 HI0001|SCSW-G3P1_YEAST| 1425.3| 0| 0.65868| 334| 0.648965|30000 HI0001|ScTDH1 | 1424.6| 0| 0.65868| 334| 0.648965|30013 HI0001|ScTDH2 | 1405.2| 0| 0.6497| 334| 0.640117|30013 HI0001|SCSW-G3P3_YEAST| 1417.5| 0| 0.65868| 334| 0.648965|30000 HI0001|ScTDH3 | 1416.8| 0| 0.65868| 334| 0.648965|30013 HI0001|SCGP-3720 | 1416.8| 0| 0.66168| 334| 0.651921|30000 HI0001|SCGP-E243731 | 1416.8| 0| 0.65868| 334| 0.648965|30000 HI0001|SCSW-G3P2_YEAST| 1405.9| 0| 0.6497| 334| 0.640117|30000 HI0001|SCGP-1008189 | 1424.6| 0| 0.65868| 334| 0.648965|30000 HI0001|SCGP-3726 | 1398.7| 0| 0.6497| 334| 0.640117|30000 HI0001|PDB3GPD_R | 1432.2| 0| 0.63772| 334| 0.628314|30012 HI0001|HP0921 | 762.6| 5.6e-36| 0.40407| 344| 0.41003|30005 HI0001|MJ1146 | 124.7| 1.9| 0.25094| 267| 0.195338|30007 HI0001|SCGP-3724 | 1371.5| 0| 0.63772| 334| 0.628314|30000 (30 rows) real 0.18 user 0.02 sys 0.03 QUERY: select count(*) from comparisons_4 where code = 80003 count ----- 3231 (1 row) real 0.34 user 0.02 sys 0.03 QUERY: select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name name |purpose -------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- MG263 |hypothetical protein HP0652 |phosphoserine phosphatase MJ1594 |phosphoserine phosphatase MG125 |hypothetical protein TP0290 |conserved hypothetical protein HI1033 |phosphoserine phosphatase (o-phosphoserine phosphohydrolase) HI0597 |hypothetical protein Rv3813c|(MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical 30.2 kdprotein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa overlap Rv3042c|(MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g. P06862|SERB_ECOLI.N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE (EC3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884 MG265 |hypothetical protein (10 rows) real 0.24 user 0.02 sys 0.03 QUERY: select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name =p.name target_name|matched_name| score|purpose -----------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ECinfA |BSInfA |0.680556|initiation factor IF-1 ECinfA |HI0548 | 0.80952|initiation factor IF-1 ECinfA |HP1298 | 0.61111|translation initiation factor EF-1 ECinfA |Rv3462c |0.684936|(MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA results: identicalto IF1_MYCBO P45957 initiation factor if-1 (72 aa) ECrpmA |BB0780 |0.635297|ribosomal protein L27 ECrpmA |HI0879 | 0.87059|ribosomal protein L27 ECrpmA |HP0297 |0.613632|ribosomal protein L27 ECrpmA |Rv2441c |0.616278|(MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s ribosomal proteinl27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains PS00831 Ribosomalprotein L27 signature (8 rows) real 0.17 user 0.02 sys 0.03 CREATE DATABASE CREATE TABLE "proteins" ( "name" character varying(16), "organism" text, "start_position" int4, "last_position" int4, "seq" text, "purpose" text, "alternate_key" character varying(16), "comment" text, "compared" bool, "complement" bool, "chromosome" character varying(4), "essentiality" float8); CREATE COPY "proteins" FROM stdin; CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" ); CREATE CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" ); CREATE real 1:04.49 user 3.14 sys 0.57 CREATE TABLE "comparisons_4" ( "name1" character varying(16), "name2" character varying(16), "z_score" float8, "expected" float8, "local_overlap_ratio" float8, "local_overlap_count" int4, "overlap_ratio" float8, "code" int4); CREATE COPY "comparisons_4" FROM stdin; CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" ); CREATE CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" ); CREATE CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" ); CREATE real 7:04.43 user 5.87 sys 1.03 CREATE TABLE "concordance_91" ( "target_name" character varying(16), "matched_name" character varying(16), "score" text); CREATE REVOKE ALL on "concordance_91" from PUBLIC; CHANGE GRANT ALL on "concordance_91" to PUBLIC; CHANGE COPY "concordance_91" FROM stdin; real 0.60 user 0.01 sys 0.03 explain select * from comparisons_4 where name1 = 'HI0001'; NOTICE: QUERY PLAN: Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=5918 width=64) EXPLAIN explain select count(*) from comparisons_4 where code = 80003; NOTICE: QUERY PLAN: Aggregate (cost=15655.61..15655.61 rows=1 width=0) -> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=5918 width=0) EXPLAIN explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; NOTICE: QUERY PLAN: Merge Join (cost=22495.22..23029.70 rows=2180283 width=36) -> Sort (cost=16011.62..16011.62 rows=5918 width=12) -> Seq Scan on comparisons_4 c (cost=0.00..15640.81 rows=5918 width=12) -> Sort (cost=6483.60..6483.60 rows=36840 width=24) -> Seq Scan on proteins p (cost=0.00..3247.40 rows=36840 width=24) EXPLAIN explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name =p.name; NOTICE: QUERY PLAN: Merge Join (cost=6553.43..7026.43 rows=368400 width=60) -> Sort (cost=69.83..69.83 rows=1000 width=36) -> Seq Scan on concordance_91 c (cost=0.00..20.00 rows=1000 width=36) -> Sort (cost=6483.60..6483.60 rows=36840 width=24) -> Seq Scan on proteins p (cost=0.00..3247.40 rows=36840 width=24) EXPLAIN select * from comparisons_4 where name1 = 'HI0001' name1 | name2 | z_score | expected | local_overlap_ratio | local_overlap_count | overlap_ratio | code --------+-----------------+---------+----------+---------------------+---------------------+---------------+------- HI0001 | PDB1GAD_O | 1869.5 | 0 | 0.83434 | 332 | 0.817112 | 30012 HI0001 | PDB1GAE_O | 1861.8 | 0 | 0.83133 | 332 | 0.814164 | 30012 HI0001 | PDB3GPD_R | 1432.2 | 0 | 0.63772 | 334 | 0.628314 | 30012 HI0001 | SCSW-G3P1_YEAST | 1425.3 | 0 | 0.65868 | 334 | 0.648965 | 30000 HI0001 | SCGP-1008189 | 1424.6 | 0 | 0.65868 | 334 | 0.648965 | 30000 HI0001 | ScTDH1 | 1424.6 | 0 | 0.65868 | 334 | 0.648965 | 30013 HI0001 | SCSW-G3P3_YEAST | 1417.5 | 0 | 0.65868 | 334 | 0.648965 | 30000 HI0001 | ScTDH3 | 1416.8 | 0 | 0.65868 | 334 | 0.648965 | 30013 HI0001 | SCGP-3720 | 1416.8 | 0 | 0.66168 | 334 | 0.651921 | 30000 HI0001 | SCGP-E243731 | 1416.8 | 0 | 0.65868 | 334 | 0.648965 | 30000 HI0001 | SCSW-G3P2_YEAST | 1405.9 | 0 | 0.6497 | 334 | 0.640117 | 30000 HI0001 | ScTDH2 | 1405.2 | 0 | 0.6497 | 334 | 0.640117 | 30013 HI0001 | SCPIR-DEBYG1 | 1405.2 | 0 | 0.6497 | 334 | 0.640117 | 30000 HI0001 | SCGP-3726 | 1398.7 | 0 | 0.6497 | 334 | 0.640117 | 30000 HI0001 | SCGP-3724 | 1371.5 | 0 | 0.63772 | 334 | 0.628314 | 30000 HI0001 | PDB4GPD_1 | 1357.8 | 0 | 0.64865 | 333 | 0.637169 | 30012 HI0001 | PDB2GD1_P | 1226.4 | 0 | 0.57015 | 335 | 0.563423 | 30012 HI0001 | PDB2DBV_O | 1217.4 | 0 | 0.56716 | 335 | 0.560468 | 30012 HI0001 | PDB4DBV_O | 1207 | 0 | 0.56418 | 335 | 0.557523 | 30012 HI0001 | PDB1HDG_O | 1020.4 | 0 | 0.48024 | 329 | 0.466074 | 30012 HI0001 | Rv1436 | 970.4 | 0 | 0.49558 | 339 | 0.49558 | 30010 HI0001 | PDB1CER_O | 949.7 | 0 | 0.47734 | 331 | 0.466075 | 30012 HI0001 | PDB1NLH_ | 935.1 | 0 | 0.46847 | 333 | 0.458825 | 30012 HI0001 | PDB1GGA_A | 918 | 0 | 0.52125 | 353 | 0.51397 | 30012 HI0001 | PDB1GYP_A | 900.1 | 0 | 0.51275 | 353 | 0.505589 | 30012 HI0001 | MG301 | 866.7 | 0 | 0.43155 | 336 | 0.427731 | 30004 HI0001 | HP1346 | 850.3 | 6.9e-41 | 0.39222 | 334 | 0.386435 | 30005 HI0001 | TP0844 | 780.3 | 5.8e-37 | 0.46307 | 352 | 0.465716 | 30014 HI0001 | HP0921 | 762.6 | 5.6e-36 | 0.40407 | 344 | 0.41003 | 30005 HI0001 | MJ1146 | 124.7 | 1.9 | 0.25094 | 267 | 0.195338 | 30007 (30 rows) real 22.68 user 0.01 sys 0.03 select count(*) from comparisons_4 where code = 80003 count ------- 3231 (1 row) real 21.49 user 0.01 sys 0.03 select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name name | purpose ---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- HI0597 | hypothetical protein HI1033 | phosphoserine phosphatase (o-phosphoserine phosphohydrolase) HP0652 | phosphoserine phosphatase MG125 | hypothetical protein MG263 | hypothetical protein MG265 | hypothetical protein MJ1594 | phosphoserine phosphatase Rv3042c | (MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g. P06862|SERB_ECOLI.N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE (EC3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884 Rv3813c | (MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical 30.2kd protein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa overlap TP0290 | conserved hypothetical protein (10 rows) real 23.13 user 0.01 sys 0.03 select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name target_name | matched_name | score | purpose -------------+--------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ECrpmA | BB0780 | 0.635297 | ribosomal protein L27 ECinfA | BSInfA | 0.680556 | initiation factor IF-1 ECinfA | HI0548 | 0.80952 | initiation factor IF-1 ECrpmA | HI0879 | 0.87059 | ribosomal protein L27 ECrpmA | HP0297 | 0.613632 | ribosomal protein L27 ECinfA | HP1298 | 0.61111 | translation initiation factor EF-1 ECrpmA | Rv2441c | 0.616278 | (MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s ribosomalprotein l27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains PS00831Ribosomal protein L27 signature ECinfA | Rv3462c | 0.684936 | (MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA results:identical to IF1_MYCBO P45957 initiation factor if-1 (72 aa) (8 rows) real 11.16 user 0.01 sys 0.03
bruc@stone.congenomics.com (Robert E. Bruccoleri) writes: > I have followed the discussion in pgsql-hackers over the previous > months and others have noted some performance problems, and the response > has typically been to VACUUM the tables. Unfortunately, this is not a > practical option for my applications. They are very large -- I have one > table that is 17GB in length, and the applications are used frequently. You can't afford to run a VACUUM ANALYZE even once in the lifetime of the table? > More importantly, PostgreSQL 6.5.3 works very, very well without > VACUUM'ing. 6.5 effectively assumes that "foo = constant" will select exactly one row, if it has no statistics to prove otherwise. I don't regard that as a well-chosen default, even if it does happen to work OK for your application. Selecting an indexscan when a seqscan is needed is just as evil as doing the reverse; what's much worse is that 6.5 will pick incredibly bad join plans (ie, nested loops) because it thinks that very little data is coming out of the scans. If you want to revert to the 6.5 behavior without doing a VACUUM, you could probably get pretty close withupdate pg_attribute set attdispersion = -1.0; Stats-gathering and planning certainly does need a great deal of additional work, but I'm afraid that none of that will happen before 7.1. regards, tom lane
"Robert E. Bruccoleri" wrote: You can try starting postmaster with the "-o -fs" option. This will disable sequential scans if there is an index. Thereis also an environment variable you can set, prior to the operation. I have run into this same problem. > Dear Tom, > I am writing to you because you are the maintainer of the > query optimizer and planner. > I have found a very significant performance degradation > between PostgreSQL 6.5.3 and 7.1beta3, which will severely impact two > large applications that I have developed and maintain for several > clients. The performance difference is seen with the use of indices in > SELECT statements, whereby the current release does not make effective > use of the indices and 6.5.3 does. All of these tests were run on a SGI > R10000 Indigo2 system running Irix 6.5. All the regression tests passed > as expected for both versions. > I have followed the discussion in pgsql-hackers over the previous > months and others have noted some performance problems, and the response > has typically been to VACUUM the tables. Unfortunately, this is not a > practical option for my applications. They are very large -- I have one > table that is 17GB in length, and the applications are used frequently. > More importantly, PostgreSQL 6.5.3 works very, very well without > VACUUM'ing. > In order to assist you to diagnosing and correcting this > problem, I have prepared a test database that shows the problems. I > will attach three files; the test script, the log from running it on > version 6.5.3, and the log from running it on version 7.1beta3. In > addition, I have setup an anonymous FTP directory on > ftp.congen.com:/pub/pg_perf which contains all of these files as well > as the compressed table dumps used to build the test database. (When > you have finished copying the files, please let me know.) > The test script creates the database including the necessary > indexing, and then runs EXPLAIN on each of the queries followed by > actually executing the queries with "timex" commands to report elapsed > times. The striking difference in the query plans is that 7.1 uses > only sequential searches for the SELECT's whereas 6.5.3 uses index > scans. As a result, 7.1 is almost two orders of magnitude slower than > 6.5.3 with exactly the same data, schema, and queries. > > I plead with you to revisit this question of performance and > fix PostgreSQL 7.1 to work as well as PostgreSQL 6.5.3 does. I depend > upon PostgreSQL for much of my work, and I do not want to abandon it > because of this performance problem which arose only recently. Thank > you. > > +----------------------------------+------------------------------------+ > | Robert E. Bruccoleri, Ph.D. | Phone: 609 737 6383 | > | President, Congenomics, Inc. | Fax: 609 737 7528 | > | 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org | > | P.O. Box 314 | URL: http://www.congen.com/~bruc | > | Pennington, NJ 08534 | | > +----------------------------------+------------------------------------+ > > ------------------------------------------------------------------------ > #!/bin/csh > > createdb perf_test > > gunzip <proteins.dmp.gz | timex psql -e perf_test > gunzip <comparisons_4.dmp.gz | timex psql -e perf_test > gunzip <concordance_91.dmp.gz | timex psql -e perf_test > > psql -e perf_test <<EOF > explain select * from comparisons_4 where name1 = 'HI0001'; > explain select count(*) from comparisons_4 where code = 80003; > explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; > explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name= p.name; > EOF > > timex psql -e -c "select * from comparisons_4 where name1 = 'HI0001'" perf_test > timex psql -e -c "select count(*) from comparisons_4 where code = 80003" perf_test > timex psql -e -c "select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name"perf_test > timex psql -e -c "select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name= p.name" perf_test > > ------------------------------------------------------------------------ > CREATE TABLE "proteins" ( > "name" character varying(16), > "organism" text, > "start_position" int4, > "last_position" int4, > "seq" text, > "purpose" text, > "alternate_key" character varying(16), > "comment" text, > "compared" bool, > "complement" bool, > "chromosome" character varying(4), > "essentiality" float8); > QUERY: CREATE TABLE "proteins" ( > "name" character varying(16), > "organism" text, > "start_position" int4, > "last_position" int4, > "seq" text, > "purpose" text, > "alternate_key" character varying(16), > "comment" text, > "compared" bool, > "complement" bool, > "chromosome" character varying(4), > "essentiality" float8); > COPY "proteins" FROM stdin; > QUERY: COPY "proteins" FROM stdin; > CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" ); > QUERY: CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" ); > CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" ); > QUERY: CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" ); > CREATE > CREATE > CREATE > EOF > > real 1:11.42 > user 3.15 > sys 0.53 > > CREATE TABLE "comparisons_4" ( > "name1" character varying(16), > "name2" character varying(16), > "z_score" float8, > "expected" float8, > "local_overlap_ratio" float8, > "local_overlap_count" int4, > "overlap_ratio" float8, > "code" int4); > QUERY: CREATE TABLE "comparisons_4" ( > "name1" character varying(16), > "name2" character varying(16), > "z_score" float8, > "expected" float8, > "local_overlap_ratio" float8, > "local_overlap_count" int4, > "overlap_ratio" float8, > "code" int4); > COPY "comparisons_4" FROM stdin; > QUERY: COPY "comparisons_4" FROM stdin; > CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" ); > QUERY: CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" ); > CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" ); > QUERY: CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" ); > CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" ); > QUERY: CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" ); > CREATE > CREATE > CREATE > CREATE > EOF > > real 16:42.13 > user 5.86 > sys 0.96 > > CREATE TABLE "concordance_91" ( > "target_name" character varying(16), > "matched_name" character varying(16), > "score" text); > QUERY: CREATE TABLE "concordance_91" ( > "target_name" character varying(16), > "matched_name" character varying(16), > "score" text); > REVOKE ALL on "concordance_91" from PUBLIC; > QUERY: REVOKE ALL on "concordance_91" from PUBLIC; > GRANT ALL on "concordance_91" to PUBLIC; > QUERY: GRANT ALL on "concordance_91" to PUBLIC; > COPY "concordance_91" FROM stdin; > QUERY: COPY "concordance_91" FROM stdin; > CREATE > CHANGE > CHANGE > EOF > > real 0.30 > user 0.02 > sys 0.04 > > explain select * from comparisons_4 where name1 = 'HI0001'; > QUERY: explain select * from comparisons_4 where name1 = 'HI0001'; > NOTICE: QUERY PLAN: > > Index Scan using comparisons_4_name1 on comparisons_4 (cost=2.05 rows=1 width=64) > > explain select count(*) from comparisons_4 where code = 80003; > QUERY: explain select count(*) from comparisons_4 where code = 80003; > NOTICE: QUERY PLAN: > > Aggregate (cost=2.05 rows=1 width=12) > -> Index Scan using comparisons_4_code on comparisons_4 (cost=2.05 rows=1 width=12) > > explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; > QUERY: explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; > NOTICE: QUERY PLAN: > > Nested Loop (cost=4.10 rows=1 width=36) > -> Index Scan using comparisons_4_name1 on comparisons_4 c (cost=2.05 rows=1 width=12) > -> Index Scan using protein_names on proteins p (cost=2.05 rows=36840 width=24) > > explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name= p.name; > QUERY: explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name= p.name; > NOTICE: QUERY PLAN: > > Nested Loop (cost=2093.00 rows=36840 width=60) > -> Seq Scan on concordance_91 c (cost=43.00 rows=1000 width=36) > -> Index Scan using protein_names on proteins p (cost=2.05 rows=36840 width=24) > > EXPLAIN > EXPLAIN > EXPLAIN > EXPLAIN > EOF > QUERY: select * from comparisons_4 where name1 = 'HI0001' > name1 |name2 |z_score|expected|local_overlap_ratio|local_overlap_count|overlap_ratio| code > ------+---------------+-------+--------+-------------------+-------------------+-------------+----- > HI0001|PDB2DBV_O | 1217.4| 0| 0.56716| 335| 0.560468|30012 > HI0001|PDB4DBV_O | 1207| 0| 0.56418| 335| 0.557523|30012 > HI0001|PDB2GD1_P | 1226.4| 0| 0.57015| 335| 0.563423|30012 > HI0001|PDB1GAE_O | 1861.8| 0| 0.83133| 332| 0.814164|30012 > HI0001|PDB4GPD_1 | 1357.8| 0| 0.64865| 333| 0.637169|30012 > HI0001|HP1346 | 850.3| 6.9e-41| 0.39222| 334| 0.386435|30005 > HI0001|TP0844 | 780.3| 5.8e-37| 0.46307| 352| 0.465716|30014 > HI0001|PDB1HDG_O | 1020.4| 0| 0.48024| 329| 0.466074|30012 > HI0001|SCPIR-DEBYG1 | 1405.2| 0| 0.6497| 334| 0.640117|30000 > HI0001|Rv1436 | 970.4| 0| 0.49558| 339| 0.49558|30010 > HI0001|PDB1CER_O | 949.7| 0| 0.47734| 331| 0.466075|30012 > HI0001|PDB1NLH_ | 935.1| 0| 0.46847| 333| 0.458825|30012 > HI0001|PDB1GGA_A | 918| 0| 0.52125| 353| 0.51397|30012 > HI0001|PDB1GAD_O | 1869.5| 0| 0.83434| 332| 0.817112|30012 > HI0001|PDB1GYP_A | 900.1| 0| 0.51275| 353| 0.505589|30012 > HI0001|MG301 | 866.7| 0| 0.43155| 336| 0.427731|30004 > HI0001|SCSW-G3P1_YEAST| 1425.3| 0| 0.65868| 334| 0.648965|30000 > HI0001|ScTDH1 | 1424.6| 0| 0.65868| 334| 0.648965|30013 > HI0001|ScTDH2 | 1405.2| 0| 0.6497| 334| 0.640117|30013 > HI0001|SCSW-G3P3_YEAST| 1417.5| 0| 0.65868| 334| 0.648965|30000 > HI0001|ScTDH3 | 1416.8| 0| 0.65868| 334| 0.648965|30013 > HI0001|SCGP-3720 | 1416.8| 0| 0.66168| 334| 0.651921|30000 > HI0001|SCGP-E243731 | 1416.8| 0| 0.65868| 334| 0.648965|30000 > HI0001|SCSW-G3P2_YEAST| 1405.9| 0| 0.6497| 334| 0.640117|30000 > HI0001|SCGP-1008189 | 1424.6| 0| 0.65868| 334| 0.648965|30000 > HI0001|SCGP-3726 | 1398.7| 0| 0.6497| 334| 0.640117|30000 > HI0001|PDB3GPD_R | 1432.2| 0| 0.63772| 334| 0.628314|30012 > HI0001|HP0921 | 762.6| 5.6e-36| 0.40407| 344| 0.41003|30005 > HI0001|MJ1146 | 124.7| 1.9| 0.25094| 267| 0.195338|30007 > HI0001|SCGP-3724 | 1371.5| 0| 0.63772| 334| 0.628314|30000 > (30 rows) > > real 0.18 > user 0.02 > sys 0.03 > > QUERY: select count(*) from comparisons_4 where code = 80003 > count > ----- > 3231 > (1 row) > > real 0.34 > user 0.02 > sys 0.03 > > QUERY: select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name > name |purpose > -------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > MG263 |hypothetical protein > HP0652 |phosphoserine phosphatase > MJ1594 |phosphoserine phosphatase > MG125 |hypothetical protein > TP0290 |conserved hypothetical protein > HI1033 |phosphoserine phosphatase (o-phosphoserine phosphohydrolase) > HI0597 |hypothetical protein > Rv3813c|(MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical 30.2kd protein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa overlap > Rv3042c|(MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g. P06862|SERB_ECOLI.N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE (EC3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884 > MG265 |hypothetical protein > (10 rows) > > real 0.24 > user 0.02 > sys 0.03 > > QUERY: select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name= p.name > target_name|matched_name| score|purpose > -----------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ECinfA |BSInfA |0.680556|initiation factor IF-1 > ECinfA |HI0548 | 0.80952|initiation factor IF-1 > ECinfA |HP1298 | 0.61111|translation initiation factor EF-1 > ECinfA |Rv3462c |0.684936|(MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA results: identicalto IF1_MYCBO P45957 initiation factor if-1 (72 aa) > ECrpmA |BB0780 |0.635297|ribosomal protein L27 > ECrpmA |HI0879 | 0.87059|ribosomal protein L27 > ECrpmA |HP0297 |0.613632|ribosomal protein L27 > ECrpmA |Rv2441c |0.616278|(MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s ribosomal proteinl27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains PS00831 Ribosomalprotein L27 signature > (8 rows) > > real 0.17 > user 0.02 > sys 0.03 > > ------------------------------------------------------------------------ > CREATE DATABASE > CREATE TABLE "proteins" ( > "name" character varying(16), > "organism" text, > "start_position" int4, > "last_position" int4, > "seq" text, > "purpose" text, > "alternate_key" character varying(16), > "comment" text, > "compared" bool, > "complement" bool, > "chromosome" character varying(4), > "essentiality" float8); > CREATE > COPY "proteins" FROM stdin; > CREATE INDEX "protein_names" on "proteins" using btree ( "name" "varchar_ops" ); > CREATE > CREATE INDEX "protein_organism" on "proteins" using btree ( "organism" "text_ops" ); > CREATE > > real 1:04.49 > user 3.14 > sys 0.57 > > CREATE TABLE "comparisons_4" ( > "name1" character varying(16), > "name2" character varying(16), > "z_score" float8, > "expected" float8, > "local_overlap_ratio" float8, > "local_overlap_count" int4, > "overlap_ratio" float8, > "code" int4); > CREATE > COPY "comparisons_4" FROM stdin; > CREATE INDEX "comparisons_4_name1" on "comparisons_4" using btree ( "name1" "varchar_ops" ); > CREATE > CREATE INDEX "comparisons_4_name2" on "comparisons_4" using btree ( "name2" "varchar_ops" ); > CREATE > CREATE INDEX "comparisons_4_code" on "comparisons_4" using btree ( "code" "int4_ops" ); > CREATE > > real 7:04.43 > user 5.87 > sys 1.03 > > CREATE TABLE "concordance_91" ( > "target_name" character varying(16), > "matched_name" character varying(16), > "score" text); > CREATE > REVOKE ALL on "concordance_91" from PUBLIC; > CHANGE > GRANT ALL on "concordance_91" to PUBLIC; > CHANGE > COPY "concordance_91" FROM stdin; > > real 0.60 > user 0.01 > sys 0.03 > > explain select * from comparisons_4 where name1 = 'HI0001'; > NOTICE: QUERY PLAN: > > Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=5918 width=64) > > EXPLAIN > explain select count(*) from comparisons_4 where code = 80003; > NOTICE: QUERY PLAN: > > Aggregate (cost=15655.61..15655.61 rows=1 width=0) > -> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=5918 width=0) > > EXPLAIN > explain select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name; > NOTICE: QUERY PLAN: > > Merge Join (cost=22495.22..23029.70 rows=2180283 width=36) > -> Sort (cost=16011.62..16011.62 rows=5918 width=12) > -> Seq Scan on comparisons_4 c (cost=0.00..15640.81 rows=5918 width=12) > -> Sort (cost=6483.60..6483.60 rows=36840 width=24) > -> Seq Scan on proteins p (cost=0.00..3247.40 rows=36840 width=24) > > EXPLAIN > explain select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name= p.name; > NOTICE: QUERY PLAN: > > Merge Join (cost=6553.43..7026.43 rows=368400 width=60) > -> Sort (cost=69.83..69.83 rows=1000 width=36) > -> Seq Scan on concordance_91 c (cost=0.00..20.00 rows=1000 width=36) > -> Sort (cost=6483.60..6483.60 rows=36840 width=24) > -> Seq Scan on proteins p (cost=0.00..3247.40 rows=36840 width=24) > > EXPLAIN > select * from comparisons_4 where name1 = 'HI0001' > name1 | name2 | z_score | expected | local_overlap_ratio | local_overlap_count | overlap_ratio | code > --------+-----------------+---------+----------+---------------------+---------------------+---------------+------- > HI0001 | PDB1GAD_O | 1869.5 | 0 | 0.83434 | 332 | 0.817112 | 30012 > HI0001 | PDB1GAE_O | 1861.8 | 0 | 0.83133 | 332 | 0.814164 | 30012 > HI0001 | PDB3GPD_R | 1432.2 | 0 | 0.63772 | 334 | 0.628314 | 30012 > HI0001 | SCSW-G3P1_YEAST | 1425.3 | 0 | 0.65868 | 334 | 0.648965 | 30000 > HI0001 | SCGP-1008189 | 1424.6 | 0 | 0.65868 | 334 | 0.648965 | 30000 > HI0001 | ScTDH1 | 1424.6 | 0 | 0.65868 | 334 | 0.648965 | 30013 > HI0001 | SCSW-G3P3_YEAST | 1417.5 | 0 | 0.65868 | 334 | 0.648965 | 30000 > HI0001 | ScTDH3 | 1416.8 | 0 | 0.65868 | 334 | 0.648965 | 30013 > HI0001 | SCGP-3720 | 1416.8 | 0 | 0.66168 | 334 | 0.651921 | 30000 > HI0001 | SCGP-E243731 | 1416.8 | 0 | 0.65868 | 334 | 0.648965 | 30000 > HI0001 | SCSW-G3P2_YEAST | 1405.9 | 0 | 0.6497 | 334 | 0.640117 | 30000 > HI0001 | ScTDH2 | 1405.2 | 0 | 0.6497 | 334 | 0.640117 | 30013 > HI0001 | SCPIR-DEBYG1 | 1405.2 | 0 | 0.6497 | 334 | 0.640117 | 30000 > HI0001 | SCGP-3726 | 1398.7 | 0 | 0.6497 | 334 | 0.640117 | 30000 > HI0001 | SCGP-3724 | 1371.5 | 0 | 0.63772 | 334 | 0.628314 | 30000 > HI0001 | PDB4GPD_1 | 1357.8 | 0 | 0.64865 | 333 | 0.637169 | 30012 > HI0001 | PDB2GD1_P | 1226.4 | 0 | 0.57015 | 335 | 0.563423 | 30012 > HI0001 | PDB2DBV_O | 1217.4 | 0 | 0.56716 | 335 | 0.560468 | 30012 > HI0001 | PDB4DBV_O | 1207 | 0 | 0.56418 | 335 | 0.557523 | 30012 > HI0001 | PDB1HDG_O | 1020.4 | 0 | 0.48024 | 329 | 0.466074 | 30012 > HI0001 | Rv1436 | 970.4 | 0 | 0.49558 | 339 | 0.49558 | 30010 > HI0001 | PDB1CER_O | 949.7 | 0 | 0.47734 | 331 | 0.466075 | 30012 > HI0001 | PDB1NLH_ | 935.1 | 0 | 0.46847 | 333 | 0.458825 | 30012 > HI0001 | PDB1GGA_A | 918 | 0 | 0.52125 | 353 | 0.51397 | 30012 > HI0001 | PDB1GYP_A | 900.1 | 0 | 0.51275 | 353 | 0.505589 | 30012 > HI0001 | MG301 | 866.7 | 0 | 0.43155 | 336 | 0.427731 | 30004 > HI0001 | HP1346 | 850.3 | 6.9e-41 | 0.39222 | 334 | 0.386435 | 30005 > HI0001 | TP0844 | 780.3 | 5.8e-37 | 0.46307 | 352 | 0.465716 | 30014 > HI0001 | HP0921 | 762.6 | 5.6e-36 | 0.40407 | 344 | 0.41003 | 30005 > HI0001 | MJ1146 | 124.7 | 1.9 | 0.25094 | 267 | 0.195338 | 30007 > (30 rows) > > real 22.68 > user 0.01 > sys 0.03 > > select count(*) from comparisons_4 where code = 80003 > count > ------- > 3231 > (1 row) > > real 21.49 > user 0.01 > sys 0.03 > > select p.name, p.purpose from comparisons_4 c, proteins p where c.name1 = 'HI0003' and c.name2 = p.name > name | purpose > ---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > HI0597 | hypothetical protein > HI1033 | phosphoserine phosphatase (o-phosphoserine phosphohydrolase) > HP0652 | phosphoserine phosphatase > MG125 | hypothetical protein > MG263 | hypothetical protein > MG265 | hypothetical protein > MJ1594 | phosphoserine phosphatase > Rv3042c | (MTV012.57c), len: 409. The C-terminal domain (150-409) is highly similar to several SerB proteins e.g. P06862|SERB_ECOLI.N-terminus (1-150) shows no similarity, FASTA score: sp|P06862|SERB_ECOLI PHOSPHOSERINE PHOSPHATASE (EC3.1 (322 aa) opt: 628 z-score: 753.3 E(): 0; 46.8%identity in 235 aa overlap. TBparse score is 0.884 > Rv3813c | (MTCY409.17), len: 273. Unknown, similar to many hypothetical proteins eg. YXEH_BACSU P54947 hypothetical 30.2kd protein in idh-deor (270 aa), fasta results; opt: 329 z-score: 456.0 E(): 2.2e-18, 32.2% identity in 267 aa overlap > TP0290 | conserved hypothetical protein > (10 rows) > > real 23.13 > user 0.01 > sys 0.03 > > select c.target_name, c.matched_name, c.score, p.purpose from concordance_91 c, proteins p where c.matched_name = p.name > target_name | matched_name | score | purpose > -------------+--------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ECrpmA | BB0780 | 0.635297 | ribosomal protein L27 > ECinfA | BSInfA | 0.680556 | initiation factor IF-1 > ECinfA | HI0548 | 0.80952 | initiation factor IF-1 > ECrpmA | HI0879 | 0.87059 | ribosomal protein L27 > ECrpmA | HP0297 | 0.613632 | ribosomal protein L27 > ECinfA | HP1298 | 0.61111 | translation initiation factor EF-1 > ECrpmA | Rv2441c | 0.616278 | (MTCY428.05), len: 86. Probable rpmA, similar to eg RL27_ECOLI P02427 50s ribosomalprotein l27, (84 aa), fasta scores, opt: 328, E(): 7.1e-17, (64.2% identity in 81 aa overlap); contains PS00831Ribosomal protein L27 signature > ECinfA | Rv3462c | 0.684936 | (MTCY13E12.15c), len: 73 aa. infA. Probable initiation factor IF-1. FASTA results:identical to IF1_MYCBO P45957 initiation factor if-1 (72 aa) > (8 rows) > > real 11.16 > user 0.01 > sys 0.03
Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3
From
bruc@stone.congenomics.com (Robert E. Bruccoleri)
Date:
Dear Tom, > You can't afford to run a VACUUM ANALYZE even once in the lifetime of > the table? Not very often at best, and certainly not routinely. Some of my tables exceed 10GB and have multiple indices. However, to test your suggestion, I modified my performance test script to "VACUUM ANALYZE" all the tables prior to invoking EXPLAIN, and it improves all of the searches except this one (EXPLAIN output also included): explain select count(*) from comparisons_4 where code = 80003; NOTICE: QUERY PLAN: Aggregate (cost=15659.29..15659.29 rows=1 width=0) -> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=7391 width=0) EXPLAIN The choice of sequential scan here takes 30x longer. > > > More importantly, PostgreSQL 6.5.3 works very, very well without > > VACUUM'ing. > > 6.5 effectively assumes that "foo = constant" will select exactly one > row, if it has no statistics to prove otherwise. I don't regard that > as a well-chosen default, even if it does happen to work OK for your > application. Selecting an indexscan when a seqscan is needed is just > as evil as doing the reverse; what's much worse is that 6.5 will > pick incredibly bad join plans (ie, nested loops) because it thinks > that very little data is coming out of the scans. I've tuned my applications to work well with these defaults (and demonstrate to my peers that PostgreSQL is comparable to Oracle in performance for these types of queries). I am willing to make changes to my applications to make them work as well with 7.1, but the performance of the query above worries me. I think the current planner will make the wrong decision more often than the right one. To test this further on this table, I went through the comparisons_4 table and found that code 13 appears the most (73912) out of 591825 rows. In this case, 6.5.3 takes 8.56 seconds to return its answer, whereas 7.1 takes 12.11 seconds. Even in the worst case for this table, the indexed scan is faster, but the optimizer decides on the sequential scan. It appears that the decision point for the switch to sequential scans isn't set properly. To me, this is a bug. > If you want to revert to the 6.5 behavior without doing a VACUUM, you > could probably get pretty close with > update pg_attribute set attdispersion = -1.0; Does VACUUM ANALYZE set this column to its calculated value? What kinds of queries would not give 6.5 behavior if I set this column as you suggest? Alternatively, how hard would it be to add another SET variable like USE_6_5_PLANNING_RULES? Personally, that would be most helpful from an application development viewpoint because I could switch to PostgreSQL 7.1 without destroying the performance of my applications, and then test new versions with the 7.1 planner with less potential for service disruption. > Stats-gathering and planning certainly does need a great deal of > additional work, but I'm afraid that none of that will happen before > 7.1. As I said above, I've put a lot of effort into making my applications work quickly with Postgres, and I'm looking forward to using the new features that are available with version 7.1. However, I'm very concerned that I will not be able to achieve the same performance without detailed knowledge of the internals. Shouldn't I be assured that I will improve the performance of a query by creating a index on the fields used for selecting the row? That is not the case for the query above. Finally, I apologize for being a little strident here. I've been advocating for and using Postgres for four years, and it's frustrating when a new version results in a serious and noticeable performance degradation. Sincerely, Bob +----------------------------------+------------------------------------+ | Robert E. Bruccoleri, Ph.D. | Phone: 609 737 6383 | | President, Congenomics, Inc. | Fax: 609 737 7528 | | 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org | | P.O. Box 314 | URL: http://www.congen.com/~bruc | | Pennington, NJ 08534 | | +----------------------------------+------------------------------------+
"Robert E. Bruccoleri" wrote: > > explain select count(*) from comparisons_4 where code = 80003; > NOTICE: QUERY PLAN: > > Aggregate (cost=15659.29..15659.29 rows=1 width=0) > -> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=7391 width=0) > > EXPLAIN What is the type of field "code" ? --------------- Hannu
Re: Re: Performance degradation in PostgreSQL 7.1beta3 vs
From
bruc@stone.congenomics.com (Robert E. Bruccoleri)
Date:
Dear Hannu, > > "Robert E. Bruccoleri" wrote: > > > > explain select count(*) from comparisons_4 where code = 80003; > > NOTICE: QUERY PLAN: > > > > Aggregate (cost=15659.29..15659.29 rows=1 width=0) > > -> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=7391 width=0) > > > > EXPLAIN > > What is the type of field "code" ? int4 Do you think that should make a difference? +----------------------------------+------------------------------------+ | Robert E. Bruccoleri, Ph.D. | Phone: 609 737 6383 | | President, Congenomics, Inc. | Fax: 609 737 7528 | | 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org | | P.O. Box 314 | URL: http://www.congen.com/~bruc | | Pennington, NJ 08534 | | +----------------------------------+------------------------------------+
"Robert E. Bruccoleri" wrote: > > Dear Hannu, > > > > "Robert E. Bruccoleri" wrote: > > > > > > explain select count(*) from comparisons_4 where code = 80003; > > > NOTICE: QUERY PLAN: > > > > > > Aggregate (cost=15659.29..15659.29 rows=1 width=0) > > > -> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=7391 width=0) > > > > > > EXPLAIN > > > > What is the type of field "code" ? > > int4 > > Do you think that should make a difference? Probably not here. Sometimes it has made difference if the system does not recognize the other side of comparison (80003) as being of the same type as the index. what are the cost estimates when you run explain with seqscan disabled ? do => SET ENABLE_SEQSCAN TO OFF; see: (http://www.postgresql.org/devel-corner/docs/admin/runtime-config.htm#RUNTIME-CONFIG-OPTIMIZER) ----------------- Hannu
Re: Re: Performance degradation in PostgreSQL 7.1beta3 vs
From
bruc@stone.congenomics.com (Robert E. Bruccoleri)
Date:
Dear Hannu, > > "Robert E. Bruccoleri" wrote: > > > > Dear Hannu, > > > > > > "Robert E. Bruccoleri" wrote: > > > > > > > > explain select count(*) from comparisons_4 where code = 80003; > > > > NOTICE: QUERY PLAN: > > > > > > > > Aggregate (cost=15659.29..15659.29 rows=1 width=0) > > > > -> Seq Scan on comparisons_4 (cost=0.00..15640.81 rows=7391 width=0) > > > > > > > > EXPLAIN > > > > > > What is the type of field "code" ? > > > > int4 > > > > Do you think that should make a difference? > > Probably not here. > > Sometimes it has made difference if the system does not recognize > the other side of comparison (80003) as being of the same type as > the index. > > what are the cost estimates when you run explain with seqscan disabled ? > do => SET ENABLE_SEQSCAN TO OFF; > see: > (http://www.postgresql.org/devel-corner/docs/admin/runtime-config.htm#RUNTIME-CONFIG-OPTIMIZER) Here's the result from EXPLAIN: Aggregate (cost=19966.21..19966.21 rows=1 width=0) -> Index Scan using comparisons_4_code on comparisons_4 (cost=0.00..19947.73rows=7391 width=0) The estimates are too high. --Bob +----------------------------------+------------------------------------+ | Robert E. Bruccoleri, Ph.D. | Phone: 609 737 6383 | | President, Congenomics, Inc. | Fax: 609 737 7528 | | 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org | | P.O. Box 314 | URL: http://www.congen.com/~bruc | | Pennington, NJ 08534 | | +----------------------------------+------------------------------------+
"Robert E. Bruccoleri" wrote: > > > > > what are the cost estimates when you run explain with seqscan disabled ? > > do => SET ENABLE_SEQSCAN TO OFF; > > see: > > (http://www.postgresql.org/devel-corner/docs/admin/runtime-config.htm#RUNTIME-CONFIG-OPTIMIZER) > > Here's the result from EXPLAIN: > > Aggregate (cost=19966.21..19966.21 rows=1 width=0) > -> Index Scan using comparisons_4_code on comparisons_4 (cost=0.00..19947.73 rows=7391 width=0) > > The estimates are too high. You could try experimenting with SET RANDOM_PAGE_COST TO x.x; from the page above RANDOM_PAGE_COST (floating point) Sets the query optimizer's estimate of the cost of a nonsequentially fetched disk page. this is measured as a multiple of the cost of a sequential page fetch. Note: Unfortunately, there is no well-defined method of determining ideal values for the family of "COST" variables that were just described. You are encouraged to experiment and share your findings. ------------- Hannu