Thread: Query optimizing - paradox behave
Hallo ! I want to tune a database. There a many redundant datas in the database , because of all the relations were consider as n:m relations. But the most of them are 1:n Relations. So my approach was to cut the redundancies to get more performance. But .. happens! The query with the 3 tables is faster than the query with 2 tables. That is paradox to the Explain output. And: the real database functions like dbPG95GetIndex and all functions defined by me are slower. The whole program is slower than before. I disabled all the index.(since with index the behavior is the same) The database pacs ist only restructured. They have the same data. With database pacs and compare a vacuum was made. I looked at the user time , since system time is faked because my testprogram hands over the control to the postmaster and the postmaster is doing his own work. So I made a lot of tests to get a average usertime. So escapes will be catched and eliminated. Here are the tabledescriptions for the original database "compare": tables i.e. There is a n:m relationship between patient and study realized with relpatient_study000 relationtable. Table "patient" Attribute | Type | Modifier ----------------------+------------------------+---------- chilioid | character varying(80) | name | text | id | character varying(256) | birthdate | date | birthtime | time | sex | character(1) | medicalrecordlocator | character varying(128) | Table "study" Attribute | Type | Modifier ------------------------+------------------------+---------- chilioid | character varying(80) | instanceuid | character varying(64) | id | character varying(64) | studydate | date | studytime | time | modality | character varying(2) | manufacturer | character varying(128) | referingphysician | text | description | character varying(128) | manufacturersmodelname | character varying(128) | importtime | double precision | chilisenderid | character varying(80) | accessionnumber | character varying(64) | institutionname | character varying(128) | workflowstate | character varying(8) | flags | character varying(8) | performingphysician | character varying(128) | reportingphysician | character varying(128) | Table "relpatient_study000" Attribute | Type | Modifier -----------+-----------------------+---------- chilioid | character varying(80) | parentoid | character varying(80) | childoid | character varying(80) | parentoid is here the oid of the patient and childoid is here the oid of the study. Thats the query with the original database "compare": time psql -d compare -c "SELECT patient.*,study.* FROM patient,study,relpatient_study000 r0 WHERE (patient.chiliOID=r0.parentOID AND study.chiliOID=r0.childOID) order by patient.name using <" > 3tableni 1.671u 0.130s 0:11.14 16.4% 0+0k 0+0io 208pf+0w psql -d compare -c "EXPLAIN SELECT patient.*,study.* FROM patient,study,relpatient_study000 r0 WHERE (patient.chiliOID=r0.parentOID AND study.chiliOID=r0.childOID) order by patient.name using <" > 3tableni NOTICE: QUERY PLAN: Sort (cost=1135170635.79..1135170635.79 rows=748802386 width=296) -> Merge Join (cost=1025510.64..1057837.48 rows=748802386 width=296) -> Sort (cost=1017989.22..1017989.22 rows=2556861 width=96) -> Merge Join (cost=4287.84..4763.21 rows=2556861 width=96) -> Sort (cost=990.43..990.43 rows=8725 width=72) -> Seq Scan on patient (cost=0.00..212.25 rows=8725 width=72) -> Sort (cost=3297.40..3297.40 rows=29305 width=24) -> Seq Scan on relpatient_study000 r0 (cost=0.00..774.05 rows=29305 width=24) -> Sort (cost=7521.42..7521.42 rows=29286 width=200) -> Seq Scan on study (cost=0.00..1116.86 rows=29286 width=200) ----------------------------- Thats the query with the new restructured database "pacs": time psql -d pacs -c "SELECT patient.*,study.* FROM patient,study WHERE (patient.chiliOID=study.patientOID ) order by patient.name using <" > 2tableni 1.730u 0.120s 0:09.44 19.5% 0+0k 0+0io 208pf+0w psql -d pacs -c "EXPLAIN SELECT patient.*,study.* FROM patient,study WHERE (patient.chiliOID=study.patientOID ) order by patient.name using <" > 2tableni NOTICE: QUERY PLAN: Sort (cost=2194791.19..2194791.19 rows=2555204 width=284) -> Merge Join (cost=8978.44..9453.57 rows=2555204 width=284) -> Sort (cost=990.43..990.43 rows=8725 width=72) -> Seq Scan on patient (cost=0.00..212.25 rows=8725 width=72) -> Sort (cost=7988.00..7988.00 rows=29286 width=212) -> Seq Scan on study (cost=0.00..1236.86 rows=29286 width=212) Restructured tables i.e. PAtient-study relationship is 1:n realized with column patientoid in table study. Table "patient" Attribute | Type | Modifier ----------------------+------------------------+---------- chilioid | character varying(80) | name | text | id | character varying(256) | birthdate | date | birthtime | time | sex | character(1) | medicalrecordlocator | character varying(128) | Table "study" Attribute | Type | Modifier ------------------------+------------------------+---------- chilioid | character varying(80) | instanceuid | character varying(64) | id | character varying(64) | studydate | date | studytime | time | modality | character varying(2) | manufacturer | character varying(128) | referingphysician | text | description | character varying(128) | manufacturersmodelname | character varying(128) | importtime | double precision | chilisenderid | character varying(80) | accessionnumber | character varying(64) | institutionname | character varying(128) | workflowstate | character varying(8) | flags | character varying(8) | performingphysician | character varying(128) | reportingphysician | character varying(128) | patientoid | character varying(80) | The times of the processes are escape-eliminated by statistical methods. I determined that the "compare" database is 8% times faster than the new restructured "pacs" database. How can I understand this? Whats my mistake? Anybody who can make some sugestions on the above will receive my enthusiastic gratitude David M. Richter
Attachment
What version are you using? (dbPG95GetIndex?) On Thu, 19 Jul 2001, David M. Richter wrote: > Hallo ! > > I want to tune a database. There a many redundant datas in the database > , because of all the relations were consider as n:m relations. But the > most of them are 1:n Relations. So my approach was to cut the > redundancies to get more performance. But .. happens! > > The query with the 3 tables is faster than the query with 2 tables. > > That is paradox to the Explain output. > And: the real database functions like dbPG95GetIndex and all functions > defined by me are slower.
David, You will no doubt hear later from the tuning experts on the list. However, let me save everybody some time by verifying some basics: 1. When you restructured the database, you ran VACUUM ANALYZE on the new database (pacs)? 2. You said that you "eliminated the indexes" because they weren't helping performance. Is this right? It seems a little hard to figure from here. 3. General Advice: If you're concerned about query performance, get rid of those VARCHAR(80) primary keys and replace them with INT4 or INT8! The math is easy to do: If you're processing INT8 keys for 1,000,000 table rows that's 8,000,000 (roughly 8mb) data on disk and data being processed. If you're processing VARCHAR(80) keys for 1,000,000 table rows, thats 82,000,000 bytes (82 mb) on disk and in ram to be processed. In theory, you could get a 10-fold increase in JOIN performance by switching to INT8 keys. In practice, its probably more like double or triple but that ain't bad, either. -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Attachment
Hallo Stephan! Are there several versions of dbPG95GetIndex existing, or did you mention postgres version 7.1.2? With a little help I have killed the Problem!! Yeah, But now I trying to improve the C-code. Do You have any experience with optimizing C-Code. Are there some new miracle-like function, wich improves the speed dramatically? Do You know some database options wich I could change for better performance? Thanks a lot for Your suggestions!! David >What version are you using? (dbPG95GetIndex?)