Thread: Query optimizing - paradox behave

Query optimizing - paradox behave

From
"David M. Richter"
Date:
Hallo !

I want to tune a database. There a many redundant datas in the database
, because of all the relations were consider as n:m relations. But the
most of them are 1:n Relations. So my approach was to cut the
redundancies to get more performance. But .. happens!

The query with the 3 tables is faster than the query with 2 tables.

That is paradox to the Explain output.
And: the real database functions like dbPG95GetIndex and all functions
defined by me are slower.
The whole program is slower than before.
I disabled all the index.(since with index the behavior is the same) The
database pacs ist only restructured. They have the same data. With
database pacs and compare a vacuum was made.
I looked at the user time , since system time is faked because my
testprogram hands over the control to the postmaster and the postmaster
is doing his own work. So I made a lot of tests to get a average
usertime. So escapes will be catched and eliminated.

Here are the tabledescriptions for the original database "compare":

tables i.e.
There is a n:m relationship between patient and study realized with
relpatient_study000 relationtable.

                    Table "patient"
      Attribute       |          Type          | Modifier
----------------------+------------------------+----------
 chilioid             | character varying(80)  |
 name                 | text                   |
 id                   | character varying(256) |
 birthdate            | date                   |
 birthtime            | time                   |
 sex                  | character(1)           |
 medicalrecordlocator | character varying(128) |

                       Table "study"
       Attribute        |          Type          | Modifier
------------------------+------------------------+----------
 chilioid               | character varying(80)  |
 instanceuid            | character varying(64)  |
 id                     | character varying(64)  |
 studydate              | date                   |
 studytime              | time                   |
 modality               | character varying(2)   |
 manufacturer           | character varying(128) |
 referingphysician      | text                   |
 description            | character varying(128) |
 manufacturersmodelname | character varying(128) |
 importtime             | double precision       |
 chilisenderid          | character varying(80)  |
 accessionnumber        | character varying(64)  |
 institutionname        | character varying(128) |
 workflowstate          | character varying(8)   |
 flags                  | character varying(8)   |
 performingphysician    | character varying(128) |
 reportingphysician     | character varying(128) |


         Table "relpatient_study000"
 Attribute |         Type          | Modifier
-----------+-----------------------+----------
 chilioid  | character varying(80) |
 parentoid | character varying(80) |
 childoid  | character varying(80) |

 parentoid is here the oid of the patient and childoid is here the oid
of the study.

Thats the query with the original database "compare":

time psql -d compare -c "SELECT patient.*,study.* FROM
patient,study,relpatient_study000 r0 WHERE
(patient.chiliOID=r0.parentOID AND study.chiliOID=r0.childOID) order by
patient.name using <" > 3tableni
1.671u 0.130s 0:11.14 16.4%     0+0k 0+0io 208pf+0w

psql -d compare -c "EXPLAIN SELECT patient.*,study.* FROM
patient,study,relpatient_study000 r0 WHERE
(patient.chiliOID=r0.parentOID AND study.chiliOID=r0.childOID) order by
patient.name using <" > 3tableni
NOTICE:  QUERY PLAN:

Sort  (cost=1135170635.79..1135170635.79 rows=748802386 width=296)
  ->  Merge Join  (cost=1025510.64..1057837.48 rows=748802386 width=296)
        ->  Sort  (cost=1017989.22..1017989.22 rows=2556861 width=96)
              ->  Merge Join  (cost=4287.84..4763.21 rows=2556861
width=96)
                    ->  Sort  (cost=990.43..990.43 rows=8725 width=72)
                          ->  Seq Scan on patient  (cost=0.00..212.25
rows=8725 width=72)
                    ->  Sort  (cost=3297.40..3297.40 rows=29305
width=24)
                          ->  Seq Scan on relpatient_study000 r0
(cost=0.00..774.05 rows=29305 width=24)
        ->  Sort  (cost=7521.42..7521.42 rows=29286 width=200)
              ->  Seq Scan on study  (cost=0.00..1116.86 rows=29286
width=200)

-----------------------------



Thats the query with the new restructured database "pacs":

time psql -d pacs -c "SELECT patient.*,study.* FROM patient,study WHERE
(patient.chiliOID=study.patientOID ) order by patient.name using <" >
2tableni
1.730u 0.120s 0:09.44 19.5%     0+0k 0+0io 208pf+0w


psql -d pacs -c "EXPLAIN SELECT patient.*,study.* FROM patient,study
WHERE (patient.chiliOID=study.patientOID ) order by patient.name using
<" > 2tableni       NOTICE:  QUERY PLAN:

Sort  (cost=2194791.19..2194791.19 rows=2555204 width=284)
  ->  Merge Join  (cost=8978.44..9453.57 rows=2555204 width=284)
        ->  Sort  (cost=990.43..990.43 rows=8725 width=72)
              ->  Seq Scan on patient  (cost=0.00..212.25 rows=8725
width=72)
        ->  Sort  (cost=7988.00..7988.00 rows=29286 width=212)
              ->  Seq Scan on study  (cost=0.00..1236.86 rows=29286
width=212)


Restructured tables i.e.
PAtient-study relationship is 1:n realized with column patientoid in
table study.

                    Table "patient"
      Attribute       |          Type          | Modifier
----------------------+------------------------+----------
 chilioid             | character varying(80)  |
 name                 | text                   |
 id                   | character varying(256) |
 birthdate            | date                   |
 birthtime            | time                   |
 sex                  | character(1)           |
 medicalrecordlocator | character varying(128) |



Table "study"
       Attribute        |          Type          | Modifier
------------------------+------------------------+----------
 chilioid               | character varying(80)  |
 instanceuid            | character varying(64)  |
 id                     | character varying(64)  |
 studydate              | date                   |
 studytime              | time                   |
 modality               | character varying(2)   |
 manufacturer           | character varying(128) |
 referingphysician      | text                   |
 description            | character varying(128) |
 manufacturersmodelname | character varying(128) |
 importtime             | double precision       |
 chilisenderid          | character varying(80)  |
 accessionnumber        | character varying(64)  |
 institutionname        | character varying(128) |
 workflowstate          | character varying(8)   |
 flags                  | character varying(8)   |
 performingphysician    | character varying(128) |
 reportingphysician     | character varying(128) |
 patientoid             | character varying(80)  |


The times of the processes are escape-eliminated by statistical methods.

I determined that the "compare" database is 8% times faster than the new
restructured "pacs" database.
How can I understand this? Whats my mistake?

Anybody  who can make some sugestions on the above will
receive my enthusiastic gratitude

David M. Richter
Attachment

Re: Query optimizing - paradox behave

From
Stephan Szabo
Date:
What version are you using? (dbPG95GetIndex?)

On Thu, 19 Jul 2001, David M. Richter wrote:

> Hallo !
> 
> I want to tune a database. There a many redundant datas in the database
> , because of all the relations were consider as n:m relations. But the
> most of them are 1:n Relations. So my approach was to cut the
> redundancies to get more performance. But .. happens!
> 
> The query with the 3 tables is faster than the query with 2 tables. 
> 
> That is paradox to the Explain output.
> And: the real database functions like dbPG95GetIndex and all functions
> defined by me are slower.



Re: Query optimizing - paradox behave

From
"Josh Berkus"
Date:
David,

You will no doubt hear later from the tuning experts on the list.
However, let me save everybody some time by verifying some basics:

1. When you restructured the database, you ran VACUUM ANALYZE on the new
database (pacs)?

2. You said that you "eliminated the indexes" because they weren't
helping performance.  Is this right?  It seems a little hard to figure
from here.

3. General Advice:  If you're concerned about query performance, get rid
of those VARCHAR(80) primary keys and replace them with INT4 or INT8!
The math is easy to do:
    If you're processing INT8 keys for 1,000,000 table rows that's
8,000,000 (roughly 8mb) data on disk and data being processed.
    If you're processing VARCHAR(80) keys for 1,000,000 table rows, thats
82,000,000 bytes (82 mb) on disk and in ram to be processed.
    In theory, you could get a 10-fold increase in JOIN performance by
switching to INT8 keys.  In practice, its probably more like double or
triple but that ain't bad, either.

-Josh Berkus



______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Attachment

Re: Query optimizing - paradox behave

From
"David M. Richter"
Date:
Hallo Stephan!

Are there several versions of dbPG95GetIndex existing, or did you
mention postgres version 7.1.2?
With a little help I have killed the Problem!! Yeah,
But now I trying to improve the C-code. Do You have any experience with
optimizing C-Code. Are there some new miracle-like function, wich
improves the speed dramatically?
Do You know some database options wich I could change for better
performance?
Thanks a lot for Your suggestions!!

David

>What version are you using? (dbPG95GetIndex?)
Attachment