Tuning/performance question. - Mailing list pgsql-general

From David Griffiths
Subject Tuning/performance question.
Date
Msg-id 01d201c38573$84606930$6501a8c0@griffiths2
Whole thread Raw
In response to Re: populate table with large csv file  ("Dave [Hawk-Systems]" <dave@hawk-systems.com>)
Responses Re: Tuning/performance question.
Re: Tuning/performance question.
List pgsql-general
We are doing some performance testing among various databases (Oracle, MySQL
and Postgres).

One of the queries is showing Postgres lagging quite a bit:

SELECT count(*)
FROM commercial_entity, country, user_account, address_list
LEFT JOIN state_province ON address_list.state_province_id =
state_province.state_province_id
LEFT JOIN contact_info ON address_list.contact_info_id =
contact_info.contact_info_id
WHERE address_list.address_type_id = 101
AND commercial_entity.commercial_entity_id =
address_list.commercial_entity_id
AND address_list.country_id = country.country_id
AND commercial_entity.user_account_id = user_account.user_account_id
AND user_account.user_role_id IN (101, 101);

I ran a "vacuum analyze" after realizing that I had loaded all the data into
the database without redoing the statistics; the query jumped from 19
seconds to 41 seconds _after_ the analyze.

I'd also like to make sure my query is performing correctly - I want all the
count of records where the commercial_entity matches user_account,
address_list, country, and a left-outer-join on address_list-province and
address_list-contact_info.

Finally, I read some posts on the shared_buffers; they stated that the
shared_buffers should be set to 1/4 to 1/5 of total memory available. Is
that correct? I give the MySQL/InnoDB buffers about 70% of the 2 gig on the
machine.


Here's the explain (I'm not too familiar with reading a Postgres
explain...):

----------------------------------------------------------------------------
----------------------------------------------------------------
 Aggregate  (cost=52951.09..52951.09 rows=1 width=116)
   ->  Merge Join  (cost=52941.61..52950.83 rows=105 width=116)
         Merge Cond: ("outer".country_id = "inner".country_id)
         ->  Index Scan using country_pkey on country  (cost=0.00..7.54
rows=231 width=11)
         ->  Sort  (cost=52941.61..52941.88 rows=105 width=105)
               Sort Key: address_list.country_id
               ->  Merge Join  (cost=52729.54..52938.07 rows=105 width=105)
                     Merge Cond: ("outer".commercial_entity_id =
"inner".commercial_entity_id)
                     ->  Sort  (cost=8792.01..8792.52 rows=201 width=36)
                           Sort Key: commercial_entity.commercial_entity_id
                           ->  Nested Loop  (cost=0.00..8784.31 rows=201
width=36)
                                 ->  Index Scan using usr_acc_usr_role_id_i
on user_account  (cost=0.00..2403.08 rows=1401 width=12)
                                       Index Cond: (user_role_id =
101::numeric)
                                 ->  Index Scan using comm_ent_usr_acc_id_i
on commercial_entity  (cost=0.00..4.54 rows=1 width=24)
                                       Index Cond:
(commercial_entity.user_account_id = "outer".user_account_id)
                     ->  Sort  (cost=43937.53..44173.84 rows=94526 width=69)
                           Sort Key: address_list.commercial_entity_id
                           ->  Merge Join  (cost=29019.03..32585.73
rows=94526 width=69)
                                 Merge Cond: ("outer".contact_info_id =
"inner".contact_info_id)
                                 ->  Index Scan using contact_info_pkey on
contact_info  (cost=0.00..3366.76 rows=56435 width=12)
                                 ->  Sort  (cost=29019.03..29255.34
rows=94526 width=57)
                                       Sort Key:
address_list.contact_info_id
                                       ->  Merge Join
(cost=16930.18..18354.55 rows=94526 width=57)
                                             Merge Cond:
("outer".state_province_id = "inner".state_province_id)
                                             ->  Index Scan using
state_province_pkey on state_province  (cost=0.00..3.81 rows=67 width=11)
                                             ->  Sort
(cost=16930.18..17166.50 rows=94526 width=46)
                                                   Sort Key:
address_list.state_province_id
                                                   ->  Seq Scan on
address_list  (cost=0.00..6882.52 rows=94526 width=46)
                                                         Filter:
(address_type_id = 101::numeric)

What's the "Sort (cost...)"?

I noticed that joining the address_list to country was slow; there was no
index on just country_id; there were composite indexes on multiple columns,
so I added one and did a vacuum analyze on the table, and got:

 Aggregate  (cost=54115.74..54115.74 rows=1 width=116)
   ->  Merge Join  (cost=54105.91..54115.46 rows=109 width=116)
         Merge Cond: ("outer".country_id = "inner".country_id)
         ->  Index Scan using country_pkey on country  (cost=0.00..7.54
rows=231 width=11)
         ->  Sort  (cost=54105.91..54106.19 rows=110 width=105)
               Sort Key: address_list.country_id
               ->  Merge Join  (cost=53884.34..54102.18 rows=110 width=105)
                     Merge Cond: ("outer".commercial_entity_id =
"inner".commercial_entity_id)
                     ->  Sort  (cost=8792.01..8792.52 rows=201 width=36)
                           Sort Key: commercial_entity.commercial_entity_id
                           ->  Nested Loop  (cost=0.00..8784.31 rows=201
width=36)
                                 ->  Index Scan using usr_acc_usr_role_id_i
on user_account  (cost=0.00..2403.08 rows=1401 width=12)
                                       Index Cond: (user_role_id =
101::numeric)
                                 ->  Index Scan using comm_ent_usr_acc_id_i
on commercial_entity  (cost=0.00..4.54 rows=1 width=24)
                                       Index Cond:
(commercial_entity.user_account_id = "outer".user_account_id)
                     ->  Sort  (cost=45092.32..45335.37 rows=97221 width=69)
                           Sort Key: address_list.commercial_entity_id
                           ->  Merge Join  (cost=29770.81..33338.09
rows=97221 width=69)
                                 Merge Cond: ("outer".contact_info_id =
"inner".contact_info_id)
                                 ->  Index Scan using contact_info_pkey on
contact_info  (cost=0.00..3366.76 rows=56435 width=12)
                                 ->  Sort  (cost=29770.81..30013.86
rows=97221 width=57)
                                       Sort Key:
address_list.contact_info_id
                                       ->  Merge Join
(cost=17271.79..18731.55 rows=97221 width=57)
                                             Merge Cond:
("outer".state_province_id = "inner".state_province_id)
                                             ->  Index Scan using
state_province_pkey on state_province  (cost=0.00..3.81 rows=67 width=11)
                                             ->  Sort
(cost=17271.79..17514.84 rows=97221 width=46)
                                                   Sort Key:
address_list.state_province_id
                                                   ->  Seq Scan on
address_list  (cost=0.00..6882.52 rows=97221 width=46)
                                                         Filter:
(address_type_id = 101::numeric)

No difference. Note that all the keys that are used in the joins are
numeric(10)'s, so there shouldn't be any cast-issues.

When you create a primary key on a table, is an index created (I seem to
remember a message going by stating that an index would be added).

For comparison, our production Oracle database (running on nearly identical
hardware - the Postgres machine has IDE-RAID-5 and the Oracle machine has
RAID mirroring) takes between 1 and 2 seconds.

I've got one last question, and I really hope responses don't get
sidetracked by it; I see alot of negative comments towards MySQL, many of
them stating that it's a database layer overtop of the file system. Can
someone explain why Postgres is better than MySQL 4.0.14 using InnoDB?
MySQL, on the above query, with one less index (on address_list.country)
takes 0.20 seconds.

David.


pgsql-general by date:

Previous
From: Dennis Gearon
Date:
Subject: Re: State of Beta 2
Next
From: Ron Johnson
Date:
Subject: Re: State of Beta 2