Tuning/performance question. - Mailing list pgsql-general
From | David Griffiths |
---|---|
Subject | Tuning/performance question. |
Date | |
Msg-id | 01d201c38573$84606930$6501a8c0@griffiths2 Whole thread Raw |
In response to | Re: populate table with large csv file ("Dave [Hawk-Systems]" <dave@hawk-systems.com>) |
Responses |
Re: Tuning/performance question.
Re: Tuning/performance question. |
List | pgsql-general |
We are doing some performance testing among various databases (Oracle, MySQL and Postgres). One of the queries is showing Postgres lagging quite a bit: SELECT count(*) FROM commercial_entity, country, user_account, address_list LEFT JOIN state_province ON address_list.state_province_id = state_province.state_province_id LEFT JOIN contact_info ON address_list.contact_info_id = contact_info.contact_info_id WHERE address_list.address_type_id = 101 AND commercial_entity.commercial_entity_id = address_list.commercial_entity_id AND address_list.country_id = country.country_id AND commercial_entity.user_account_id = user_account.user_account_id AND user_account.user_role_id IN (101, 101); I ran a "vacuum analyze" after realizing that I had loaded all the data into the database without redoing the statistics; the query jumped from 19 seconds to 41 seconds _after_ the analyze. I'd also like to make sure my query is performing correctly - I want all the count of records where the commercial_entity matches user_account, address_list, country, and a left-outer-join on address_list-province and address_list-contact_info. Finally, I read some posts on the shared_buffers; they stated that the shared_buffers should be set to 1/4 to 1/5 of total memory available. Is that correct? I give the MySQL/InnoDB buffers about 70% of the 2 gig on the machine. Here's the explain (I'm not too familiar with reading a Postgres explain...): ---------------------------------------------------------------------------- ---------------------------------------------------------------- Aggregate (cost=52951.09..52951.09 rows=1 width=116) -> Merge Join (cost=52941.61..52950.83 rows=105 width=116) Merge Cond: ("outer".country_id = "inner".country_id) -> Index Scan using country_pkey on country (cost=0.00..7.54 rows=231 width=11) -> Sort (cost=52941.61..52941.88 rows=105 width=105) Sort Key: address_list.country_id -> Merge Join (cost=52729.54..52938.07 rows=105 width=105) Merge Cond: ("outer".commercial_entity_id = "inner".commercial_entity_id) -> Sort (cost=8792.01..8792.52 rows=201 width=36) Sort Key: commercial_entity.commercial_entity_id -> Nested Loop (cost=0.00..8784.31 rows=201 width=36) -> Index Scan using usr_acc_usr_role_id_i on user_account (cost=0.00..2403.08 rows=1401 width=12) Index Cond: (user_role_id = 101::numeric) -> Index Scan using comm_ent_usr_acc_id_i on commercial_entity (cost=0.00..4.54 rows=1 width=24) Index Cond: (commercial_entity.user_account_id = "outer".user_account_id) -> Sort (cost=43937.53..44173.84 rows=94526 width=69) Sort Key: address_list.commercial_entity_id -> Merge Join (cost=29019.03..32585.73 rows=94526 width=69) Merge Cond: ("outer".contact_info_id = "inner".contact_info_id) -> Index Scan using contact_info_pkey on contact_info (cost=0.00..3366.76 rows=56435 width=12) -> Sort (cost=29019.03..29255.34 rows=94526 width=57) Sort Key: address_list.contact_info_id -> Merge Join (cost=16930.18..18354.55 rows=94526 width=57) Merge Cond: ("outer".state_province_id = "inner".state_province_id) -> Index Scan using state_province_pkey on state_province (cost=0.00..3.81 rows=67 width=11) -> Sort (cost=16930.18..17166.50 rows=94526 width=46) Sort Key: address_list.state_province_id -> Seq Scan on address_list (cost=0.00..6882.52 rows=94526 width=46) Filter: (address_type_id = 101::numeric) What's the "Sort (cost...)"? I noticed that joining the address_list to country was slow; there was no index on just country_id; there were composite indexes on multiple columns, so I added one and did a vacuum analyze on the table, and got: Aggregate (cost=54115.74..54115.74 rows=1 width=116) -> Merge Join (cost=54105.91..54115.46 rows=109 width=116) Merge Cond: ("outer".country_id = "inner".country_id) -> Index Scan using country_pkey on country (cost=0.00..7.54 rows=231 width=11) -> Sort (cost=54105.91..54106.19 rows=110 width=105) Sort Key: address_list.country_id -> Merge Join (cost=53884.34..54102.18 rows=110 width=105) Merge Cond: ("outer".commercial_entity_id = "inner".commercial_entity_id) -> Sort (cost=8792.01..8792.52 rows=201 width=36) Sort Key: commercial_entity.commercial_entity_id -> Nested Loop (cost=0.00..8784.31 rows=201 width=36) -> Index Scan using usr_acc_usr_role_id_i on user_account (cost=0.00..2403.08 rows=1401 width=12) Index Cond: (user_role_id = 101::numeric) -> Index Scan using comm_ent_usr_acc_id_i on commercial_entity (cost=0.00..4.54 rows=1 width=24) Index Cond: (commercial_entity.user_account_id = "outer".user_account_id) -> Sort (cost=45092.32..45335.37 rows=97221 width=69) Sort Key: address_list.commercial_entity_id -> Merge Join (cost=29770.81..33338.09 rows=97221 width=69) Merge Cond: ("outer".contact_info_id = "inner".contact_info_id) -> Index Scan using contact_info_pkey on contact_info (cost=0.00..3366.76 rows=56435 width=12) -> Sort (cost=29770.81..30013.86 rows=97221 width=57) Sort Key: address_list.contact_info_id -> Merge Join (cost=17271.79..18731.55 rows=97221 width=57) Merge Cond: ("outer".state_province_id = "inner".state_province_id) -> Index Scan using state_province_pkey on state_province (cost=0.00..3.81 rows=67 width=11) -> Sort (cost=17271.79..17514.84 rows=97221 width=46) Sort Key: address_list.state_province_id -> Seq Scan on address_list (cost=0.00..6882.52 rows=97221 width=46) Filter: (address_type_id = 101::numeric) No difference. Note that all the keys that are used in the joins are numeric(10)'s, so there shouldn't be any cast-issues. When you create a primary key on a table, is an index created (I seem to remember a message going by stating that an index would be added). For comparison, our production Oracle database (running on nearly identical hardware - the Postgres machine has IDE-RAID-5 and the Oracle machine has RAID mirroring) takes between 1 and 2 seconds. I've got one last question, and I really hope responses don't get sidetracked by it; I see alot of negative comments towards MySQL, many of them stating that it's a database layer overtop of the file system. Can someone explain why Postgres is better than MySQL 4.0.14 using InnoDB? MySQL, on the above query, with one less index (on address_list.country) takes 0.20 seconds. David.
pgsql-general by date: