ported application having performance issues - Mailing list pgsql-performance
From | John Mendenhall |
---|---|
Subject | ported application having performance issues |
Date | |
Msg-id | 20050630222451.GA17628@calvin.surfutopia.net Whole thread Raw |
Responses |
Re: ported application having performance issues
Re: ported application having performance issues |
List | pgsql-performance |
pgsql performance gurus, We ported an application from oracle to postgresql. We are experiencing an approximately 50% performance hit. I am in the process of isolating the problem. I have searched the internet (google) and tried various things. Only one thing seems to work. I am trying to find out if our solution is the only option, or if I am doing something terribly wrong. The original application runs on the following: hw: cpu0: SUNW,UltraSPARC-IIi (upaid 0 impl 0x12 ver 0x12 clock 302 MHz) mem = 393216K (0x18000000) sw: Solaris 5.6 Oracle 7.3.2.2.0 Apache 1.3.27 Perl 5.004_04 mod_perl 1.27 DBI 1.20 DBD::Oracle 1.12 The ported application runs on the following: hw: unix: [ID 389951 kern.info] mem = 262144K (0x10000000) rootnex: [ID 466748 kern.info] root nexus = Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 360MHz) sw: Solaris 5.9 PostgreSQL 7.4.6 Apache 1.3.33 Perl 5.8.6 mod_perl 1.29 DBI 1.46 DBD::Pg 1.40.1 Based on assistance from another list, we have tried the following: (1) Upgraded DBD::Pg to current version 1.43 (2) Ensured all tables are analyzed regularly (3) Setting some memory options in postgresql.conf (4) Located a handful of slow queries by setting log_min_duration_statement to 250. Future options we will consider are: (1) Attempting other option settings, like random_page_cost (2) Upgrading db server to current version 8.0.3 With our handful of slow queries, we have done several iterations of changes to determine what will address the issues. We have broken this down to the direction of a join and setting the enable_seqscan to off. The table definitions are at the bottom of this e-mail. There is one large table (contacts) and one smaller table (lead_requests). The final SQL is as follows: SELECT c.id AS contact_id, lr.id AS lead_request_id FROM lead_requests lr JOIN contacts c ON (c.id = lr.contact_id) WHERE c.partner_id IS NULL ORDER BY contact_id I ran this query against freshly vacuum analyzed tables. The first run is as follows: db=> explain analyze SELECT db-> c.id AS contact_id, db-> lr.id AS lead_request_id db-> FROM db-> lead_requests lr db-> JOIN contacts c ON (c.id = lr.contact_id) db-> WHERE db-> c.partner_id IS NULL db-> ORDER BY db-> contact_id db-> ; LOG: duration: 4618.133 ms statement: explain analyze SELECT c.id AS contact_id, lr.id AS lead_request_id FROM lead_requests lr JOIN contacts c ON (c.id = lr.contact_id) WHERE c.partner_id IS NULL ORDER BY contact_id QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=4272.84..4520.82 rows=1230 width=21) (actual time=3998.771..4603.739 rows=699 loops=1) Merge Cond: ("outer".contact_id = "inner".id) -> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..74.09 rows=1435 width=21) (actual time=0.070..22.431rows=1430 loops=1) -> Sort (cost=4272.84..4352.28 rows=31775 width=11) (actual time=3998.554..4130.067 rows=32640 loops=1) Sort Key: c.id -> Seq Scan on contacts c (cost=0.00..1896.77 rows=31775 width=11) (actual time=0.040..326.135 rows=32501 loops=1) Filter: (partner_id IS NULL) Total runtime: 4611.323 ms (8 rows) As you can see, run time over 4 seconds. Then, I set enable_seqscan = off. db=> set enable_seqscan=off; SET Then I ran the exact same query: db=> explain analyze SELECT db-> c.id AS contact_id, db-> lr.id AS lead_request_id db-> FROM db-> lead_requests lr db-> JOIN contacts c ON (c.id = lr.contact_id) db-> WHERE db-> c.partner_id IS NULL db-> ORDER BY db-> contact_id db-> ; LOG: duration: 915.304 ms statement: explain analyze SELECT c.id AS contact_id, lr.id AS lead_request_id FROM lead_requests lr JOIN contacts c ON (c.id = lr.contact_id) WHERE c.partner_id IS NULL ORDER BY contact_id QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..4749.84 rows=1230 width=21) (actual time=0.213..901.315 rows=699 loops=1) Merge Cond: ("outer".contact_id = "inner".id) -> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..74.09 rows=1435 width=21) (actual time=0.073..21.448rows=1430 loops=1) -> Index Scan using contacts_pkey on contacts c (cost=0.00..4581.30 rows=31775 width=11) (actual time=0.038..524.217rows=32640 loops=1) Filter: (partner_id IS NULL) Total runtime: 903.638 ms (6 rows) Under 1 second. Excellent. The contacts table has 30000+ records. The lead_requests table has just around 1500 records. I want the query to start with the join at the lead_requests table since the number is so much smaller. So, bottom line is this: In order to get the performance to an acceptable level (I can live with under 1 second, though less time would be better), do I have to set enable_seqscan to off every time I run this query? Is there a better or more acceptable way to handle this? Thank you very much in advance for any pointers you can provide. And, if this is the wrong forum for this question, please let me know and I'll ask it elsewhere. JohnM ----- table definitions ----- ----- db=> \d contacts Table "db.contacts" Column | Type | Modifiers ------------------------------+-----------------------------+----------- id | numeric(38,0) | not null db_id | character varying(32) | firstname | character varying(64) | lastname | character varying(64) | company | character varying(128) | email | character varying(256) | phone | character varying(64) | address | character varying(128) | city | character varying(128) | state | character varying(32) | postalcode | character varying(16) | country | character varying(2) | not null contact_info_modified | character(1) | token_id | numeric(38,0) | status_id | numeric(38,0) | status_last_modified | timestamp without time zone | notes | character varying(2000) | demo_schedule | timestamp without time zone | partner_id | numeric(38,0) | prev_partner_id | numeric(38,0) | prev_prev_partner_id | numeric(38,0) | site_last_visited | timestamp without time zone | source_id | numeric(4,0) | demo_requested | timestamp without time zone | sourcebook_requested | timestamp without time zone | zip | numeric(8,0) | suffix | numeric(8,0) | feedback_request_sent | timestamp without time zone | products_sold | character varying(512) | other_brand | character varying(512) | printsample_requested | timestamp without time zone | indoor_media_sample | timestamp without time zone | outdoor_media_sample | timestamp without time zone | printers_owned | character varying(256) | business_type | character varying(256) | printers_owned2 | character varying(256) | contact_quality_id | numeric(38,0) | est_annual_value | numeric(38,2) | likelyhood_of_closing | numeric(38,0) | priority | numeric(38,0) | business_type_id | numeric(38,0) | lead_last_modified | timestamp without time zone | lead_value | numeric(38,2) | channel_contact_flag | character(1) | request_status_last_modified | timestamp without time zone | master_key_number | numeric(38,0) | master_key_token | character varying(32) | current_media_cust | character(1) | kodak_media_id | numeric(38,0) | printer_sample_id | numeric(38,0) | quantity_used_id | numeric(38,0) | rip_used_id | numeric(38,0) | language_code | character varying(3) | region_id | numeric(38,0) | not null lead_deleted | timestamp without time zone | last_request_set_status_id | numeric(38,0) | address2 | character varying(128) | media_usage_id | numeric(38,0) | Indexes: "contacts_pkey" primary key, btree (id) "contacts_partner_id_idx" btree (partner_id) "contacts_partner_id_null_idx" btree (partner_id) WHERE (partner_id IS NULL) "contacts_token_id_idx" btree (token_id) Check constraints: "sys_c0050644" CHECK (country IS NOT NULL) "sys_c0050643" CHECK (id IS NOT NULL) "sys_c0050645" CHECK (region_id IS NOT NULL) Triggers: insert_master_key BEFORE INSERT ON contacts FOR EACH ROW EXECUTE PROCEDURE pg_fct_insert_master_key() ----- ----- db=> \d lead_requests Table "db.lead_requests" Column | Type | Modifiers -----------------------+-----------------------------+----------- id | numeric(38,0) | not null contact_id | numeric(38,0) | not null request_id | numeric(38,0) | not null date_requested | timestamp without time zone | not null must_update_by | timestamp without time zone | date_satisfied | timestamp without time zone | status_id | numeric(38,0) | request_scheduled | timestamp without time zone | session_log_id | numeric(38,0) | notes | character varying(2000) | status_last_modified | timestamp without time zone | reminder_last_sent | timestamp without time zone | data | character varying(2000) | fulfillment_status_id | numeric(38,0) | Indexes: "lead_requests_pkey" primary key, btree (id) "lead_requests_contact_id_idx" btree (contact_id) "lead_requests_request_id_idx" btree (request_id) Check constraints: "sys_c0049877" CHECK (request_id IS NOT NULL) "sys_c0049876" CHECK (contact_id IS NOT NULL) "sys_c0049878" CHECK (date_requested IS NOT NULL) ----- -- John Mendenhall john@surfutopia.net surf utopia internet services
pgsql-performance by date: