ported application having performance issues - Mailing list pgsql-performance

From John Mendenhall
Subject ported application having performance issues
Date
Msg-id 20050630222451.GA17628@calvin.surfutopia.net
Whole thread Raw
Responses Re: ported application having performance issues  ("Joshua D. Drake" <jd@commandprompt.com>)
Re: ported application having performance issues  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
pgsql performance gurus,

We ported an application from oracle to postgresql.
We are experiencing an approximately 50% performance
hit.  I am in the process of isolating the problem.
I have searched the internet (google) and tried various
things.  Only one thing seems to work.  I am trying to
find out if our solution is the only option, or if I
am doing something terribly wrong.

The original application runs on the following:

hw:
cpu0: SUNW,UltraSPARC-IIi (upaid 0 impl 0x12 ver 0x12 clock 302 MHz)
mem = 393216K (0x18000000)

sw:
Solaris 5.6
Oracle 7.3.2.2.0
Apache 1.3.27
Perl 5.004_04
mod_perl 1.27
DBI 1.20
DBD::Oracle 1.12

The ported application runs on the following:

hw:
unix: [ID 389951 kern.info] mem = 262144K (0x10000000)
rootnex: [ID 466748 kern.info] root nexus = Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 360MHz)

sw:
Solaris 5.9
PostgreSQL 7.4.6
Apache 1.3.33
Perl 5.8.6
mod_perl 1.29
DBI 1.46
DBD::Pg 1.40.1

Based on assistance from another list, we have
tried the following:

(1) Upgraded DBD::Pg to current version 1.43
(2) Ensured all tables are analyzed regularly
(3) Setting some memory options in postgresql.conf
(4) Located a handful of slow queries by setting
    log_min_duration_statement to 250.

Future options we will consider are:

(1) Attempting other option settings, like
    random_page_cost
(2) Upgrading db server to current version 8.0.3

With our handful of slow queries, we have done
several iterations of changes to determine what
will address the issues.

We have broken this down to the direction of a join
and setting the enable_seqscan to off.  The table
definitions are at the bottom of this e-mail.  There
is one large table (contacts) and one smaller table
(lead_requests).  The final SQL is as follows:

SELECT
 c.id AS contact_id,
 lr.id AS lead_request_id
FROM
 lead_requests lr
  JOIN contacts c ON (c.id = lr.contact_id)
WHERE
 c.partner_id IS NULL
ORDER BY
 contact_id

I ran this query against freshly vacuum analyzed tables.

The first run is as follows:

db=> explain analyze      SELECT
db->       c.id AS contact_id,
db->       lr.id AS lead_request_id
db->      FROM
db->       lead_requests lr
db->        JOIN contacts c ON (c.id = lr.contact_id)
db->      WHERE
db->       c.partner_id IS NULL
db->      ORDER BY
db->       contact_id
db-> ;
LOG:  duration: 4618.133 ms  statement: explain analyze      SELECT
      c.id AS contact_id,
      lr.id AS lead_request_id
     FROM
      lead_requests lr
       JOIN contacts c ON (c.id = lr.contact_id)
     WHERE
      c.partner_id IS NULL
     ORDER BY
      contact_id
                                                                          QUERY PLAN
                                      

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=4272.84..4520.82 rows=1230 width=21) (actual time=3998.771..4603.739 rows=699 loops=1)
   Merge Cond: ("outer".contact_id = "inner".id)
   ->  Index Scan using lead_requests_contact_id_idx on lead_requests lr  (cost=0.00..74.09 rows=1435 width=21) (actual
time=0.070..22.431rows=1430 loops=1) 
   ->  Sort  (cost=4272.84..4352.28 rows=31775 width=11) (actual time=3998.554..4130.067 rows=32640 loops=1)
         Sort Key: c.id
         ->  Seq Scan on contacts c  (cost=0.00..1896.77 rows=31775 width=11) (actual time=0.040..326.135 rows=32501
loops=1)
               Filter: (partner_id IS NULL)
 Total runtime: 4611.323 ms
(8 rows)

As you can see, run time over 4 seconds.
Then, I set enable_seqscan = off.

db=> set enable_seqscan=off;
SET

Then I ran the exact same query:

db=> explain analyze      SELECT
db->       c.id AS contact_id,
db->       lr.id AS lead_request_id
db->      FROM
db->       lead_requests lr
db->        JOIN contacts c ON (c.id = lr.contact_id)
db->      WHERE
db->       c.partner_id IS NULL
db->      ORDER BY
db->       contact_id
db-> ;
LOG:  duration: 915.304 ms  statement: explain analyze      SELECT
      c.id AS contact_id,
      lr.id AS lead_request_id
     FROM
      lead_requests lr
       JOIN contacts c ON (c.id = lr.contact_id)
     WHERE
      c.partner_id IS NULL
     ORDER BY
      contact_id
                                                                          QUERY PLAN
                                      

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..4749.84 rows=1230 width=21) (actual time=0.213..901.315 rows=699 loops=1)
   Merge Cond: ("outer".contact_id = "inner".id)
   ->  Index Scan using lead_requests_contact_id_idx on lead_requests lr  (cost=0.00..74.09 rows=1435 width=21) (actual
time=0.073..21.448rows=1430 loops=1) 
   ->  Index Scan using contacts_pkey on contacts c  (cost=0.00..4581.30 rows=31775 width=11) (actual
time=0.038..524.217rows=32640 loops=1) 
         Filter: (partner_id IS NULL)
 Total runtime: 903.638 ms
(6 rows)

Under 1 second.  Excellent.

The contacts table has 30000+ records.
The lead_requests table has just around 1500 records.
I want the query to start with the join at the lead_requests
table since the number is so much smaller.

So, bottom line is this:
In order to get the performance to an acceptable level
(I can live with under 1 second, though less time would
be better), do I have to set enable_seqscan to off every
time I run this query?  Is there a better or more acceptable
way to handle this?

Thank you very much in advance for any pointers you can
provide.  And, if this is the wrong forum for this question,
please let me know and I'll ask it elsewhere.

JohnM





-----
table definitions
-----

-----
db=> \d contacts
                       Table "db.contacts"
            Column            |            Type             | Modifiers
------------------------------+-----------------------------+-----------
 id                           | numeric(38,0)               | not null
 db_id                        | character varying(32)       |
 firstname                    | character varying(64)       |
 lastname                     | character varying(64)       |
 company                      | character varying(128)      |
 email                        | character varying(256)      |
 phone                        | character varying(64)       |
 address                      | character varying(128)      |
 city                         | character varying(128)      |
 state                        | character varying(32)       |
 postalcode                   | character varying(16)       |
 country                      | character varying(2)        | not null
 contact_info_modified        | character(1)                |
 token_id                     | numeric(38,0)               |
 status_id                    | numeric(38,0)               |
 status_last_modified         | timestamp without time zone |
 notes                        | character varying(2000)     |
 demo_schedule                | timestamp without time zone |
 partner_id                   | numeric(38,0)               |
 prev_partner_id              | numeric(38,0)               |
 prev_prev_partner_id         | numeric(38,0)               |
 site_last_visited            | timestamp without time zone |
 source_id                    | numeric(4,0)                |
 demo_requested               | timestamp without time zone |
 sourcebook_requested         | timestamp without time zone |
 zip                          | numeric(8,0)                |
 suffix                       | numeric(8,0)                |
 feedback_request_sent        | timestamp without time zone |
 products_sold                | character varying(512)      |
 other_brand                  | character varying(512)      |
 printsample_requested        | timestamp without time zone |
 indoor_media_sample          | timestamp without time zone |
 outdoor_media_sample         | timestamp without time zone |
 printers_owned               | character varying(256)      |
 business_type                | character varying(256)      |
 printers_owned2              | character varying(256)      |
 contact_quality_id           | numeric(38,0)               |
 est_annual_value             | numeric(38,2)               |
 likelyhood_of_closing        | numeric(38,0)               |
 priority                     | numeric(38,0)               |
 business_type_id             | numeric(38,0)               |
 lead_last_modified           | timestamp without time zone |
 lead_value                   | numeric(38,2)               |
 channel_contact_flag         | character(1)                |
 request_status_last_modified | timestamp without time zone |
 master_key_number            | numeric(38,0)               |
 master_key_token             | character varying(32)       |
 current_media_cust           | character(1)                |
 kodak_media_id               | numeric(38,0)               |
 printer_sample_id            | numeric(38,0)               |
 quantity_used_id             | numeric(38,0)               |
 rip_used_id                  | numeric(38,0)               |
 language_code                | character varying(3)        |
 region_id                    | numeric(38,0)               | not null
 lead_deleted                 | timestamp without time zone |
 last_request_set_status_id   | numeric(38,0)               |
 address2                     | character varying(128)      |
 media_usage_id               | numeric(38,0)               |
Indexes:
    "contacts_pkey" primary key, btree (id)
    "contacts_partner_id_idx" btree (partner_id)
    "contacts_partner_id_null_idx" btree (partner_id) WHERE (partner_id IS NULL)
    "contacts_token_id_idx" btree (token_id)
Check constraints:
    "sys_c0050644" CHECK (country IS NOT NULL)
    "sys_c0050643" CHECK (id IS NOT NULL)
    "sys_c0050645" CHECK (region_id IS NOT NULL)
Triggers:
    insert_master_key BEFORE INSERT ON contacts FOR EACH ROW EXECUTE PROCEDURE pg_fct_insert_master_key()
-----

-----
db=> \d lead_requests
                 Table "db.lead_requests"
        Column         |            Type             | Modifiers
-----------------------+-----------------------------+-----------
 id                    | numeric(38,0)               | not null
 contact_id            | numeric(38,0)               | not null
 request_id            | numeric(38,0)               | not null
 date_requested        | timestamp without time zone | not null
 must_update_by        | timestamp without time zone |
 date_satisfied        | timestamp without time zone |
 status_id             | numeric(38,0)               |
 request_scheduled     | timestamp without time zone |
 session_log_id        | numeric(38,0)               |
 notes                 | character varying(2000)     |
 status_last_modified  | timestamp without time zone |
 reminder_last_sent    | timestamp without time zone |
 data                  | character varying(2000)     |
 fulfillment_status_id | numeric(38,0)               |
Indexes:
    "lead_requests_pkey" primary key, btree (id)
    "lead_requests_contact_id_idx" btree (contact_id)
    "lead_requests_request_id_idx" btree (request_id)
Check constraints:
    "sys_c0049877" CHECK (request_id IS NOT NULL)
    "sys_c0049876" CHECK (contact_id IS NOT NULL)
    "sys_c0049878" CHECK (date_requested IS NOT NULL)
-----


--
John Mendenhall
john@surfutopia.net
surf utopia
internet services

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: start time very high
Next
From: "Joshua D. Drake"
Date:
Subject: Re: ported application having performance issues