Thread: Query tuning help

Query tuning help

From
CS DBA
Date:
Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
    a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
    a.status,  a.customer_location_id, a.added_date,
    o.agent_id, p.order_location_id_id,
    COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id
FROM
    cust_acct a JOIN
    cust_orders o
        ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB


I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches.  Here's the explain plan:

 Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
   Hash Cond: (a.order_id = o.order_id)
   ->  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
   ->  Hash  (cost=122.69..122.69 rows=2269 width=12)
         ->  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269 width=12)


Thanks in advance for any help, tips, etc...












-- 
---------------------------------------------
Kevin Kempter       -       Constent State 
A PostgreSQL Professional Services Company         www.consistentstate.com
---------------------------------------------

Re: Query tuning help

From
Pavel Stehule
Date:
Hello

please, send EXPLAIN ANALYZE output instead.

Regards

Pavel Stehule

2011/10/11 CS DBA <cs_dba@consistentstate.com>:
> Hi all ;
>
> I'm trying to tune a difficult query.
>
> I have 2 tables:
> cust_acct (9million rows)
> cust_orders (200,000 rows)
>
> Here's the query:
>
> SELECT
>     a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
>     a.status,  a.customer_location_id, a.added_date,
>     o.agent_id, p.order_location_id_id,
>     COALESCE(a.customer_location_id, p.order_location_id) AS
> order_location_id
> FROM
>     cust_acct a JOIN
>     cust_orders o
>         ON a.order_id = p.order_id;
>
> I can't get it to run much faster that about 13 seconds, in most cases it's
> more like 30 seconds.
> We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB
>
>
> I've tried separating the queries as filtering queries & joining the
> results, disabling seq scans, upping work_mem and half a dozen other
> approaches.  Here's the explain plan:
>
>  Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
>    Hash Cond: (a.order_id = o.order_id)
>    ->  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
>    ->  Hash  (cost=122.69..122.69 rows=2269 width=12)
>          ->  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269
> width=12)
>
> Thanks in advance for any help, tips, etc...
>
>
>
>
>
>
>
>
>
>
>
>
> --
> ---------------------------------------------
> Kevin Kempter       -       Constent State
> A PostgreSQL Professional Services Company
>           www.consistentstate.com
> ---------------------------------------------

Re: Query tuning help

From
Szymon Guz
Date:


On 11 October 2011 19:52, CS DBA <cs_dba@consistentstate.com> wrote:
Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
    a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
    a.status,  a.customer_location_id, a.added_date,
    o.agent_id, p.order_location_id_id,
    COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id
FROM
    cust_acct a JOIN
    cust_orders o
        ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB


I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches.  Here's the explain plan:

 Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
   Hash Cond: (a.order_id = o.order_id)
   ->  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
   ->  Hash  (cost=122.69..122.69 rows=2269 width=12)
         ->  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269 width=12)


Thanks in advance for any help, tips, etc...





Hi,
two simple questions:

- do you really need getting all 9M rows?
- show us the table structure, together with index definitions

regards
Szymon


Re: Query tuning help

From
CS DBA
Date:
On 10/11/2011 12:02 PM, Pavel Stehule wrote:
Hello

please, send EXPLAIN ANALYZE output instead.

Regards

Pavel Stehule

2011/10/11 CS DBA <cs_dba@consistentstate.com>:
Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
    a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
    a.status,  a.customer_location_id, a.added_date,
    o.agent_id, p.order_location_id_id,
    COALESCE(a.customer_location_id, p.order_location_id) AS
order_location_id
FROM
    cust_acct a JOIN
    cust_orders o
        ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most cases it's
more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB


I've tried separating the queries as filtering queries & joining the
results, disabling seq scans, upping work_mem and half a dozen other
approaches.  Here's the explain plan:

 Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
   Hash Cond: (a.order_id = o.order_id)
   ->  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
   ->  Hash  (cost=122.69..122.69 rows=2269 width=12)
         ->  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269
width=12)

Thanks in advance for any help, tips, etc...






Explain Analyze:


 Hash Join  (cost=154.46..691776.11 rows=10059626 width=100) (actual time=5.191..37551.360 rows=10063432 loops=1)
   Hash Cond: (a.order_id = o.order_id)                                                                                                                  
   ->  Seq Scan on cust_acct a  (cost=0.00..540727.26 rows=10059626 width=92) (actual time=0.022..18987.095 rows=10063432 loops=1)                            
   ->  Hash  (cost=124.76..124.76 rows=2376 width=12) (actual time=5.135..5.135 rows=2534 loops=1)                                                           
         ->  Seq Scan on cust_orders o  (cost=0.00..124.76 rows=2376 width=12) (actual time=0.011..2.843 rows=2534 loops=1)
 Total runtime: 43639.105 ms                                                                                                                                 
(6 rows)














--
---------------------------------------------
Kevin Kempter       -       Constent State
A PostgreSQL Professional Services Company         www.consistentstate.com
---------------------------------------------


-- 
---------------------------------------------
Kevin Kempter       -       Constent State 
A PostgreSQL Professional Services Company         www.consistentstate.com
---------------------------------------------

Re: Query tuning help

From
CS DBA
Date:
On 10/11/2011 12:03 PM, Szymon Guz wrote:


On 11 October 2011 19:52, CS DBA <cs_dba@consistentstate.com> wrote:
Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
    a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
    a.status,  a.customer_location_id, a.added_date,
    o.agent_id, p.order_location_id_id,
    COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id
FROM
    cust_acct a JOIN
    cust_orders o
        ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB


I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches.  Here's the explain plan:

 Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
   Hash Cond: (a.order_id = o.order_id)
   ->  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
   ->  Hash  (cost=122.69..122.69 rows=2269 width=12)
         ->  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269 width=12)


Thanks in advance for any help, tips, etc...





Hi,
two simple questions:

- do you really need getting all 9M rows?
unfortunately yes


- show us the table structure, together with index definitions



cust_acct table

        Column         |            Type             |                       Modifiers                      
-----------------------+-----------------------------+-------------------------------------------------------
 account_id            | bigint                      | not null default nextval('cust_account_id_seq'::regclass)
 customer_id           | character varying(10)       |
 order_id              | integer                     | not null
 primary_contact_id    | bigint                      |
 status                | accounts_status_type        | not null
 customer_location_id  | integer                     |
 added_date            | timestamp with time zone    | not null
Indexes:
    "cust_acct_pkey" PRIMARY KEY, btree (account_id)
    "cust_acct_cust_id_indx" btree (customer_id)
    "cust_acct_order_id_id_indx" btree (order_id)
    "cust_acct_pri_contact_id_indx" btree (primary_contact_id)





cust_orders table


           Column            |            Type             |                       Modifiers                                                                 
-----------------------------+-----------------------------+-------------------------------------------------------                                          
 order_id                    | integer                     | not null default nextval('order_id_seq'::regclass)                                           
 backorder_tag_id            | character varying(18)       |                                                                                                 
 order_location_id           | integer                     | not null                                                                                        
 work_order_name             | character varying(75)       | not null                                                                                        
 status                      | programs_status_type        | not null                                 
 additional_info_tag_shipper | character(16)               | not null                                    
 additional_info_tag_cust    | character(16)               | not null                                            
 additional_info_tag_manuf   | character(16)               | not null                                            
 additional_info_tag_supply  | character(16)               | not null                                       
 acct_active_dt              | timestamp without time zone |                                                                                                 
 acct_last_activity_date     | timestamp without time zone |                                                                                                 
 acct_backorder_items        | boolean                     | not null default false                                                                          
 custom_info1                | text                        |                                                                                                 
 custom_info2                | text                        |                                                                                                 
 custom_info3                | text                        |                                                                                                 
 custom_info4                | text                        |                                                                                                 
 custom_info5                | text                        |                                                                                                 
 custom_info6                | text                        |                 
 custom_info7                | text                        |                                                                                                   
Indexes:                                                                                                                                                     
    "cust_orders_pkey" PRIMARY KEY, btree (order_id)                                                                                                          
    "cust_orders_order_id_loc_id_key" UNIQUE, btree (order_id, order_location_id)                                                                                                                                                                                                          
    "cust_orders_loc_id_idx" btree (order_location_id)













regards
Szymon




-- 
---------------------------------------------
Kevin Kempter       -       Constent State 
A PostgreSQL Professional Services Company         www.consistentstate.com
---------------------------------------------

Re: Query tuning help

From
Pavel Stehule
Date:
>
>
>  Hash Join  (cost=154.46..691776.11 rows=10059626 width=100) (actual
> time=5.191..37551.360 rows=10063432 loops=1)
>    Hash Cond: (a.order_id =
> o.order_id)
>    ->  Seq Scan on cust_acct a  (cost=0.00..540727.26 rows=10059626
> width=92) (actual time=0.022..18987.095 rows=10063432
> loops=1)
>    ->  Hash  (cost=124.76..124.76 rows=2376 width=12) (actual
> time=5.135..5.135 rows=2534
> loops=1)
>          ->  Seq Scan on cust_orders o  (cost=0.00..124.76 rows=2376
> width=12) (actual time=0.011..2.843 rows=2534 loops=1)
>  Total runtime: 43639.105
> ms
> (6 rows)
>

I am thinking so this time is adequate - processing of 10 mil rows
result must be slow

a tips:

* recheck a seq. read speed - if this is about expected values

* play with work_mem - probably is not enough for one bucket - you can
decrease time about 10-20 sec, but attention to going to swap -
EXPLAIN ANALYZE VERBOSE show a number of buckets - ideal is one.

* use a some filter if it's possible
* use a limit if it's possible

if you really should to process all rows and you need better reaction
time, try to use a cursor. It is optimized for fast first row

Regards

Pavel Stehule

>
>
>
>
>
>
>
>
>
> --
> ---------------------------------------------
> Kevin Kempter       -       Constent State
> A PostgreSQL Professional Services Company
>           www.consistentstate.com
> ---------------------------------------------
>
>
> --
> ---------------------------------------------
> Kevin Kempter       -       Constent State
> A PostgreSQL Professional Services Company
>           www.consistentstate.com
> ---------------------------------------------