Thread: Query tuning help

Query tuning help

From

CS DBA

Date:

11 October 2011, 14:59:25

Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
    a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
    a.status, a.customer_location_id, a.added_date,
    o.agent_id, p.order_location_id_id,
    COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id
FROM
    cust_acct a JOIN
    cust_orders o
        ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB

I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches. Here's the explain plan:

Hash Join (cost=151.05..684860.30 rows=9783130 width=100)
   Hash Cond: (a.order_id = o.order_id)
   -> Seq Scan on cust_acct a (cost=0.00..537962.30 rows=9783130 width=92)
   -> Hash (cost=122.69..122.69 rows=2269 width=12)
         -> Seq Scan on cust_orders o (cost=0.00..122.69 rows=2269 width=12)

Thanks in advance for any help, tips, etc...

-- 
---------------------------------------------
Kevin Kempter       -       Constent State 
A PostgreSQL Professional Services Company         www.consistentstate.com
---------------------------------------------

Re: Query tuning help

From

Pavel Stehule

Date:

11 October 2011, 15:03:10

Hello

please, send EXPLAIN ANALYZE output instead.

Regards

Pavel Stehule

2011/10/11 CS DBA <cs_dba@consistentstate.com>:
> Hi all ;
>
> I'm trying to tune a difficult query.
>
> I have 2 tables:
> cust_acct (9million rows)
> cust_orders (200,000 rows)
>
> Here's the query:
>
> SELECT
>     a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
>     a.status,  a.customer_location_id, a.added_date,
>     o.agent_id, p.order_location_id_id,
>     COALESCE(a.customer_location_id, p.order_location_id) AS
> order_location_id
> FROM
>     cust_acct a JOIN
>     cust_orders o
>         ON a.order_id = p.order_id;
>
> I can't get it to run much faster that about 13 seconds, in most cases it's
> more like 30 seconds.
> We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB
>
>
> I've tried separating the queries as filtering queries & joining the
> results, disabling seq scans, upping work_mem and half a dozen other
> approaches.  Here's the explain plan:
>
>  Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
>    Hash Cond: (a.order_id = o.order_id)
>    ->  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
>    ->  Hash  (cost=122.69..122.69 rows=2269 width=12)
>          ->  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269
> width=12)
>
> Thanks in advance for any help, tips, etc...
>
>
>
>
>
>
>
>
>
>
>
>
> --
> ---------------------------------------------
> Kevin Kempter       -       Constent State
> A PostgreSQL Professional Services Company
>           www.consistentstate.com
> ---------------------------------------------

Re: Query tuning help

From

Szymon Guz

Date:

11 October 2011, 15:03:54

On 11 October 2011 19:52, CS DBA <cs_dba@consistentstate.com> wrote:

Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
    a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
    a.status, a.customer_location_id, a.added_date,
    o.agent_id, p.order_location_id_id,
    COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id
FROM
    cust_acct a JOIN
    cust_orders o
        ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB

I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches. Here's the explain plan:

Hash Join (cost=151.05..684860.30 rows=9783130 width=100)
   Hash Cond: (a.order_id = o.order_id)
   -> Seq Scan on cust_acct a (cost=0.00..537962.30 rows=9783130 width=92)
   -> Hash (cost=122.69..122.69 rows=2269 width=12)
         -> Seq Scan on cust_orders o (cost=0.00..122.69 rows=2269 width=12)

Thanks in advance for any help, tips, etc...

Hi,

two simple questions:

- do you really need getting all 9M rows?

- show us the table structure, together with index definitions

regards

Szymon

Re: Query tuning help

From

CS DBA

Date:

11 October 2011, 15:14:36

On 10/11/2011 12:02 PM, Pavel Stehule wrote:

Hello

please, send EXPLAIN ANALYZE output instead.

Regards

Pavel Stehule

2011/10/11 CS DBA <cs_dba@consistentstate.com>:

Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
    a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
    a.status,  a.customer_location_id, a.added_date,
    o.agent_id, p.order_location_id_id,
    COALESCE(a.customer_location_id, p.order_location_id) AS
order_location_id
FROM
    cust_acct a JOIN
    cust_orders o
        ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most cases it's
more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB


I've tried separating the queries as filtering queries & joining the
results, disabling seq scans, upping work_mem and half a dozen other
approaches.  Here's the explain plan:

 Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
   Hash Cond: (a.order_id = o.order_id)
   ->  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
   ->  Hash  (cost=122.69..122.69 rows=2269 width=12)
         ->  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269
width=12)

Thanks in advance for any help, tips, etc...

Explain Analyze:

Hash Join (cost=154.46..691776.11 rows=10059626 width=100) (actual time=5.191..37551.360 rows=10063432 loops=1)
   Hash Cond: (a.order_id = o.order_id)
   -> Seq Scan on cust_acct a (cost=0.00..540727.26 rows=10059626 width=92) (actual time=0.022..18987.095 rows=10063432 loops=1)
   -> Hash (cost=124.76..124.76 rows=2376 width=12) (actual time=5.135..5.135 rows=2534 loops=1)
         -> Seq Scan on cust_orders o (cost=0.00..124.76 rows=2376 width=12) (actual time=0.011..2.843 rows=2534 loops=1)
Total runtime: 43639.105 ms
(6 rows)







--
---------------------------------------------
Kevin Kempter       -       Constent State
A PostgreSQL Professional Services Company         www.consistentstate.com
---------------------------------------------

-- 
---------------------------------------------
Kevin Kempter       -       Constent State 
A PostgreSQL Professional Services Company         www.consistentstate.com
---------------------------------------------

Re: Query tuning help

From

CS DBA

Date:

11 October 2011, 15:37:15

On 10/11/2011 12:03 PM, Szymon Guz wrote:

On 11 October 2011 19:52, CS DBA <cs_dba@consistentstate.com> wrote:
Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
    a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
    a.status, a.customer_location_id, a.added_date,
    o.agent_id, p.order_location_id_id,
    COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id
FROM
    cust_acct a JOIN
    cust_orders o
        ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB

I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches. Here's the explain plan:

Hash Join (cost=151.05..684860.30 rows=9783130 width=100)
   Hash Cond: (a.order_id = o.order_id)
   -> Seq Scan on cust_acct a (cost=0.00..537962.30 rows=9783130 width=92)
   -> Hash (cost=122.69..122.69 rows=2269 width=12)
         -> Seq Scan on cust_orders o (cost=0.00..122.69 rows=2269 width=12)

Thanks in advance for any help, tips, etc...

Hi,
two simple questions:

- do you really need getting all 9M rows?

unfortunately yes

- show us the table structure, together with index definitions

regards
Szymon

-- 
---------------------------------------------
Kevin Kempter       -       Constent State 
A PostgreSQL Professional Services Company         www.consistentstate.com
---------------------------------------------

Re: Query tuning help

From

Pavel Stehule

Date:

11 October 2011, 16:00:38

>
>
>  Hash Join  (cost=154.46..691776.11 rows=10059626 width=100) (actual
> time=5.191..37551.360 rows=10063432 loops=1)
>    Hash Cond: (a.order_id =
> o.order_id)
>    ->  Seq Scan on cust_acct a  (cost=0.00..540727.26 rows=10059626
> width=92) (actual time=0.022..18987.095 rows=10063432
> loops=1)
>    ->  Hash  (cost=124.76..124.76 rows=2376 width=12) (actual
> time=5.135..5.135 rows=2534
> loops=1)
>          ->  Seq Scan on cust_orders o  (cost=0.00..124.76 rows=2376
> width=12) (actual time=0.011..2.843 rows=2534 loops=1)
>  Total runtime: 43639.105
> ms
> (6 rows)
>

I am thinking so this time is adequate - processing of 10 mil rows
result must be slow

a tips:

* recheck a seq. read speed - if this is about expected values

* play with work_mem - probably is not enough for one bucket - you can
decrease time about 10-20 sec, but attention to going to swap -
EXPLAIN ANALYZE VERBOSE show a number of buckets - ideal is one.

* use a some filter if it's possible
* use a limit if it's possible

if you really should to process all rows and you need better reaction
time, try to use a cursor. It is optimized for fast first row

Regards

Pavel Stehule

>
>
>
>
>
>
>
>
>
> --
> ---------------------------------------------
> Kevin Kempter       -       Constent State
> A PostgreSQL Professional Services Company
>           www.consistentstate.com
> ---------------------------------------------
>
>
> --
> ---------------------------------------------
> Kevin Kempter       -       Constent State
> A PostgreSQL Professional Services Company
>           www.consistentstate.com
> ---------------------------------------------