Thread: Query tuning help
Hi all ;
I'm trying to tune a difficult query.
I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)
Here's the query:
SELECT
a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
a.status, a.customer_location_id, a.added_date,
o.agent_id, p.order_location_id_id,
COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id
FROM
cust_acct a JOIN
cust_orders o
ON a.order_id = p.order_id;
I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB
I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches. Here's the explain plan:
Hash Join (cost=151.05..684860.30 rows=9783130 width=100)
Hash Cond: (a.order_id = o.order_id)
-> Seq Scan on cust_acct a (cost=0.00..537962.30 rows=9783130 width=92)
-> Hash (cost=122.69..122.69 rows=2269 width=12)
-> Seq Scan on cust_orders o (cost=0.00..122.69 rows=2269 width=12)
Thanks in advance for any help, tips, etc...
I'm trying to tune a difficult query.
I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)
Here's the query:
SELECT
a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
a.status, a.customer_location_id, a.added_date,
o.agent_id, p.order_location_id_id,
COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id
FROM
cust_acct a JOIN
cust_orders o
ON a.order_id = p.order_id;
I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB
I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches. Here's the explain plan:
Hash Join (cost=151.05..684860.30 rows=9783130 width=100)
Hash Cond: (a.order_id = o.order_id)
-> Seq Scan on cust_acct a (cost=0.00..537962.30 rows=9783130 width=92)
-> Hash (cost=122.69..122.69 rows=2269 width=12)
-> Seq Scan on cust_orders o (cost=0.00..122.69 rows=2269 width=12)
Thanks in advance for any help, tips, etc...
-- --------------------------------------------- Kevin Kempter - Constent State A PostgreSQL Professional Services Company www.consistentstate.com ---------------------------------------------
Hello please, send EXPLAIN ANALYZE output instead. Regards Pavel Stehule 2011/10/11 CS DBA <cs_dba@consistentstate.com>: > Hi all ; > > I'm trying to tune a difficult query. > > I have 2 tables: > cust_acct (9million rows) > cust_orders (200,000 rows) > > Here's the query: > > SELECT > a.account_id, a.customer_id, a.order_id, a.primary_contact_id, > a.status, a.customer_location_id, a.added_date, > o.agent_id, p.order_location_id_id, > COALESCE(a.customer_location_id, p.order_location_id) AS > order_location_id > FROM > cust_acct a JOIN > cust_orders o > ON a.order_id = p.order_id; > > I can't get it to run much faster that about 13 seconds, in most cases it's > more like 30 seconds. > We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB > > > I've tried separating the queries as filtering queries & joining the > results, disabling seq scans, upping work_mem and half a dozen other > approaches. Here's the explain plan: > > Hash Join (cost=151.05..684860.30 rows=9783130 width=100) > Hash Cond: (a.order_id = o.order_id) > -> Seq Scan on cust_acct a (cost=0.00..537962.30 rows=9783130 width=92) > -> Hash (cost=122.69..122.69 rows=2269 width=12) > -> Seq Scan on cust_orders o (cost=0.00..122.69 rows=2269 > width=12) > > Thanks in advance for any help, tips, etc... > > > > > > > > > > > > > -- > --------------------------------------------- > Kevin Kempter - Constent State > A PostgreSQL Professional Services Company > www.consistentstate.com > ---------------------------------------------
On 11 October 2011 19:52, CS DBA <cs_dba@consistentstate.com> wrote:
Hi all ;
I'm trying to tune a difficult query.
I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)
Here's the query:
SELECT
a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
a.status, a.customer_location_id, a.added_date,
o.agent_id, p.order_location_id_id,
COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id
FROM
cust_acct a JOIN
cust_orders o
ON a.order_id = p.order_id;
I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB
I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches. Here's the explain plan:
Hash Join (cost=151.05..684860.30 rows=9783130 width=100)
Hash Cond: (a.order_id = o.order_id)
-> Seq Scan on cust_acct a (cost=0.00..537962.30 rows=9783130 width=92)
-> Hash (cost=122.69..122.69 rows=2269 width=12)
-> Seq Scan on cust_orders o (cost=0.00..122.69 rows=2269 width=12)
Thanks in advance for any help, tips, etc...
Hi,
two simple questions:
- do you really need getting all 9M rows?
- show us the table structure, together with index definitions
regards
Szymon
On 10/11/2011 12:02 PM, Pavel Stehule wrote:
Explain Analyze:
Hash Join (cost=154.46..691776.11 rows=10059626 width=100) (actual time=5.191..37551.360 rows=10063432 loops=1)
Hash Cond: (a.order_id = o.order_id)
-> Seq Scan on cust_acct a (cost=0.00..540727.26 rows=10059626 width=92) (actual time=0.022..18987.095 rows=10063432 loops=1)
-> Hash (cost=124.76..124.76 rows=2376 width=12) (actual time=5.135..5.135 rows=2534 loops=1)
-> Seq Scan on cust_orders o (cost=0.00..124.76 rows=2376 width=12) (actual time=0.011..2.843 rows=2534 loops=1)
Total runtime: 43639.105 ms
(6 rows)
Hello please, send EXPLAIN ANALYZE output instead. Regards Pavel Stehule 2011/10/11 CS DBA <cs_dba@consistentstate.com>:Hi all ; I'm trying to tune a difficult query. I have 2 tables: cust_acct (9million rows) cust_orders (200,000 rows) Here's the query: SELECT a.account_id, a.customer_id, a.order_id, a.primary_contact_id, a.status, a.customer_location_id, a.added_date, o.agent_id, p.order_location_id_id, COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id FROM cust_acct a JOIN cust_orders o ON a.order_id = p.order_id; I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds. We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches. Here's the explain plan: Hash Join (cost=151.05..684860.30 rows=9783130 width=100) Hash Cond: (a.order_id = o.order_id) -> Seq Scan on cust_acct a (cost=0.00..537962.30 rows=9783130 width=92) -> Hash (cost=122.69..122.69 rows=2269 width=12) -> Seq Scan on cust_orders o (cost=0.00..122.69 rows=2269 width=12) Thanks in advance for any help, tips, etc...
Explain Analyze:
Hash Join (cost=154.46..691776.11 rows=10059626 width=100) (actual time=5.191..37551.360 rows=10063432 loops=1)
Hash Cond: (a.order_id = o.order_id)
-> Seq Scan on cust_acct a (cost=0.00..540727.26 rows=10059626 width=92) (actual time=0.022..18987.095 rows=10063432 loops=1)
-> Hash (cost=124.76..124.76 rows=2376 width=12) (actual time=5.135..5.135 rows=2534 loops=1)
-> Seq Scan on cust_orders o (cost=0.00..124.76 rows=2376 width=12) (actual time=0.011..2.843 rows=2534 loops=1)
Total runtime: 43639.105 ms
(6 rows)
-- --------------------------------------------- Kevin Kempter - Constent State A PostgreSQL Professional Services Company www.consistentstate.com ---------------------------------------------
-- --------------------------------------------- Kevin Kempter - Constent State A PostgreSQL Professional Services Company www.consistentstate.com ---------------------------------------------
On 10/11/2011 12:03 PM, Szymon Guz wrote:
cust_acct table
Column | Type | Modifiers
-----------------------+-----------------------------+-------------------------------------------------------
account_id | bigint | not null default nextval('cust_account_id_seq'::regclass)
customer_id | character varying(10) |
order_id | integer | not null
primary_contact_id | bigint |
status | accounts_status_type | not null
customer_location_id | integer |
added_date | timestamp with time zone | not null
Indexes:
"cust_acct_pkey" PRIMARY KEY, btree (account_id)
"cust_acct_cust_id_indx" btree (customer_id)
"cust_acct_order_id_id_indx" btree (order_id)
"cust_acct_pri_contact_id_indx" btree (primary_contact_id)
cust_orders table
Column | Type | Modifiers
-----------------------------+-----------------------------+-------------------------------------------------------
order_id | integer | not null default nextval('order_id_seq'::regclass)
backorder_tag_id | character varying(18) |
order_location_id | integer | not null
work_order_name | character varying(75) | not null
status | programs_status_type | not null
additional_info_tag_shipper | character(16) | not null
additional_info_tag_cust | character(16) | not null
additional_info_tag_manuf | character(16) | not null
additional_info_tag_supply | character(16) | not null
acct_active_dt | timestamp without time zone |
acct_last_activity_date | timestamp without time zone |
acct_backorder_items | boolean | not null default false
custom_info1 | text |
custom_info2 | text |
custom_info3 | text |
custom_info4 | text |
custom_info5 | text |
custom_info6 | text |
custom_info7 | text |
Indexes:
"cust_orders_pkey" PRIMARY KEY, btree (order_id)
"cust_orders_order_id_loc_id_key" UNIQUE, btree (order_id, order_location_id)
"cust_orders_loc_id_idx" btree (order_location_id)
unfortunately yesOn 11 October 2011 19:52, CS DBA <cs_dba@consistentstate.com> wrote:Hi all ;
I'm trying to tune a difficult query.
I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)
Here's the query:
SELECT
a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
a.status, a.customer_location_id, a.added_date,
o.agent_id, p.order_location_id_id,
COALESCE(a.customer_location_id, p.order_location_id) AS order_location_id
FROM
cust_acct a JOIN
cust_orders o
ON a.order_id = p.order_id;
I can't get it to run much faster that about 13 seconds, in most cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB
I've tried separating the queries as filtering queries & joining the results, disabling seq scans, upping work_mem and half a dozen other approaches. Here's the explain plan:
Hash Join (cost=151.05..684860.30 rows=9783130 width=100)
Hash Cond: (a.order_id = o.order_id)
-> Seq Scan on cust_acct a (cost=0.00..537962.30 rows=9783130 width=92)
-> Hash (cost=122.69..122.69 rows=2269 width=12)
-> Seq Scan on cust_orders o (cost=0.00..122.69 rows=2269 width=12)
Thanks in advance for any help, tips, etc...Hi,two simple questions:- do you really need getting all 9M rows?
- show us the table structure, together with index definitions
cust_acct table
Column | Type | Modifiers
-----------------------+-----------------------------+-------------------------------------------------------
account_id | bigint | not null default nextval('cust_account_id_seq'::regclass)
customer_id | character varying(10) |
order_id | integer | not null
primary_contact_id | bigint |
status | accounts_status_type | not null
customer_location_id | integer |
added_date | timestamp with time zone | not null
Indexes:
"cust_acct_pkey" PRIMARY KEY, btree (account_id)
"cust_acct_cust_id_indx" btree (customer_id)
"cust_acct_order_id_id_indx" btree (order_id)
"cust_acct_pri_contact_id_indx" btree (primary_contact_id)
cust_orders table
Column | Type | Modifiers
-----------------------------+-----------------------------+-------------------------------------------------------
order_id | integer | not null default nextval('order_id_seq'::regclass)
backorder_tag_id | character varying(18) |
order_location_id | integer | not null
work_order_name | character varying(75) | not null
status | programs_status_type | not null
additional_info_tag_shipper | character(16) | not null
additional_info_tag_cust | character(16) | not null
additional_info_tag_manuf | character(16) | not null
additional_info_tag_supply | character(16) | not null
acct_active_dt | timestamp without time zone |
acct_last_activity_date | timestamp without time zone |
acct_backorder_items | boolean | not null default false
custom_info1 | text |
custom_info2 | text |
custom_info3 | text |
custom_info4 | text |
custom_info5 | text |
custom_info6 | text |
custom_info7 | text |
Indexes:
"cust_orders_pkey" PRIMARY KEY, btree (order_id)
"cust_orders_order_id_loc_id_key" UNIQUE, btree (order_id, order_location_id)
"cust_orders_loc_id_idx" btree (order_location_id)
regardsSzymon
-- --------------------------------------------- Kevin Kempter - Constent State A PostgreSQL Professional Services Company www.consistentstate.com ---------------------------------------------
> > > Hash Join (cost=154.46..691776.11 rows=10059626 width=100) (actual > time=5.191..37551.360 rows=10063432 loops=1) > Hash Cond: (a.order_id = > o.order_id) > -> Seq Scan on cust_acct a (cost=0.00..540727.26 rows=10059626 > width=92) (actual time=0.022..18987.095 rows=10063432 > loops=1) > -> Hash (cost=124.76..124.76 rows=2376 width=12) (actual > time=5.135..5.135 rows=2534 > loops=1) > -> Seq Scan on cust_orders o (cost=0.00..124.76 rows=2376 > width=12) (actual time=0.011..2.843 rows=2534 loops=1) > Total runtime: 43639.105 > ms > (6 rows) > I am thinking so this time is adequate - processing of 10 mil rows result must be slow a tips: * recheck a seq. read speed - if this is about expected values * play with work_mem - probably is not enough for one bucket - you can decrease time about 10-20 sec, but attention to going to swap - EXPLAIN ANALYZE VERBOSE show a number of buckets - ideal is one. * use a some filter if it's possible * use a limit if it's possible if you really should to process all rows and you need better reaction time, try to use a cursor. It is optimized for fast first row Regards Pavel Stehule > > > > > > > > > > -- > --------------------------------------------- > Kevin Kempter - Constent State > A PostgreSQL Professional Services Company > www.consistentstate.com > --------------------------------------------- > > > -- > --------------------------------------------- > Kevin Kempter - Constent State > A PostgreSQL Professional Services Company > www.consistentstate.com > ---------------------------------------------