Thread: CPU Intensive query
I have an interesting problem. I have the following query that ran ok on Monday and Tuesday and it has been running ok since I have been at this job. I have seen it to be IO intensive, but since Wednesday it has become CPU intensive. Database wise fresh data has been put into the tables, vacuumed & analyzed, no other parameter has been modified.
Wednesday it ran over 24 hours and it did not finish and all this time it pegged a CPU between 95-99%. Yesterday the same story. I do not understand what could have caused it to behave like this suddenly. I am hoping somebody can point me to do research in the right direction.
The query is as follows and it's explain plan is also attached:
set enable_nestloop = off;
INSERT INTO linkshare.macys_ls_daily_shipped
SELECT
ddw.intr_xref,
cdm.cdm_get_linkshare_id_safe(ddw.intr_xref, 10),
to_char(cdm.cdm_utc_convert(cdm.cdm_get_linkshare_timestamp(ddw.intr_xref, 10), -5), 'YYYY-MM-DD/HH24:MI:SS'),
to_char(cdm.cdm_utc_convert(to_char(sales.order_date, 'YYYY-MM-DD HH24:MI:SS')::timestamp without time zone, -5), 'YYYY-MM-DD/HH24:MI:SS') ,
ddw.item_upc,
sum(abs(ddw.itm_qty)),
sum((ddw.tran_itm_total * 100::numeric)::integer),
'USD', '', '', '',
ddw.item_desc
FROM
cdm.cdm_ddw_tran_item_grouped ddw
JOIN
cdm.cdm_sitesales sales ON ddw.intr_xref::text = sales.order_number::text
WHERE
ddw.cal_date > (CURRENT_DATE - 7) AND ddw.cal_date < CURRENT_DATE
AND
ddw.intr_xref IS NOT NULL
AND trim(cdm.cdm_get_linkshare_id_safe(ddw.intr_xref, 10)) <> ''
AND cdm.cdm_utc_convert(cdm.cdm_get_linkshare_timestamp(ddw.intr_xref, 10), -5)::text::date >= (CURRENT_DATE - 52)
AND sales.order_date >= (CURRENT_DATE - 52)
AND (tran_typ_id = 'S'::bpchar)
AND btrim(item_group::text) <> 'EGC'::text
AND btrim(item_group::text) <> 'VGC'::text
GROUP BY
ddw.intr_xref,
cdm.cdm_get_linkshare_id_safe(ddw.intr_xref, 10),
to_char(cdm.cdm_utc_convert(cdm.cdm_get_linkshare_timestamp(ddw.intr_xref, 10), -5), 'YYYY-MM-DD/HH24:MI:SS'),
to_char(cdm.cdm_utc_convert(to_char(sales.order_date, 'YYYY-MM-DD HH24:MI:SS')::timestamp without time zone, -5), 'YYYY-MM-DD/HH24:MI:SS'),
ddw.item_upc,
8, 9, 10, 11,
ddw.item_desc;
INSERT INTO linkshare.macys_ls_daily_shipped
SELECT
ddw.intr_xref,
cdm.cdm_get_linkshare_id_safe(ddw.intr_xref, 10),
to_char(cdm.cdm_utc_convert(cdm.cdm_get_linkshare_timestamp(ddw.intr_xref, 10), -5), 'YYYY-MM-DD/HH24:MI:SS'),
to_char(cdm.cdm_utc_convert(to_char(sales.order_date, 'YYYY-MM-DD HH24:MI:SS')::timestamp without time zone, -5), 'YYYY-MM-DD/HH24:MI:SS') ,
ddw.item_upc,
sum(abs(ddw.itm_qty)),
sum((ddw.tran_itm_total * 100::numeric)::integer),
'USD', '', '', '',
ddw.item_desc
FROM
cdm.cdm_ddw_tran_item_grouped ddw
JOIN
cdm.cdm_sitesales sales ON ddw.intr_xref::text = sales.order_number::text
WHERE
ddw.cal_date > (CURRENT_DATE - 7) AND ddw.cal_date < CURRENT_DATE
AND
ddw.intr_xref IS NOT NULL
AND trim(cdm.cdm_get_linkshare_id_safe(ddw.intr_xref, 10)) <> ''
AND cdm.cdm_utc_convert(cdm.cdm_get_linkshare_timestamp(ddw.intr_xref, 10), -5)::text::date >= (CURRENT_DATE - 52)
AND sales.order_date >= (CURRENT_DATE - 52)
AND (tran_typ_id = 'S'::bpchar)
AND btrim(item_group::text) <> 'EGC'::text
AND btrim(item_group::text) <> 'VGC'::text
GROUP BY
ddw.intr_xref,
cdm.cdm_get_linkshare_id_safe(ddw.intr_xref, 10),
to_char(cdm.cdm_utc_convert(cdm.cdm_get_linkshare_timestamp(ddw.intr_xref, 10), -5), 'YYYY-MM-DD/HH24:MI:SS'),
to_char(cdm.cdm_utc_convert(to_char(sales.order_date, 'YYYY-MM-DD HH24:MI:SS')::timestamp without time zone, -5), 'YYYY-MM-DD/HH24:MI:SS'),
ddw.item_upc,
8, 9, 10, 11,
ddw.item_desc;
HashAggregate (cost=152555.97..152567.32 rows=267 width=162)
-> Hash Join (cost=139308.18..152547.96 rows=267 width=162)
Hash Cond: (("outer".intr_xref)::text = ("inner".order_number)::text)
-> GroupAggregate (cost=106793.14..109222.13 rows=4319 width=189)
-> Sort (cost=106793.14..106901.09 rows=43182 width=189)
Sort Key: cdm_ddw_tran_item.appl_xref, cdm_ddw_tran_item.intr_xref, cdm_ddw_tran_item.tran_typ_id, cdm_ddw_tran_item.cal_date, cdm_ddw_tran_item.cal_time, cdm_ddw_tran_item.tran_itm_total, cdm_ddw_tran_item.tran_tot_amt, cdm_ddw_tran_item.fill_store_div, cdm_ddw_tran_item.itm_price, cdm_ddw_tran_item.item_id, cdm_ddw_tran_item.item_upc, cdm_ddw_tran_item.item_pid, cdm_ddw_tran_item.item_desc, cdm_ddw_tran_item.nrf_color_name, cdm_ddw_tran_item.nrf_size_name, cdm_ddw_tran_item.dept_id, c
-> Index Scan using cdm_ddw_tranp_item_cal_date on cdm_ddw_tran_item (cost=0.01..103468.52 rows=43182 width=189)
Index Cond: ((cal_date > (('now'::text)::date - 7)) AND (cal_date < ('now'::text)::date))
Filter: ((intr_xref IS NOT NULL) AND (btrim(cdm.cdm_get_linkshare_id_safe(intr_xref, 10)) <> ''::text) AND (((cdm.cdm_utc_convert(cdm.cdm_get_linkshare_timestamp(intr_xref, 10), -5))::text)::date >= (('now'::text)::date - 52)) AND (tran_typ_id = 'S'::bpchar) AND (btrim((item_group)::text) <> 'EGC'::text) AND (btrim((item_group)::text) <> 'VGC'::text))
-> Hash (cost=31409.92..31409.92 rows=442050 width=20)
-> Index Scan using cdm_sitesales_order_date on cdm_sitesales sales (cost=0.00..31409.92 rows=442050 width=20)
Index Cond: (order_date >= (('now'::text)::date - 52))
-> Hash Join (cost=139308.18..152547.96 rows=267 width=162)
Hash Cond: (("outer".intr_xref)::text = ("inner".order_number)::text)
-> GroupAggregate (cost=106793.14..109222.13 rows=4319 width=189)
-> Sort (cost=106793.14..106901.09 rows=43182 width=189)
Sort Key: cdm_ddw_tran_item.appl_xref, cdm_ddw_tran_item.intr_xref, cdm_ddw_tran_item.tran_typ_id, cdm_ddw_tran_item.cal_date, cdm_ddw_tran_item.cal_time, cdm_ddw_tran_item.tran_itm_total, cdm_ddw_tran_item.tran_tot_amt, cdm_ddw_tran_item.fill_store_div, cdm_ddw_tran_item.itm_price, cdm_ddw_tran_item.item_id, cdm_ddw_tran_item.item_upc, cdm_ddw_tran_item.item_pid, cdm_ddw_tran_item.item_desc, cdm_ddw_tran_item.nrf_color_name, cdm_ddw_tran_item.nrf_size_name, cdm_ddw_tran_item.dept_id, c
-> Index Scan using cdm_ddw_tranp_item_cal_date on cdm_ddw_tran_item (cost=0.01..103468.52 rows=43182 width=189)
Index Cond: ((cal_date > (('now'::text)::date - 7)) AND (cal_date < ('now'::text)::date))
Filter: ((intr_xref IS NOT NULL) AND (btrim(cdm.cdm_get_linkshare_id_safe(intr_xref, 10)) <> ''::text) AND (((cdm.cdm_utc_convert(cdm.cdm_get_linkshare_timestamp(intr_xref, 10), -5))::text)::date >= (('now'::text)::date - 52)) AND (tran_typ_id = 'S'::bpchar) AND (btrim((item_group)::text) <> 'EGC'::text) AND (btrim((item_group)::text) <> 'VGC'::text))
-> Hash (cost=31409.92..31409.92 rows=442050 width=20)
-> Index Scan using cdm_sitesales_order_date on cdm_sitesales sales (cost=0.00..31409.92 rows=442050 width=20)
Index Cond: (order_date >= (('now'::text)::date - 52))
Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.
On Fri, May 18, 2007 at 09:02:52AM -0700, Abu Mushayeed wrote: > I have an interesting problem. I have the following query that ran ok on > Monday and Tuesday and it has been running ok since I have been at this > job. I have seen it to be IO intensive, but since Wednesday it has become > CPU intensive. Database wise fresh data has been put into the tables, > vacuumed & analyzed, no other parameter has been modified. What Postgres version is this? > The query is as follows and it's explain plan is also attached: Normally EXPLAIN ANALYZE data would be much better than EXPLAIN, but if the query indeed does not finish, it's not going to help much. > set enable_nestloop = off; What's the rationale for this? > HashAggregate (cost=152555.97..152567.32 rows=267 width=162) 152000 disk page fetches is a bit, but it shouldn't take 24 hours. There's probably misestimation involved at some point here. Does it really return 267 rows, or many more? /* Steinar */ -- Homepage: http://www.sesse.net/
What Postgres version is this?
8.1.3
> set enable_nestloop = off;
What's the rationale for this?
To eliminate nested loop. It does a nested loop betwwen to very large table(millions of rows).
> HashAggregate (cost=152555.97..152567.32 rows=267 width=162)
152000 disk page fetches is a bit, but it shouldn't take 24 hours. There's
probably misestimation involved at some point here. Does it really return 267
rows, or many more?
152000 disk page fetches is a bit, but it shouldn't take 24 hours. There's
probably misestimation involved at some point here. Does it really return 267
rows, or many more?
It returns finally about 19-20 thousand rows.
"Steinar H. Gunderson" <sgunderson@bigfoot.com> wrote:
"Steinar H. Gunderson" <sgunderson@bigfoot.com> wrote:
On Fri, May 18, 2007 at 09:02:52AM -0700, Abu Mushayeed wrote:
> I have an interesting problem. I have the following query that ran ok on
> Monday and Tuesday and it has been running ok since I have been at this
> job. I have seen it to be IO intensive, but since Wednesday it has become
> CPU intensive. Database wise fresh data has been put into the tables,
> vacuumed & analyzed, no other parameter has been modified.
What Postgres version is this?
> The query is as follows and it's explain plan is also attached:
Normally EXPLAIN ANALYZE data would be much better than EXPLAIN, but if the
query indeed does not finish, it's not going to help much.
> set enable_nestloop = off;
What's the rationale for this?
> HashAggregate (cost=152555.97..152567.32 rows=267 width=162)
152000 disk page fetches is a bit, but it shouldn't take 24 hours. There's
probably misestimation involved at some point here. Does it really return 267
rows, or many more?
/* Steinar */
--
Homepage: http://www.sesse.net/
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
8:00? 8:25? 8:40? Find a flick in no time
with theYahoo! Search movie showtime shortcut.
Abu Mushayeed <abumushayeed@yahoo.com> writes: > The query is as follows and it's explain plan is also attached: The query itself seems to be a simple join over not too many rows, so I don't see how it could be taking 24 hours. What I suspect is you're incurring lots and lots of invocations of those user-written functions and one of them has suddenly decided to get very slow. regards, tom lane
> one of them has suddenly decided to get very slow
Is there a way to predict when the system will do this? Also, why would it suddenly go from IO intensive to CPU intensive.
Also, this query ran today and it already finished. Today it was IO intensive.
Please provide me some direction to this problem.
Thanks.
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Abu Mushayeed writes:
> The query is as follows and it's explain plan is also attached:
The query itself seems to be a simple join over not too many rows, so
I don't see how it could be taking 24 hours. What I suspect is you're
incurring lots and lots of invocations of those user-written functions
and one of them has suddenly decided to get very slow.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate
Now that's room service! Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
On Fri, May 18, 2007 at 02:37:27PM -0700, Abu Mushayeed wrote: >>> set enable_nestloop = off; >> What's the rationale for this? > To eliminate nested loop. It does a nested loop betwwen to very large > table(millions of rows). If the planner chooses a nested loop, it is because it believes it is the most efficient solution. I'd turn it back on and try to figure out why the planner was wrong. Note that a nested loop with an index scan on one or both sides can easily be as efficient as anything. Did you ANALYZE your tables recently? If the joins are really between millions of rows and the planner thinks it's a couple thousands, the stats sound rather off... >>> HashAggregate (cost=152555.97..152567.32 rows=267 width=162) >> 152000 disk page fetches is a bit, but it shouldn't take 24 hours. There's >> probably misestimation involved at some point here. Does it really return 267 >> rows, or many more? > It returns finally about 19-20 thousand rows. So the planner is off by a factor of at least a hundred. That's a good first-level explanation for why it's slow, at least... If you can, please provide EXPLAIN ANALYZE output for your query (after running ANALYZE on all your tables, if you haven't already); even though it will take some time, it usually makes this kind of performance debugging much easier. /* Steinar */ -- Homepage: http://www.sesse.net/
On Sat, May 19, 2007 at 12:32:33AM +0200, Steinar H. Gunderson wrote: > Did you ANALYZE your tables recently? If the joins are really between > millions of rows and the planner thinks it's a couple thousands, the stats > sound rather off... Sorry, I forgot your first e-mail where you said you had both vacuumed and analyzed recently. The estimates are still off, though -- the WHERE query might be difficult to estimate properly. (I'm not sure how Tom arrived on his conclusion of expensive user-defined functions, but given the usual precisions of his guesses, I'd check that too...) /* Steinar */ -- Homepage: http://www.sesse.net/
On Fri, May 18, 2007 at 03:26:08PM -0700, Abu Mushayeed wrote: > Also, this query ran today and it already finished. Today it was > IO intensive. Are you entirely sure that it's not a coincidence, and something _else_ in the system is causing the CPU issues? A -- Andrew Sullivan | ajs@crankycanuck.ca The plural of anecdote is not data. --Roger Brinner