Thread: horrendous query challenge :-)
Hello, I'm posting this sort of as a learning exercise to see how others approach a slow query and what steps they take to optimize it. I'm aware that this query is pretty hideous, but unfortunately it's all too typical of the type of thing our app is asked to do. It's a join of five tables (rather, 4 tables and a view). There are as of yet no indexes defined - that part of the process I want to learn - where to apply indexes most effectively. The following query takes several minutes to run. current_status_test is a view, definition at the end of this email. My coworker authored this query. He's a SQL novice, and I'm somewhere in the novice-intermediate range, so when his queries run slow, it falls to me to try to optimize them. Unfortunately, I'm not very good at this particular activity. :-) What steps would you all take to try to speed up this query? I'm interested to see peoples' various approaches. I'll try them all out and report execution times before and after. Thanks for your time! Thanks, Fran monitor=# explain select cs.entity_id,e.type,e.name,w.interface_label,cs.remote_ts,s.name,r.name from current_status_test cs, entity e, site s, region r,watch w where cs.entity_id in (select entity_id from entity where active=true) and cs.current = false and cs.msgtype=w.msg_type and e.entity_id = cs.entity_id and s.region_id = r.region_id and cs.site_id = s.site_id order by cs.remote_ts desc, r.name, s.name; NOTICE: QUERY PLAN: Sort (cost=100018792.67..100018792.67 rows=1 width=108) -> Merge Join (cost=100018792.56..100018792.66 rows=1 width=108) -> Sort (cost=100018791.39..100018791.39 rows=1 width=92) -> Nested Loop (cost=100018729.68..100018791.38 rows=1 width=92) -> Hash Join (cost=18729.68..18783.58 rows=1 width=72) -> Hash Join (cost=18728.30..18780.56 rows=28 width=56) -> Seq Scan on entity e (cost=0.00..16.74 rows=874 width=28) -> Hash (cost=18728.19..18728.19 rows=45 width=28) -> Hash Join (cost=3.58..18728.19 rows=45 width=28) -> Seq Scan on entity_watch_map e (cost=0.00..18720.79 rows=89 width=16) SubPlan -> Seq Scan on entity (cost=0.00..18.93 rows=451 width=4) -> Hash (cost=3.53..3.53 rows=22 width=12) -> Hash Join (cost=1.11..3.53 rows=22 width=12) -> Seq Scan on classifications class (cost=0.00..1.26 rows=26 width=8) -> Hash (cost=1.09..1.09 rows=9 width=4) -> Seq Scan on categories cat (cost=0.00..1.09 rows=9 width=4) -> Hash (cost=1.30..1.30 rows=30 width=16) -> Seq Scan on watch w (cost=0.00..1.30 rows=30 width=16) -> Seq Scan on site s (cost=0.00..5.52 rows=152 width=20) -> Sort (cost=1.17..1.17 rows=7 width=16) -> Seq Scan on region r (cost=0.00..1.07 rows=7 width=16) EXPLAIN monitor=# current_status_test definition: View definition: SELECT findsite(e.entity_id) AS site_id, e.entity_id, get_status(e.entity_id, e.watch_id) AS status, e.watch_id, e.msg_type AS msgtype, cat.name AS msgcat, 'Ok' AS message, now() AS local_ts, e.last_contact AS remote_ts, e.current FROM entity_watch_map e, classifications class, categories cat WHERE (((e.last_contact > "timestamp"((date(now()) - 180))) AND (e.msg_type = class.msg_type)) AND (class.category_id = cat.category_id));
Fran Fabrizio <ffabrizio@mmrd.com> writes: > I'm posting this sort of as a learning exercise to see how others > approach a slow query and what steps they take to optimize it. You haven't really given us enough information. For starters, do the row count estimates shown in EXPLAIN have any relationship to reality? (Posting EXPLAIN ANALYZE results, if you are on 7.2, would help answer that question.) Possibly even more relevant is what the functions used in the view definition do --- it's not unlikely that the function evaluations are where much of the time goes. > monitor=# explain select > cs.entity_id,e.type,e.name,w.interface_label,cs.remote_ts,s.name,r.name > from current_status_test cs, entity e, site s, region r,watch w where > cs.entity_id in (select entity_id from entity where active=true) and > cs.current = false and cs.msgtype=w.msg_type and e.entity_id = > cs.entity_id and s.region_id = r.region_id and cs.site_id = s.site_id > order by cs.remote_ts desc, r.name, s.name; IN is almost always bad news. Is entity.entity_id a unique key? If so I'd think you could rewrite this into a join ... but wait, you're *already* joining to entity. Isn't the cs.entity_id in (select entity_id from entity where active=true) clause replaceable by just e.active = true given that you have e.entity_id = cs.entity_id in there already? > View definition: SELECT findsite(e.entity_id) AS site_id, e.entity_id, > get_status(e.entity_id, e.watch_id) AS status, e.watch_id, e.msg_type AS > msgtype, cat.name AS msgcat, 'Ok' AS message, now() AS local_ts, > e.last_contact AS remote_ts, e.current FROM entity_watch_map e, > classifications class, categories cat WHERE (((e.last_contact > > "timestamp"((date(now()) - 180))) AND (e.msg_type = class.msg_type)) AND > (class.category_id = cat.category_id)); The e.last_contact > "timestamp"((date(now()) - 180)) clause will not be indexable without some rework (at least not in PG 7.2 and earlier --- this will be a nonissue once 7.3 comes out). I am not sure how much that matters; the clause may not be selective enough to justify trying to indexscan on last_contact anyhow. Again it's tough to say much in advance of seeing EXPLAIN ANALYZE results. regards, tom lane
On Wed, 29 May 2002, Fran Fabrizio wrote: > cs.entity_id,e.type,e.name,w.interface_label,cs.remote_ts,s.name,r.name > from current_status_test cs, entity e, site s, region r,watch w where > cs.entity_id in (select entity_id from entity where active=true) and > cs.current = false and cs.msgtype=w.msg_type and e.entity_id = > cs.entity_id and s.region_id = r.region_id and cs.site_id = s.site_id > order by cs.remote_ts desc, r.name, s.name; Um... eww? Let's make that a *little* more readable, shall we? SELECT cs.entity_id, e.type, e.name, w.interface_label, cs.remote_ts, s.name, r.name FROM current_status_test cs, entity e, site s, region r, watch w WHERE cs.entity_id in (select entity_id from entity where active=true) AND cs.current = false AND cs.msgtype=w.msg_type AND e.entity_id = cs.entity_id AND s.region_id = r.region_id AND cs.site_id = s.site_id ORDER BY cs.remote_ts desc, r.name, s.name; Ok, now that we have that taken care of, first suggestion is to ditch the "in" clause. Postgres is notorious about not doing those very well. Looking at the query, you don't need it, it can be part of the where clause directly. You can replace it with "e.active = true". Since you're already including the entity table, why not use it? Now you have this: SELECT cs.entity_id, e.type, e.name, w.interface_label, cs.remote_ts, s.name, r.name FROM current_status_test cs, entity e, site s, region r, watch w WHERE cs.current = false AND cs.msgtype = w.msg_type AND cs.entity_id = e.entity_id AND e.active = true AND cs.site_id = s.site_id AND s.region_id = r.region_id ORDER BY cs.remote_ts desc, r.name, s.name; Not much different, but I bet it'll work better. As far as indexes go, I'd follow the obvious route and index all of your "_id" columns, if they're not already. I'd suggest putting one on msg_type too as it also appears in your view. But wait, you have a view in there... let's rewrite that too. Since you're using and, all of the where parenthesis can go bye-bye. You also don't need to date(now()) as now() is already a date. SELECT findsite(e.entity_id) AS site_id, e.entity_id, get_status(e.entity_id, e.watch_id) AS status, e.watch_id, e.msg_type AS msgtype, cat.name AS msgcat, 'Ok' AS message, now() AS local_ts, e.last_contact AS remote_ts, e.current FROM entity_watch_map e, classifications class, categories cat WHERE e.last_contact > "timestamp"(now() - 180) AND e.msg_type = class.msg_type AND class.category_id = cat.category_id; Yow. That's some view. You're essentially joining 7 tables overall, not 5. That can't be pretty. ^_^ Anyway, I'd index msg_type, and category_id from this select. Not much you can do with anything else. But also consider that you're simply adding information from the classifications and categories tables. You use none of this information in your query at all. Why even use the view? The view restricts to things newwer than 180 days, but you can put that in your select itself. You're left with: SELECT wm.entity_id, e.type, e.name, w.interface_label, wm.last_contact AS remote_ts, s.name, r.name FROM entity_watch_map wm, entity e, site s, region r, watch w WHERE wm.last_contact > "timestamp"(now() - 180) AND wm.current = false AND wm.msg_type = w.msg_type AND wm.entity_id = e.entity_id AND e.active = true AND findsite(wm.entity_id) = s.site_id AND s.region_id = r.region_id ORDER BY wm.last_contact desc, r.name, s.name; And there you go. I just essentially chopped two tables off of your total query plan. Tables you don't need, and tables you don't use data from in this particular query, if my guess is right. So, all in all, you've lost an unnecessary IN clause, and chopped off two unnecessary table joins. Index the *_id columns and msg_type, vacuum analyze, and try running this again. That's about all I can see. But it should be more than enough. -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
Ok, I've taken Shaun and Tom's advice, and I've also loaded the schema and data into a 7.2 devel machine in order to produce EXPLAIN ANALYZE output. Shaun and Tom pointed out that I could remove the IN, and Shaun also showed me that including the view in the join was unnecessary, so we ended up with the final query of: SELECT wm.entity_id, e.type, e.name, w.interface_label, wm.last_contact AS remote_ts, s.name, r.name FROM entity_watch_map wm, entity e, site s, region r, watch w WHERE wm.last_contact > "timestamp"(now() - 180) AND wm.current = false AND wm.msg_type = w.msg_type AND wm.entity_id = e.entity_id AND e.active = true AND findsite(wm.entity_id) = s.site_id AND s.region_id = r.region_id ORDER BY wm.last_contact desc, r.name, s.name; I created indexes on all the *_id columns as well as the msg_type columns, but as you can see from the output below, they aren't being used (these tables are small so that doesn't surprise me). Performance is about 2-3x's better, but still taking around 40s. Now that we've eliminated the view and it's function calls, and the IN, this seems pretty straightforward. My next stop was to check on the one remaining function call, the findsite() in the where clause. But, I have spent about a week's time in the past working on that one and it's fairly well optimized, down to about .003 seconds per call. Small bit of evidence: monitor=# select now(); select findsite(10950); select now(); now ------------------------------- 2002-05-30 13:16:50.519125-04 (1 row) findsite ---------- 10132 (1 row) now ------------------------------- 2002-05-30 13:16:50.522812-04 (1 row) monitor=# Calling it with several other values produces the same results. So I'm not sure that's the bottleneck. Then, I tried to read through the EXPLAIN ANALYZE output of the new and improved query, and it looks like the hit is coming on the nested loop. So I thought maybe setting enable_nestloop = false would show something interesting, so I edit postgresql.conf, restart, run again, and get the same exact EXPLAIN ANALYZE output (showing that it's still doing a nested loop anyhow). At this point, I'm past my expertise. Are there other areas of this query I can look to improve upon? Here is the EXPLAIN ANALYZE output. The row counts are accurate to the reality, too (Tom had asked). monitor=# explain analyze SELECT wm.entity_id, e.type, e.name, w.interface_label, monitor-# wm.last_contact AS remote_ts, s.name, r.name monitor-# FROM entity_watch_map wm, entity e, site s, monitor-# region r, watch w monitor-# WHERE wm.last_contact > "timestamp"(now() - 180) monitor-# AND wm.current = false monitor-# AND wm.msg_type = w.msg_type monitor-# AND wm.entity_id = e.entity_id monitor-# AND e.active = true monitor-# AND findsite(wm.entity_id) = s.site_id monitor-# AND s.region_id = r.region_id monitor-# ORDER BY wm.last_contact desc, r.name, s.name; NOTICE: QUERY PLAN: Sort (cost=469.67..469.67 rows=39 width=102) (actual time=38170.90..38171.06 rows=202 loops=1) -> Hash Join (cost=72.86..468.65 rows=39 width=102) (actual time=222.59..38163.46 rows=202 loops=1) -> Hash Join (cost=71.49..466.08 rows=36 width=84) (actual time=222.25..38160.23 rows=202 loops=1) -> Nested Loop (cost=70.40..464.37 rows=36 width=69) (actual time=222.03..38156.67 rows=202 loops=1) -> Hash Join (cost=70.40..100.27 rows=47 width=38) (actual time=33.58..41.06 rows=202 loops=1) -> Seq Scan on entity e (cost=0.00..19.02 rows=513 width=22) (actual time=0.13..4.52 rows=513 loops=1) -> Hash (cost=70.20..70.20 rows=80 width=16) (actual time=32.98..32.98 rows=0 loops=1) -> Seq Scan on entity_watch_map wm (cost=0.00..70.20 rows=80 width=16) (actual time=0.15..32.36 rows=240 loops=1) -> Seq Scan on site s (cost=0.00..5.53 rows=153 width=31) (actual time=0.01..1.02 rows=153 loops=202) -> Hash (cost=1.07..1.07 rows=7 width=15) (actual time=0.14..0.14 rows=0 loops=1) -> Seq Scan on region r (cost=0.00..1.07 rows=7 width=15) (actual time=0.09..0.11 rows=7 loops=1) -> Hash (cost=1.30..1.30 rows=30 width=18) (actual time=0.30..0.30 rows=0 loops=1) -> Seq Scan on watch w (cost=0.00..1.30 rows=30 width=18) (actual time=0.10..0.20 rows=30 loops=1) Total runtime: 38171.52 msec EXPLAIN monitor=# Thank you for your continued help, I've learned a lot already. -Fran
Fran Fabrizio <ffabrizio@mmrd.com> writes: > Performance is about 2-3x's better, but still taking around 40s. Now > that we've eliminated the view and it's function calls, and the IN, this > seems pretty straightforward. My next stop was to check on the one > remaining function call, the findsite() in the where clause. But, I > have spent about a week's time in the past working on that one and it's > fairly well optimized, down to about .003 seconds per call. Nonetheless, it's being evaluated 202*153 or almost 31000 times per query. Your .003 is an overestimate since it includes all the overhead of query parsing/planning, but it still appears that those function calls are the bulk of the runtime. What is findsite doing exactly? If it's a table lookup, maybe you could inline it into this query and get some optimization. regards, tom lane
I said: > Your .003 is an overestimate since it includes all the overhead > of query parsing/planning, but it still appears that those function > calls are the bulk of the runtime. BTW, a more accurate estimate of function execution time could be had like this: regression=# select timeofday(), timeofday(); timeofday | timeofday -------------------------------------+------------------------------------- Thu May 30 15:05:32.838542 2002 EDT | Thu May 30 15:05:32.838718 2002 EDT (1 row) regression=# select 838718-838542; ?column? ---------- 176 (1 row) regression=# select timeofday(), recursion_test(22,42), timeofday(); timeofday | recursion_test | timeofday -------------------------------------+-------------------------------------------------------------+------------------------------------- Thu May 30 15:05:57.342504 2002 EDT | 22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,42 | Thu May 30 15:05:57.3473982002 EDT (1 row) regression=# select 347398-342504; ?column? ---------- 4894 (1 row) so I can conclude that recursion_test() took about 4.7 msec. On my machine this technique seems to yield measurements good to 10 microsec or so. Note you have to use timeofday() and not now(). regards, tom lane
> > >What is findsite doing exactly? If it's a table lookup, maybe you could >inline it into this query and get some optimization. > > regards, tom lane > > Tom, You hit the nail on the head. The findsite(entity_id) plpgsql function queries the 'entity' table recursively. 'Entity' table has entity_id, parent_id (which is another entity_id in the 'entity' table) and type (such as S for site, H for host, etc...). My data is organized in a heirarchy site-host-app-practice, so in the worst case findsite recurses three times when called with an entity_id of a practice. However, to optimize findsite (and it's cousins findhost and findregion), I created a table called findparent_cache which has entity_id, parent_id, and type (of the parent). When you call findsite() it checks first to see if it's computed this particular value before (it would find it in the findparent_cache). There are only approx. 800 entity ids in the entity table, so after 1 loop over the entity table with findsite(), it should be hitting 100% cache, and thus it becomes a simple table lookup on findparent_cache. To test Tom's hypothesis, I ensured that findparent_cache was fully populated, and changed the query to... SELECT wm.entity_id, e.type, e.name, w.interface_label, wm.last_contact AS remote_ts, s.name, r.name FROM entity_watch_map wm, entity e, site s, region r, watch w, findparent_cache fpc WHERE wm.last_contact > "timestamp"(now() - 180) AND wm.current = false AND wm.msg_type = w.msg_type AND wm.entity_id = e.entity_id AND e.active = true AND wm.entity_id = fpc.entity_id AND fpc.type = 'S' AND fpc.parent_id = s.site_id AND s.region_id = r.region_id ORDER BY wm.last_contact desc, r.name, s.name; at which point the query runs pretty much instantly. That's an awful lot of overhead for that poor function. findsite() is a key function that we use all over the place. I thought it was fairly efficient but this demonstrates how it can quickly get out of hand. I suppose if I could always ensure that findsite_cache was completely populated, we could always just hit that directly. Since "what is the site id of the site that holds this entity?" is such a common question, we really should have it in a table as opposed to a function lookup, huh? Does even the simplest plpgsql function have this kind of overhead? Or is my function poorly written? Here is the function code.... create function findsite(int4) returns int4 as ' declare child alias for $1; thesite int4; begin select parent_id into thesite from findparent_cache where entity_id = child and type = ''S''; if not found then select findparenttype(child, ''S'') into thesite; execute ''insert into findparent_cache values ('' || child || '','' || thesite || '',''''S'''')''; else end if; return thesite; end; ' language 'plpgsql'; create function findparenttype(int4, varchar) returns int4 as ' select parent.entity_id from entity parent, entity child where child.entity_id = $1 and child.lft between parent.lft and parent.rgt and parent.type = $2; ' language 'sql'; The 'entity' table implements the Celko nested set model, thus the lft's and rgt's and self-join. Could these functions be written more efficiently, or am I just witnessing the overhead of functions, and I should avoid them unless I'm using them in O(1) situations? Thanks for all the help, Fran
Fran Fabrizio <ffabrizio@mmrd.com> writes: > ... at which point the query runs pretty much instantly. > That's an awful lot of overhead for that poor function. findsite() is a > key function that we use all over the place. I thought it was fairly > efficient but this demonstrates how it can quickly get out of hand. I > suppose if I could always ensure that findsite_cache was completely > populated, we could always just hit that directly. Since "what is the > site id of the site that holds this entity?" is such a common question, > we really should have it in a table as opposed to a function lookup, > huh? Does even the simplest plpgsql function have this kind of > overhead? Or is my function poorly written? The problem is not that there's anything wrong with the function on its own terms. The problem is that the query planner has no idea what the semantics of findsite(a.foo) = b.bar are, and so it has no alternative but to execute the query as a nested loop: compare every row of A to every row of B, computing findsite() again for each such comparison. And each findsite call performs an independent probe into findparent_cache, making the thing effectively equivalent to a three-level nested loop. You did not show the query plan being used after you converted this to a join against findparent_cache, but obviously it's a lot better than a 3-level nested loop... It would have helped if the planner had computed findsite(a.foo) only once per row of A. In the current scheme of things I believe that would happen if B were being scanned with an inner indexscan, but unfortunately B was too small to justify an indexscan. (The extra cost of re-evaluating findsite isn't weighted sufficiently high in making that choice, because the planner doesn't know that findsite() is any more expensive than any other function.) However, the real lesson is that pushing table lookups down into functions prevents the planner from optimizing those lookups as joins. regards, tom lane
On Thu, 30 May 2002, Fran Fabrizio wrote: Looking at your function, it looks like you're doing something some databases allow you to do, mainly sending parameters to views. Knowing this, and the fact that your function creates a self-join. Now think about it for a second. You already know in this query the parent id you're looking for: s.site_id, right? Drop the function, and make it an exists query. Basically you're now asking, "as a parent, does this site_id have a type of 's' and a child in the wm table?" The planner gets more info, and it just might help. Try this... it's ugly, but it's there: SELECT wm.entity_id, e.type, e.name, w.interface_label, wm.last_contact AS remote_ts, s.name, r.name FROM entity_watch_map wm, entity e, site s, region r, watch w WHERE wm.last_contact > "timestamp"(now() - 180) AND wm.current = false AND wm.msg_type = w.msg_type AND wm.entity_id = e.entity_id AND e.active = true AND EXISTS ( SELECT 1 FROM entity p, entity c WHERE p.entity_id = s.site_id AND c.entity_id = wm.entity_id AND p.type = 'S' AND c.lft BETWEEN p.lft AND p.rgt ) AND s.region_id = r.region_id ORDER BY wm.last_contact desc, r.name, s.name; Man, is that an ugly query. I've seen worse, though. -- +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+ | Shaun M. Thomas INN Database Administrator | | Phone: (309) 743-0812 Fax : (309) 743-0830 | | Email: sthomas@townnews.com AIM : trifthen | | Web : www.townnews.com | | | | "Most of our lives are about proving something, either to | | ourselves or to someone else." | | -- Anonymous | +-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+