Thread: Interpreting query plan
I've just noticed in the regular profiling information from our web application that a particular query on a fairly small database is taking about 15 seconds. The query is generated from software on the fly, hence its quirkiness -- if any of that is the problem, then I'll go ahead and fix it, but I can't imagine a few repeated WHERE conditions fooling the query optimizer. Anyway, I don't know how to interpret query plans. Can anyone give me a hand? To get the plan, I just plugged in various values -- The actual query is run with various different values, and even a few different lengths for the IN clause. miqweb=> explain select distinct t0.* from UserAccount t0, UserMapping t1 where (t0.companyid = 123) and ((t0.companyid = 123) and (t0.userid = t1.userid) and (t1.groupid in (123, 234, 345, 456))) and (t0.companyid = 123); QUERY PLAN ------------------------------------------------------------------------------ -- -------------------------------------------------------------- Unique (cost=133.78..133.81 rows=1 width=55) -> Sort (cost=133.78..133.79 rows=1 width=55) Sort Key: t0.userid, t0.companyid, t0.username, t0."password", t0.isact ive, t0.isregistered, t0.lastlogin, t0.firstname, t0.lastname -> Hash Join (cost=13.44..133.77 rows=1 width=55) Hash Cond: ("outer".userid = "inner".userid) -> Seq Scan on usermapping t1 (cost=0.00..120.26 rows=13 width= 4) Filter: ((groupid = 123) OR (groupid = 234) OR (groupid = 3 45) OR (groupid = 456)) -> Hash (cost=13.43..13.43 rows=4 width=55) -> Index Scan using useraccount_lookup on useraccount t0 (cost=0.00..13.43 rows=4 width=55) Index Cond: (companyid = 123) (10 rows) And relevant tables (apparently a little messed up by prior database version upgrades, so that come of the foreign keys show up directly as triggers): miqweb=> \d useraccount Table "public.useraccount" Column | Type | Modifiers --------------+---------+----------- userid | integer | not null companyid | integer | not null username | text | not null password | text | not null isactive | boolean | not null isregistered | boolean | not null lastlogin | date | firstname | text | lastname | text | Indexes: "useraccount_pkey" primary key, btree (userid) "useraccount_lookup" unique, btree (companyid, username) Triggers: "RI_ConstraintTrigger_255906" AFTER INSERT OR UPDATE ON useraccount FROM com pany NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_ check_ins"('useraccount_fk1', 'useraccount', 'company', 'UNSPECIFIED', 'companyi d', 'companyid') "RI_ConstraintTrigger_255916" AFTER DELETE ON useraccount FROM registrationf ield NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_ noaction_del"('registrationfield_fk2', 'registrationfield', 'useraccount', 'UNSP ECIFIED', 'userid', 'userid') "RI_ConstraintTrigger_255917" AFTER UPDATE ON useraccount FROM registrationf ield NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_ noaction_upd"('registrationfield_fk2', 'registrationfield', 'useraccount', 'UNSP ECIFIED', 'userid', 'userid') "RI_ConstraintTrigger_255919" AFTER DELETE ON useraccount FROM userrole NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_ del"('userrole_fk1', 'userrole', 'useraccount', 'UNSPECIFIED', 'userid', 'userid ') "RI_ConstraintTrigger_255920" AFTER UPDATE ON useraccount FROM userrole NOT DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_ upd"('userrole_fk1', 'userrole', 'useraccount', 'UNSPECIFIED', 'userid', 'userid ') "RI_ConstraintTrigger_255928" AFTER DELETE ON useraccount FROM visit NOT DEF ERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_del "('visit_fk1', 'visit', 'useraccount', 'UNSPECIFIED', 'userid', 'userid') "RI_ConstraintTrigger_255929" AFTER UPDATE ON useraccount FROM visit NOT DEF ERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noaction_upd "('visit_fk1', 'visit', 'useraccount', 'UNSPECIFIED', 'userid', 'userid') "RI_ConstraintTrigger_255940" AFTER DELETE ON useraccount FROM adminvisit NO T DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noactio n_del"('adminvisit_fk1', 'adminvisit', 'useraccount', 'UNSPECIFIED', 'userid', ' userid') "RI_ConstraintTrigger_255941" AFTER UPDATE ON useraccount FROM adminvisit NO T DEFERRABLE INITIALLY IMMEDIATE FOR EACH ROW EXECUTE PROCEDURE "RI_FKey_noactio n_upd"('adminvisit_fk1', 'adminvisit', 'useraccount', 'UNSPECIFIED', 'userid', ' userid') miqweb=> \d usermapping Table "public.usermapping" Column | Type | Modifiers ---------+---------+----------- userid | integer | not null groupid | integer | not null Foreign-key constraints: "$1" FOREIGN KEY (userid) REFERENCES useraccount(userid) "$2" FOREIGN KEY (groupid) REFERENCES groups(groupid) -- www.designacourse.com The Easiest Way to Train Anyone... Anywhere. Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
On Fri, 2 Jul 2004, Chris Smith wrote: > I've just noticed in the regular profiling information from our web > application that a particular query on a fairly small database is taking about > 15 seconds. The query is generated from software on the fly, hence its > quirkiness -- if any of that is the problem, then I'll go ahead and fix it, > but I can't imagine a few repeated WHERE conditions fooling the query > optimizer. > > Anyway, I don't know how to interpret query plans. Can anyone give me a hand? > To get the plan, I just plugged in various values -- The actual query is run > with various different values, and even a few different lengths for the IN > clause. > > miqweb=> explain select distinct t0.* from UserAccount t0, UserMapping t1 > where > (t0.companyid = 123) and ((t0.companyid = 123) and (t0.userid = t1.userid) > and > (t1.groupid in (123, 234, 345, 456))) and (t0.companyid = 123); Plain explain output is useful for finding what the plan is, but not as useful for determining why a query takes a particular amount of time. You might want to use "explain analyze" and send that result (which gives the real time and real number of rows for different steps).
Stephan, Thanks for your reply. Here is the output of "explain analyze". I also replaced by simple values with a real query that gets run and (according to our profiling) takes a long time. However, the query is now taking a much shorter period of time than it was profiled at by the application. I can only guess the difference has something to do with system load. I could try to run this again during a high-load period, if that is necessary. Let me know. miqweb=> explain analyze select distinct t0.* from UserAccount t0, UserMapping t1 where (t0.companyid = 628) and ((t0.companyid = 628) and (t0.userid = t1.use rid) and (t1.groupid in (628,948,949,950,951,953,954,1272,1279,1296,1299,1300,1 363,1423,1446,1467,1526,1724,1735,1759,1763,1772,1785,1841,1862,1975,2721,2800 , 2801,2802,2803,1264,1394,1525,1662,1843,1844,1845,1396,1528,1860,1846,1762,242 2 ,1271,1847,1848,1281,1849,1850,1851,1266,1809,1852,1853,2421,1854,1855,1913,18 5 6,1857,1269,1268,1858,1859,2804))) and (t0.companyid = 628); [...] Unique (cost=952.15..959.37 rows=289 width=55) (actual time=137.130..143.363 r ows=752 loops=1) -> Sort (cost=952.15..952.87 rows=289 width=55) (actual time=137.123..138.0 04 rows=1328 loops=1) Sort Key: t0.userid, t0.companyid, t0.username, t0."password", t0.isact ive, t0.isregistered, t0.lastlogin, t0.firstname, t0.lastname -> Hash Join (cost=869.15..940.34 rows=289 width=55) (actual time=112 .112..130.948 rows=1328 loops=1) Hash Cond: ("outer".userid = "inner".userid) -> Seq Scan on useraccount t0 (cost=0.00..55.71 rows=629 width= 55) (actual time=0.239..8.501 rows=753 loops=1) Filter: (companyid = 628) -> Hash (cost=866.28..866.28 rows=1151 width=4) (actual time=11 1.762..111.762 rows=0 loops=1) -> Seq Scan on usermapping t1 (cost=0.00..866.28 rows=115 1 width=4) (actual time=4.251..109.563 rows=1328 loops=1) Filter: ((groupid = 628) OR (groupid = 948) OR (group id = 949) OR (groupid = 950) OR (groupid = 951) OR (groupid = 953) OR (groupid = 954) OR (groupid = 1272) OR (groupid = 1279) OR (groupid = 1296) OR (groupid = 1299) OR (groupid = 1300) OR (groupid = 1363) OR (groupid = 1423) OR (groupid = 1446) OR (groupid = 1467) OR (groupid = 1526) OR (groupid = 1724) OR (groupid = 1735) OR (groupid = 1759) OR (groupid = 1763) OR (groupid = 1772) OR (groupid = 1785) OR (groupid = 1841) OR (groupid = 1862) OR (groupid = 1975) OR (groupid = 2721) OR (groupid = 2800) OR (groupid = 2801) OR (groupid = 2802) OR (groupid = 2803) OR (groupid = 1264) OR (groupid = 1394) OR (groupid = 1525) OR (groupid = 1662) OR (groupid = 1843) OR (groupid = 1844) OR (groupid = 1845) OR (groupid = 1396) OR (groupid = 1528) OR (groupid = 1860) OR (groupid = 1846) OR (groupid = 1762) OR (groupid = 2422) OR (groupid = 1271) OR (groupid = 1847) OR (groupid = 1848) OR (groupid = 1281) OR (groupid = 1849) OR (groupid = 1850) OR (groupid = 1851) OR (groupid = 1266) OR (groupid = 1809) OR (groupid = 1852) OR (groupid = 1853) OR (groupid = 2421) OR (groupid = 1854) OR (groupid = 1855) OR (groupid = 1913) OR (groupid = 1856) OR (groupid = 1857) OR (groupid = 1269) OR (groupid = 1268) OR (groupid = 1858) OR (groupid = 1859) OR (groupid = 2804)) Total runtime: 144.690 ms (11 rows) -- www.designacourse.com The Easiest Way to Train Anyone... Anywhere. Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
On Tue, 6 Jul 2004, Chris Smith wrote: > Thanks for your reply. Here is the output of "explain analyze". I also > replaced by simple values with a real query that gets run and (according to > our profiling) takes a long time. However, the query is now taking a much > shorter period of time than it was profiled at by the application. I can only > guess the difference has something to do with system load. I could try to run > this again during a high-load period, if that is necessary. Let me know. It might help. Notes and questions about the query and the plan below inline. > miqweb=> explain analyze select distinct t0.* from UserAccount t0, UserMapping > t1 where (t0.companyid = 628) and ((t0.companyid = 628) and (t0.userid = > t1.use > rid) and (t1.groupid in > (628,948,949,950,951,953,954,1272,1279,1296,1299,1300,1 > 363,1423,1446,1467,1526,1724,1735,1759,1763,1772,1785,1841,1862,1975,2721,2800 > , > 2801,2802,2803,1264,1394,1525,1662,1843,1844,1845,1396,1528,1860,1846,1762,242 > 2 > ,1271,1847,1848,1281,1849,1850,1851,1266,1809,1852,1853,2421,1854,1855,1913,18 > 5 > 6,1857,1269,1268,1858,1859,2804))) and (t0.companyid = 628); I wonder if in practice this query uses distinct to get around a problem with subqueries. I'd think that a query of the general form: select t0.* from UserAccount t0 where t0.companyid=62 and t0.userid in (select userid from UserMapping t1 where t1.groupid in (...)); might work better (although it doesn't look like the extra steps are that big in practice). For systems that don't do in well (older PostgreSQL versions included), perhaps an exists would be better. > Unique (cost=952.15..959.37 rows=289 width=55) (actual time=137.130..143.363 > r > ows=752 loops=1) > -> Sort (cost=952.15..952.87 rows=289 width=55) (actual > time=137.123..138.0 > 04 rows=1328 loops=1) > Sort Key: t0.userid, t0.companyid, t0.username, t0."password", > t0.isact > ive, t0.isregistered, t0.lastlogin, t0.firstname, t0.lastname These steps are for the distinct. It's not alot of actual time, but if the row set returned was large enough to exceed sort_mem the sort might start going off to disk and be slower. > -> Hash Join (cost=869.15..940.34 rows=289 width=55) (actual > time=112 > .112..130.948 rows=1328 loops=1) > Hash Cond: ("outer".userid = "inner".userid) > -> Seq Scan on useraccount t0 (cost=0.00..55.71 rows=629 > width= > 55) (actual time=0.239..8.501 rows=753 loops=1) > Filter: (companyid = 628) The row estimate is pretty reasonable, estimated 629 versus actual 753. How many rows are in useraccount? I'm wondering if 629 is a reasonable percentage of the rows to see if seq scan is reasonable here. > -> Hash (cost=866.28..866.28 rows=1151 width=4) (actual > time=11 > 1.762..111.762 rows=0 loops=1) > -> Seq Scan on usermapping t1 (cost=0.00..866.28 > rows=115 > 1 width=4) (actual time=4.251..109.563 rows=1328 loops=1) > [lots of filter conditions on groupid] Here the estimate isn't so good, estimated 115 vs actual 1328. You might want to consider raising the groupid column's statistics target and re-analyzing to see if you can get a better estimate. It won't probably directly affect this plan entry, but it might affect the layers above. Also, how many rows are in usermapping? I didn't see any indexes on usermapping. Perhaps an index on (userid, groupid) would help as well.
Stephan Szabo wrote: > > [running at higher load] > It might help. Okay, I'll give it a shot. Unfortunately, load isn't so predictable with our current users, so I'll have to give it a few shots and see if I come up with something substantially different. > I wonder if in practice this query uses distinct to get around a > problem with subqueries. Yes, it does exactly that. The database was recently moved over to PostgreSQL 7.4. The thought of switching over to IN-subqueries is a bit scary, since the query generator is a really over-generalized mess of about 7000 LOC in 17 different source files; but I'll certainly look at that. > These steps are for the distinct. It's not alot of actual time, but > if the row set returned was large enough to exceed sort_mem the sort > might start going off to disk and be slower. Indeed, that looks like it could be a problem for some of our larger customers who have up to tens of thousands of users. The IN form would avoid this sort? > The row estimate is pretty reasonable, estimated 629 versus actual > 753. How many rows are in useraccount? I'm wondering if 629 is a > reasonable percentage of the rows to see if seq scan is reasonable > here. Total? On this server, it's currently 2566. > Here the estimate isn't so good, estimated 115 vs actual 1328. You > might want to consider raising the groupid column's statistics target > and re-analyzing to see if you can get a better estimate. Alright. So how exactly do I raise the groupid column's statistics target? > Also, how many rows are in usermapping? 4120 > I didn't see any indexes on usermapping. Perhaps an index on (userid, > groupid) would help as well. Done. I'll watch for more profiling results to see if we run into this same problem again. Thanks for your help, -- www.designacourse.com The Easiest Way to Train Anyone... Anywhere. Chris Smith - Lead Software Developer/Technical Trainer MindIQ Corporation
On Tue, 6 Jul 2004, Chris Smith wrote: > Stephan Szabo wrote: > > > I wonder if in practice this query uses distinct to get around a > > problem with subqueries. > > Yes, it does exactly that. The database was recently moved over to PostgreSQL > 7.4. The thought of switching over to IN-subqueries is a bit scary, since the > query generator is a really over-generalized mess of about 7000 LOC in 17 > different source files; but I'll certainly look at that. It might be worth just converting some by hand to see what explain analyze says about them in contrast to the original. > > These steps are for the distinct. It's not alot of actual time, but > > if the row set returned was large enough to exceed sort_mem the sort > > might start going off to disk and be slower. > > Indeed, that looks like it could be a problem for some of our larger customers > who have up to tens of thousands of users. The IN form would avoid this sort? The IN form would potentially use a generally different plan. It still uses sort_mem for some of its determinations, but I *think* you'd get more in than you would be for the sort. If you have the RAM and are doing queries like this alot, you might want to raise sort_mem if you haven't changed it from the default. > > The row estimate is pretty reasonable, estimated 629 versus actual > > 753. How many rows are in useraccount? I'm wondering if 629 is a > > reasonable percentage of the rows to see if seq scan is reasonable > > here. > > Total? On this server, it's currently 2566. Okay, so 629 doing a seq scan is pretty reasonable if the table doesn't have alot of empty space. > > Here the estimate isn't so good, estimated 115 vs actual 1328. You > > might want to consider raising the groupid column's statistics target > > and re-analyzing to see if you can get a better estimate. > > Alright. So how exactly do I raise the groupid column's statistics target? Oops, I mentioned it in a previous version of that paragraph and apparently removed it upon editing. ALTER TABLE <table> ALTER COLUMN <column> SET STATISTICS <integer>. Maybe try 100 to see what it gives you.