Re: Interpreting query plan - Mailing list pgsql-general
From | Stephan Szabo |
---|---|
Subject | Re: Interpreting query plan |
Date | |
Msg-id | 20040706093945.H13939@megazone.bigpanda.com Whole thread Raw |
In response to | Re: Interpreting query plan ("Chris Smith" <cdsmith@twu.net>) |
List | pgsql-general |
On Tue, 6 Jul 2004, Chris Smith wrote: > Thanks for your reply. Here is the output of "explain analyze". I also > replaced by simple values with a real query that gets run and (according to > our profiling) takes a long time. However, the query is now taking a much > shorter period of time than it was profiled at by the application. I can only > guess the difference has something to do with system load. I could try to run > this again during a high-load period, if that is necessary. Let me know. It might help. Notes and questions about the query and the plan below inline. > miqweb=> explain analyze select distinct t0.* from UserAccount t0, UserMapping > t1 where (t0.companyid = 628) and ((t0.companyid = 628) and (t0.userid = > t1.use > rid) and (t1.groupid in > (628,948,949,950,951,953,954,1272,1279,1296,1299,1300,1 > 363,1423,1446,1467,1526,1724,1735,1759,1763,1772,1785,1841,1862,1975,2721,2800 > , > 2801,2802,2803,1264,1394,1525,1662,1843,1844,1845,1396,1528,1860,1846,1762,242 > 2 > ,1271,1847,1848,1281,1849,1850,1851,1266,1809,1852,1853,2421,1854,1855,1913,18 > 5 > 6,1857,1269,1268,1858,1859,2804))) and (t0.companyid = 628); I wonder if in practice this query uses distinct to get around a problem with subqueries. I'd think that a query of the general form: select t0.* from UserAccount t0 where t0.companyid=62 and t0.userid in (select userid from UserMapping t1 where t1.groupid in (...)); might work better (although it doesn't look like the extra steps are that big in practice). For systems that don't do in well (older PostgreSQL versions included), perhaps an exists would be better. > Unique (cost=952.15..959.37 rows=289 width=55) (actual time=137.130..143.363 > r > ows=752 loops=1) > -> Sort (cost=952.15..952.87 rows=289 width=55) (actual > time=137.123..138.0 > 04 rows=1328 loops=1) > Sort Key: t0.userid, t0.companyid, t0.username, t0."password", > t0.isact > ive, t0.isregistered, t0.lastlogin, t0.firstname, t0.lastname These steps are for the distinct. It's not alot of actual time, but if the row set returned was large enough to exceed sort_mem the sort might start going off to disk and be slower. > -> Hash Join (cost=869.15..940.34 rows=289 width=55) (actual > time=112 > .112..130.948 rows=1328 loops=1) > Hash Cond: ("outer".userid = "inner".userid) > -> Seq Scan on useraccount t0 (cost=0.00..55.71 rows=629 > width= > 55) (actual time=0.239..8.501 rows=753 loops=1) > Filter: (companyid = 628) The row estimate is pretty reasonable, estimated 629 versus actual 753. How many rows are in useraccount? I'm wondering if 629 is a reasonable percentage of the rows to see if seq scan is reasonable here. > -> Hash (cost=866.28..866.28 rows=1151 width=4) (actual > time=11 > 1.762..111.762 rows=0 loops=1) > -> Seq Scan on usermapping t1 (cost=0.00..866.28 > rows=115 > 1 width=4) (actual time=4.251..109.563 rows=1328 loops=1) > [lots of filter conditions on groupid] Here the estimate isn't so good, estimated 115 vs actual 1328. You might want to consider raising the groupid column's statistics target and re-analyzing to see if you can get a better estimate. It won't probably directly affect this plan entry, but it might affect the layers above. Also, how many rows are in usermapping? I didn't see any indexes on usermapping. Perhaps an index on (userid, groupid) would help as well.
pgsql-general by date: