Re: Interpreting query plan - Mailing list pgsql-general

From Stephan Szabo
Subject Re: Interpreting query plan
Date
Msg-id 20040706093945.H13939@megazone.bigpanda.com
Whole thread Raw
In response to Re: Interpreting query plan  ("Chris Smith" <cdsmith@twu.net>)
List pgsql-general
On Tue, 6 Jul 2004, Chris Smith wrote:

> Thanks for your reply.  Here is the output of "explain analyze".  I also
> replaced by simple values with a real query that gets run and (according to
> our profiling) takes a long time.  However, the query is now taking a much
> shorter period of time than it was profiled at by the application.  I can only
> guess the difference has something to do with system load.  I could try to run
> this again during a high-load period, if that is necessary.  Let me know.

It might help.

Notes and questions about the query and the plan below inline.

> miqweb=> explain analyze select distinct t0.* from UserAccount t0, UserMapping
> t1 where (t0.companyid = 628) and ((t0.companyid = 628) and (t0.userid =
> t1.use
> rid) and (t1.groupid in
> (628,948,949,950,951,953,954,1272,1279,1296,1299,1300,1
> 363,1423,1446,1467,1526,1724,1735,1759,1763,1772,1785,1841,1862,1975,2721,2800
> ,
> 2801,2802,2803,1264,1394,1525,1662,1843,1844,1845,1396,1528,1860,1846,1762,242
> 2
> ,1271,1847,1848,1281,1849,1850,1851,1266,1809,1852,1853,2421,1854,1855,1913,18
> 5
> 6,1857,1269,1268,1858,1859,2804))) and (t0.companyid = 628);

I wonder if in practice this query uses distinct to get around a problem
with subqueries.  I'd think that a query of the general form:
select t0.* from UserAccount t0 where t0.companyid=62 and t0.userid in
 (select userid from UserMapping t1 where t1.groupid in (...));
might work better (although it doesn't look like the extra steps are that
big in practice).  For systems that don't do in well (older PostgreSQL
versions included), perhaps an exists would be better.

>  Unique  (cost=952.15..959.37 rows=289 width=55) (actual time=137.130..143.363
> r
> ows=752 loops=1)
>    ->  Sort  (cost=952.15..952.87 rows=289 width=55) (actual
> time=137.123..138.0
> 04 rows=1328 loops=1)
>          Sort Key: t0.userid, t0.companyid, t0.username, t0."password",
> t0.isact
> ive, t0.isregistered, t0.lastlogin, t0.firstname, t0.lastname

These steps are for the distinct.  It's not alot of actual time, but if
the row set returned was large enough to exceed sort_mem the sort might
start going off to disk and be slower.

>          ->  Hash Join  (cost=869.15..940.34 rows=289 width=55) (actual
> time=112
> .112..130.948 rows=1328 loops=1)
>                Hash Cond: ("outer".userid = "inner".userid)
>                ->  Seq Scan on useraccount t0  (cost=0.00..55.71 rows=629
> width=
> 55) (actual time=0.239..8.501 rows=753 loops=1)
>                      Filter: (companyid = 628)

The row estimate is pretty reasonable, estimated 629 versus actual 753.
How many rows are in useraccount?  I'm wondering if 629 is a reasonable
percentage of the rows to see if seq scan is reasonable here.

>                ->  Hash  (cost=866.28..866.28 rows=1151 width=4) (actual
> time=11
> 1.762..111.762 rows=0 loops=1)
>                      ->  Seq Scan on usermapping t1  (cost=0.00..866.28
> rows=115
> 1 width=4) (actual time=4.251..109.563 rows=1328 loops=1)
> [lots of filter conditions on groupid]

Here the estimate isn't so good, estimated 115 vs actual 1328.  You might
want to consider raising the groupid column's statistics target and
re-analyzing to see if you can get a better estimate. It won't probably
directly affect this plan entry, but it might affect the layers above.
Also, how many rows are in usermapping?

I didn't see any indexes on usermapping. Perhaps an index on (userid,
groupid) would help as well.

pgsql-general by date:

Previous
From: ruben
Date:
Subject: Re: server closed the connection unexpectedly
Next
From: Bruce Momjian
Date:
Subject: Re: alter table cascade does not give notice about dropped