Re: Slow-ish Query Needs Some Love - Mailing list pgsql-performance

From Robert Haas
Subject Re: Slow-ish Query Needs Some Love
Date
Msg-id 603c8f071002041749x14ff4414g1ed2bb26be3a0bce@mail.gmail.com
Whole thread Raw
In response to Slow-ish Query Needs Some Love  (Matt White <mattw922@gmail.com>)
List pgsql-performance
On Mon, Feb 1, 2010 at 7:53 PM, Matt White <mattw922@gmail.com> wrote:
> I have a relatively straightforward query that by itself isn't that
> slow, but we have to run it up to 40 times on one webpage load, so it
> needs to run much faster than it does. Here it is:
>
> SELECT COUNT(*) FROM users, user_groups
>  WHERE users.user_group_id = user_groups.id AND NOT users.deleted AND
> user_groups.partner_id IN
>  (partner_id_1, partner_id_2);
>
> The structure is partners have user groups which have users. In the
> test data there are over 200,000 user groups and users but only ~3000
> partners. Anyone have any bright ideas on how to speed this query up?
>
> Here's the query plan:
>
>  Aggregate  (cost=12574.53..12574.54 rows=1 width=0) (actual
> time=2909.298..2909.299 rows=1 loops=1)
>   ->  Hash Join  (cost=217.79..12566.08 rows=3378 width=0) (actual
> time=2909.284..2909.284 rows=0 loops=1)
>         Hash Cond: (users.user_group_id = user_groups.id)
>         ->  Seq Scan on users  (cost=0.00..11026.11 rows=206144
> width=4) (actual time=0.054..517.811 rows=205350 loops=1)
>               Filter: (NOT deleted)
>         ->  Hash  (cost=175.97..175.97 rows=3346 width=4) (actual
> time=655.054..655.054 rows=200002 loops=1)
>               ->  Nested Loop  (cost=0.27..175.97 rows=3346 width=4)
> (actual time=1.327..428.406 rows=200002 loops=1)
>                     ->  HashAggregate  (cost=0.27..0.28 rows=1
> width=4) (actual time=1.259..1.264 rows=2 loops=1)
>                           ->  Result  (cost=0.00..0.26 rows=1
> width=0) (actual time=1.181..1.240 rows=2 loops=1)
>                     ->  Index Scan using user_groups_partner_id_idx
> on user_groups  (cost=0.00..133.86 rows=3346 width=8) (actual
> time=0.049..96.992 rows=100001 loops=2)
>                           Index Cond: (user_groups.partner_id =
> (partner_all_subpartners(3494)))
>
>
> The one obvious thing that everyone will point out is the sequential
> scan on users, but there actually is an index on users.deleted. When I
> forced sequential scanning off, it ran slower, so the planner wins
> again.

Yeah, I don't think the sequential scan is hurting you.  What is
bugging me is that it doesn't look like the plan you've posted is for
the query you've posted.  The plan shows an index condition that
references partner_all_subpartners(3494), which doesn't appear in your
original query, and also has two aggregates in it, where your posted
query only has one.

...Robert

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Slow query: table iteration (8.3)
Next
From: Robert Haas
Date:
Subject: Re: foreign key constraint lock behavour in postgresql