Thread: Problems with non use of indexes

Problems with non use of indexes

From
Tyler Durden
Date:
Hi,
I can't figure out why query planner doesn't use the proper index, anyone can help me?

This query properly uses indexes:

mydb=# EXPLAIN SELECT U0."object_id" FROM "activity_follow" U0 WHERE (U0."content_type_id" = 3 AND U0."user_id" = 1);

QUERY PLAN                                              

--------------------------------------------------------------------------------------------------------
 Index Scan using activity_follow_user_id on activity_follow u0  (cost=0.00..4875.15 rows=4898 width=4)
   Index Cond: (user_id = 1)
   Filter: (content_type_id = 3)
(3 rows)

But the same query on a "IN" statement doesn't. The query planner uses Seq Scan on U0."user_id" = 1

mydb=# EXPLAIN SELECT "activity_action"."id", "activity_action"."actor_id", "activity_action"."verb", "activity_action"."action_content_type_id", "activity_action"."action_object_id", "activity_action"."target_content_type_id", "activity_action"."target_object_id", "activity_action"."public", "activity_action"."created", "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON ("activity_action"."actor_id" = "auth_user"."id") WHERE "activity_action"."actor_id" IN (SELECT U0."object_id" FROM "activity_follow" U0 WHERE (U0."content_type_id" = 3 AND U0."user_id" = 1 )) ORDER BY "activity_action"."created" DESC LIMIT 100;

QUERY PLAN                                                             

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=9206.97..9207.22 rows=100 width=155)
   ->  Sort  (cost=9206.97..9320.34 rows=45347 width=155)
         Sort Key: activity_action.created
         ->  Hash Join  (cost=5447.39..7473.84 rows=45347 width=155)
               Hash Cond: (activity_action.actor_id = auth_user.id)
               ->  Nested Loop  (cost=4887.39..5020.58 rows=45347 width=55)
                     ->  HashAggregate  (cost=4887.39..4887.41 rows=2 width=4)
                           ->  Index Scan using activity_follow_user_id on activity_follow u0  (cost=0.00..4875.15 rows=4898 width=4)
                                 Index Cond: (user_id = 1)
                                 Filter: (content_type_id = 3)
                     ->  Index Scan using activity_action_actor_id on activity_action  (cost=0.00..65.20 rows=111 width=51)
                           Index Cond: (activity_action.actor_id = u0.object_id)
               ->  Hash  (cost=278.00..278.00 rows=10000 width=104)
                     ->  Seq Scan on auth_user  (cost=0.00..278.00 rows=10000 width=104)


If I do a SET enable_seqscan TO 'off'; It uses the index but is also slow.

---

TABLES:

CREATE TABLE "auth_user" (
    "id" serial NOT NULL PRIMARY KEY,
    "username" varchar(30) NOT NULL UNIQUE,
    "first_name" varchar(30) NOT NULL,
    "last_name" varchar(30) NOT NULL,
    "email" varchar(75) NOT NULL,
    "password" varchar(128) NOT NULL,
    "is_staff" boolean NOT NULL,
    "is_active" boolean NOT NULL,
    "is_superuser" boolean NOT NULL,
    "last_login" timestamp with time zone NOT NULL,
    "date_joined" timestamp with time zone NOT NULL
);

CREATE TABLE "activity_follow" (
    "id" serial NOT NULL PRIMARY KEY,
    "user_id" integer NOT NULL REFERENCES "auth_user" ("id") DEFERRABLE INITIALLY DEFERRED,
    "content_type_id" integer NOT NULL REFERENCES "django_content_type" ("id") DEFERRABLE INITIALLY DEFERRED,
    "object_id" integer CHECK ("object_id" >= 0),
    UNIQUE ("user_id", "content_type_id", "object_id")
);

CREATE TABLE "activity_action" (
    "id" serial NOT NULL PRIMARY KEY,
    "actor_id" integer NOT NULL REFERENCES "auth_user" ("id") DEFERRABLE INITIALLY DEFERRED,
    "verb" varchar(50) NOT NULL,
    "action_content_type_id" integer REFERENCES "django_content_type" ("id") DEFERRABLE INITIALLY DEFERRED,
    "action_object_id" integer CHECK ("action_object_id" >= 0),
    "target_content_type_id" integer REFERENCES "django_content_type" ("id") DEFERRABLE INITIALLY DEFERRED,
    "target_object_id" integer CHECK ("target_object_id" >= 0),
    "public" boolean NOT NULL,
    "created" timestamp with time zone NOT NULL
);

CREATE INDEX "activity_follow_user_id" ON "activity_follow" ("user_id");
CREATE INDEX "activity_follow_content_type_id" ON "activity_follow" ("content_type_id");
CREATE INDEX "activity_follow_object_id" ON "activity_follow" ("object_id");
CREATE INDEX "activity_action_actor_id" ON "activity_action" ("actor_id");
CREATE INDEX "activity_action_action_content_type_id" ON "activity_action" ("action_content_type_id");
CREATE INDEX "activity_action_action_object_id" ON "activity_action" ("action_object_id");
CREATE INDEX "activity_action_target_content_type_id" ON "activity_action" ("target_content_type_id");
CREATE INDEX "activity_action_target_object_id" ON "activity_action" ("target_object_id");

---

mydb=# SELECT COUNT(1) FROM activity_action;
  count 
---------
 1104800
(1 row)

mydb=# SELECT COUNT(1) FROM activity_follow;
  count 
---------
 1104800
(1 row)

mydb=# SELECT COUNT(1) FROM auth_user;
 count
-------
 10000
(1 row)

Re: Problems with non use of indexes

From
"Tomas Vondra"
Date:
Hi,

On 2 Březen 2012, 13:12, Tyler Durden wrote:
> Hi,
> I can't figure out why query planner doesn't use the proper index, anyone
> can help me?
>
> This query properly uses indexes:
>
> mydb=# EXPLAIN SELECT U0."object_id" FROM "activity_follow" U0 WHERE
> (U0."content_type_id" = 3 AND U0."user_id" = 1);
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------
>  Index Scan using activity_follow_user_id on activity_follow u0
> (cost=0.00..4875.15 rows=4898 width=4)
>    Index Cond: (user_id = 1)
>    Filter: (content_type_id = 3)
> (3 rows)
>
> But the same query on a "IN" statement doesn't. The query planner uses Seq
> Scan on *U0."user_id" = 1*
>
> mydb=# EXPLAIN SELECT "activity_action"."id",
> "activity_action"."actor_id",
> "activity_action"."verb", "activity_action"."action_content_type_id",
> "activity_action"."action_object_id",
> "activity_action"."target_content_type_id",
> "activity_action"."target_object_id", "activity_action"."public",
> "activity_action"."created", "auth_user"."id", "auth_user"."username",
> "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
> "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
> "auth_user"."is_superuser", "auth_user"."last_login",
> "auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON
> ("activity_action"."actor_id" = "auth_user"."id") WHERE
> "activity_action"."actor_id" IN (SELECT U0."object_id" FROM
> "activity_follow" U0 WHERE (U0."content_type_id" = 3 AND *U0."user_id"
> = 1*)) ORDER BY "activity_action"."created" DESC LIMIT 100;
>
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=9206.97..9207.22 rows=100 width=155)
>    ->  Sort  (cost=9206.97..9320.34 rows=45347 width=155)
>          Sort Key: activity_action.created
>          ->  Hash Join  (cost=5447.39..7473.84 rows=45347 width=155)
>                Hash Cond: (activity_action.actor_id = auth_user.id)
>                ->  Nested Loop  (cost=4887.39..5020.58 rows=45347
> width=55)
>                      ->  HashAggregate  (cost=4887.39..4887.41 rows=2
> width=4)
>                            ->  Index Scan using activity_follow_user_id on
> activity_follow u0  (cost=0.00..4875.15 rows=4898 width=4)
>                                  Index Cond: (user_id = 1)
>                                  Filter: (content_type_id = 3)
>                      ->  Index Scan using activity_action_actor_id on
> activity_action  (cost=0.00..65.20 rows=111 width=51)
>                            Index Cond: (activity_action.actor_id =
> u0.object_id)
>                ->  Hash  (cost=278.00..278.00 rows=10000 width=104)
>                      ->  Seq Scan on auth_user  (cost=0.00..278.00
> rows=10000 width=104)
>
>
> If I do a SET enable_seqscan TO 'off'; It uses the index but is also slow.

Errr, what? The only sequential scan in that explain output is on
auth_user, not activity_follow which is the table referenced in the
original query. It actually uses index scan to read activity_follow

                     ->  Index Scan using activity_follow_user_id on
activity_follow u0  (cost=0.00..4875.15 rows=4898
width=4)
                           Index Cond: (user_id = 1)
                           Filter: (content_type_id = 3)


kind regards
Tomas


Re: Problems with non use of indexes

From
Tyler Durden
Date:
Hi,

On Fri, Mar 2, 2012 at 12:23 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
Hi,

On 2 Březen 2012, 13:12, Tyler Durden wrote:
> Hi,
> I can't figure out why query planner doesn't use the proper index, anyone
> can help me?
>
> This query properly uses indexes:
>
> mydb=# EXPLAIN SELECT U0."object_id" FROM "activity_follow" U0 WHERE
> (U0."content_type_id" = 3 AND U0."user_id" = 1);
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------
>  Index Scan using activity_follow_user_id on activity_follow u0
> (cost=0.00..4875.15 rows=4898 width=4)
>    Index Cond: (user_id = 1)
>    Filter: (content_type_id = 3)
> (3 rows)
>
> But the same query on a "IN" statement doesn't. The query planner uses Seq
> Scan on *U0."user_id" = 1*
>
> mydb=# EXPLAIN SELECT "activity_action"."id",
> "activity_action"."actor_id",
> "activity_action"."verb", "activity_action"."action_content_type_id",
> "activity_action"."action_object_id",
> "activity_action"."target_content_type_id",
> "activity_action"."target_object_id", "activity_action"."public",
> "activity_action"."created", "auth_user"."id", "auth_user"."username",
> "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
> "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
> "auth_user"."is_superuser", "auth_user"."last_login",
> "auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON
> ("activity_action"."actor_id" = "auth_user"."id") WHERE
> "activity_action"."actor_id" IN (SELECT U0."object_id" FROM
> "activity_follow" U0 WHERE (U0."content_type_id" = 3 AND *U0."user_id"
> = 1*)) ORDER BY "activity_action"."created" DESC LIMIT 100;
>
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=9206.97..9207.22 rows=100 width=155)
>    ->  Sort  (cost=9206.97..9320.34 rows=45347 width=155)
>          Sort Key: activity_action.created
>          ->  Hash Join  (cost=5447.39..7473.84 rows=45347 width=155)
>                Hash Cond: (activity_action.actor_id = auth_user.id)
>                ->  Nested Loop  (cost=4887.39..5020.58 rows=45347
> width=55)
>                      ->  HashAggregate  (cost=4887.39..4887.41 rows=2
> width=4)
>                            ->  Index Scan using activity_follow_user_id on
> activity_follow u0  (cost=0.00..4875.15 rows=4898 width=4)
>                                  Index Cond: (user_id = 1)
>                                  Filter: (content_type_id = 3)
>                      ->  Index Scan using activity_action_actor_id on
> activity_action  (cost=0.00..65.20 rows=111 width=51)
>                            Index Cond: (activity_action.actor_id =
> u0.object_id)
>                ->  Hash  (cost=278.00..278.00 rows=10000 width=104)
>                      ->  Seq Scan on auth_user  (cost=0.00..278.00
> rows=10000 width=104)
>
>
> If I do a SET enable_seqscan TO 'off'; It uses the index but is also slow.

Errr, what? The only sequential scan in that explain output is on
auth_user, not activity_follow which is the table referenced in the
original query. It actually uses index scan to read activity_follow

                    ->  Index Scan using activity_follow_user_id on
activity_follow u0  (cost=0.00..4875.15 rows=4898
width=4)
                          Index Cond: (user_id = 1)
                          Filter: (content_type_id = 3)


kind regards
Tomas


Yes, but if I remove U0."user_id" = 1 will use the index:

EXPLAIN SELECT "activity_action"."id", "activity_action"."actor_id", "activity_action"."verb", "activity_action"."action_content_type_id", "activity_action"."action_object_id", "activity_action"."target_content_type_id", "activity_action"."target_object_id", "activity_action"."public", "activity_action"."created", "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON ("activity_action"."actor_id" = "auth_user"."id") WHERE "activity_action"."actor_id" IN (SELECT U0."object_id" FROM "activity_follow" U0 WHERE ( U0."content_type_id" = 3 )) ORDER BY "activity_action"."created" DESC LIMIT 100;
                                                         QUERY PLAN                                                        
----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..4502.18 rows=100 width=155)
   ->  Nested Loop  (cost=0.00..2041605.23 rows=45347 width=155)
         ->  Nested Loop Semi Join  (cost=0.00..1907985.65 rows=45347 width=55)
               ->  Index Scan using activity_action_created on activity_action  (cost=0.00..40093.37 rows=1104800 width=51)
               ->  Index Scan using activity_follow_object_id on activity_follow u0  (cost=0.00..5519.13 rows=3328 width=4)
                     Index Cond: (u0.object_id = activity_action.actor_id)
                     Filter: (u0.content_type_id = 3)
         ->  Index Scan using auth_user_pkey on auth_user  (cost=0.00..2.93 rows=1 width=104)
               Index Cond: (auth_user.id = activity_action.actor_id)

Re: Problems with non use of indexes

From
"Tomas Vondra"
Date:
On 2 Březen 2012, 13:45, Tyler Durden wrote:
>>
> Yes, but if I remove *U0."user_id" = 1 *will use the index:

Which PostgreSQL version is that? Post EXPLAIN ANALYZE output for all
three queries, not just EXPLAIN. And use explain.depesz.com if possible,
it's much more readable.

kind regards
Tomas



Re: Problems with non use of indexes

From
Tyler Durden
Date:
On Fri, Mar 2, 2012 at 12:55 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
On 2 Březen 2012, 13:45, Tyler Durden wrote:
>>
> Yes, but if I remove *U0."user_id" = 1 *will use the index:

Which PostgreSQL version is that? Post EXPLAIN ANALYZE output for all
three queries, not just EXPLAIN. And use explain.depesz.com if possible,
it's much more readable.

kind regards
Tomas

I'm using PostgreSQL 8.4.2 and you can find the EXPLAIN ANALYSE VERBOSE in http://explain.depesz.com/s/hk2

Thanks!

Re: Problems with non use of indexes

From
"Tomas Vondra"
Date:
On 2 Březen 2012, 14:34, Tyler Durden wrote:
> On Fri, Mar 2, 2012 at 12:55 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>
>> On 2 Březen 2012, 13:45, Tyler Durden wrote:
>> >>
>> > Yes, but if I remove *U0."user_id" = 1 *will use the index:
>>
>> Which PostgreSQL version is that? Post EXPLAIN ANALYZE output for all
>> three queries, not just EXPLAIN. And use explain.depesz.com if possible,
>> it's much more readable.
>>
>> kind regards
>> Tomas
>>
>
> I'm using PostgreSQL 8.4.2 and you can find the EXPLAIN ANALYSE VERBOSE in
> http://explain.depesz.com/s/hk2

For all three queries, please.

Tomas


Re: Problems with non use of indexes

From
Tyler Durden
Date:
http://explain.depesz.com/s/f92O

EXPLAIN ANALYSE VERBOSE SELECT "activity_action"."id", "activity_action"."actor_id", "activity_action"."verb", "activity_action"."action_content_type_id", "activity_action"."action_object_id", "activity_action"."target_content_type_id", "activity_action"."target_object_id", "activity_action"."public", "activity_action"."created", "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON ("activity_action"."actor_id" = "auth_user"."id") WHERE "activity_action"."actor_id" IN (SELECT U0."object_id" FROM "activity_follow" U0 WHERE (U0."content_type_id" = 3 AND U0."user_id" = 1)) ORDER BY "activity_action"."created" DESC LIMIT 100;

---

http://explain.depesz.com/s/o3w

EXPLAIN ANALYSE VERBOSE SELECT "activity_action"."id", "activity_action"."actor_id", "activity_action"."verb", "activity_action"."action_content_type_id", "activity_action"."action_object_id", "activity_action"."target_content_type_id", "activity_action"."target_object_id", "activity_action"."public", "activity_action"."created", "auth_user"."id", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON ("activity_action"."actor_id" = "auth_user"."id") WHERE "activity_action"."actor_id" IN (SELECT U0."object_id" FROM "activity_follow" U0 WHERE (U0."content_type_id" = 3)) ORDER BY "activity_action"."created" DESC LIMIT 100;

---

http://explain.depesz.com/s/ccJ

EXPLAIN ANALYSE VERBOSE SELECT U0."object_id" FROM "activity_follow" U0 WHERE (U0."content_type_id" = 3 AND U0."user_id" = 1);


On Fri, Mar 2, 2012 at 2:23 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
On 2 Březen 2012, 14:34, Tyler Durden wrote:
> On Fri, Mar 2, 2012 at 12:55 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>
>> On 2 Březen 2012, 13:45, Tyler Durden wrote:
>> >>
>> > Yes, but if I remove *U0."user_id" = 1 *will use the index:
>>
>> Which PostgreSQL version is that? Post EXPLAIN ANALYZE output for all
>> three queries, not just EXPLAIN. And use explain.depesz.com if possible,
>> it's much more readable.
>>
>> kind regards
>> Tomas
>>
>
> I'm using PostgreSQL 8.4.2 and you can find the EXPLAIN ANALYSE VERBOSE in
> http://explain.depesz.com/s/hk2

For all three queries, please.

Tomas


Re: Problems with non use of indexes

From
Scott Marlowe
Date:
On Fri, Mar 2, 2012 at 5:12 AM, Tyler Durden <tylersticky@gmail.com> wrote:
> Hi,
> I can't figure out why query planner doesn't use the proper index, anyone
> can help me?
>
> This query properly uses indexes:
>
> mydb=# EXPLAIN SELECT U0."object_id" FROM "activity_follow" U0 WHERE
> (U0."content_type_id" = 3 AND U0."user_id" = 1);

Query plan: http://explain.depesz.com/s/ccJ
No order by in the above.  Order by in the below:

> mydb=# EXPLAIN SELECT "activity_action"."id", "activity_action"."actor_id",
> "activity_action"."verb", "activity_action"."action_content_type_id",
> "activity_action"."action_object_id",
> "activity_action"."target_content_type_id",
> "activity_action"."target_object_id", "activity_action"."public",
> "activity_action"."created", "auth_user"."id", "auth_user"."username",
> "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
> "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
> "auth_user"."is_superuser", "auth_user"."last_login",
> "auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON
> ("activity_action"."actor_id" = "auth_user"."id") WHERE
> "activity_action"."actor_id" IN (SELECT U0."object_id" FROM
> "activity_follow" U0 WHERE (U0."content_type_id" = 3 AND U0."user_id" = 1 ))
> ORDER BY "activity_action"."created" DESC LIMIT 100;

query plan: http://explain.depesz.com/s/f92O

What happens if you drop the order by on it?  Just for comparison.
I'm guessing that needing to sort is where the cost is coming from.