Redundant sub query triggers slow nested loop left join

From: henk de wit
Subject: Redundant sub query triggers slow nested loop left join
Date: ,
Msg-id: BAY106-F1923F1C1D05D94B1BAF671F5550@phx.gbl
(view: Whole thread, Raw)
Responses: Re: Redundant sub query triggers slow nested loop left join  (Tom Lane)
List: pgsql-performance


In investigating a slow query, I distiled the code below from a larger
query:

SELECT
    *
FROM
    /* SUBQUERY banners */ (
        SELECT
            *
        FROM
            /* SUBQUERY banners_links */ (
                SELECT
                    *
                FROM
                    banners_links
                WHERE
                    merchant_id = 5631
            ) as banners_links
        WHERE
            merchant_id = 5631 AND
            banners_links.status = 0
    ) AS banners

        LEFT OUTER JOIN

    /* SUBQUERY types */ (
        SELECT
            banner_types.id         AS type_id,
            banner_types.type         AS type,
            banners_banner_types.banner_id     AS id
        FROM
            banner_types,banners_banner_types
        WHERE
            banners_banner_types.banner_id IN /* SUBQUERY */ (
                SELECT
                    id
                FROM
                    banners_links
                WHERE
                    merchant_id = 5631
            ) AND
            banners_banner_types.type_id = banner_types.id
    ) AS types

    USING (id)

Obviously, the subquery "banners_links" is redundant. The query however is a
generated one, and this redundancy probably wasn't noted before. Logically
you would say it shouldn't hurt, but in fact it does. The above query
executes painfully slow. The left outer join is killing the performance, as
witnessed by the plan:

"Nested Loop Left Join  (cost=964.12..1480.67 rows=1 width=714) (actual
time=20.801..8233.410 rows=553 loops=1)"
"  Join Filter: (public.banners_links.id = banners_banner_types.banner_id)"
"  ->  Bitmap Heap Scan on banners_links  (cost=4.35..42.12 rows=1
width=671) (actual time=0.127..0.690 rows=359 loops=1)"
"        Recheck Cond: ((merchant_id = 5631) AND (merchant_id = 5631))"
"        Filter: ((status)::text = '0'::text)"
"        ->  Bitmap Index Scan on banners_links_merchant_id_idx
(cost=0.00..4.35 rows=10 width=0) (actual time=0.092..0.092 rows=424
loops=1)"
"              Index Cond: ((merchant_id = 5631) AND (merchant_id = 5631))"
"  ->  Hash Join  (cost=959.77..1432.13 rows=514 width=51) (actual
time=0.896..22.611 rows=658 loops=359)"
"        Hash Cond: (banners_banner_types.type_id = banner_types.id)"
"        ->  Hash IN Join  (cost=957.32..1422.52 rows=540 width=16) (actual
time=0.894..21.878 rows=658 loops=359)"
"              Hash Cond: (banners_banner_types.banner_id =
public.banners_links.id)"
"              ->  Seq Scan on banners_banner_types  (cost=0.00..376.40
rows=22240 width=16) (actual time=0.003..10.149 rows=22240 loops=359)"
"              ->  Hash  (cost=952.02..952.02 rows=424 width=8) (actual
time=0.779..0.779 rows=424 loops=1)"
"                    ->  Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.108..0.513 rows=424
loops=1)"
"                          Recheck Cond: (merchant_id = 5631)"
"                          ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.078..0.078 rows=424 loops=1)"
"                                Index Cond: (merchant_id = 5631)"
"        ->  Hash  (cost=2.20..2.20 rows=20 width=43) (actual
time=0.033..0.033 rows=20 loops=1)"
"              ->  Seq Scan on banner_types  (cost=0.00..2.20 rows=20
width=43) (actual time=0.004..0.017 rows=20 loops=1)"
"Total runtime: 8233.710 ms"

I noticed that the recheck condition looks a bit weird:

Recheck Cond: ((merchant_id = 5631) AND (merchant_id = 5631))

You would think that PG (8.2.3) would be smart enough to optimize this away.
Also the estimate of the nested loop left join and the actual results are
way off. I tried increasing the statistics of both public.banners_links.id
and banners_banner_types.banner_id (to the highest value 1000), analyzed,
vacuum analyzed and did a vacuum full, but without any improvements.

Anyway, when I remove the redundant sub query the code becomes:

SELECT
    *
FROM
    /* SUBQUERY banners */ (
        SELECT
            *
        FROM
            banners_links
        WHERE
            merchant_id = 5631 AND
            banners_links.status = 0
    ) AS banners

        LEFT OUTER JOIN

    /* SUBQUERY types */ (
        SELECT
            banner_types.id         AS type_id,
            banner_types.type         AS type,
            banners_banner_types.banner_id     AS id
        FROM
            banner_types,banners_banner_types
        WHERE
            banners_banner_types.banner_id IN /* SUBQUERY */ (
                SELECT
                    id
                FROM
                    banners_links
                WHERE
                    merchant_id = 5631
            ) AND
            banners_banner_types.type_id = banner_types.id
    ) AS types

    USING (id)

With this query, the execution time drops from 8 seconds to a mere 297 ms!
The plan now looks as follows:

"Hash Left Join  (cost=1449.99..2392.68 rows=2 width=714) (actual
time=24.257..25.292 rows=553 loops=1)"
"  Hash Cond: (public.banners_links.id = banners_banner_types.banner_id)"
"  ->  Bitmap Heap Scan on banners_links  (cost=11.43..954.03 rows=2
width=671) (actual time=0.122..0.563 rows=359 loops=1)"
"        Recheck Cond: (merchant_id = 5631)"
"        Filter: ((status)::text = '0'::text)"
"        ->  Bitmap Index Scan on banners_links_merchant_id_idx
(cost=0.00..11.43 rows=424 width=0) (actual time=0.086..0.086 rows=424
loops=1)"
"              Index Cond: (merchant_id = 5631)"
"  ->  Hash  (cost=1432.13..1432.13 rows=514 width=51) (actual
time=24.128..24.128 rows=658 loops=1)"
"        ->  Hash Join  (cost=959.77..1432.13 rows=514 width=51) (actual
time=1.714..23.606 rows=658 loops=1)"
"              Hash Cond: (banners_banner_types.type_id = banner_types.id)"
"              ->  Hash IN Join  (cost=957.32..1422.52 rows=540 width=16)
(actual time=1.677..22.811 rows=658 loops=1)"
"                    Hash Cond: (banners_banner_types.banner_id =
public.banners_links.id)"
"                    ->  Seq Scan on banners_banner_types
(cost=0.00..376.40 rows=22240 width=16) (actual time=0.005..10.306
rows=22240 loops=1)"
"                    ->  Hash  (cost=952.02..952.02 rows=424 width=8)
(actual time=0.772..0.772 rows=424 loops=1)"
"                          ->  Bitmap Heap Scan on banners_links
(cost=11.54..952.02 rows=424 width=8) (actual time=0.105..0.510 rows=424
loops=1)"
"                                Recheck Cond: (merchant_id = 5631)"
"                                ->  Bitmap Index Scan on
banners_links_merchant_id_idx  (cost=0.00..11.43 rows=424 width=0) (actual
time=0.077..0.077 rows=424 loops=1)"
"                                      Index Cond: (merchant_id = 5631)"
"              ->  Hash  (cost=2.20..2.20 rows=20 width=43) (actual
time=0.032..0.032 rows=20 loops=1)"
"                    ->  Seq Scan on banner_types  (cost=0.00..2.20 rows=20
width=43) (actual time=0.004..0.018 rows=20 loops=1)"
"Total runtime: 25.602 ms"

We see that instead of a nested loop left join the planner now chooses a
Hash Left Join. I'm not really an expert in this matter and would like some
more insight into what's happening  here. Naively I would say that a planner
would have to be smart enough to see this by itself?

Thanks in advance for all hints.

_________________________________________________________________
Talk with your online friends with Messenger
http://www.join.msn.com/messenger/overview



pgsql-performance by date:

From: Tom Lane
Date:
Subject: Re: Odd problem with planner choosing seq scan
From: Ulrich Cech
Date:
Subject: Re: Large objetcs performance