Thread: Slow inner join, but left join is fast

Slow inner join, but left join is fast

From
"Jeremy Haile"
Date:
I have a query made by joining two subqueries where the outer query
performing the join takes significantly longer to run than the two
subqueries.

The first subquery runs in 600ms.  The seconds subquery runs in 700ms.
But the outer query takes 240 seconds to run!  Both of the two
subqueries only return 8728 rows.

Changing the inner join to a left join makes the outer query run in
about 1000ms (which is great), but I don't understand why the inner join
is so slow!

I'm using PostgreSQL 8.2.1.  Any ideas?

QUERY PLAN (Inner Join) - takes 240 seconds
-------------------
Nested Loop  (cost=17.46..17.56 rows=1 width=120)
  Join Filter: ((a.merchant_dim_id = b.merchant_dim_id) AND
  (a.dcms_dim_id = b.dcms_dim_id))
  ->  HashAggregate  (cost=8.71..8.74 rows=1 width=16)
        ->  Index Scan using transaction_facts_transaction_date_idx on
        transaction_facts  (cost=0.00..8.69 rows=1 width=16)
              Index Cond: ((transaction_date >= '2007-01-09
              00:00:00'::timestamp without time zone) AND
              (transaction_date < '2007-01-09 09:30:00'::timestamp
              without time zone))
  ->  HashAggregate  (cost=8.75..8.78 rows=1 width=16)
        ->  HashAggregate  (cost=8.71..8.72 rows=1 width=55)
              ->  Index Scan using
              transaction_facts_transaction_date_idx on
              transaction_facts  (cost=0.00..8.69 rows=1 width=55)
                    Index Cond: ((transaction_date >= '2007-01-09
                    00:00:00'::timestamp without time zone) AND
                    (transaction_date < '2007-01-09 09:30:00'::timestamp
                    without time zone))


QUERY PLAN (Left Join) - takes one second
-------------------
Merge Left Join  (cost=304037.63..304064.11 rows=2509 width=120)
  Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id =
  b.merchant_dim_id))
  ->  Sort  (cost=152019.45..152025.72 rows=2509 width=64)
        Sort Key: a.dcms_dim_id, a.merchant_dim_id
        ->  HashAggregate  (cost=151771.15..151852.69 rows=2509
        width=16)
              ->  Bitmap Heap Scan on transaction_facts
              (cost=5015.12..150419.90 rows=77214 width=16)
                    Recheck Cond: ((transaction_date >= '2007-01-09
                    00:00:00'::timestamp without time zone) AND
                    (transaction_date < '2007-01-09 09:30:00'::timestamp
                    without time zone))
                    ->  Bitmap Index Scan on
                    transaction_facts_transaction_date_idx
                    (cost=0.00..4995.81 rows=77214 width=0)
                          Index Cond: ((transaction_date >= '2007-01-09
                          00:00:00'::timestamp without time zone) AND
                          (transaction_date < '2007-01-09
                          09:30:00'::timestamp without time zone))
  ->  Sort  (cost=152018.18..152020.54 rows=943 width=64)
        Sort Key: b.dcms_dim_id, b.merchant_dim_id
        ->  Subquery Scan b  (cost=151931.51..151971.59 rows=943
        width=64)
              ->  HashAggregate  (cost=151931.51..151962.16 rows=943
              width=16)
                    ->  HashAggregate  (cost=151578.11..151672.35
                    rows=9424 width=55)
                          ->  Bitmap Heap Scan on transaction_facts
                          (cost=5015.12..150419.90 rows=77214 width=55)
                                Recheck Cond: ((transaction_date >=
                                '2007-01-09 00:00:00'::timestamp without
                                time zone) AND (transaction_date <
                                '2007-01-09 09:30:00'::timestamp without
                                time zone))
                                ->  Bitmap Index Scan on
                                transaction_facts_transaction_date_idx
                                (cost=0.00..4995.81 rows=77214 width=0)
                                      Index Cond: ((transaction_date >=
                                      '2007-01-09 00:00:00'::timestamp
                                      without time zone) AND
                                      (transaction_date < '2007-01-09
                                      09:30:00'::timestamp without time
                                      zone))


QUERY
-------------------
select a.merchant_dim_id, a.dcms_dim_id,
       a.num_success, a.num_failed, a.total_transactions,
       a.success_rate,
       b.distinct_num_success, b.distinct_num_failed,
       b.distinct_total_transactions, b.distinct_success_rate
from (

-- SUBQUERY 1
select merchant_dim_id,
       dcms_dim_id,
       sum(success) as num_success,
       sum(failed) as num_failed,
       count(*) as total_transactions,
       (sum(success) * 1.0 / count(*)) as success_rate
from transaction_facts
where transaction_date >= '2007-1-9'
and transaction_date < '2007-1-9 9:30'
group by merchant_dim_id, dcms_dim_id

) as a inner join (

-- SUBQUERY 2
select merchant_dim_id,
       dcms_dim_id,
       sum(success) as distinct_num_success,
       sum(failed) as distinct_num_failed,
       count(*) as distinct_total_transactions,
       (sum(success) * 1.0 / count(*)) as distinct_success_rate
from (

  select merchant_dim_id,
         dcms_dim_id,
         serial_number,
         success,
         failed
  from transaction_facts
  where transaction_date >= '2007-1-9'
  and transaction_date < '2007-1-9 9:30'
  group by merchant_dim_id, dcms_dim_id, serial_number, success, failed

  ) as distinct_summary
group by merchant_dim_id, dcms_dim_id

) as b using(merchant_dim_id, dcms_dim_id)

Re: Slow inner join, but left join is fast

From
Tom Lane
Date:
"Jeremy Haile" <jhaile@fastmail.fm> writes:
> I have a query made by joining two subqueries where the outer query
> performing the join takes significantly longer to run than the two
> subqueries.

Please show EXPLAIN ANALYZE results, not just EXPLAIN.
Also, have you analyzed your tables recently?

            regards, tom lane

Re: Slow inner join, but left join is fast

From
"Jeremy Haile"
Date:
The table should have been analyzed, but to make sure I ran analyze on
the table before executing the explain analyze queries.  Well - problem
solved.  This time the inner join query runs quickly.

I still don't understand why the inner join would be so different from
the left join prior to the analyze.  It looks like the amount of rows
expected in the original query plan for inner join was 1 (not correct
since it was really 8728)  The left join query had the exact same
subqueries but expected 77214 rows to be returned from them, which was
still not correct but resulted in a better query plan.

After the recent analyze, here's the new inner join query plan.  I won't
bother pasting the left join plan, since it is almost identical now
(including row counts)  FYI -the result of the queries is (and always
was) identical for inner join and left join.


QUERY PLAN (inner join)
Merge Join  (cost=279457.86..279479.83 rows=43 width=120) (actual
time=626.771..670.275 rows=8728 loops=1)
  Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id =
  b.merchant_dim_id))
  ->  Sort  (cost=139717.30..139722.38 rows=2029 width=64) (actual
  time=265.669..269.878 rows=8728 loops=1)
        Sort Key: a.dcms_dim_id, a.merchant_dim_id
        ->  HashAggregate  (cost=139519.61..139585.56 rows=2029
        width=16) (actual time=211.368..247.429 rows=8728 loops=1)
              ->  Bitmap Heap Scan on transaction_facts
              (cost=4427.62..138316.05 rows=68775 width=16) (actual
              time=21.858..100.998 rows=65789 loops=1)
                    Recheck Cond: ((transaction_date >= '2007-01-09
                    00:00:00'::timestamp without time zone) AND
                    (transaction_date < '2007-01-09 09:30:00'::timestamp
                    without time zone))
                    ->  Bitmap Index Scan on
                    transaction_facts_transaction_date_idx
                    (cost=0.00..4410.42 rows=68775 width=0) (actual
                    time=21.430..21.430 rows=65789 loops=1)
                          Index Cond: ((transaction_date >= '2007-01-09
                          00:00:00'::timestamp without time zone) AND
                          (transaction_date < '2007-01-09
                          09:30:00'::timestamp without time zone))
  ->  Sort  (cost=139740.56..139742.67 rows=843 width=64) (actual
  time=361.083..365.418 rows=8728 loops=1)
        Sort Key: b.dcms_dim_id, b.merchant_dim_id
        ->  Subquery Scan b  (cost=139663.76..139699.59 rows=843
        width=64) (actual time=308.567..346.135 rows=8728 loops=1)
              ->  HashAggregate  (cost=139663.76..139691.16 rows=843
              width=16) (actual time=308.563..337.677 rows=8728 loops=1)
                    ->  HashAggregate  (cost=139347.68..139431.97
                    rows=8429 width=55) (actual time=198.093..246.591
                    rows=48942 loops=1)
                          ->  Bitmap Heap Scan on transaction_facts
                          (cost=4427.62..138316.05 rows=68775 width=55)
                          (actual time=24.080..83.988 rows=65789
                          loops=1)
                                Recheck Cond: ((transaction_date >=
                                '2007-01-09 00:00:00'::timestamp without
                                time zone) AND (transaction_date <
                                '2007-01-09 09:30:00'::timestamp without
                                time zone))
                                ->  Bitmap Index Scan on
                                transaction_facts_transaction_date_idx
                                (cost=0.00..4410.42 rows=68775 width=0)
                                (actual time=23.596..23.596 rows=65789
                                loops=1)
                                      Index Cond: ((transaction_date >=
                                      '2007-01-09 00:00:00'::timestamp
                                      without time zone) AND
                                      (transaction_date < '2007-01-09
                                      09:30:00'::timestamp without time
                                      zone))
Total runtime: 675.638 ms



On Wed, 10 Jan 2007 12:15:44 -0500, "Tom Lane" <tgl@sss.pgh.pa.us> said:
> "Jeremy Haile" <jhaile@fastmail.fm> writes:
> > I have a query made by joining two subqueries where the outer query
> > performing the join takes significantly longer to run than the two
> > subqueries.
>
> Please show EXPLAIN ANALYZE results, not just EXPLAIN.
> Also, have you analyzed your tables recently?
>
>             regards, tom lane

Re: Slow inner join, but left join is fast

From
Tom Lane
Date:
"Jeremy Haile" <jhaile@fastmail.fm> writes:
> I still don't understand why the inner join would be so different from
> the left join prior to the analyze.

Are you sure you hadn't analyzed in between?  Or maybe autovac did it
for you?  The reason for the plan change is the change from estimating
1 row matching the transaction_date range constraint, to estimating lots
of them, and the join type away up at the top would surely not have
affected that.

            regards, tom lane

Re: Slow inner join, but left join is fast

From
"Jeremy Haile"
Date:
I'm pretty sure it didn't analyze in between  - autovac is turned off
and I ran the test multiple times before posting.

But since I can't reproduce it anymore, I can't be 100% sure.  And it
certainly doesn't make sense that the estimate for the index scan would
change based on an unrelated join condition.

If I ever get it to happen again, I'll be more careful and repost if it
is a real issue.  Thanks for pointing me in the right direction!


On Wed, 10 Jan 2007 13:38:15 -0500, "Tom Lane" <tgl@sss.pgh.pa.us> said:
> "Jeremy Haile" <jhaile@fastmail.fm> writes:
> > I still don't understand why the inner join would be so different from
> > the left join prior to the analyze.
>
> Are you sure you hadn't analyzed in between?  Or maybe autovac did it
> for you?  The reason for the plan change is the change from estimating
> 1 row matching the transaction_date range constraint, to estimating lots
> of them, and the join type away up at the top would surely not have
> affected that.
>
>             regards, tom lane

Re: Slow inner join, but left join is fast

From
"Jeremy Haile"
Date:
Another random idea - does PostgreSQL do any caching of query plans?
even on the session level?

I ran these queries from the same Query window, so my idea is that maybe
the inner join plan was cached prior to an automatic analyze being run.

But I'm doubting PostgreSQL would do something like that.  And of
course, if PostgreSQL doesn't cache query plans - this idea is bogus =)


On Wed, 10 Jan 2007 13:38:24 -0500, "Jeremy Haile" <jhaile@fastmail.fm>
said:
> I'm pretty sure it didn't analyze in between  - autovac is turned off
> and I ran the test multiple times before posting.
>
> But since I can't reproduce it anymore, I can't be 100% sure.  And it
> certainly doesn't make sense that the estimate for the index scan would
> change based on an unrelated join condition.
>
> If I ever get it to happen again, I'll be more careful and repost if it
> is a real issue.  Thanks for pointing me in the right direction!
>
>
> On Wed, 10 Jan 2007 13:38:15 -0500, "Tom Lane" <tgl@sss.pgh.pa.us> said:
> > "Jeremy Haile" <jhaile@fastmail.fm> writes:
> > > I still don't understand why the inner join would be so different from
> > > the left join prior to the analyze.
> >
> > Are you sure you hadn't analyzed in between?  Or maybe autovac did it
> > for you?  The reason for the plan change is the change from estimating
> > 1 row matching the transaction_date range constraint, to estimating lots
> > of them, and the join type away up at the top would surely not have
> > affected that.
> >
> >             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

Re: Slow inner join, but left join is fast

From
Tom Lane
Date:
"Jeremy Haile" <jhaile@fastmail.fm> writes:
> Another random idea - does PostgreSQL do any caching of query plans?

Only if the client specifies it, either by PREPARE or the equivalent
protocol-level message.  I dunno what client software you were using,
but I think few if any would PREPARE behind your back.  Might be worth
checking into though, if you've eliminated autovacuum.

Actually there's another possibility --- did you create any indexes on
the table in between?  CREATE INDEX doesn't do a full stats update, but
it does count the rows and update pg_class.reltuples.  But it's hard to
believe that'd have caused as big a rowcount shift as we see here ...

            regards, tom lane

Re: Slow inner join, but left join is fast

From
"Jeremy Haile"
Date:
I did create and drop an index at some point while looking at this
issue.  But I definitely reran both of the queries (and explains) after
the index was dropped, so I don't understand why there would be a
difference between the inner and left query plans.  (which were run
back-to-back more than once)  Anyways - I'll let you know if something
similar happens again.

Thanks,
Jeremy Haile


On Wed, 10 Jan 2007 14:22:35 -0500, "Tom Lane" <tgl@sss.pgh.pa.us> said:
> "Jeremy Haile" <jhaile@fastmail.fm> writes:
> > Another random idea - does PostgreSQL do any caching of query plans?
>
> Only if the client specifies it, either by PREPARE or the equivalent
> protocol-level message.  I dunno what client software you were using,
> but I think few if any would PREPARE behind your back.  Might be worth
> checking into though, if you've eliminated autovacuum.
>
> Actually there's another possibility --- did you create any indexes on
> the table in between?  CREATE INDEX doesn't do a full stats update, but
> it does count the rows and update pg_class.reltuples.  But it's hard to
> believe that'd have caused as big a rowcount shift as we see here ...
>
>             regards, tom lane