Re: demystifying nested loop vs. merge join query plan choice - Mailing list pgsql-general

From Pavel Stehule
Subject Re: demystifying nested loop vs. merge join query plan choice
Date
Msg-id CAFj8pRDZ0Frm-S1-qMJW76zyxNDj1K6U8cSeYWJ9ZUykoR2buQ@mail.gmail.com
Whole thread Raw
In response to demystifying nested loop vs. merge join query plan choice  (Sandeep Gupta <gupta.sandeep@gmail.com>)
Responses Re: demystifying nested loop vs. merge join query plan choice
List pgsql-general
Hello

do you have same configuration?

Regards

Pavel

2013/7/31 Sandeep Gupta <gupta.sandeep@gmail.com>:
> I have two  postgres instances each with a database of same schema. The
> dataset in both is ''same'' but for randomness i.e.  both contain two tables
> pc(did) and tc(pid, did) that have almost
> same number of rows and have been generate from same distribution.
>
> However the query plan for the join turns out to be completely different: on
> one join takes 2.3 secs while on the other it takes 7 secs.
>
>
> Here are the statistics:
>
> for database 1:
>      size of tc table:  49987585
>      size of pc table: 499616
>
> join plan:
>
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=1534125.08..1534125.09 rows=1 width=0) (actual
> time=8473.296..8473.296 rows=1 loops=1)
>    ->  Merge Join  (cost=2.48..1514765.90 rows=7743672 width=0) (actual
> time=0.084..8409.065 rows=998038 loops=1)
>          Merge Cond: (pc.did = tc.did)
>          ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..12987.04
> rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1)
>                Heap Fetches: 0
>          ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..1298125.32
> rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291 loops=1)
>                Heap Fetches: 0
>  Total runtime: 8473.337 ms
> '
>
> Query Running time:  5135
>
>
> for database 2:
>   size of tc table: 50012415
>   size of pc table: 500384
>
>                                                                    QUERY
> PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=35279895.52..35279895.53 rows=1 width=0) (actual
> time=2501.970..2501.970 rows=1 loops=1)
>    ->  Nested Loop  (cost=0.00..35276697.82 rows=1279080 width=0) (actual
> time=0.038..2418.766 rows=1000834 loops=1)
>          ->  Index Only Scan using pc_did_idx on pc  (cost=0.00..15224.56
> rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1)
>                Heap Fetches: 500384
>          ->  Index Only Scan using tc_did_idx on tc  (cost=0.00..70.44
> rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384)
>                Index Cond: (did = pc.did)
>                Heap Fetches: 1000834
>  Total runtime: 2502.017 ms
>
> Query running time: 2090.388 ms
>
> My question is why is the query plan so different for two datasets that are
> really exactly the same. And how can i force the plan to be nested index
> scan on
> database 1 .
>
>
> -Sandeep
>


pgsql-general by date:

Previous
From: Sandeep Gupta
Date:
Subject: demystifying nested loop vs. merge join query plan choice
Next
From: Sandeep Gupta
Date:
Subject: Re: demystifying nested loop vs. merge join query plan choice