Thread: Question on explain

Question on explain

From
Enrico Pirozzi
Date:
Hi all,

I have 2 tables:

table1 (field1,field2,.....) 
table2 (field1,field2,field3,.....)

field1 is an uuid type
field2 is a timestamp with time zone type.

If I execute:

explain (analyze on, timing off)
select B.field3,A.field1,A.field2
FROM table1  A INNER JOIN table2 B
using (field1,field2)

the query plan is


                                                QUERY PLAN                                                 
   
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=137324.20..247200.77 rows=1 width=31) (actual rows=1136175 loops=1)
   Hash Cond: ((b.field1 = a.field1) AND (b.field2 = a.field2))
   ->  Seq Scan on table2 b  (cost=0.00..49694.75 rows=1136175 width=31) (actual rows=1136175 loops
=1)
   ->  Hash  (cost=89610.68..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1)
         Buckets: 65536  Batches: 4  Memory Usage: 15699kB
         ->  Seq Scan on table1 a  (cost=0.00..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1
)
 Total runtime: 5055.118 ms
(7 rows)

My question is
Why Have I a rows=1 in the explain and rows=1136175 in the explain analyze?

I tried to tune Planner Cost Constants as

default_statistics_target (integer)

and / or

from_collapse_limit (integer)
join_collapse_limit (integer)

but the query plan has been the same.

Can anyone help me ?
Thanks,

Enrico

--
That's one small step for man; one giant leap for mankind

Enrico Pirozzi
Tel.  +39 0861 1855771 - Mob.+39 328 4164437 - Fax  +39 0861 1850310
http://www.pgtraining.com <http://www.pgtraining.com-> -
info@pgtraining.com
www.enricopirozzi.info - info@enricopirozzi.info
Skype sscotty71 - Gtalk sscotty71@gmail.com

Re: Question on explain

From
Jeff Janes
Date:
On Sat, Jun 8, 2013 at 8:06 AM, Enrico Pirozzi <sscotty71@gmail.com> wrote:
Hi all,

I have 2 tables:

table1 (field1,field2,.....) 
table2 (field1,field2,field3,.....)

field1 is an uuid type

Is it a primary key?  Is there a foreign key constraint between them?
 
field2 is a timestamp with time zone type.

If I execute:

explain (analyze on, timing off)
select B.field3,A.field1,A.field2
FROM table1  A INNER JOIN table2 B
using (field1,field2)

the query plan is


                                                QUERY PLAN                                                 
   
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=137324.20..247200.77 rows=1 width=31) (actual rows=1136175 loops=1)
   Hash Cond: ((b.field1 = a.field1) AND (b.field2 = a.field2))
   ->  Seq Scan on table2 b  (cost=0.00..49694.75 rows=1136175 width=31) (actual rows=1136175 loops
=1)
   ->  Hash  (cost=89610.68..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1)
         Buckets: 65536  Batches: 4  Memory Usage: 15699kB
         ->  Seq Scan on table1 a  (cost=0.00..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1
)
 Total runtime: 5055.118 ms
(7 rows)

My question is
Why Have I a rows=1 in the explain and rows=1136175 in the explain analyze?

Yes, that seems quite strange.  Did you analyze the tables (not explain analyze, but analyze itself)?
 

I tried to tune Planner Cost Constants as

default_statistics_target (integer)

Did you re-analyze the tables after changing that?


and / or 

from_collapse_limit (integer)
join_collapse_limit (integer)

I would not expect those to matter much for your query.

Cheers,

Jeff

Re: Question on explain

From
Jeff Janes
Date:
On Saturday, June 8, 2013, Jeff Janes wrote:
On Sat, Jun 8, 2013 at 8:06 AM, Enrico Pirozzi <sscotty71@gmail.com> wrote:
Hi all,

I have 2 tables:

table1 (field1,field2,.....) 
table2 (field1,field2,field3,.....)

field1 is an uuid type

Is it a primary key?  Is there a foreign key constraint between them?
 
field2 is a timestamp with time zone type.

If I execute:

explain (analyze on, timing off)
select B.field3,A.field1,A.field2
FROM table1  A INNER JOIN table2 B
using (field1,field2)

the query plan is


                                                QUERY PLAN                                                 
   
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=137324.20..247200.77 rows=1 width=31) (actual rows=1136175 loops=1)
   Hash Cond: ((b.field1 = a.field1) AND (b.field2 = a.field2))
   ->  Seq Scan on table2 b  (cost=0.00..49694.75 rows=1136175 width=31) (actual rows=1136175 loops
=1)
   ->  Hash  (cost=89610.68..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1)
         Buckets: 65536  Batches: 4  Memory Usage: 15699kB
         ->  Seq Scan on table1 a  (cost=0.00..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1
)
 Total runtime: 5055.118 ms
(7 rows)

My question is
Why Have I a rows=1 in the explain and rows=1136175 in the explain analyze?

Yes, that seems quite strange.

On further thought, that is not strange at all.  You have two very selective join conditions, and the planner assumes they are independent, so that it can multiply the selectivities.  But in reality they are completely (or almost completely) dependent.   If the planner knew about cross column correlations, that might not even help as you can have complete statistical dependence without having correlation.

It seems unlikely to me that the timestamp belongs in both tables, since it's value seems to be completely dependent on the value of the UUID.

 
In any event, it is unlikely the planner would pick a different plan were it to correctly understand the selectivities, so no harm is done.  Although it is easy to imagine similar queries where that would not be the case.

Cheers,

Jeff

Re: Question on explain

From
Enrico Pirozzi
Date:
Il 10/06/2013 04:19, Jeff Janes ha scritto:
> On further thought, that is not strange at all.  You have two very
> selective join conditions, and the planner assumes they are independent, so
> that it can multiply the selectivities.  But in reality they are completely
> (or almost completely) dependent.   If the planner knew about cross column
> correlations, that might not even help as you can have complete statistical
> dependence without having correlation.
>
> It seems unlikely to me that the timestamp belongs in both tables, since
> it's value seems to be completely dependent on the value of the UUID.
>
>
> In any event, it is unlikely the planner would pick a different plan were
> it to correctly understand the selectivities, so no harm is done.  Although
> it is easy to imagine similar queries where that would not be the case.
>
> Cheers,
>
> Jeff


Thank you very much ;)

Enrico

--
That's one small step for man; one giant leap for mankind

Enrico Pirozzi
Tel.  +39 0861 1855771 - Mob.+39 328 4164437 - Fax  +39 0861 1850310
http://www.pgtraining.com  - info@pgtraining.com
www.enricopirozzi.info - info@enricopirozzi.info
Skype sscotty71 - Gtalk sscotty71@gmail.com


Attachment