Re: Question on explain - Mailing list pgsql-general

From Jeff Janes
Subject Re: Question on explain
Date
Msg-id CAMkU=1wPOHktPNTz-k9bR7OGCBuM9ieyu=qd6p3n+LA7rj9mkQ@mail.gmail.com
Whole thread Raw
In response to Re: Question on explain  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Question on explain  (Enrico Pirozzi <sscotty71@gmail.com>)
List pgsql-general
On Saturday, June 8, 2013, Jeff Janes wrote:
On Sat, Jun 8, 2013 at 8:06 AM, Enrico Pirozzi <sscotty71@gmail.com> wrote:
Hi all,

I have 2 tables:

table1 (field1,field2,.....) 
table2 (field1,field2,field3,.....)

field1 is an uuid type

Is it a primary key?  Is there a foreign key constraint between them?
 
field2 is a timestamp with time zone type.

If I execute:

explain (analyze on, timing off)
select B.field3,A.field1,A.field2
FROM table1  A INNER JOIN table2 B
using (field1,field2)

the query plan is


                                                QUERY PLAN                                                 
   
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=137324.20..247200.77 rows=1 width=31) (actual rows=1136175 loops=1)
   Hash Cond: ((b.field1 = a.field1) AND (b.field2 = a.field2))
   ->  Seq Scan on table2 b  (cost=0.00..49694.75 rows=1136175 width=31) (actual rows=1136175 loops
=1)
   ->  Hash  (cost=89610.68..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1)
         Buckets: 65536  Batches: 4  Memory Usage: 15699kB
         ->  Seq Scan on table1 a  (cost=0.00..89610.68 rows=2287368 width=24) (actual rows=1143684 loops=1
)
 Total runtime: 5055.118 ms
(7 rows)

My question is
Why Have I a rows=1 in the explain and rows=1136175 in the explain analyze?

Yes, that seems quite strange.

On further thought, that is not strange at all.  You have two very selective join conditions, and the planner assumes they are independent, so that it can multiply the selectivities.  But in reality they are completely (or almost completely) dependent.   If the planner knew about cross column correlations, that might not even help as you can have complete statistical dependence without having correlation.

It seems unlikely to me that the timestamp belongs in both tables, since it's value seems to be completely dependent on the value of the UUID.

 
In any event, it is unlikely the planner would pick a different plan were it to correctly understand the selectivities, so no harm is done.  Although it is easy to imagine similar queries where that would not be the case.

Cheers,

Jeff

pgsql-general by date:

Previous
From: Rad Cirskis
Date:
Subject: pgxs question - linking c-functions to external libraries
Next
From: Craig Ringer
Date:
Subject: Re: Sample database with difficult SQL questions