[ADMIN] question on hash joins - Mailing list pgsql-admin

From Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]
Subject [ADMIN] question on hash joins
Date
Msg-id 63F14C4C-1518-4EAA-9EBE-87222988F073@nasa.gov
Whole thread Raw
Responses Re: [ADMIN] question on hash joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-admin

Hi All,

 

I have a question about hash joins and the meaning of the width in the explain output for a query.

 

I have two large tables table1 has 1 million rows and table2 has 600 million rows

 

When I try to join these two tables based using one constraint on table1

(which reduces the candidate rows down to 660,888) the optimizer seems to correctly

choose to hash the values from table1.

 

 

explain select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;

                                                          QUERY PLAN                                                          

-------------------------------------------------------------------------------------------------------------------------------

Aggregate  (cost=18200480.80..18200480.81 rows=1 width=8)

   ->  Hash Join  (cost=103206.82..18190602.43 rows=3951347 width=0)

         Hash Cond: (x.granuleid = g.granuleid)

         ->  Seq Scan on table2 x  (cost=0.00..10596253.01 rows=644241901 width=8)

         ->  Hash  (cost=92363.72..92363.72 rows=660888 width=8)

               ->  Index Only Scan using idx_table1 on table1 g  (cost=0.57..92363.72 rows=660888 width=8)

                     Index Cond: (collectionid = '22467'::bigint)

(7 rows)

 

 

 

My question is, what gets put into the Hash?

I assume the with "width=8" must refer to the size of the key.

Does the entire row get copied to the hash as the corresponding value?

 

 

The reason I ask is because, when I try to run the query it fails due to

temp file use over 10GB.  How do I accurately determine the amount of memory

that will be used.

 

select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;

ERROR:  temporary file size exceeds temp_file_limit (10485760 kB)

 

 

Thanks in advance,

Bob

pgsql-admin by date:

Previous
From: JaeWon Lee
Date:
Subject: Re: [ADMIN] .pgpass not working ( centos7, pgagent_96 )
Next
From: Tom Lane
Date:
Subject: Re: [ADMIN] question on hash joins