Re: [ADMIN] question on hash joins - Mailing list pgsql-admin

From Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]
Subject Re: [ADMIN] question on hash joins
Date
Msg-id E46ADDFD-C631-441C-9859-7A956CFBCACE@nasa.gov
Whole thread Raw
In response to Re: [ADMIN] question on hash joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [ADMIN] question on hash joins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-admin
Thanks Tom,

Given that the hash would only contain keys and values needed for supporting
the query I am having a hard time understanding why I am exceeding the 
10 GB temp_file_limit.


In my case the hash estimates 660888 rows from table1
and I verified that is accurate.

In theory, the hash only contains a bigint key, bigint value, and some overhead.
But even if I assume 50 bytes per row that gives a total size of
33,044,400 bytes for the hash which is way below the 10 GB limit.

What could be causing the temp_file growth?


select granuleid
from table2
where granuleid in (select granuleid from table1 where collectionid = 22467);
ERROR:  temporary file size exceeds temp_file_limit (10485760kB)

explain select granuleid
from table2
where granuleid in (select granuleid from table1 where collectionid = 22467);
          QUERY PLAN                                                       
 

------------------------------------------------------------------------------------------------------------------------Hash
Join (cost=597264.56..18687640.08 rows=644348762 width=8)  Hash Cond: (table2.granuleid = table1.granuleid)  ->  Seq
Scanon table2  (cost=0.00..10598010.62 rows=644348762 width=8)  ->  Hash  (cost=586509.04..586509.04 rows=655562
width=8)       ->  Index Only Scan using idx_tabl1 on table1  (cost=0.57..586509.04 rows=655562 width=8)
IndexCond: (collectionid = '22467'::bigint)
 
(6 rows)


Bob


On 10/19/17, 10:14 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
   "Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@nasa.gov> writes:   > Sorry if I am being
dense,but I still have a question…   > Is it possible for me to estimate the size of the hash and a value for    > the
temp_file_limitsetting using information in the explain plan?      Well, it'd be (row_overhead + data_width) *
number_of_rows.     Poking around in the source code, it looks like the row_overhead in   a tuplestore temp file is 10
bytes(can be more if you have nulls in   the data).  Your example seemed to be storing one bigint column,   so
data_widthis 8 bytes.  data_width can be a fairly squishy thing   to estimate if the data being passed through the join
involvesvariable-   width columns, but the planner's number is usually an OK place to start.      > For example, one
possibilityis that the hash contains the entire tuple for each   > matching row.      No, it's just the columns that
needto be used in or passed through the   join.  If you want to be clear about this you can use EXPLAIN VERBOSE   and
checkwhat columns are emitted by the plan node just below the hash.                  regards, tom lane   
 


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: [ADMIN] question on hash joins
Next
From: Tom Lane
Date:
Subject: Re: [ADMIN] question on hash joins