Hash Join performance - Mailing list pgsql-performance

From Vamsidhar Thummala
Subject Hash Join performance
Date
Msg-id e0e3da5e0903131415k38dc870bo15c6ad2625cd922b@mail.gmail.com
Whole thread Raw
Responses Re: Hash Join performance  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance

From the documentation, I understand that range of actual time represents the time taken for retrieving the first result and the last result respectively. However, the following output of explain analyze confuses me:
GroupAggregate  (cost=632185.58..632525.55 rows=122884 width=57) (actual time=187382.499..187383.241 rows=57 loops=1)
-> Sort (cost=632185.58..632201.78 rows=122884 width=57) (actual time=187167.792..187167.905 rows=399 loops=1)
Sort Key: orders.o_totalprice, orders.o_orderdate, customer.c_name, customer.c_custkey, orders.o_orderkey
-> Hash Join (cost=399316.78..629388.21 rows=122884 width=57) (actual time=122805.133..186107.210 rows=399 loops=1)
Hash Cond: (public.lineitem.l_orderkey = orders.o_orderkey)
-> Seq Scan on lineitem (cost=0.00..163912.71 rows=6000742 width=14) (actual time=0.022..53597.555 rows=6001215 loops=1)
-> Hash (cost=398960.15..398960.15 rows=30713 width=51) (actual time=112439.592..112439.592 rows=57 loops=1)
-> Hash Join (cost=369865.37..398960.15 rows=30713 width=51) (actual time=80638.283..111855.510 rows=57 loops=1)
Hash Cond: (orders.o_custkey = customer.c_custkey)
-> Nested Loop (cost=364201.67..391753.70 rows=30713 width=29) (actual time=75598.246..107760.054 rows=57 loops=1)
-> GroupAggregate (cost=364201.67..366634.97 rows=30713 width=14) (actual time=75427.115..96167.167 rows=57 loops=1)
Filter: (sum(l_quantity) > 300::numeric)
-> Sort (cost=364201.67..364992.54 rows=6000742 width=14) (actual time=74987.112..86289.063 rows=6001215 loops=1)
Sort Key: public.lineitem.l_orderkey
-> Seq Scan on lineitem (cost=0.00..163912.71 rows=6000742 width=14) (actual time=0.006..51153.880 rows=6001215 loops=1)
-> Index Scan using orders_pkey on orders (cost=0.00..0.81 rows=1 width=25) (actual time=169.485..173.006 rows=1 loops=57)
Index Cond: (orders.o_orderkey = "IN_subquery".l_orderkey)
-> Hash (cost=4360.96..4360.96 rows=150072 width=26) (actual time=998.378..998.378 rows=150000 loops=1)
-> Seq Scan on customer (cost=0.00..4360.96 rows=150072 width=26) (actual time=8.188..883.778 rows=150000 loops=1)
Total runtime: 187644.927 ms
(20 rows)

My settings: Memory - 1GB, Data size - 1GB, Lineitem ~ 650MB, shared_buffers: 200MB, work_mem: 1MB.
PostgreSQL version: 8.2, OS: Sun Solaris 10u4

Query: TPC-H 18, Large Volume Customer Query
Questions:

1) The actual time on Seq Scan on Lineitem shows that the first record is fetched at time 0.022ms and the last record is fetched at 53.5s. Does it mean the sequential scan is completed with-in first 53.4s (absolute time)? Or does it mean that sequential scan started at 112.43s (after build phase of Hash Join) and finished at 165.93s (112.43 + 53.5s)? My understanding is that former is true. If so, the sequential scan has to fetched around 6M records (~650MB) ahead of build phase of Hash Join, which seems surprising. Is this called prefetching at DB level? Where does the DB hold all these records? Definitely, it can't hold in shared_buffers since it's only 200MB.

2) Why is the Hash Join (top most) so slow? The hash is build over the output of subplan which produces 57 records (~20kb). We can assume that these 57 records fit into work_mem. Now, the Hash Join is producing first record at 122.8s where as the Hash build is completed at 112.4s (10.4s difference. I have seen in some cases, this gap is even worse). Also, the total time for Hash Join is 63.3s, which seems too high given that Lineitem is already in the buffer. What is happening over here?

Appreciate your help!

Regards,
~Vamsi

pgsql-performance by date:

Previous
From: "Jignesh K. Shah"
Date:
Subject: Re: Proposal of tunable fix for scalability of 8.4
Next
From: Tom Lane
Date:
Subject: Re: Hash Join performance