Thread: Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)

Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)

From

"Mario Weilguni"

Date:

28 November 2005, 20:28:35

<p><font size="2">If the query runs slow it will be not such a problem, but I was very concerned about other queries
havingthis problem too - without knowing it. I've already rewritten the query to use IN instead of exists.<br /><br />
I'llcompile again and try it again.<br /><br /> Thanks alot!<br /><br /> Best regards,<br /> Mario Weilguni<br /><br
/><br/><br /> -----Ursprüngliche Nachricht-----<br /> Von: Tom Lane [<a
href="mailto:tgl@sss.pgh.pa.us">mailto:tgl@sss.pgh.pa.us</a>]<br/> Gesendet: Mo 28.11.2005 19:39<br /> An: Mario
Weilguni<br/> Cc: pgsql-hackers@postgresql.org<br /> Betreff: Hashjoin startup strategy (was Re: [HACKERS] Getting
differentnumber of results when using hashjoin on/off)<br /><br /> "Mario Weilguni"
<mario.weilguni@icomedias.com>writes:<br /> > Thanks for the quick response, I've tried the patch, but it did
notwork<br /> > as expected. When I set enable_hashjoin to off, everything works as<br /> > expected, but with
hashjoinon I do not even get results anymore, CPU is<br /> > going up to 100% and after 3 minutes I cancelled the
query(it normale<br /> > would take ~100-500 milliseconds).<br /><br /> Try letting it run longer.  I think your
expectationis tuned for the<br /> broken implementation (which runs the subqueries only once instead of<br /> 26k
times...)<br/><br /> The test case I developed for this failure in the regression database is<br /><br /> select
count(*)from tenk1 a<br /> where exists (select 1 from tenk1 b, tenk1 c<br />               where b.unique1=c.unique2
and<br/>               b.hundred in (4,5) and c.hundred=a.hundred+99);<br /><br /> 8.0 prefers a nestloop for the
subquery,and that plan runs in about<br /> 600 ms on my machine.  If forced to a hash join, it takes about 2450 ms.<br
/>8.1 prefers the hash join to start with, but takes 11300 ms to run it :-(<br /> (after the patch that is).<br /><br
/>The reason for the differential is that 8.1 guesses wrong about which<br /> subplan to cycle first: most of the time,
theinner plan is empty and<br /> so there's no need to pull any rows from the outer plan, but 8.1 pulls<br /> the first
rowfrom the outer plan anyway, and doing that 10000 times is<br /> what's eating the extra runtime.  It looks from your
previousmessage<br /> that similar things are happening with your data distribution, allowing<br /> 8.0 to run faster
foryou than 8.1 does.<br /><br /> Not sure if there's much we can do about this.  The presence of the<br /> upper-query
parameterin the subplan makes it difficult to derive any<br /> stats at all, let alone guess how often the subplan will
becompletely<br /> empty, so I'm not sure the planner can help.<br /><br /> For a query like this, where the hash join
isbeing done repeatedly,<br /> it might be useful for the executor itself to track how often each<br /> subplan has
beenseen to be empty.  In particular, the executor knows<br /> that the outer subplan is parameterless and therefore
shoulddeliver<br /> the same results each time (modulo volatile functions of course), so<br /> after the first cycle it
couldknow that there's no point in trying<br /> the early fetch on that side.  Dunno if this will be of wide enough<br
/>use to be worth implementing though --- in simple cases the join<br /> won't be rescanned and so the executor can't
help.<br/><br /> Anyone have any other ideas?<br /><br />                         regards, tom lane<br /><br /></font>

Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)

From

"Mario Weilguni"

Date:

29 November 2005, 08:05:19

Hello Tom,

I tried both patches on a different machine (but had to take the patches from cvs diff, cut'n paste from the
mail-programdid not work). Up until now, they work like a charm, correct results and fast. I will try on the other
machinethat failed yesterday in the afternoon, maybe it was just a problem with patching (or with the machine setup
itself,it's a hardened gentoo, but I doubt that). 

BTW: the difference of the second patch is really noticable, from 990ms down to 226ms.

Thanks for your quick response! I wish commercial vendors would act that fast :-)

Best regards,Mario Weilguni



icomedias - Digitale Kommunikation
------------------------------------------------------------------------
Mario Weilguni, Forschung und Entwicklung
mario.weilguni@icomedias.com, http://www.icomedias.com/

icomedias Österreich Systemhaus GmbH: 8020 Graz, Entenplatz 1  Tel: +43 (316) 721.671-272, Fax: -103

icomedias Deutschland Systemhaus GmbH: 10969 Berlin, Alexandrinenstraße 2-3 Tel: +49 (30) 695.399-272, Fax: -103

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, November 29, 2005 12:50 AM
To: Mario Weilguni; pgsql-hackers@postgresql.org
Subject: Re: Hashjoin startup strategy (was Re: [HACKERS] Getting different number of results when using hashjoin
on/off) 

I wrote:
> For a query like this, where the hash join is being done repeatedly,
> it might be useful for the executor itself to track how often each
> subplan has been seen to be empty.

I implemented a simple form of this, and it made 8.1 faster than 8.0
on the test case I was using.  Give it a try ...
        regards, tom lane


Index: src/backend/executor/nodeHashjoin.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/nodeHashjoin.c,v
retrieving revision 1.75.2.2
diff -c -r1.75.2.2 nodeHashjoin.c
*** src/backend/executor/nodeHashjoin.c    28 Nov 2005 17:14:47 -0000    1.75.2.2
--- src/backend/executor/nodeHashjoin.c    28 Nov 2005 23:41:28 -0000
***************
*** 120,135 ****          * since we aren't going to be able to skip the join on the strength          * of an empty
innerrelation anyway.)          *          * The only way to make the check is to try to fetch a tuple from the
* outer plan node.  If we succeed, we have to stash it away for later          * consumption by
ExecHashJoinOuterGetTuple.         */ 
!         if (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost ||
!             node->js.jointype == JOIN_LEFT)         {             node->hj_FirstOuterTupleSlot =
ExecProcNode(outerNode);            if (TupIsNull(node->hj_FirstOuterTupleSlot))                 return NULL;         }
       else             node->hj_FirstOuterTupleSlot = NULL; 
--- 120,147 ----          * since we aren't going to be able to skip the join on the strength          * of an empty
innerrelation anyway.)          * 
+          * If we are rescanning the join, we make use of information gained
+          * on the previous scan: don't bother to try the prefetch if the
+          * previous scan found the outer relation nonempty.  This is not
+          * 100% reliable since with new parameters the outer relation might
+          * yield different results, but it's a good heuristic.
+          *          * The only way to make the check is to try to fetch a tuple from the          * outer plan node.
Ifwe succeed, we have to stash it away for later          * consumption by ExecHashJoinOuterGetTuple.          */ 
!         if (node->js.jointype == JOIN_LEFT ||
!             (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
!              !node->hj_OuterNotEmpty))         {             node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
          if (TupIsNull(node->hj_FirstOuterTupleSlot)) 
+             {
+                 node->hj_OuterNotEmpty = false;                 return NULL;
+             }
+             else
+                 node->hj_OuterNotEmpty = true;         }         else             node->hj_FirstOuterTupleSlot =
NULL;
***************
*** 159,164 ****
--- 171,183 ----          * scanning the outer relation          */         hashtable->nbatch_outstart =
hashtable->nbatch;
+
+         /*
+          * Reset OuterNotEmpty for scan.  (It's OK if we fetched a tuple
+          * above, because ExecHashJoinOuterGetTuple will immediately
+          * set it again.)
+          */
+         node->hj_OuterNotEmpty = false;     }      /*
***************
*** 454,459 ****
--- 473,479 ----     hjstate->js.ps.ps_TupFromTlist = false;     hjstate->hj_NeedNewOuter = true;
hjstate->hj_MatchedOuter= false; 
+     hjstate->hj_OuterNotEmpty = false;      return hjstate; }
***************
*** 546,551 ****
--- 566,574 ----             *hashvalue = ExecHashGetHashValue(hashtable, econtext,
         hjstate->hj_OuterHashKeys);  
+             /* remember outer relation is not empty for possible rescan */
+             hjstate->hj_OuterNotEmpty = true;
+              return slot;         }
***************
*** 810,816 ****         if (node->hj_HashTable->nbatch == 1 &&             ((PlanState *) node)->righttree->chgParam
==NULL)         { 
!             /* okay to reuse the hash table; needn't rescan inner, either */         }         else         {
--- 833,851 ----         if (node->hj_HashTable->nbatch == 1 &&             ((PlanState *) node)->righttree->chgParam
==NULL)         { 
!             /*
!              * okay to reuse the hash table; needn't rescan inner, either.
!              *
!              * What we do need to do is reset our state about the emptiness
!              * of the outer relation, so that the new scan of the outer will
!              * update it correctly if it turns out to be empty this time.
!              * (There's no harm in clearing it now because ExecHashJoin won't
!              * need the info.  In the other cases, where the hash table
!              * doesn't exist or we are destroying it, we leave this state
!              * alone because ExecHashJoin will need it the first time
!              * through.)
!              */
!             node->hj_OuterNotEmpty = false;         }         else         {
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/nodes/execnodes.h,v
retrieving revision 1.139.2.2
diff -c -r1.139.2.2 execnodes.h
*** src/include/nodes/execnodes.h    22 Nov 2005 18:23:28 -0000    1.139.2.2
--- src/include/nodes/execnodes.h    28 Nov 2005 23:41:28 -0000
***************
*** 1101,1106 ****
--- 1101,1107 ----  *        hj_FirstOuterTupleSlot    first tuple retrieved from outer plan  *        hj_NeedNewOuter
         true if need new outer tuple on next call  *        hj_MatchedOuter            true if found a join match for
currentouter 
+  *        hj_OuterNotEmpty        true if outer relation known not empty  * ----------------  */
***************
*** 1125,1130 ****
--- 1126,1132 ----     TupleTableSlot *hj_FirstOuterTupleSlot;     bool        hj_NeedNewOuter;     bool
hj_MatchedOuter;
+     bool        hj_OuterNotEmpty; } HashJoinState;

Re: Hashjoin startup strategy (was Re: Getting different number of results when using hashjoin on/off)

From

Mario Weilguni

Date:

29 November 2005, 13:56:30

Am Dienstag, 29. November 2005 10:05 schrieb Mario Weilguni:
> Hello Tom,
>
> I tried both patches on a different machine (but had to take the patches
> from cvs diff, cut'n paste from the mail-program did not work). Up until
> now, they work like a charm, correct results and fast. I will try on the
> other machine that failed yesterday in the afternoon, maybe it was just a
> problem with patching (or with the machine setup itself, it's a hardened
> gentoo, but I doubt that).

I've taken diff's from cvs for both files, and now it works flawlessly on both 
machines, with good response times. 

I guess I must have done something wrong when copying the patch from your 
yesterday's mail.

Thanks!

Best regardsMario