Re: [HACKERS] Parallel Hash take II - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] Parallel Hash take II |
Date | |
Msg-id | 20171221084949.k36qsg4t4xpkr3rh@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] Parallel Hash take II (Thomas Munro <thomas.munro@enterprisedb.com>) |
List | pgsql-hackers |
Hi, > > Not really happy with the name. ExecParallelHashTableInsert() calling > > ExecParallelHashLoadTuple() to insert a tuple into the hashtable doesn't > > quite strike me as right; the naming similarity to > > ExecParallelHashTableLoad seems problematic too. > > ExecParallelHashAllocTuple() or such? > > > > One could argue it'd not be a bad idea to keep a similar split as > > dense_alloc() and memcpy() have, but I'm not really convinced by > > myself. Hm. > > Yeah, the names are confusing. So: Cool. > > Just to make sure I understand: The "empty" phase is to protect the > > process of the electee doing the sizing calculations etc? And the > > reason that's not possible to do just by waiting for > > PHJ_GROW_BATCHES_REPARTITIONING is that somebody could dynamically > > arrive, i.e. it'd not be needed in a statically sized barrier? > > Yeah, it's where you wait for the serial phase above to be finished. > [ Explanation ] Does that make sense? Yes. > > > + /* reset temp memory each time to avoid leaks from qual expr */ > > + ResetExprContext(econtext); > > + > > + if (ExecQual(hjclauses, econtext)) > > > > I personally think it's better to avoid this pattern and store the > > result of the ExecQual() in a variable, ResetExprContext() and then act > > on the result. No need to keep memory around for longer, and for bigger > > contexts you're more likely to have all the metadata in cache. > > > > I'd previously thought about introducing ExecQualAndReset() or such... > > Makes sense, but this is code that is identical in > ExecScanHashBucket() so I think we should keep it this way for now, > and explore expression context lifetime improvements in a separate > patch? Looks like the same change could be made in other nodes too. Ok. > > + > > + /* > > + * If we started up so late that the shared batches have been freed > > + * already by ExecHashTableDetach(), then we are finished. > > + */ > > + if (!DsaPointerIsValid(hashtable->parallel_state->batches)) > > + return false; > > > > This is really the only place that weird condition is detected? And why > > is that possible in the first place? And if possible, shouldn't we have > > detected earlier? Also, if possible, what prevents this to occur in a > > way that test fails, because pstate->batches is freed, but not yet > > reset? > > ExecParallelHashJoinNewBatch(), where this code appears, is generally > the place that we discover that we're finished. Normally we discover > that we're finished by seeing that there are no batches left to chew > on, by inspecting the per-batch state in shmem. This weird condition > arises when a worker starts up so late that the join is finished and > the shmem space used to tracks batches has already been freed. I > agree that that was badly explained and there was in fact something a > bit kooky about that coding. I have now changed it so that > ExecParallelHashEnsureBatchAccessors() detects this case and has a > better comment to explain it, and ExecParallelHashJoinNewBatch() now > just looks out for hashtable->batches == NULL with a comment referring > to the other place. Thanks for updating. My compiler complained that ExecHashJoinImpl() might not be inlinable. That's just because you declared it always_inline without actually making it an inline function... Pushed. Yeha! Congrats, this has been quite the project. I suspect we'll find a bunch of problems, both on the planning and execution side, but I think at this point we're much more likely to find and resolve these in-tree vs. out of tree. Btw, I see dsa_get_address() show up pretty prominently in profiles. I kinda wonder if there's some cases where we could ameliorate the cost by recognizing that a bunch of lookups are all going to reside in the same segment. - Andres
pgsql-hackers by date: