Re: optimize hashjoin - Mailing list pgsql-hackers

From Kirill Reshke
Subject Re: optimize hashjoin
Date
Msg-id CALdSSPiyBaLyn-Xt7JuwO_Ph0XeE1p6tRqbQmGGe6eZw6Rgj_g@mail.gmail.com
Whole thread Raw
List pgsql-hackers


On Thu, 22 Aug 2024 at 17:08, bucoo <bucoo@sohu.com> wrote:


> 0) The patch does not apply anymore, thanks to David committing a patch

> yesterday. Attached is a patch rebased on top of current master.

That patch is based on PG17. I have now rewritten it based on the master branch and added some comments.


> 1) Wouldn't it be easier (and just as efficient) to use slots with

> TTSOpsMinimalTuple, i.e. backed by a minimal tuple?

Use diffent kind of slot, the ExecEvalExpr function will report an error.


> 2) I find the naming of the new functions a bit confusing. We now have
> the "original" functions working with slots, and then also functions
> with "Tuple" working with tuples. Seems asymmetric.

In net patch function name renamed to ExecHashTableInsertSlot and ExecHashTableInsertTuple,

also ExecParallelHashTableInsertSlotCurrentBatch and ExecParallelHashTableInsertTupleCurrentBatch.


> 3) The code in ExecHashJoinOuterGetTuple is hard to understand, it'd
> very much benefit from some comments. I'm a bit unsure if the likely()
> and unlikely() hints really help.

In new patch added some comments.

"Likely" and "unlikely" might offer only marginal help on some CPUs and might not be beneficial at all on other platforms (I think).


> 4) Is the hj_outerTupleBuffer buffer really needed / effective? I'd bet
> just using palloc() will work just as well, thanks to the AllocSet
> caching etc.

The hj_outerTupleBuffer avoid reform(materialize) tuple in non-TTSOpsMinimalTuple scenarios,

see ExecForceStoreMinimalTuple. I added some comments in new patch.


> Can you provide more information about the benchmark you did? What
> hardware, what scale, PostgreSQL configuration, which of the 22 queries
> are improved, etc.
>
> I ran TPC-H with 1GB and 10GB scales on two machines, and I see pretty
> much no difference compared to master. However, it occurred to me the
> patch only ever helps if we increase the number of batches during
> execution, in which case we need to move tuples to the right batch.

Only parallel HashJoin speed up to ~2x(all data cached in memory),

not full query, include non-parallel HashJoin.

non-parallel HashJoin only when batchs large then one will speed up,

because this patch only optimize for read batchs tuples to memory.

Hi
 
likely/unlikely usage can be justified via benchmark. Parallel HashJoin speed up still also can be verified  via benchmark. Either benchmark script or benchmark result, and it is better to provide both.


--
Best regards,
Kirill Reshke

pgsql-hackers by date:

Previous
From: Andreas Karlsson
Date:
Subject: Re: Some questions about PostgreSQL’s design.
Next
From: Robert Haas
Date:
Subject: Re: On disable_cost