Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers

From David Rowley
Subject Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date
Msg-id CAApHDvqu70VQFEz28GKgHWFXvXLQBdujZ5RuEzfroPh6+eH2QA@mail.gmail.com
Whole thread Raw
In response to Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Zhihong Yu <zyu@yugabyte.com>)
Responses Re: Hybrid Hash/Nested Loop joins and caching results from subplans
List pgsql-hackers
On Sat, 5 Dec 2020 at 16:51, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> There are two blocks with almost identical code (second occurrence in cache_store_tuple):
>
> +   if (rcstate->mem_used > rcstate->mem_upperlimit)
> +   {
> It would be nice if the code can be extracted to a method and shared.

It's true, they're *almost* identical.  I quite like the fact that one
of the cases can have an unlikely() macro in there. It's pretty
unlikely that we'd go into cache overflow just when adding a new cache
entry. work_mem would likely have to be set to a couple of dozen bytes
for that to happen. 64k is the lowest it can be set.  However, I
didn't really check to see if having that unlikely() macro increases
performance.  I've changed things locally here to add a new function
named cache_check_mem(). I'll keep that for now, but I'm not sure if
there's enough code there to warrant a function. The majority of the
additional lines are from the comment being duplicated.

>                     node->rc_status = RC_END_OF_SCAN;
>                     return NULL;
>                 }
>                 else
>
> There are several places where the else keyword for else block can be omitted because the if block ends with return.
> This would allow the code in else block to move leftward (for easier reading).

OK, I've removed the "else" in places where it can be removed.

>        if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
>
> I noticed that right_hashfn isn't used. Would this cause some warning from the compiler (for some compiler the
warningwould be treated as error) ?
 
> Maybe NULL can be passed as the last parameter. The return value of get_op_hash_functions would keep the current
meaning(find both hash fn's).
 

It's fine not to use output parameters.  It's not the only one in the
code base ignoring one from that very function. See
execTuplesHashPrepare().

>     rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
>
> Maybe (in subsequent patch) GUC variable can be introduced for tuning the constant 0.98.

I don't think exposing such knobs for users to adjust is a good idea.
Can you think of a case where you'd want to change it? Or do you think
98% is not a good number?

>
> For +paraminfo_get_equal_hashops :
>
> +       else
> +           Assert(false);

I'm keen to leave it like it is.  I don't see any need to bloat the
compiled code with an elog(ERROR).

There's a comment in RelOptInfo.lateral_vars that mentions:

/* LATERAL Vars and PHVs referenced by rel */

So, if anyone, in the future, wants to add some other node type to
that list then they'll have a bit more work to do. Plus, I'm only
doing the same as what's done in create_lateral_join_info().

I'll run the updated patch which includes the cache_check_mem()
function for a bit and post an updated patch to the main thread a bit
later.

Thanks for having a look at this patch.

David



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Single transaction in the tablesync worker?
Next
From: Zhihong Yu
Date:
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans