Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Hybrid Hash/Nested Loop joins and caching results from subplans |
Date | |
Msg-id | CAApHDvqu70VQFEz28GKgHWFXvXLQBdujZ5RuEzfroPh6+eH2QA@mail.gmail.com Whole thread Raw |
In response to | Re: Hybrid Hash/Nested Loop joins and caching results from subplans (Zhihong Yu <zyu@yugabyte.com>) |
Responses |
Re: Hybrid Hash/Nested Loop joins and caching results from subplans
|
List | pgsql-hackers |
On Sat, 5 Dec 2020 at 16:51, Zhihong Yu <zyu@yugabyte.com> wrote: > > There are two blocks with almost identical code (second occurrence in cache_store_tuple): > > + if (rcstate->mem_used > rcstate->mem_upperlimit) > + { > It would be nice if the code can be extracted to a method and shared. It's true, they're *almost* identical. I quite like the fact that one of the cases can have an unlikely() macro in there. It's pretty unlikely that we'd go into cache overflow just when adding a new cache entry. work_mem would likely have to be set to a couple of dozen bytes for that to happen. 64k is the lowest it can be set. However, I didn't really check to see if having that unlikely() macro increases performance. I've changed things locally here to add a new function named cache_check_mem(). I'll keep that for now, but I'm not sure if there's enough code there to warrant a function. The majority of the additional lines are from the comment being duplicated. > node->rc_status = RC_END_OF_SCAN; > return NULL; > } > else > > There are several places where the else keyword for else block can be omitted because the if block ends with return. > This would allow the code in else block to move leftward (for easier reading). OK, I've removed the "else" in places where it can be removed. > if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn)) > > I noticed that right_hashfn isn't used. Would this cause some warning from the compiler (for some compiler the warningwould be treated as error) ? > Maybe NULL can be passed as the last parameter. The return value of get_op_hash_functions would keep the current meaning(find both hash fn's). It's fine not to use output parameters. It's not the only one in the code base ignoring one from that very function. See execTuplesHashPrepare(). > rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98; > > Maybe (in subsequent patch) GUC variable can be introduced for tuning the constant 0.98. I don't think exposing such knobs for users to adjust is a good idea. Can you think of a case where you'd want to change it? Or do you think 98% is not a good number? > > For +paraminfo_get_equal_hashops : > > + else > + Assert(false); I'm keen to leave it like it is. I don't see any need to bloat the compiled code with an elog(ERROR). There's a comment in RelOptInfo.lateral_vars that mentions: /* LATERAL Vars and PHVs referenced by rel */ So, if anyone, in the future, wants to add some other node type to that list then they'll have a bit more work to do. Plus, I'm only doing the same as what's done in create_lateral_join_info(). I'll run the updated patch which includes the cache_check_mem() function for a bit and post an updated patch to the main thread a bit later. Thanks for having a look at this patch. David
pgsql-hackers by date: