Thread: Hybrid Hash/Nested Loop joins and caching results from subplans
Hi, David:
For nodeResultCache.c :
+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
I think it would be safer if the comparison is enclosed in parentheses (in case the macro appears in composite condition).
+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+ const ResultCacheKey *key2)
+ const ResultCacheKey *key2)
Since key2 is not used, maybe name it unused_key ?
+ /* Make a guess at a good size when we're not given a valid size. */
+ if (size == 0)
+ size = 1024;
+ if (size == 0)
+ size = 1024;
Should the default size be logged ?
+ /* Update the memory accounting */
+ rcstate->mem_used -= freed_mem;
+ rcstate->mem_used -= freed_mem;
Maybe add an assertion that mem_used is >= 0 after the decrement (there is an assertion in remove_cache_entry however, that assertion is after another decrement).
+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * entry which the key belongs to is removed from the cache.
+ * entry which the key belongs to is removed from the cache.
duplicate entry (one at the end of first line and one at the beginning of second line).
For cache_lookup(), new key is allocated before checking whether rcstate->mem_used > rcstate->mem_upperlimit. It seems new entries should probably have the same size.
Can we check whether upper limit is crossed (assuming the addition of new entry) before allocating new entry ?
+ if (unlikely(!cache_reduce_memory(rcstate, key)))
+ return NULL;
+ return NULL;
Does the new entry need to be released in the above case?
Cheers
There are two blocks with almost identical code (second occurrence in cache_store_tuple):
+ if (rcstate->mem_used > rcstate->mem_upperlimit)
+ {
+ {
It would be nice if the code can be extracted to a method and shared.
node->rc_status = RC_END_OF_SCAN;
return NULL;
}
else
return NULL;
}
else
There are several places where the else keyword for else block can be omitted because the if block ends with return.
This would allow the code in else block to move leftward (for easier reading).
if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
I noticed that right_hashfn isn't used. Would this cause some warning from the compiler (for some compiler the warning would be treated as error) ?
Maybe NULL can be passed as the last parameter. The return value of get_op_hash_functions would keep the current meaning (find both hash fn's).
rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
Maybe (in subsequent patch) GUC variable can be introduced for tuning the constant 0.98.
For +paraminfo_get_equal_hashops :
+ else
+ Assert(false);
+ Assert(false);
Add elog would be good for debugging.
Cheers
On Fri, Dec 4, 2020 at 5:09 PM Zhihong Yu <zyu@yugabyte.com> wrote:
Hi, David:For nodeResultCache.c :+#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0I think it would be safer if the comparison is enclosed in parentheses (in case the macro appears in composite condition).+ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
+ const ResultCacheKey *key2)Since key2 is not used, maybe name it unused_key ?+ /* Make a guess at a good size when we're not given a valid size. */
+ if (size == 0)
+ size = 1024;Should the default size be logged ?+ /* Update the memory accounting */
+ rcstate->mem_used -= freed_mem;Maybe add an assertion that mem_used is >= 0 after the decrement (there is an assertion in remove_cache_entry however, that assertion is after another decrement).+ * 'specialkey', if not NULL, causes the function to return false if the entry
+ * entry which the key belongs to is removed from the cache.duplicate entry (one at the end of first line and one at the beginning of second line).For cache_lookup(), new key is allocated before checking whether rcstate->mem_used > rcstate->mem_upperlimit. It seems new entries should probably have the same size.Can we check whether upper limit is crossed (assuming the addition of new entry) before allocating new entry ?+ if (unlikely(!cache_reduce_memory(rcstate, key)))
+ return NULL;Does the new entry need to be released in the above case?Cheers
Thanks for having a look at this. On Sat, 5 Dec 2020 at 14:08, Zhihong Yu <zyu@yugabyte.com> wrote: > +#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0 > > I think it would be safer if the comparison is enclosed in parentheses (in case the macro appears in composite condition). That seems fair. Likely it might be nicer if simplehash.h played it safe and put usages of the macro in additional parenthesis. I see a bit of a mix of other places where we #define SH_EQUAL. It looks like the one in execGrouping.c and tidbitmap.c are also missing the additional parenthesis. > +ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1, > + const ResultCacheKey *key2) > > Since key2 is not used, maybe name it unused_key ? I'm not so sure it's a great change. The only place where people see that is in the same area that mentions " 'key2' is never used" > + /* Make a guess at a good size when we're not given a valid size. */ > + if (size == 0) > + size = 1024; > > Should the default size be logged ? I'm not too sure if I know what you mean here. Should it be a power of 2? It is. Or do you mean I shouldn't use a magic number? > + /* Update the memory accounting */ > + rcstate->mem_used -= freed_mem; > > Maybe add an assertion that mem_used is >= 0 after the decrement (there is an assertion in remove_cache_entry however,that assertion is after another decrement). Good idea. > + * 'specialkey', if not NULL, causes the function to return false if the entry > + * entry which the key belongs to is removed from the cache. > > duplicate entry (one at the end of first line and one at the beginning of second line). Well spotted. > For cache_lookup(), new key is allocated before checking whether rcstate->mem_used > rcstate->mem_upperlimit. It seemsnew entries should probably have the same size. > Can we check whether upper limit is crossed (assuming the addition of new entry) before allocating new entry ? I'd like to leave this as is. If we were to check if we've gone over memory budget before the resultcache_insert() then we're doing a memory check even for cache hits. That's additional effort. I'd prefer only to check if we've gone over the memory budget in cases where we've actually allocated more memory. In each case we can go one allocation over budget, so I don't see what doing the check beforehand gives us. > + if (unlikely(!cache_reduce_memory(rcstate, key))) > + return NULL; > > Does the new entry need to be released in the above case? No. cache_reduce_memory returning false will have removed "key" from the cache. I'll post an updated patch on the main thread once I've looked at your followup review. David
On Sat, 5 Dec 2020 at 16:51, Zhihong Yu <zyu@yugabyte.com> wrote: > > There are two blocks with almost identical code (second occurrence in cache_store_tuple): > > + if (rcstate->mem_used > rcstate->mem_upperlimit) > + { > It would be nice if the code can be extracted to a method and shared. It's true, they're *almost* identical. I quite like the fact that one of the cases can have an unlikely() macro in there. It's pretty unlikely that we'd go into cache overflow just when adding a new cache entry. work_mem would likely have to be set to a couple of dozen bytes for that to happen. 64k is the lowest it can be set. However, I didn't really check to see if having that unlikely() macro increases performance. I've changed things locally here to add a new function named cache_check_mem(). I'll keep that for now, but I'm not sure if there's enough code there to warrant a function. The majority of the additional lines are from the comment being duplicated. > node->rc_status = RC_END_OF_SCAN; > return NULL; > } > else > > There are several places where the else keyword for else block can be omitted because the if block ends with return. > This would allow the code in else block to move leftward (for easier reading). OK, I've removed the "else" in places where it can be removed. > if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn)) > > I noticed that right_hashfn isn't used. Would this cause some warning from the compiler (for some compiler the warningwould be treated as error) ? > Maybe NULL can be passed as the last parameter. The return value of get_op_hash_functions would keep the current meaning(find both hash fn's). It's fine not to use output parameters. It's not the only one in the code base ignoring one from that very function. See execTuplesHashPrepare(). > rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98; > > Maybe (in subsequent patch) GUC variable can be introduced for tuning the constant 0.98. I don't think exposing such knobs for users to adjust is a good idea. Can you think of a case where you'd want to change it? Or do you think 98% is not a good number? > > For +paraminfo_get_equal_hashops : > > + else > + Assert(false); I'm keen to leave it like it is. I don't see any need to bloat the compiled code with an elog(ERROR). There's a comment in RelOptInfo.lateral_vars that mentions: /* LATERAL Vars and PHVs referenced by rel */ So, if anyone, in the future, wants to add some other node type to that list then they'll have a bit more work to do. Plus, I'm only doing the same as what's done in create_lateral_join_info(). I'll run the updated patch which includes the cache_check_mem() function for a bit and post an updated patch to the main thread a bit later. Thanks for having a look at this patch. David
> + /* Make a guess at a good size when we're not given a valid size. */
> + if (size == 0)
> + size = 1024;
>
> Should the default size be logged ?
> I'm not too sure if I know what you mean here. Should it be a power of
> 2? It is. Or do you mean I shouldn't use a magic number?
> + if (size == 0)
> + size = 1024;
>
> Should the default size be logged ?
> I'm not too sure if I know what you mean here. Should it be a power of
> 2? It is. Or do you mean I shouldn't use a magic number?
Using 1024 seems to be fine. I meant logging the default value of 1024 so that user / dev can have better idea the actual value used (without looking at the code).
>> Or do you think 98% is not a good number?
Since you have played with Result Cache, I would trust your judgment in choosing the percentage.
It is fine not to expose this constant until the need arises.
Cheers
On Sun, Dec 6, 2020 at 5:15 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Sat, 5 Dec 2020 at 16:51, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> There are two blocks with almost identical code (second occurrence in cache_store_tuple):
>
> + if (rcstate->mem_used > rcstate->mem_upperlimit)
> + {
> It would be nice if the code can be extracted to a method and shared.
It's true, they're *almost* identical. I quite like the fact that one
of the cases can have an unlikely() macro in there. It's pretty
unlikely that we'd go into cache overflow just when adding a new cache
entry. work_mem would likely have to be set to a couple of dozen bytes
for that to happen. 64k is the lowest it can be set. However, I
didn't really check to see if having that unlikely() macro increases
performance. I've changed things locally here to add a new function
named cache_check_mem(). I'll keep that for now, but I'm not sure if
there's enough code there to warrant a function. The majority of the
additional lines are from the comment being duplicated.
> node->rc_status = RC_END_OF_SCAN;
> return NULL;
> }
> else
>
> There are several places where the else keyword for else block can be omitted because the if block ends with return.
> This would allow the code in else block to move leftward (for easier reading).
OK, I've removed the "else" in places where it can be removed.
> if (!get_op_hash_functions(hashop, &left_hashfn, &right_hashfn))
>
> I noticed that right_hashfn isn't used. Would this cause some warning from the compiler (for some compiler the warning would be treated as error) ?
> Maybe NULL can be passed as the last parameter. The return value of get_op_hash_functions would keep the current meaning (find both hash fn's).
It's fine not to use output parameters. It's not the only one in the
code base ignoring one from that very function. See
execTuplesHashPrepare().
> rcstate->mem_lowerlimit = rcstate->mem_upperlimit * 0.98;
>
> Maybe (in subsequent patch) GUC variable can be introduced for tuning the constant 0.98.
I don't think exposing such knobs for users to adjust is a good idea.
Can you think of a case where you'd want to change it? Or do you think
98% is not a good number?
>
> For +paraminfo_get_equal_hashops :
>
> + else
> + Assert(false);
I'm keen to leave it like it is. I don't see any need to bloat the
compiled code with an elog(ERROR).
There's a comment in RelOptInfo.lateral_vars that mentions:
/* LATERAL Vars and PHVs referenced by rel */
So, if anyone, in the future, wants to add some other node type to
that list then they'll have a bit more work to do. Plus, I'm only
doing the same as what's done in create_lateral_join_info().
I'll run the updated patch which includes the cache_check_mem()
function for a bit and post an updated patch to the main thread a bit
later.
Thanks for having a look at this patch.
David
On Mon, 7 Dec 2020 at 14:25, Zhihong Yu <zyu@yugabyte.com> wrote: > > > + /* Make a guess at a good size when we're not given a valid size. */ > > + if (size == 0) > > + size = 1024; > > > > Should the default size be logged ? > > > I'm not too sure if I know what you mean here. Should it be a power of > > 2? It is. Or do you mean I shouldn't use a magic number? > > Using 1024 seems to be fine. I meant logging the default value of 1024 so that user / dev can have better idea the actualvalue used (without looking at the code). Oh, right. In EXPLAIN ANALYZE. Good point. I wonder if that's going to be interesting enough to show. > >> Or do you think 98% is not a good number? > > Since you have played with Result Cache, I would trust your judgment in choosing the percentage. > It is fine not to expose this constant until the need arises. I did some experimentation with different values on a workload that never gets a cache hit. and just always evicts the oldest entry. There's only very slight changes in performance between 90%, 98% and 100% with 1MB work_mem. times in milliseconds measured over 60 seconds on each run. 90% 98% 100% run1 2318 2339 2344 run2 2339 2333 2309 run3 2357 2339 2346 avg (ms) 2338 2337 2333 Perhaps this is an argument for just removing the logic that has the soft upper limit and just have it do cache evictions after each allocation after the cache first fills. Setup: same tables as [1] alter table hundredk alter column hundredk set (n_distinct = 10); analyze hundredk; alter system set work_mem = '1MB'; select pg_reload_conf(); Query select count(*) from hundredk hk inner join lookup l on hk.hundredk = l.a; David [1] https://www.postgresql.org/message-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com
>> just removing the logic that has the
soft upper limit and just have it do cache evictions after each
allocation after the cache first fills
soft upper limit and just have it do cache evictions after each
allocation after the cache first fills
Yeah - having one fewer limit would simplify the code.
Cheers
On Mon, Dec 7, 2020 at 5:27 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 7 Dec 2020 at 14:25, Zhihong Yu <zyu@yugabyte.com> wrote:
>
> > + /* Make a guess at a good size when we're not given a valid size. */
> > + if (size == 0)
> > + size = 1024;
> >
> > Should the default size be logged ?
>
> > I'm not too sure if I know what you mean here. Should it be a power of
> > 2? It is. Or do you mean I shouldn't use a magic number?
>
> Using 1024 seems to be fine. I meant logging the default value of 1024 so that user / dev can have better idea the actual value used (without looking at the code).
Oh, right. In EXPLAIN ANALYZE. Good point. I wonder if that's going
to be interesting enough to show.
> >> Or do you think 98% is not a good number?
>
> Since you have played with Result Cache, I would trust your judgment in choosing the percentage.
> It is fine not to expose this constant until the need arises.
I did some experimentation with different values on a workload that
never gets a cache hit. and just always evicts the oldest entry.
There's only very slight changes in performance between 90%, 98% and
100% with 1MB work_mem.
times in milliseconds measured over 60 seconds on each run.
90% 98% 100%
run1 2318 2339 2344
run2 2339 2333 2309
run3 2357 2339 2346
avg (ms) 2338 2337 2333
Perhaps this is an argument for just removing the logic that has the
soft upper limit and just have it do cache evictions after each
allocation after the cache first fills.
Setup: same tables as [1]
alter table hundredk alter column hundredk set (n_distinct = 10);
analyze hundredk;
alter system set work_mem = '1MB';
select pg_reload_conf();
Query
select count(*) from hundredk hk inner join lookup l on hk.hundredk = l.a;
David
[1] https://www.postgresql.org/message-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com