Re: Hybrid Hash/Nested Loop joins and caching results from subplans - Mailing list pgsql-hackers

From David Rowley
Subject Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date
Msg-id CAApHDvpeTHnD=zQ41HKL829R7JCvxZRVyyRfNR5bexL-x=s6Nw@mail.gmail.com
Whole thread Raw
In response to Hybrid Hash/Nested Loop joins and caching results from subplans  (Zhihong Yu <zyu@yugabyte.com>)
List pgsql-hackers
Thanks for having a look at this.

On Sat, 5 Dec 2020 at 14:08, Zhihong Yu <zyu@yugabyte.com> wrote:
> +#define SH_EQUAL(tb, a, b) ResultCacheHash_equal(tb, a, b) == 0
>
> I think it would be safer if the comparison is enclosed in parentheses (in case the macro appears in composite
condition).

That seems fair.  Likely it might be nicer if simplehash.h played it
safe and put usages of the macro in additional parenthesis.  I see a
bit of a mix of other places where we #define SH_EQUAL. It looks like
the one in execGrouping.c and tidbitmap.c are also missing the
additional parenthesis.

> +ResultCacheHash_equal(struct resultcache_hash *tb, const ResultCacheKey *key1,
> +                     const ResultCacheKey *key2)
>
> Since key2 is not used, maybe name it unused_key ?

I'm not so sure it's a great change.  The only place where people see
that is in the same area that mentions " 'key2' is never used"

> +   /* Make a guess at a good size when we're not given a valid size. */
> +   if (size == 0)
> +       size = 1024;
>
> Should the default size be logged ?

I'm not too sure if I know what you mean here. Should it be a power of
2? It is.  Or do you mean I shouldn't use a magic number?

> +   /* Update the memory accounting */
> +   rcstate->mem_used -= freed_mem;
>
> Maybe add an assertion that mem_used is >= 0 after the decrement (there is an assertion in remove_cache_entry
however,that assertion is after another decrement).
 

Good idea.

> + * 'specialkey', if not NULL, causes the function to return false if the entry
> + * entry which the key belongs to is removed from the cache.
>
> duplicate entry (one at the end of first line and one at the beginning of second line).

Well spotted.

> For cache_lookup(), new key is allocated before checking whether rcstate->mem_used > rcstate->mem_upperlimit. It
seemsnew entries should probably have the same size.
 
> Can we check whether upper limit is crossed (assuming the addition of new entry) before allocating new entry ?

I'd like to leave this as is. If we were to check if we've gone over
memory budget before the resultcache_insert() then we're doing a
memory check even for cache hits. That's additional effort. I'd prefer
only to check if we've gone over the memory budget in cases where
we've actually allocated more memory.

In each case we can go one allocation over budget, so I don't see what
doing the check beforehand gives us.

> +       if (unlikely(!cache_reduce_memory(rcstate, key)))
> +           return NULL;
>
> Does the new entry need to be released in the above case?

No. cache_reduce_memory returning false will have removed "key" from the cache.

I'll post an updated patch on the main thread once I've looked at your
followup review.

David



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Proposed patch for key managment
Next
From: "Brian Davis"
Date:
Subject: Re: Consider parallel for lateral subqueries with limit