Hou-san,
On Mon, Jun 7, 2021 at 8:38 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
> Thanks for the patch and explanation, I think you are right that it’s better add
> the rechecking-cached-offset code directly in get_partiiton_for_tuple().
>
> And now, I think maybe it's time to try to optimize the performance.
> Currently, if every row to be inserted in a statement belongs to different
> partition, then the cache check code will bring a slight performance
> degradation(AFAICS: 2% ~ 4%).
>
> So, If we want to solve this, then we may need 1) a reloption to let user control whether use the cache.
> Or, 2) introduce some simple strategy to control whether use cache automatically.
>
> I have not write a patch about 1) reloption, because I think it will be nice if we can
> enable this cache feature by default. So, I attached a WIP patch about approach 2).
>
> The rough idea is to check the average batch number every 1000 rows.
> If the average batch num is greater than 1, then we enable the cache check,
> if not, disable cache check. This is similar to what 0d5f05cde0 did.
Thanks for sharing the idea and writing a patch for it.
I considered a simpler heuristic where we enable/disable caching of a
given offset if it is found by the binary search algorithm at least N
consecutive times. But your idea to check the ratio of the number of
tuples inserted over partition/bound offset changes every N tuples
inserted may be more adaptive.
Please find attached a revised version of your patch, where I tried to
make it a bit easier to follow, hopefully. While doing so, I realized
that caching the bound offset across queries makes little sense now,
so I decided to keep the changes confined to execPartition.c. Do you
have a counter-argument to that?
--
Amit Langote
EDB: http://www.enterprisedb.com