On Fri, 31 Mar 2023 14:06:11 +0200
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
> > [...]
> > >> Hmmm, not sure is WARNING is a good approach, but I don't have a better
> > >> idea at the moment.
> > >
> > > I stepped it down to NOTICE and added some more infos.
> > >
> > > [...]
> > > NOTICE: Growing number of hash batch to 32768 is exhausting allowed
> > > memory (137494656 > 2097152)
> > [...]
> >
> > OK, although NOTICE that may actually make it less useful - the default
> > level is WARNING, and regular users are unable to change the level. So
> > very few people will actually see these messages.
[...]
> Anyway, maybe this should be added in the light of next patch, balancing
> between increasing batches and allowed memory. The WARNING/LOG/NOTICE message
> could appears when we actually break memory rules because of some bad HJ
> situation.
So I did some more minor editions to the memory context patch and start working
on the balancing memory patch. Please, find in attachment the v4 patch set:
* 0001-v4-Describe-hybrid-hash-join-implementation.patch:
Adds documentation written by Melanie few years ago
* 0002-v4-Allocate-hash-batches-related-BufFile-in-a-dedicated.patch:
The batches' BufFile dedicated memory context patch
* 0003-v4-Add-some-debug-and-metrics.patch:
A pure debug patch I use to track memory in my various tests
* 0004-v4-Limit-BufFile-memory-explosion-with-bad-HashJoin.patch
A new and somewhat different version of the balancing memory patch, inspired
from Tomas work.
After rebasing Tomas' memory balancing patch, I did some memory measures
to answer some of my questions. Please, find in attachment the resulting charts
"HJ-HEAD.png" and "balancing-v3.png" to compare memory consumption between HEAD
and Tomas' patch. They shows an alternance of numbers before/after calling
ExecHashIncreaseNumBatches (see the debug patch). I didn't try to find the
exact last total peak of memory consumption during the join phase and before
all the BufFiles are destroyed. So the last number might be underestimated.
Looking at Tomas' patch, I was quite surprised to find that data+bufFile
actually didn't fill memory up to spaceAllowed before splitting the batches and
rising the memory limit. This is because the patch assume the building phase
consume inner and outer BufFiles equally, where only the inner side is really
allocated. That's why the peakMB value is wrong compared to actual bufFileMB
measured.
So I worked on the v4 patch were BufFile are accounted in spaceUsed. Moreover,
instead of rising the limit and splitting the batches in the same step, the
patch first rise the memory limit if needed, then split in a later call if we
have enough room. The "balancing-v4.png" chart shows the resulting memory
activity. We might need to discuss the proper balancing between memory
consumption and batches.
Note that the patch now log a message when breaking the work_mem. Eg.:
WARNING: Hash Join node must grow outside of work_mem
DETAIL: Rising memory limit from 4194304 to 6291456
HINT: You might need to ANALYZE your table or tune its statistics collection.
Regards,