Re: accounting for memory used for BufFile during hash joins - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: accounting for memory used for BufFile during hash joins
Date
Msg-id CAAKRu_aUirDu2gud_jgqdV73KKycjNXY2GZ=2+FDMeorO6iTDQ@mail.gmail.com
Whole thread Raw
In response to Re: accounting for memory used for BufFile during hash joins  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: accounting for memory used for BufFile during hash joins  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers


On Mon, May 6, 2019 at 8:15 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Nope, that's not how it works. It's the array of batches that gets
sliced, not the batches themselves.

It does slightly increase the amount of data we need to shuffle between
the temp files, because we can't write the data directly to batches in
"future" slices. But that amplification is capped to ~2.2x (compared to
the ~1.4x in master) - I've shared some measurements in [1].

[1] https://www.postgresql.org/message-id/20190428141901.5dsbge2ka3rxmpk6%40development


Cool, I misunderstood. I looked at the code again today, and, at the email
thread where you measured "amplification".

In terms of how many times you write each tuple, is it accurate to say that a
tuple can now be spilled three times (in the worst case) whereas, before, it
could be spilled only twice?

1 - when building the inner side hashtable, tuple is spilled to a "slice" file
2 - (assuming the number of batches was increased) during execution, when a
tuple belonging to a later slice's spill file is found, it is re-spilled to that
slice's spill file
3 - during execution, when reading from its slice file, it is re-spilled (again)
to its batch's spill file

Is it correct that the max number of BufFile structs you will have is equal to
the number of slices + number of batches in a slice
because that is the max number of open BufFiles you would have at a time?

By the way, applying v4 patch on master, in an assert build, I am tripping some
asserts -- starting with
Assert(!file->readOnly);
in BufFileWrite

One thing I was a little confused by was the nbatch_inmemory member of the
hashtable.  The comment in ExecChooseHashTableSize says that it is determining
the number of batches we can fit in memory.  I thought that the problem was the
amount of space taken up by the BufFile data structure itself--which is related
to the number of open BufFiles you need at a time. This comment in
ExecChooseHashTableSize makes it sound like you are talking about fitting more
than one batch of tuples into memory at a time. I was under the impression that
you could only fit one batch of tuples in memory at a time.

So, I was stepping through the code with work_mem set to the lower bound, and in
ExecHashIncreaseNumBatches, I got confused.
hashtable->nbatch_inmemory was 2 for me, thus, nbatch_tmp was 2
so, I didn't meet this condition
if (nbatch_tmp > hashtable->nbatch_inmemory)
since I just set nbatch_tmp using hashtable->nbatch_inmemory
So, I didn't increase the number of slices, which is what I was expecting.
What happens when hashtable->nbatch_inmemory is equal to nbatch_tmp?

--
Melanie Plageman

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: vacuumdb and new VACUUM options
Next
From: Melanie Plageman
Date:
Subject: Re: accounting for memory used for BufFile during hash joins