Thread: small temp files

small temp files

From
Scott Ribe
Date:
What would be the cause of small temp files? For example:

temporary file: path "base/pgsql_tmp/pgsql_tmp1261242.83979", size 7452

work_mem is set to 128MB


--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/






Re: small temp files

From
Holger Jakobs
Date:
Am 22.07.24 um 14:50 schrieb Scott Ribe:
> What would be the cause of small temp files? For example:
>
> temporary file: path "base/pgsql_tmp/pgsql_tmp1261242.83979", size 7452
>
> work_mem is set to 128MB
>
>
> --
> Scott Ribe
> scott_ribe@elevated-dev.com
> https://www.linkedin.com/in/scottribe/
>
>
Typically, queries which need a lot of memory (RAM) create temp files if 
work_mem isn't sufficient for some sorting or hash algorithms.

Increasing work_mem will help, but small temp files don't create any 
trouble.

You can set work_mem within each session, don't set it high globally.

Regards,

Holger


>
>
-- 

Holger Jakobs, Bergisch Gladbach


Attachment

Re: small temp files

From
Scott Ribe
Date:
> On Jul 22, 2024, at 7:00 AM, Holger Jakobs <holger@jakobs.com> wrote:
>
> Typically, queries which need a lot of memory (RAM) create temp files if work_mem isn't sufficient for some sorting
orhash algorithms. 
>
> Increasing work_mem will help, but small temp files don't create any trouble.
>
> You can set work_mem within each session, don't set it high globally.

I understand those things--my question is why, with work_mem set to 128MB, I would see tiny temp files (7452 is common,
asis 102, and I've seen as small as 51). 




Re: small temp files

From
Paul Smith*
Date:
On 22/07/2024 14:28, Scott Ribe wrote:
I understand those things--my question is why, with work_mem set to 128MB, I would see tiny temp files (7452 is common, as is 102, and I've seen as small as 51).

From the manual:

"Note that a complex query might perform several sort and hash operations at the same time, with each operation generally being allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, memoize nodes and hash-based processing of IN subqueries."

So, if it's doing lots of joins, there may be lots of bits of temporary data which together add up to more than work_mem. AIUI PostgreSQL doesn't necessarily shove all the temporary data for one query into one file, it may have multiple smaller files for the data for a single query.

Re: small temp files

From
"David G. Johnston"
Date:
On Monday, July 22, 2024, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> On Jul 22, 2024, at 7:00 AM, Holger Jakobs <holger@jakobs.com> wrote:
>
> Typically, queries which need a lot of memory (RAM) create temp files if work_mem isn't sufficient for some sorting or hash algorithms.
>
> Increasing work_mem will help, but small temp files don't create any trouble.
>
> You can set work_mem within each session, don't set it high globally.

I understand those things--my question is why, with work_mem set to 128MB, I would see tiny temp files (7452 is common, as is 102, and I've seen as small as 51).


You expect the smallest temporary file to be 128MB?  I.e., if the memory used exceeds work_mem all of it gets put into the temp file at that point?  Versus only the amount of data that exceeds work_mem getting pushed out to the temporary file.  The overflow only design seems much more reasonable - why write to disk that which fits, and already exists, in memory.

David J.

Re: small temp files

From
Scott Ribe
Date:
> ...with each operation generally being allowed to use as much memory as this value specifies before it starts to
writedata into temporary files. 

So, doesn't explain the 7452-byte files. Unless an operation can use a temporary file as an addendum to work_mem,
insteadof spilling the RAM contents to disk as is my understanding. 

> So, if it's doing lots of joins, there may be lots of bits of temporary data which together add up to more than
work_mem.

If it's doing lots of joins, each will get work_mem--there is no "adding up" among operations using work_mem.

> You expect the smallest temporary file to be 128MB?  I.e., if the memory used exceeds work_mem all of it gets put
intothe temp file at that point?  Versus only the amount of data that exceeds work_mem getting pushed out to the
temporaryfile.  The overflow only design seems much more reasonable - why write to disk that which fits, and already
exists,in memory. 

Well, I don't know of an algorithm which can effectively sort 128MB + 7KB of data using 128MB of RAM and a 7KB file.
Samefor many of the other operations which use work_mem, so yes, I expected spill over to start with 128MB file and
growit as needed. If I'm wrong and there are operations which can effectively use temp files as adjunct, then that
wouldbe the answer to my question. Does anybody know for sure that this is the case? 


Re: small temp files

From
Tom Lane
Date:
Scott Ribe <scott_ribe@elevated-dev.com> writes:
>> You expect the smallest temporary file to be 128MB?  I.e., if the memory used exceeds work_mem all of it gets put
intothe temp file at that point?  Versus only the amount of data that exceeds work_mem getting pushed out to the
temporaryfile.  The overflow only design seems much more reasonable - why write to disk that which fits, and already
exists,in memory. 

> Well, I don't know of an algorithm which can effectively sort 128MB + 7KB of data using 128MB of RAM and a 7KB file.
Samefor many of the other operations which use work_mem, so yes, I expected spill over to start with 128MB file and
growit as needed. If I'm wrong and there are operations which can effectively use temp files as adjunct, then that
wouldbe the answer to my question. Does anybody know for sure that this is the case? 

You would get more specific answers if you provided an example of the
queries that cause this, with EXPLAIN ANALYZE output.  But I think a
likely bet is that it's doing a hash join that overruns work_mem.
What will happen is that the join gets divided into batches based on
hash codes, and each batch gets dumped into its own temp files (one
per batch for each side of the join).  It would not be too surprising
if some of the batches are small, thanks to the vagaries of hash
values.  Certainly they could be less than work_mem, since the only
thing we can say for sure is that the sum of the temp file sizes for
the inner side of the join should exceed work_mem.

            regards, tom lane



Re: small temp files

From
Scott Ribe
Date:
> On Jul 22, 2024, at 8:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> You would get more specific answers if you provided an example of the
> queries that cause this, with EXPLAIN ANALYZE output.  But I think a
> likely bet is that it's doing a hash join that overruns work_mem.
> What will happen is that the join gets divided into batches based on
> hash codes, and each batch gets dumped into its own temp files (one
> per batch for each side of the join).  It would not be too surprising
> if some of the batches are small, thanks to the vagaries of hash
> values.  Certainly they could be less than work_mem, since the only
> thing we can say for sure is that the sum of the temp file sizes for
> the inner side of the join should exceed work_mem.

OK, that makes total sense, and fits our usage patterns. (Lots of complex queries, lots of hash joins.)

thanks