Thread: [PERFORMANCE] work_mem vs temp files issue
Hi, AFAIUI, work_mem is used for some operations (sort, hash, etc) for avoiding the use of temp files on disk... In a client server i'm monitoring (pg 8.3.7, 32GB of ram) work_mem is set to 8MB, however i'm seeing a lot of temp files (>30000 in 4 hours) with small sizes (ie: 2021520 obviously lower than 8MB). so, why? maybe we use work_mem until we find isn't enough and we send just the difference to a temp file? i'm not thinking in raising work_mem until i understand this well, what's the point if we still create temp files that could fit in work_mem... PS: i have max_connections to 1024, i know i need a pool but the app is still opening persistent conecctions to the db, so is not like i could raise work_mem just easy until the app gets fixed -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Aug 19, 2009, at 7:45 PM, Jaime Casanova wrote: > AFAIUI, work_mem is used for some operations (sort, hash, etc) for > avoiding the use of temp files on disk... > > In a client server i'm monitoring (pg 8.3.7, 32GB of ram) work_mem is > set to 8MB, however i'm seeing a lot of temp files (>30000 in 4 hours) > with small sizes (ie: 2021520 obviously lower than 8MB). so, why? > maybe we use work_mem until we find isn't enough and we send just the > difference to a temp file? > > i'm not thinking in raising work_mem until i understand this well, > what's the point if we still create temp files that could fit in > work_mem... Are you using temp tables? Those end up in pgsql_tmp as well. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
decibel <decibel@decibel.org> writes: > On Aug 19, 2009, at 7:45 PM, Jaime Casanova wrote: >> AFAIUI, work_mem is used for some operations (sort, hash, etc) for >> avoiding the use of temp files on disk... >> >> In a client server i'm monitoring (pg 8.3.7, 32GB of ram) work_mem is >> set to 8MB, however i'm seeing a lot of temp files (>30000 in 4 hours) >> with small sizes (ie: 2021520 obviously lower than 8MB). so, why? >> maybe we use work_mem until we find isn't enough and we send just the >> difference to a temp file? >> >> i'm not thinking in raising work_mem until i understand this well, >> what's the point if we still create temp files that could fit in >> work_mem... > Are you using temp tables? Those end up in pgsql_tmp as well. Uh, no, they don't. It might be useful to turn on trace_sort to see if the small files are coming from sorts. If they're from hashes I'm afraid there's no handy instrumentation ... regards, tom lane
On Sun, Sep 13, 2009 at 5:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > It might be useful to turn on trace_sort to see if the small files > are coming from sorts. If they're from hashes I'm afraid there's > no handy instrumentation ... > yes they are, this is the log (i deleted the STATEMENT lines because they were redundant), seems like all the temp files are used to execute the same sentence... BTW, this is my laptop no the server. -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Mon, Jan 11, 2010 at 1:15 PM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > On Sun, Sep 13, 2009 at 5:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> It might be useful to turn on trace_sort to see if the small files >> are coming from sorts. If they're from hashes I'm afraid there's >> no handy instrumentation ... >> > > yes they are, this is the log (i deleted the STATEMENT lines because > they were redundant), seems like all the temp files are used to > execute the same sentence... > > BTW, this is my laptop no the server. I think maybe there was supposed to be an attachment here? ...Robert
On Mon, Jan 11, 2010 at 2:07 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jan 11, 2010 at 1:15 PM, Jaime Casanova > <jcasanov@systemguards.com.ec> wrote: >> On Sun, Sep 13, 2009 at 5:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> >>> It might be useful to turn on trace_sort to see if the small files >>> are coming from sorts. If they're from hashes I'm afraid there's >>> no handy instrumentation ... >>> >> >> yes they are, this is the log (i deleted the STATEMENT lines because >> they were redundant), seems like all the temp files are used to >> execute the same sentence... >> >> BTW, this is my laptop no the server. > > I think maybe there was supposed to be an attachment here? > i knew i was forgotting something ;) ah! and this is in 8.5dev but it's the same in 8.3 -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Mon, Jan 11, 2010 at 2:14 PM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > On Mon, Jan 11, 2010 at 2:07 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Jan 11, 2010 at 1:15 PM, Jaime Casanova >> <jcasanov@systemguards.com.ec> wrote: >>> On Sun, Sep 13, 2009 at 5:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> >>>> It might be useful to turn on trace_sort to see if the small files >>>> are coming from sorts. If they're from hashes I'm afraid there's >>>> no handy instrumentation ... >>>> >>> >>> yes they are, this is the log (i deleted the STATEMENT lines because >>> they were redundant), seems like all the temp files are used to >>> execute the same sentence... >>> >>> BTW, this is my laptop no the server. >> >> I think maybe there was supposed to be an attachment here? >> > > i knew i was forgotting something ;) > ah! and this is in 8.5dev but it's the same in 8.3 > oh! boy this can't be happen! attaching again -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Attachment
Jaime Casanova <jcasanov@systemguards.com.ec> writes: > LOG: begin tuple sort: nkeys = 1, workMem = 1024, randomAccess = f > LOG: switching to bounded heapsort at 641 tuples: CPU 0.08s/0.13u sec elapsed 0.25 sec > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.5", size 471010 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.10", size 81096 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.0", size 467373 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.9", size 110200 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.3", size 470011 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.8", size 157192 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.4", size 468681 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.12", size 101624 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.1", size 472285 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.11", size 100744 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.6", size 467173 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.7", size 141888 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.2", size 476227 > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.13", size 89072 > LOG: performsort starting: CPU 0.10s/0.19u sec elapsed 0.33 sec > LOG: performsort done: CPU 0.10s/0.19u sec elapsed 0.33 sec > LOG: internal sort ended, 118 KB used: CPU 0.10s/0.19u sec elapsed 0.33 sec Hmm. Not clear where the temp files are coming from, but it's *not* the sort --- the "internal sort ended" line shows that that sort never went to disk. What kind of plan is feeding the sort node? regards, tom lane
On Mon, Jan 11, 2010 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jaime Casanova <jcasanov@systemguards.com.ec> writes: >> LOG: begin tuple sort: nkeys = 1, workMem = 1024, randomAccess = f >> LOG: switching to bounded heapsort at 641 tuples: CPU 0.08s/0.13u sec elapsed 0.25 sec >> LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8507.5", size 471010 [... some more temp files logged ...] >> LOG: internal sort ended, 118 KB used: CPU 0.10s/0.19u sec elapsed 0.33 sec > > Hmm. Not clear where the temp files are coming from, but it's *not* the > sort --- the "internal sort ended" line shows that that sort never went > to disk. What kind of plan is feeding the sort node? > i'm sure i have seen on disk sorts even when the files are small, but still i see a problem here... the temp files shoul be coming from hash operations but AFAICS the files are small and every hash operation should be using until work_mem memory, right? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Attachment
Jaime Casanova <jcasanov@systemguards.com.ec> writes: > the temp files shoul be coming from hash operations but AFAICS the > files are small and every hash operation should be using until > work_mem memory, right? No, when a hash spills to disk the code has to guess the partition sizes (number of buckets per partition) in advance. So it wouldn't be at all surprising if the actual sizes come out substantially different from work_mem. regards, tom lane
On Mon, Jan 11, 2010 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Hmm. Not clear where the temp files are coming from, but it's *not* the > sort --- the "internal sort ended" line shows that that sort never went > to disk. What kind of plan is feeding the sort node? > some time ago, you said: """ It might be useful to turn on trace_sort to see if the small files are coming from sorts. If they're from hashes I'm afraid there's no handy instrumentation ... """ and is clearly what was bother me... because most of all temp files are coming from hash... why we don't show some of that info in explain? for example: we can show memory used, no? or if the hash goes to disk... if i remove #ifdef HJDEBUG seems like we even know how many batchs the hash used... the reason i say "most of the temp files" is that when i removed #ifdef HJDEBUG it says that in total i was using 10 batchs but there were 14 temp files created (i guess we use 1 file per batch, no?) """ nbatch = 1, nbuckets = 1024 nbatch = 1, nbuckets = 1024 nbatch = 8, nbuckets = 2048 """ -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jaime Casanova <jcasanov@systemguards.com.ec> writes: > why we don't show some of that info in explain? Lack of round tuits; plus concern about breaking programs that read EXPLAIN output, which I guess will be alleviated in 8.5. > the reason i say "most of the temp files" is that when i removed > #ifdef HJDEBUG it says that in total i was using 10 batchs but there > were 14 temp files created (i guess we use 1 file per batch, no?) Two files per batch, in general --- I suppose some of the buckets were empty. regards, tom lane
On Wed, Jan 13, 2010 at 1:31 AM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > On Mon, Jan 11, 2010 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Hmm. Not clear where the temp files are coming from, but it's *not* the >> sort --- the "internal sort ended" line shows that that sort never went >> to disk. What kind of plan is feeding the sort node? >> > > some time ago, you said: > """ > It might be useful to turn on trace_sort to see if the small files > are coming from sorts. If they're from hashes I'm afraid there's > no handy instrumentation ... > """ > > and is clearly what was bother me... because most of all temp files > are coming from hash... > > why we don't show some of that info in explain? for example: we can > show memory used, no? or if the hash goes to disk... if i remove > #ifdef HJDEBUG seems like we even know how many batchs the hash > used... I had an idea at one point of making explain show the planned and actual # of batches for each hash join. I believe that "actual # of batches > 1" is isomorphic to "hash join went to disk". The code is actually pretty easy; the hard part is figuring out what to do about the UI. The choices seem to be: 1. Create a new EXPLAIN option just for this - what would we call it? 2. Think of some more, similar things and come up with a new EXPLAIN option covering all of them - what else would go along with? 3. Sandwhich it into an existing EXPLAIN option, most likely VERBOSE. 4. Display it by default. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > I had an idea at one point of making explain show the planned and > actual # of batches for each hash join. I believe that "actual # of > batches > 1" is isomorphic to "hash join went to disk". The code is > actually pretty easy; the hard part is figuring out what to do about > the UI. The choices seem to be: > 1. Create a new EXPLAIN option just for this - what would we call it? > 2. Think of some more, similar things and come up with a new EXPLAIN > option covering all of them - what else would go along with? > 3. Sandwhich it into an existing EXPLAIN option, most likely VERBOSE. > 4. Display it by default. Treat it the same as the Sort-node actual usage information. We did not add a special option when we added that. regards, tom lane
On Wed, Jan 13, 2010 at 10:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I had an idea at one point of making explain show the planned and >> actual # of batches for each hash join. I believe that "actual # of >> batches > 1" is isomorphic to "hash join went to disk". The code is >> actually pretty easy; the hard part is figuring out what to do about >> the UI. The choices seem to be: > >> 1. Create a new EXPLAIN option just for this - what would we call it? >> 2. Think of some more, similar things and come up with a new EXPLAIN >> option covering all of them - what else would go along with? >> 3. Sandwhich it into an existing EXPLAIN option, most likely VERBOSE. >> 4. Display it by default. > > Treat it the same as the Sort-node actual usage information. We did not > add a special option when we added that. Well, what about when we're just doing EXPLAIN, not EXPLAIN ANALYZE? It'll add another line to the output for the expected number of batches. ...Robert
On Wed, Jan 13, 2010 at 11:11 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > Well, what about when we're just doing EXPLAIN, not EXPLAIN ANALYZE? > It'll add another line to the output for the expected number of > batches. > and when we are in EXPLAIN ANALYZE the real number as well? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
On Wed, Jan 13, 2010 at 11:14 AM, Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > On Wed, Jan 13, 2010 at 11:11 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Well, what about when we're just doing EXPLAIN, not EXPLAIN ANALYZE? >> It'll add another line to the output for the expected number of >> batches. > > and when we are in EXPLAIN ANALYZE the real number as well? Yeah. My question is whether it's acceptable to add an extra line to the EXPLAIN output for every hash join, even w/o ANALYZE. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > Yeah. My question is whether it's acceptable to add an extra line to > the EXPLAIN output for every hash join, even w/o ANALYZE. We could add it if either VERBOSE or ANALYZE appears. Not sure if that's just too much concern for backwards compatibility, though. regards, tom lane
On Wed, Jan 13, 2010 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Yeah. My question is whether it's acceptable to add an extra line to >> the EXPLAIN output for every hash join, even w/o ANALYZE. > > We could add it if either VERBOSE or ANALYZE appears. Not sure if > that's just too much concern for backwards compatibility, though. I think having it controlled by either of two options is to weird. I'm not worried so much about backward compatibility as I am about cluttering the output. Maybe making it controlled by VERBOSE is the right thing to do, although I'm sort of tempted to figure out if there is more useful instrumentation that could be done and put it all under a new option called, say, HASH_DETAILS. Not sure what else we could show though. ...Robert