Thread: optimizing CleanupTempFiles
Hi, We've been profiling a large system (8 CPUs, 64 GB of memory, some dozens of disks) which seems rather more swamped than it should. Part of the problem seems to come from CleanupTempFiles, the second entry in oprofile output. This is the top 3 symbol report for a 2 minute oprofile run: $ opreport '*postgres' -l CPU: AMD64 processors, speed 2600 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 7000 samples % symbol name 1963666 5.4862 AllocSetAlloc 1629477 4.5525 CleanupTempFiles 1560543 4.3599 hash_search_with_hash_value Annotating the function results in this: :static void :CleanupTempFiles(bool isProcExit) 1334 0.0037 :{ /* CleanupTempFiles total: 1629477 4.5525 */ : Index i; : 90 2.5e-04 : if (SizeVfdCache > 0) : { : Assert(FileIsNotOpen(0)); /* Make sure ring not corrupted */ 470706 1.3151 : for (i = 1; i < SizeVfdCache; i++) : { 281998 0.7879 : unsigned short fdstate = VfdCache[i].fdstate; : 872073 2.4365 : if ((fdstate & FD_TEMPORARY) && VfdCache[i].fileName != NULL) : { : /* : * Ifwe're in the process of exiting a backend process, close : * all temporaryfiles. Otherwise, only close temporary files : * local to the currenttransaction. : */ : if (isProcExit|| (fdstate & FD_XACT_TEMPORARY)) : FileClose(i); : } : } : } : 3198 0.0089 : while (numAllocatedDescs > 0) : FreeDesc(&allocatedDescs[0]); 78 2.2e-04 :} So we're scanning the array lots of times (there are about 1300 transactions per second), and do no useful work on each scan. There are about 8900 files in the database, and since this is using a connection pooler, it's quite likely that each session opens a large subset of them at least once at some point, and so the VfdCache gets very large. I can see two simple ways to solve this problem. One is to create a second array that keeps pointers to the temp files in VfdCache. Then, on CleanupTempFiles we scan that array instead of VfdCache directly. The second one is to use a separate VfdCache for temp files, but this seems much more involved, requiring touch almost all of fd.c. Of course, perhaps there's another solution which involves rethinking fd.c in a more thorough fashion, but I'm not sure how right offhand. Thoughts? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote: > We've been profiling a large system (8 CPUs, 64 GB of memory, some > dozens of disks) which seems rather more swamped than it should. Part > of the problem seems to come from CleanupTempFiles, the second entry in > oprofile output. I'm glad you've observed this also. I saw it about two years ago but wasn't able to convince anyone else it existed at the time. > I can see two simple ways to solve this problem. One is to create a > second array that keeps pointers to the temp files in VfdCache. Then, > on CleanupTempFiles we scan that array instead of VfdCache directly. > > The second one is to use a separate VfdCache for temp files, but this > seems much more involved, requiring touch almost all of fd.c. > > Of course, perhaps there's another solution which involves rethinking > fd.c in a more thorough fashion, but I'm not sure how right offhand. Simple solution is to have a state variable so you can see whether a backend has created an temp files in this transaction. Most don't, so I think the above two solutions are overkill. If we created any, scan for them, if not, don't. Just a simple boolean state, just as we have for AtEOXact_RelationCache(). -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > > On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote: > > > We've been profiling a large system (8 CPUs, 64 GB of memory, some > > dozens of disks) which seems rather more swamped than it should. Part > > of the problem seems to come from CleanupTempFiles, the second entry in > > oprofile output. > > I'm glad you've observed this also. I saw it about two years ago but > wasn't able to convince anyone else it existed at the time. I couldn't find it in the archives. > Simple solution is to have a state variable so you can see whether a > backend has created an temp files in this transaction. Most don't, so I > think the above two solutions are overkill. If we created any, scan for > them, if not, don't. Just a simple boolean state, just as we have for > AtEOXact_RelationCache(). Ah -- like this? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Attachment
Alvaro Herrera wrote: > Simon Riggs wrote: > > Simple solution is to have a state variable so you can see whether a > > backend has created an temp files in this transaction. Most don't, so I > > think the above two solutions are overkill. If we created any, scan for > > them, if not, don't. Just a simple boolean state, just as we have for > > AtEOXact_RelationCache(). > > Ah -- like this? BTW I wonder if there's any point in checking SizeVfdCache > 0 in the places where checking the new flag suffices. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Ah -- like this? +1, but there are two kinds of temp files in that module, and only one of them is relevant here. Call it something like have_xact_temporary_files to make things clearer. I concur that the explicit test on SizeVfdCache > 0 is a waste of effort, too. It'll nearly always be true anyway. regards, tom lane
On Wed, 2008-09-17 at 17:34 -0400, Alvaro Herrera wrote: > Simon Riggs wrote: > > > > On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote: > > > > > We've been profiling a large system (8 CPUs, 64 GB of memory, some > > > dozens of disks) which seems rather more swamped than it should. Part > > > of the problem seems to come from CleanupTempFiles, the second entry in > > > oprofile output. > > > > I'm glad you've observed this also. I saw it about two years ago but > > wasn't able to convince anyone else it existed at the time. > > I couldn't find it in the archives. Perhaps it was a private conversation then, but the main point is I've seen it too and believe it is a real world problem. > > Simple solution is to have a state variable so you can see whether a > > backend has created an temp files in this transaction. Most don't, so I > > think the above two solutions are overkill. If we created any, scan for > > them, if not, don't. Just a simple boolean state, just as we have for > > AtEOXact_RelationCache(). > > Ah -- like this? Yeh, nice and simple. Might be better to call it "need_eoxact_work" to mirror relcache.c? any_temporary_files is fairly vague and could be misinterpreted. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2008-09-17 at 17:34 -0400, Alvaro Herrera wrote: > Ah -- like this? if test should include || isProcExit so you don't skip non-transactional temp files at proc exit when there weren't any created in the last transaction. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Ah -- like this? > > +1, but there are two kinds of temp files in that module, and only > one of them is relevant here. Call it something like > have_xact_temporary_files to make things clearer. Ok, so that's what I called it. Simon wrote: > if test should include > || isProcExit > > so you don't skip non-transactional temp files at proc exit when there > weren't any created in the last transaction. Yep, I noticed that too. Thanks. Should I backpatch this to 8.3? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
BTW in testing this patch I was surprised by the fact that temp tables files are removed at checkpoint time, rather than when the transaction ends (at first I thought I had broken the removal of temp files). Is this a recent feature? (I verified that this continues to work fine for WITH HOLD cursors too.) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > BTW in testing this patch I was surprised by the fact that temp tables > files are removed at checkpoint time, [ blink... ] Doesn't look like that should happen. What is your test case? regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> BTW in testing this patch I was surprised by the fact that temp tables >> files are removed at checkpoint time, > > [ blink... ] Doesn't look like that should happen. What is your > test case? Hmph, must be because of the patch from last winter to prevent relfilenode reuse until next checkpoint. Looks like we didn't make an exception for temporary tables. Although it's harmless, we could put an isTempOrToastNamespace() test in there: *** md.c 11 Aug 2008 11:05:11 -0000 1.139 --- md.c 18 Sep 2008 06:22:13 -0000 *************** *** 19,24 **** --- 19,25 ---- #include <sys/file.h> #include "catalog/catalog.h" + #include "catalog/namespace.h" #include "miscadmin.h" #include "postmaster/bgwriter.h" #include "storage/fd.h" *************** *** 324,330 **** /* * Delete or truncate the first segment. */ ! if (isRedo || forkNum != MAIN_FORKNUM) ret = unlink(path); else { --- 325,331 ---- /* * Delete or truncate the first segment. */ ! if (isRedo || forkNum != MAIN_FORKNUM || isTempOrToastNamespace(rnode.spcNode)) ret = unlink(path); else { -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2008-09-18 at 09:23 +0300, Heikki Linnakangas wrote: > Tom Lane wrote: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> BTW in testing this patch I was surprised by the fact that temp tables > >> files are removed at checkpoint time, > > > > [ blink... ] Doesn't look like that should happen. What is your > > test case? > > Hmph, must be because of the patch from last winter to prevent > relfilenode reuse until next checkpoint. Looks like we didn't make an > exception for temporary tables. Although it's harmless... An unfortunate choice of words! Harmless is not how your average DBA would describe it when their disk fills and they are apparently unable to reduce space consumption. So there is still a problem there even if we fix the temp files portion of it. Seems like a complete fix must have some kind of pressure relief valve for when things get full. Disk overflow is a much more likely problem than a relfilenode recycling problem. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2008-09-18 at 09:23 +0300, Heikki Linnakangas wrote: >> Tom Lane wrote: >>> Alvaro Herrera <alvherre@commandprompt.com> writes: >>>> BTW in testing this patch I was surprised by the fact that temp tables >>>> files are removed at checkpoint time, >>> [ blink... ] Doesn't look like that should happen. What is your >>> test case? >> Hmph, must be because of the patch from last winter to prevent >> relfilenode reuse until next checkpoint. Looks like we didn't make an >> exception for temporary tables. Although it's harmless... > > An unfortunate choice of words! Harmless is not how your average DBA > would describe it when their disk fills and they are apparently unable > to reduce space consumption. So there is still a problem there even if > we fix the temp files portion of it. The files *are* truncated to zero bytes immediately. They're left hanging as empty files until next checkpoint. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, 2008-09-18 at 10:19 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > > > An unfortunate choice of words! Harmless is not how your average DBA > > would describe it when their disk fills and they are apparently unable > > to reduce space consumption. So there is still a problem there even if > > we fix the temp files portion of it. > > The files *are* truncated to zero bytes immediately. They're left > hanging as empty files until next checkpoint. Ah, cool. So won't actually unlinking temp files be slower than just leaving them for checkpointer to clear up offline? i.e. do we really need the patch you just posted? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2008-09-18 at 10:19 +0300, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> An unfortunate choice of words! Harmless is not how your average DBA >>> would describe it when their disk fills and they are apparently unable >>> to reduce space consumption. So there is still a problem there even if >>> we fix the temp files portion of it. >> The files *are* truncated to zero bytes immediately. They're left >> hanging as empty files until next checkpoint. > > Ah, cool. So won't actually unlinking temp files be slower than just > leaving them for checkpointer to clear up offline? i.e. do we really > need the patch you just posted? Dunno, maybe. I doubt it's significant either way. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Tom Lane wrote: >> [ blink... ] Doesn't look like that should happen. What is your >> test case? > Hmph, must be because of the patch from last winter to prevent > relfilenode reuse until next checkpoint. Ah. I had misunderstood Alvaro to say that temp files (the kind under discussion up to now) were not unlinked immediately; which would be pretty strange given that fd.c is underneath md.c. > Looks like we didn't make an > exception for temporary tables. Although it's harmless, we could put an > isTempOrToastNamespace() test in there: Bad, bad idea to have md.c doing any catalog access. As already noted downthread, it wouldn't buy much anyway. regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Looks like we didn't make an >> exception for temporary tables. Although it's harmless, we could put an >> isTempOrToastNamespace() test in there: > > Bad, bad idea to have md.c doing any catalog access. isTempOrToastNamespace() doesn't access catalogs, it's just checking the argument against two global variables. I guess you could argue that it's a modularity violation. > As already noted downthread, it wouldn't buy much anyway. Yeah. *shrug* -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > Tom Lane wrote: > >> [ blink... ] Doesn't look like that should happen. What is your > >> test case? This was simply a CREATE TEMP TABLE ... ON COMMIT DROP. The file stays in place until checkpoint (either a manually invoked one, or a shutdown's) Don't temp tables use this kind of temp file? I admit I didn't check; I just assumed they did. > > Hmph, must be because of the patch from last winter to prevent > > relfilenode reuse until next checkpoint. > > Ah. I had misunderstood Alvaro to say that temp files (the kind under > discussion up to now) were not unlinked immediately; which would be > pretty strange given that fd.c is underneath md.c. The test case where I actually verify that fd.c was used was a WITH SCROLL cursor. The file does go away like I expected in that case, as soon as the cursor is destroyed (or as soon as the backend is closed). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Don't temp tables use this kind of temp file? I admit I didn't check; I > just assumed they did. No, temp tables go through localbuf.c, which sits atop regular smgr. I don't think fd.c knows any difference from regular tables. regards, tom lane
I committed this to HEAD. If anything breaks, I request that someone else does the cleanup, as it looks like we're headed for the hospital first thing tomorrow morning to see what newborns look like. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.