Thread: optimizing CleanupTempFiles

optimizing CleanupTempFiles

From
Alvaro Herrera
Date:
Hi,

We've been profiling a large system (8 CPUs, 64 GB of memory, some
dozens of disks) which seems rather more swamped than it should.  Part
of the problem seems to come from CleanupTempFiles, the second entry in
oprofile output.

This is the top 3 symbol report for a 2 minute oprofile run:

$ opreport '*postgres' -l
CPU: AMD64 processors, speed 2600 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 7000
samples  %        symbol name
1963666   5.4862  AllocSetAlloc
1629477   4.5525  CleanupTempFiles
1560543   4.3599  hash_search_with_hash_value


Annotating the function results in this:
              :static void              :CleanupTempFiles(bool isProcExit) 1334  0.0037 :{ /* CleanupTempFiles total:
1629477 4.5525 */              :        Index      i;              :   90 2.5e-04 :        if (SizeVfdCache > 0)
     :        {              :                Assert(FileIsNotOpen(0));              /* Make sure ring not corrupted
*/
470706  1.3151 :                for (i = 1; i < SizeVfdCache; i++)              :                {
281998  0.7879 :                        unsigned short fdstate = VfdCache[i].fdstate;              :
872073  2.4365 :                        if ((fdstate & FD_TEMPORARY) && VfdCache[i].fileName != NULL)              :
                   {              :                                /*              :                                 *
Ifwe're in the process of exiting a backend process, close              :                                 * all
temporaryfiles. Otherwise, only close temporary files              :                                 * local to the
currenttransaction.              :                                 */              :                                if
(isProcExit|| (fdstate & FD_XACT_TEMPORARY))              :                                        FileClose(i);
     :                        }              :                }              :        }              : 3198  0.0089 :
    while (numAllocatedDescs > 0)              :                FreeDesc(&allocatedDescs[0]);   78 2.2e-04 :}
 



So we're scanning the array lots of times (there are about 1300
transactions per second), and do no useful work on each scan.

There are about 8900 files in the database, and since this is using a
connection pooler, it's quite likely that each session opens a large
subset of them at least once at some point, and so the VfdCache gets
very large.


I can see two simple ways to solve this problem.  One is to create a
second array that keeps pointers to the temp files in VfdCache.  Then,
on CleanupTempFiles we scan that array instead of VfdCache directly.

The second one is to use a separate VfdCache for temp files, but this
seems much more involved, requiring touch almost all of fd.c.

Of course, perhaps there's another solution which involves rethinking
fd.c in a more thorough fashion, but I'm not sure how right offhand.

Thoughts?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: optimizing CleanupTempFiles

From
Simon Riggs
Date:
On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote:

> We've been profiling a large system (8 CPUs, 64 GB of memory, some
> dozens of disks) which seems rather more swamped than it should.  Part
> of the problem seems to come from CleanupTempFiles, the second entry in
> oprofile output.

I'm glad you've observed this also. I saw it about two years ago but
wasn't able to convince anyone else it existed at the time.

> I can see two simple ways to solve this problem.  One is to create a
> second array that keeps pointers to the temp files in VfdCache.  Then,
> on CleanupTempFiles we scan that array instead of VfdCache directly.
> 
> The second one is to use a separate VfdCache for temp files, but this
> seems much more involved, requiring touch almost all of fd.c.
> 
> Of course, perhaps there's another solution which involves rethinking
> fd.c in a more thorough fashion, but I'm not sure how right offhand.

Simple solution is to have a state variable so you can see whether a
backend has created an temp files in this transaction. Most don't, so I
think the above two solutions are overkill. If we created any, scan for
them, if not, don't. Just a simple boolean state, just as we have for
AtEOXact_RelationCache().

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: optimizing CleanupTempFiles

From
Alvaro Herrera
Date:
Simon Riggs wrote:
>
> On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote:
>
> > We've been profiling a large system (8 CPUs, 64 GB of memory, some
> > dozens of disks) which seems rather more swamped than it should.  Part
> > of the problem seems to come from CleanupTempFiles, the second entry in
> > oprofile output.
>
> I'm glad you've observed this also. I saw it about two years ago but
> wasn't able to convince anyone else it existed at the time.

I couldn't find it in the archives.

> Simple solution is to have a state variable so you can see whether a
> backend has created an temp files in this transaction. Most don't, so I
> think the above two solutions are overkill. If we created any, scan for
> them, if not, don't. Just a simple boolean state, just as we have for
> AtEOXact_RelationCache().

Ah -- like this?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment

Re: optimizing CleanupTempFiles

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> Simon Riggs wrote:

> > Simple solution is to have a state variable so you can see whether a
> > backend has created an temp files in this transaction. Most don't, so I
> > think the above two solutions are overkill. If we created any, scan for
> > them, if not, don't. Just a simple boolean state, just as we have for
> > AtEOXact_RelationCache().
> 
> Ah -- like this?

BTW I wonder if there's any point in checking SizeVfdCache > 0 in the
places where checking the new flag suffices.


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: optimizing CleanupTempFiles

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Ah -- like this?

+1, but there are two kinds of temp files in that module, and only
one of them is relevant here.  Call it something like
have_xact_temporary_files to make things clearer.

I concur that the explicit test on SizeVfdCache > 0 is a waste of
effort, too.  It'll nearly always be true anyway.
        regards, tom lane


Re: optimizing CleanupTempFiles

From
Simon Riggs
Date:
On Wed, 2008-09-17 at 17:34 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
> > 
> > On Wed, 2008-09-17 at 16:25 -0400, Alvaro Herrera wrote:
> > 
> > > We've been profiling a large system (8 CPUs, 64 GB of memory, some
> > > dozens of disks) which seems rather more swamped than it should.  Part
> > > of the problem seems to come from CleanupTempFiles, the second entry in
> > > oprofile output.
> > 
> > I'm glad you've observed this also. I saw it about two years ago but
> > wasn't able to convince anyone else it existed at the time.
> 
> I couldn't find it in the archives.

Perhaps it was a private conversation then, but the main point is I've
seen it too and believe it is a real world problem.

> > Simple solution is to have a state variable so you can see whether a
> > backend has created an temp files in this transaction. Most don't, so I
> > think the above two solutions are overkill. If we created any, scan for
> > them, if not, don't. Just a simple boolean state, just as we have for
> > AtEOXact_RelationCache().
> 
> Ah -- like this?

Yeh, nice and simple.

Might be better to call it "need_eoxact_work" to mirror relcache.c?
any_temporary_files is fairly vague and could be misinterpreted.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: optimizing CleanupTempFiles

From
Simon Riggs
Date:
On Wed, 2008-09-17 at 17:34 -0400, Alvaro Herrera wrote:

> Ah -- like this?

if test should include || isProcExit

so you don't skip non-transactional temp files at proc exit when there
weren't any created in the last transaction.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: optimizing CleanupTempFiles

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Ah -- like this?
>
> +1, but there are two kinds of temp files in that module, and only
> one of them is relevant here.  Call it something like
> have_xact_temporary_files to make things clearer.

Ok, so that's what I called it.

Simon wrote:

> if test should include
>         || isProcExit
>
> so you don't skip non-transactional temp files at proc exit when there
> weren't any created in the last transaction.

Yep, I noticed that too.  Thanks.

Should I backpatch this to 8.3?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: optimizing CleanupTempFiles

From
Alvaro Herrera
Date:
BTW in testing this patch I was surprised by the fact that temp tables
files are removed at checkpoint time, rather than when the transaction
ends (at first I thought I had broken the removal of temp files).  Is
this a recent feature?

(I verified that this continues to work fine for WITH HOLD cursors too.)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: optimizing CleanupTempFiles

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> BTW in testing this patch I was surprised by the fact that temp tables
> files are removed at checkpoint time,

[ blink... ]  Doesn't look like that should happen.  What is your
test case?
        regards, tom lane


Re: optimizing CleanupTempFiles

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> BTW in testing this patch I was surprised by the fact that temp tables
>> files are removed at checkpoint time,
> 
> [ blink... ]  Doesn't look like that should happen.  What is your
> test case?

Hmph, must be because of the patch from last winter to prevent 
relfilenode reuse until next checkpoint. Looks like we didn't make an 
exception for temporary tables. Although it's harmless, we could put an 
isTempOrToastNamespace() test in there:

*** md.c    11 Aug 2008 11:05:11 -0000    1.139
--- md.c    18 Sep 2008 06:22:13 -0000
***************
*** 19,24 ****
--- 19,25 ----  #include <sys/file.h>
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"  #include "miscadmin.h"  #include "postmaster/bgwriter.h"  #include "storage/fd.h"
***************
*** 324,330 ****      /*       * Delete or truncate the first segment.       */
!     if (isRedo || forkNum != MAIN_FORKNUM)          ret = unlink(path);      else      {
--- 325,331 ----      /*       * Delete or truncate the first segment.       */
!     if (isRedo || forkNum != MAIN_FORKNUM || 
isTempOrToastNamespace(rnode.spcNode))          ret = unlink(path);      else      {

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: optimizing CleanupTempFiles

From
Simon Riggs
Date:
On Thu, 2008-09-18 at 09:23 +0300, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre@commandprompt.com> writes:
> >> BTW in testing this patch I was surprised by the fact that temp tables
> >> files are removed at checkpoint time,
> > 
> > [ blink... ]  Doesn't look like that should happen.  What is your
> > test case?
> 
> Hmph, must be because of the patch from last winter to prevent 
> relfilenode reuse until next checkpoint. Looks like we didn't make an 
> exception for temporary tables. Although it's harmless...

An unfortunate choice of words! Harmless is not how your average DBA
would describe it when their disk fills and they are apparently unable
to reduce space consumption. So there is still a problem there even if
we fix the temp files portion of it.

Seems like a complete fix must have some kind of pressure relief valve
for when things get full. Disk overflow is a much more likely problem
than a relfilenode recycling problem.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: optimizing CleanupTempFiles

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2008-09-18 at 09:23 +0300, Heikki Linnakangas wrote:
>> Tom Lane wrote:
>>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>>> BTW in testing this patch I was surprised by the fact that temp tables
>>>> files are removed at checkpoint time,
>>> [ blink... ]  Doesn't look like that should happen.  What is your
>>> test case?
>> Hmph, must be because of the patch from last winter to prevent 
>> relfilenode reuse until next checkpoint. Looks like we didn't make an 
>> exception for temporary tables. Although it's harmless...
> 
> An unfortunate choice of words! Harmless is not how your average DBA
> would describe it when their disk fills and they are apparently unable
> to reduce space consumption. So there is still a problem there even if
> we fix the temp files portion of it.

The files *are* truncated to zero bytes immediately. They're left 
hanging as empty files until next checkpoint.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: optimizing CleanupTempFiles

From
Simon Riggs
Date:
On Thu, 2008-09-18 at 10:19 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > 
> > An unfortunate choice of words! Harmless is not how your average DBA
> > would describe it when their disk fills and they are apparently unable
> > to reduce space consumption. So there is still a problem there even if
> > we fix the temp files portion of it.
> 
> The files *are* truncated to zero bytes immediately. They're left 
> hanging as empty files until next checkpoint.

Ah, cool. So won't actually unlinking temp files be slower than just
leaving them for checkpointer to clear up offline? i.e. do we really
need the patch you just posted?

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: optimizing CleanupTempFiles

From
Heikki Linnakangas
Date:
Simon Riggs wrote:
> On Thu, 2008-09-18 at 10:19 +0300, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> An unfortunate choice of words! Harmless is not how your average DBA
>>> would describe it when their disk fills and they are apparently unable
>>> to reduce space consumption. So there is still a problem there even if
>>> we fix the temp files portion of it.
>> The files *are* truncated to zero bytes immediately. They're left 
>> hanging as empty files until next checkpoint.
> 
> Ah, cool. So won't actually unlinking temp files be slower than just
> leaving them for checkpointer to clear up offline? i.e. do we really
> need the patch you just posted?

Dunno, maybe. I doubt it's significant either way.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: optimizing CleanupTempFiles

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> [ blink... ]  Doesn't look like that should happen.  What is your
>> test case?

> Hmph, must be because of the patch from last winter to prevent 
> relfilenode reuse until next checkpoint.

Ah.  I had misunderstood Alvaro to say that temp files (the kind under
discussion up to now) were not unlinked immediately; which would be
pretty strange given that fd.c is underneath md.c.

> Looks like we didn't make an 
> exception for temporary tables. Although it's harmless, we could put an 
> isTempOrToastNamespace() test in there:

Bad, bad idea to have md.c doing any catalog access.  As already noted
downthread, it wouldn't buy much anyway.
        regards, tom lane


Re: optimizing CleanupTempFiles

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Looks like we didn't make an 
>> exception for temporary tables. Although it's harmless, we could put an 
>> isTempOrToastNamespace() test in there:
> 
> Bad, bad idea to have md.c doing any catalog access.

isTempOrToastNamespace() doesn't access catalogs, it's just checking the 
argument against two global variables. I guess you could argue that it's 
a modularity violation.
>  As already noted downthread, it wouldn't buy much anyway.

Yeah. *shrug*

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: optimizing CleanupTempFiles

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > Tom Lane wrote:
> >> [ blink... ]  Doesn't look like that should happen.  What is your
> >> test case?

This was simply a CREATE TEMP TABLE ... ON COMMIT DROP.  The file stays
in place until checkpoint (either a manually invoked one, or a
shutdown's)

Don't temp tables use this kind of temp file?  I admit I didn't check; I
just assumed they did.

> > Hmph, must be because of the patch from last winter to prevent 
> > relfilenode reuse until next checkpoint.
> 
> Ah.  I had misunderstood Alvaro to say that temp files (the kind under
> discussion up to now) were not unlinked immediately; which would be
> pretty strange given that fd.c is underneath md.c.

The test case where I actually verify that fd.c was used was a WITH
SCROLL cursor.  The file does go away like I expected in that case, as
soon as the cursor is destroyed (or as soon as the backend is closed).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: optimizing CleanupTempFiles

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Don't temp tables use this kind of temp file?  I admit I didn't check; I
> just assumed they did.

No, temp tables go through localbuf.c, which sits atop regular smgr.
I don't think fd.c knows any difference from regular tables.
        regards, tom lane


Re: optimizing CleanupTempFiles

From
Alvaro Herrera
Date:
I committed this to HEAD.  If anything breaks, I request that someone
else does the cleanup, as it looks like we're headed for the hospital
first thing tomorrow morning to see what newborns look like.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.