Thread: Nasty resource-leak problem in sort code

Nasty resource-leak problem in sort code

From
Tom Lane
Date:
If a CREATE INDEX fails, the backend returns to the main loop without
having closed the temporary files that are created for sorting.
An easy example that provokes this is
create table titles (adate date);insert into titles values ('today');create index titles_f_ind on titles (date(adate)
date_ops);ERROR: SQL-language function not supported in this context.
 

after which the backend has about a dozen more open files than it had
before.

If you then try to create another index, you will crash for
lack of free file descriptors (unless your kernel has a
higher-than-usual open-files-per-process limit).  In any case, the
sort temp files will never get deleted from your database directory.

Offhand I'm not sure how to fix this.  The system knows about releasing
memory after an elog(ERROR), but does it have any provisions for
releasing other kinds of resources?  I suspect we need something
analogous to the on_shmem_exit() callback-registration list, but I
don't know whether it already exists.  Comments?
        regards, tom lane


Re: Nasty resource-leak problem in sort code

From
Tom Lane
Date:
I wrote:
> If a CREATE INDEX fails, the backend returns to the main loop without
> having closed the temporary files that are created for sorting.
> ...
> If you then try to create another index, you will crash for
> lack of free file descriptors (unless your kernel has a
> higher-than-usual open-files-per-process limit).  In any case, the
> sort temp files will never get deleted from your database directory.
>
> Offhand I'm not sure how to fix this.  The system knows about releasing
> memory after an elog(ERROR), but does it have any provisions for
> releasing other kinds of resources?

After further thought it seems that adding code to release temporary
files at transaction abort is the best solution.  I propose the
following fix:

1. In fd.c, invent the notion of a "temporary file", ie one that fd.c
knows (a) should be discarded when it is closed and (b) should not
outlive the current transaction.  Add a function TemporaryOpenFile()
that selects an appropriate temp file name, does the open, and marks
the resulting VFD as a temp file.  FileClose will implicitly act like
FileUnlink on this file.

2. Add a hook to xact.c that calls fd.c to close and delete all
remaining temporary files at transaction commit or abort.

3. Change psort.c and nodeHash.c (anyplace else?) to use this facility
instead of their existing ad-hoc temp file code.

This will fix several problems, including (a) failure to release file
descriptors after an error; (b) failure to delete temp files after
an error; (c) risk of running out of FDs if multiple sorts are invoked
concurrently (not sure if this can actually happen in the current state
of the code).  Centralizing the generation of temp file names in one
place seems like a good idea too.

Unless I hear objections, I'll work on this this weekend.
        regards, tom lane


Re: [HACKERS] Re: Nasty resource-leak problem in sort code

From
geek+@cmu.edu
Date:
Then <tgl@sss.pgh.pa.us> spoke up and said:
> After further thought it seems that adding code to release temporary
> files at transaction abort is the best solution.  I propose the
> following fix:
[ explanation snipped ]

Uhm, this all seems unnecessarily complicated.  Shouldn't the process
look more like this:
fp = open('tempfile');
unlink('tempfile');

This way, when the file is closed, the space is freed.  The only
complication I can see is if backends need to share the file handle,
or it needs to be re-opened.  This works with all sorts of temp-file
situations.

Of course, it's not NT safe, since I don't believe that NT provides
for deleting open files (NT file libs sucks.

-- 
=====================================================================
| JAVA must have been developed in the wilds of West Virginia.      |
| After all, why else would it support only single inheritance??    |
=====================================================================
| Finger geek@cmu.edu for my public key.                            |
=====================================================================

Re: [HACKERS] Re: Nasty resource-leak problem in sort code

From
Tom Lane
Date:
geek+@cmu.edu writes:
> Uhm, this all seems unnecessarily complicated.  Shouldn't the process
> look more like this:
> fp = open('tempfile');
> unlink('tempfile');

I did think of that, but it only solves the problem of making sure
the temp file goes away when you close it; it doesn't solve the problem
of making sure that you close it.   It's failure to release the file
descriptors that is causing backend crashes --- waste of diskspace is
bad also, but it's not the critical issue IMHO.  We *must* add cleanup
code to ensure that the FDs are closed after an abort; once we do that
it's essentially no extra code to unlink the files at the same time.

Doing the unlink right away would ensure that the temp file disappears
if the backend crashes before it gets to transaction commit/abort.
However, I regard that as a bad thing not a good thing, because it
would complicate debugging --- you might want to be able to see what
had been in the temp files.  You normally have to clean up a core file
to recover diskspace after a backend crash, so having to delete temp
files too doesn't seem like a big shortcoming.

> Of course, it's not NT safe, since I don't believe that NT provides
> for deleting open files (NT file libs sucks.

Don't think it works over NFS mounts, either.
        regards, tom lane


Re: [HACKERS] Re: Nasty resource-leak problem in sort code

From
Bruce Momjian
Date:
> This will fix several problems, including (a) failure to release file
> descriptors after an error; (b) failure to delete temp files after
> an error; (c) risk of running out of FDs if multiple sorts are invoked
> concurrently (not sure if this can actually happen in the current state
> of the code).  Centralizing the generation of temp file names in one
> place seems like a good idea too.
> 
> Unless I hear objections, I'll work on this this weekend.

This sounds like a good plan.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: Nasty resource-leak problem in sort code]

From
Bruce Momjian
Date:
> Uhm, this all seems unnecessarily complicated.  Shouldn't the process
> look more like this:
> fp = open('tempfile');
> unlink('tempfile');
> 
> This way, when the file is closed, the space is freed.  The only
> complication I can see is if backends need to share the file handle,
> or it needs to be re-opened.  This works with all sorts of temp-file
> situations.
> 
> Of course, it's not NT safe, since I don't believe that NT provides
> for deleting open files (NT file libs sucks.

Two problems.  First, we support NT, so we have to behave a little bit
to keep it happy.  Second, Tom is concerned about leaking file handles,
not just the files themselves.  He needs to call close() to release them
on transaction aborts.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026