Thread: File leak?

File leak?

From
Heikki Linnakangas
Date:
template1=# BEGIN;
BEGIN
template1=# CREATE TABLE foobar (foo char(10));
CREATE TABLE
template1=# select relname, relfilenode from pg_class where
relname='foobar';relname | relfilenode
---------+-------------foobar  |       66372
(1 row)

> killall -9 postmaster
> ll data/base/1/
-rw-------  1 hlinnaka hlinnaka 0 Jun 12 15:33 data/base/1/66372

Unless I'm missing something, that file is left there forever. Not such a
big deal with an empty file, but if you create a new table and load it
with data in the same transaction...

Can you guys confirm?

The problem seems to be in storage/smgr.c. The code keeps track of files
created in the transaction in backend memory. That information is lost on
crash, and there is nothing to clean up the mess later.

I wonder if we could clean up those lost files on database recovery or
vacuum. Do we have a list of valid relfilenodes somewhere? Is pg_class
enough?

- Heikki



Re: File leak?

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> I wonder if we could clean up those lost files on database recovery or
> vacuum.

There is a TODO for this, but it seems exceedingly low priority to me.

In any case I'd not recommend troubling to work on the problem until
the tablespaces merry-go-round comes to a complete stop, since that
restructuring will likely change what you'd need to do to identify
unreferenced files.
        regards, tom lane


Re: File leak?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > I wonder if we could clean up those lost files on database recovery or
> > vacuum.
> 
> There is a TODO for this, but it seems exceedingly low priority to me.
> 
> In any case I'd not recommend troubling to work on the problem until
> the tablespaces merry-go-round comes to a complete stop, since that
> restructuring will likely change what you'd need to do to identify
> unreferenced files.

Agreed on the need to wait until after tablespaces. I have an old patch
that does lookups on postmaster start to check for unreferenced files,
but it wasn't perfect.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: File leak?

From
Heikki Linnakangas
Date:
On Sat, 12 Jun 2004, Tom Lane wrote:

> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > I wonder if we could clean up those lost files on database recovery or
> > vacuum.
>
> There is a TODO for this, but it seems exceedingly low priority to me.

Are you sure? I read through the TODO list but couldn't find it.

> In any case I'd not recommend troubling to work on the problem until
> the tablespaces merry-go-round comes to a complete stop, since that
> restructuring will likely change what you'd need to do to identify
> unreferenced files.

Ok. I'll to take a look at this later.

- Heikki



Re: File leak?

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On Sat, 12 Jun 2004, Tom Lane wrote:
>> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>>> I wonder if we could clean up those lost files on database recovery or
>>> vacuum.
>> 
>> There is a TODO for this, but it seems exceedingly low priority to me.

> Are you sure? I read through the TODO list but couldn't find it.

Well, there used to be: 7.4 TODO has

* Remove unreferenced table files and temp tables during database vacuum or postmaster startup (Bruce)

Now that I think about it, I believe Bruce recently removed this on my
advice; I was thinking that the problem shouldn't occur anymore now that
we WAL-log file creation and deletion.  But actually the present form of
the WAL entries doesn't ensure that a file created by a transaction that
crashes before committing will go away, because file deletion actions
are only logged (and replayed) at transaction commit/abort.  So it
probably should go back in.  Or else we could add more WAL logging
(viz, log at the instant of file creation, and the replayer would have
to keep track of whether it sees the creating transaction commit and
delete the file if not).
        regards, tom lane


Re: File leak?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > On Sat, 12 Jun 2004, Tom Lane wrote:
> >> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> >>> I wonder if we could clean up those lost files on database recovery or
> >>> vacuum.
> >> 
> >> There is a TODO for this, but it seems exceedingly low priority to me.
> 
> > Are you sure? I read through the TODO list but couldn't find it.
> 
> Well, there used to be: 7.4 TODO has
> 
> * Remove unreferenced table files and temp tables during database vacuum
>   or postmaster startup (Bruce)
> 
> Now that I think about it, I believe Bruce recently removed this on my
> advice; I was thinking that the problem shouldn't occur anymore now that

True.

> we WAL-log file creation and deletion.  But actually the present form of
> the WAL entries doesn't ensure that a file created by a transaction that
> crashes before committing will go away, because file deletion actions
> are only logged (and replayed) at transaction commit/abort.  So it
> probably should go back in.  Or else we could add more WAL logging

Wording updated to:

* Remove unreferenced table files created by a transactions that were in-progress when the server crashed

> (viz, log at the instant of file creation, and the replayer would have
> to keep track of whether it sees the creating transaction commit and
> delete the file if not).

I don't see how we could WAL log it because we don't fsync the WAL until
our transaction completes, right, or are you thinking we would do a
special fsync when we add the record?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: File leak?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> (viz, log at the instant of file creation, and the replayer would have
>> to keep track of whether it sees the creating transaction commit and
>> delete the file if not).

> I don't see how we could WAL log it because we don't fsync the WAL until
> our transaction completes, right, or are you thinking we would do a
> special fsync when we add the record?

Right, we would have to XLogFlush the file-creation WAL record before we
could actually create the file.  This is in line with the standard WAL
rule: the WAL record must hit disk before the data file change it
describes does.  Assuming that the filesystem fsync's the created inode
immediately, that means we have to flush first.

I'm not sure what the performance implications of this would be; it's
likely that pushing the cost somewhere else would be better.
        regards, tom lane


Re: File leak?

From
Heikki Linnakangas
Date:
On Sun, 13 Jun 2004, Tom Lane wrote:

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> (viz, log at the instant of file creation, and the replayer would have
> >> to keep track of whether it sees the creating transaction commit and
> >> delete the file if not).
>
> > I don't see how we could WAL log it because we don't fsync the WAL until
> > our transaction completes, right, or are you thinking we would do a
> > special fsync when we add the record?
>
> Right, we would have to XLogFlush the file-creation WAL record before we
> could actually create the file.  This is in line with the standard WAL
> rule: the WAL record must hit disk before the data file change it
> describes does.  Assuming that the filesystem fsync's the created inode
> immediately, that means we have to flush first.

I'm afraid that's not enough. Checkpoints spoil it, think:

1. CREATE TABLE foobar ...
2. INSERT ....
3. <checkpoint>
4. <crash>

The replay would not see the file-creation WAL record.

We need some additional stash for the pending file-creations to make them
survive checkpoints.

> I'm not sure what the performance implications of this would be; it's
> likely that pushing the cost somewhere else would be better.

I don't think that file creation is that common for it to matter..

- Heikki



Re: File leak?

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> I'm afraid that's not enough. Checkpoints spoil it, think:

> 1. CREATE TABLE foobar ...
> 2. INSERT ....
> 3. <checkpoint>
> 4. <crash>

> The replay would not see the file-creation WAL record.

Good point.  That makes it messy enough that we probably don't want to
do it that way.  Scan-for-unreferenced-files is looking a lot more
robust (although it has its own interesting race-condition issues if
you try to do it in a live system).

>> I'm not sure what the performance implications of this would be; it's
>> likely that pushing the cost somewhere else would be better.

> I don't think that file creation is that common for it to matter..

Maybe not for regular tables, but for temp tables I'm less convinced.
If we could do the unreferenced-file scan only at completion of a crash
recovery then it'd be zero cost in all normal paths ...
        regards, tom lane


Re: File leak?

From
Gaetano Mendola
Date:
Bruce Momjian wrote:

> Tom Lane wrote:
>>Now that I think about it, I believe Bruce recently removed this on my
>>advice; I was thinking that the problem shouldn't occur anymore now that
> 
> 
> True.
> 
> 
>>we WAL-log file creation and deletion.  But actually the present form of
>>the WAL entries doesn't ensure that a file created by a transaction that
>>crashes before committing will go away, because file deletion actions
>>are only logged (and replayed) at transaction commit/abort.  So it
>>probably should go back in.  Or else we could add more WAL logging
> 
> 
> Wording updated to:
> 
> * Remove unreferenced table files created by a transactions that were
>   in-progress when the server crashed

I don't think is a good idea put the words: "when the server crashed" in a TODO
list, may be is better write: "when the server is killed abruptly".

My 2 cents.


Regards
Gaetano Mendola