Thread: File leak?
template1=# BEGIN; BEGIN template1=# CREATE TABLE foobar (foo char(10)); CREATE TABLE template1=# select relname, relfilenode from pg_class where relname='foobar';relname | relfilenode ---------+-------------foobar | 66372 (1 row) > killall -9 postmaster > ll data/base/1/ -rw------- 1 hlinnaka hlinnaka 0 Jun 12 15:33 data/base/1/66372 Unless I'm missing something, that file is left there forever. Not such a big deal with an empty file, but if you create a new table and load it with data in the same transaction... Can you guys confirm? The problem seems to be in storage/smgr.c. The code keeps track of files created in the transaction in backend memory. That information is lost on crash, and there is nothing to clean up the mess later. I wonder if we could clean up those lost files on database recovery or vacuum. Do we have a list of valid relfilenodes somewhere? Is pg_class enough? - Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes: > I wonder if we could clean up those lost files on database recovery or > vacuum. There is a TODO for this, but it seems exceedingly low priority to me. In any case I'd not recommend troubling to work on the problem until the tablespaces merry-go-round comes to a complete stop, since that restructuring will likely change what you'd need to do to identify unreferenced files. regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: > > I wonder if we could clean up those lost files on database recovery or > > vacuum. > > There is a TODO for this, but it seems exceedingly low priority to me. > > In any case I'd not recommend troubling to work on the problem until > the tablespaces merry-go-round comes to a complete stop, since that > restructuring will likely change what you'd need to do to identify > unreferenced files. Agreed on the need to wait until after tablespaces. I have an old patch that does lookups on postmaster start to check for unreferenced files, but it wasn't perfect. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Sat, 12 Jun 2004, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: > > I wonder if we could clean up those lost files on database recovery or > > vacuum. > > There is a TODO for this, but it seems exceedingly low priority to me. Are you sure? I read through the TODO list but couldn't find it. > In any case I'd not recommend troubling to work on the problem until > the tablespaces merry-go-round comes to a complete stop, since that > restructuring will likely change what you'd need to do to identify > unreferenced files. Ok. I'll to take a look at this later. - Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On Sat, 12 Jun 2004, Tom Lane wrote: >> Heikki Linnakangas <hlinnaka@iki.fi> writes: >>> I wonder if we could clean up those lost files on database recovery or >>> vacuum. >> >> There is a TODO for this, but it seems exceedingly low priority to me. > Are you sure? I read through the TODO list but couldn't find it. Well, there used to be: 7.4 TODO has * Remove unreferenced table files and temp tables during database vacuum or postmaster startup (Bruce) Now that I think about it, I believe Bruce recently removed this on my advice; I was thinking that the problem shouldn't occur anymore now that we WAL-log file creation and deletion. But actually the present form of the WAL entries doesn't ensure that a file created by a transaction that crashes before committing will go away, because file deletion actions are only logged (and replayed) at transaction commit/abort. So it probably should go back in. Or else we could add more WAL logging (viz, log at the instant of file creation, and the replayer would have to keep track of whether it sees the creating transaction commit and delete the file if not). regards, tom lane
Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: > > On Sat, 12 Jun 2004, Tom Lane wrote: > >> Heikki Linnakangas <hlinnaka@iki.fi> writes: > >>> I wonder if we could clean up those lost files on database recovery or > >>> vacuum. > >> > >> There is a TODO for this, but it seems exceedingly low priority to me. > > > Are you sure? I read through the TODO list but couldn't find it. > > Well, there used to be: 7.4 TODO has > > * Remove unreferenced table files and temp tables during database vacuum > or postmaster startup (Bruce) > > Now that I think about it, I believe Bruce recently removed this on my > advice; I was thinking that the problem shouldn't occur anymore now that True. > we WAL-log file creation and deletion. But actually the present form of > the WAL entries doesn't ensure that a file created by a transaction that > crashes before committing will go away, because file deletion actions > are only logged (and replayed) at transaction commit/abort. So it > probably should go back in. Or else we could add more WAL logging Wording updated to: * Remove unreferenced table files created by a transactions that were in-progress when the server crashed > (viz, log at the instant of file creation, and the replayer would have > to keep track of whether it sees the creating transaction commit and > delete the file if not). I don't see how we could WAL log it because we don't fsync the WAL until our transaction completes, right, or are you thinking we would do a special fsync when we add the record? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> (viz, log at the instant of file creation, and the replayer would have >> to keep track of whether it sees the creating transaction commit and >> delete the file if not). > I don't see how we could WAL log it because we don't fsync the WAL until > our transaction completes, right, or are you thinking we would do a > special fsync when we add the record? Right, we would have to XLogFlush the file-creation WAL record before we could actually create the file. This is in line with the standard WAL rule: the WAL record must hit disk before the data file change it describes does. Assuming that the filesystem fsync's the created inode immediately, that means we have to flush first. I'm not sure what the performance implications of this would be; it's likely that pushing the cost somewhere else would be better. regards, tom lane
On Sun, 13 Jun 2004, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> (viz, log at the instant of file creation, and the replayer would have > >> to keep track of whether it sees the creating transaction commit and > >> delete the file if not). > > > I don't see how we could WAL log it because we don't fsync the WAL until > > our transaction completes, right, or are you thinking we would do a > > special fsync when we add the record? > > Right, we would have to XLogFlush the file-creation WAL record before we > could actually create the file. This is in line with the standard WAL > rule: the WAL record must hit disk before the data file change it > describes does. Assuming that the filesystem fsync's the created inode > immediately, that means we have to flush first. I'm afraid that's not enough. Checkpoints spoil it, think: 1. CREATE TABLE foobar ... 2. INSERT .... 3. <checkpoint> 4. <crash> The replay would not see the file-creation WAL record. We need some additional stash for the pending file-creations to make them survive checkpoints. > I'm not sure what the performance implications of this would be; it's > likely that pushing the cost somewhere else would be better. I don't think that file creation is that common for it to matter.. - Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes: > I'm afraid that's not enough. Checkpoints spoil it, think: > 1. CREATE TABLE foobar ... > 2. INSERT .... > 3. <checkpoint> > 4. <crash> > The replay would not see the file-creation WAL record. Good point. That makes it messy enough that we probably don't want to do it that way. Scan-for-unreferenced-files is looking a lot more robust (although it has its own interesting race-condition issues if you try to do it in a live system). >> I'm not sure what the performance implications of this would be; it's >> likely that pushing the cost somewhere else would be better. > I don't think that file creation is that common for it to matter.. Maybe not for regular tables, but for temp tables I'm less convinced. If we could do the unreferenced-file scan only at completion of a crash recovery then it'd be zero cost in all normal paths ... regards, tom lane
Bruce Momjian wrote: > Tom Lane wrote: >>Now that I think about it, I believe Bruce recently removed this on my >>advice; I was thinking that the problem shouldn't occur anymore now that > > > True. > > >>we WAL-log file creation and deletion. But actually the present form of >>the WAL entries doesn't ensure that a file created by a transaction that >>crashes before committing will go away, because file deletion actions >>are only logged (and replayed) at transaction commit/abort. So it >>probably should go back in. Or else we could add more WAL logging > > > Wording updated to: > > * Remove unreferenced table files created by a transactions that were > in-progress when the server crashed I don't think is a good idea put the words: "when the server crashed" in a TODO list, may be is better write: "when the server is killed abruptly". My 2 cents. Regards Gaetano Mendola