Thread: Remove sort files
We have the TODO item: * Remove unused files during database vacuum or postmaster startup The following patch places sort files in a separate directory (created when first needed), and removes old sort files on postmaster startup. The only unusual part is the use of system("find...rm...") to remove the files in pg_sorttemp. Seemed cleaner than doing fancy directory walking code in C. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 Index: src/backend/postmaster/postmaster.c =================================================================== RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v retrieving revision 1.212 diff -c -r1.212 postmaster.c *** src/backend/postmaster/postmaster.c 2001/04/19 19:09:23 1.212 --- src/backend/postmaster/postmaster.c 2001/05/23 00:31:42 *************** *** 243,248 **** --- 243,249 ---- static void SignalChildren(int signal); static int CountChildren(void); static bool CreateOptsFile(int argc, char *argv[]); + static void RemovePgSorttemp(void); static pid_t SSDataBase(int xlop); *************** *** 595,600 **** --- 596,604 ---- if (!CreateDataDirLockFile(DataDir, true)) ExitPostmaster(1); + /* Remove old sort files */ + RemovePgSorttemp(); + /* * Establish input sockets. */ *************** *** 2449,2452 **** --- 2453,2474 ---- fclose(fp); return true; + } + + + /* + * Remove old sort files + */ + static void + RemovePgSorttemp(void) + { + char clear_pg_sorttemp[1024]; + + /* Don't remove directory in case it is a symlink */ + snprintf(clear_pg_sorttemp, sizeof(clear_pg_sorttemp), + "find \"%s\"/base -name pg_sorttemp -type d -exec sh -c \"rm -f {}/*\" \\;", + DataDir); + /* Make sure we have the full 'find' command */ + if (strlen(clear_pg_sorttemp)+1 < 1024) + system(clear_pg_sorttemp); } Index: src/backend/storage/file/fd.c =================================================================== RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/storage/file/fd.c,v retrieving revision 1.76 diff -c -r1.76 fd.c *** src/backend/storage/file/fd.c 2001/04/03 04:07:02 1.76 --- src/backend/storage/file/fd.c 2001/05/23 00:31:42 *************** *** 90,95 **** --- 90,97 ---- #define VFD_CLOSED (-1) + #define TEMP_SORT_DIR "pg_sorttemp" + #define FileIsValid(file) \ ((file) > 0 && (file) < (int) SizeVfdCache && VfdCache[file].fileName != NULL) *************** *** 742,762 **** File OpenTemporaryFile(void) { ! char tempfilename[64]; File file; /* * Generate a tempfile name that's unique within the current * transaction */ ! snprintf(tempfilename, sizeof(tempfilename), ! "pg_sorttemp%d.%ld", MyProcPid, tempFileCounter++); /* Open the file */ ! file = FileNameOpenFile(tempfilename, O_RDWR | O_CREAT | O_TRUNC | PG_BINARY, 0600); if (file <= 0) ! elog(ERROR, "Failed to create temporary file %s", tempfilename); /* Mark it for deletion at close or EOXact */ VfdCache[file].fdstate |= FD_TEMPORARY; --- 744,772 ---- File OpenTemporaryFile(void) { ! char tempfilepath[128]; File file; /* * Generate a tempfile name that's unique within the current * transaction */ ! snprintf(tempfilepath, sizeof(tempfilepath), ! "%s%c%d.%ld", TEMP_SORT_DIR, SEP_CHAR, MyProcPid, ! tempFileCounter++); /* Open the file */ ! file = FileNameOpenFile(tempfilepath, O_RDWR | O_CREAT | O_TRUNC | PG_BINARY, 0600); if (file <= 0) ! { ! /* mkdir could fail if some one else already created it */ ! mkdir(TEMP_SORT_DIR, S_IRWXU); ! file = FileNameOpenFile(tempfilepath, ! O_RDWR | O_CREAT | O_TRUNC | PG_BINARY, 0600); ! if (file <= 0) ! elog(ERROR, "Failed to create temporary file %s", tempfilepath); ! } /* Mark it for deletion at close or EOXact */ VfdCache[file].fdstate |= FD_TEMPORARY;
> The only unusual part is the use of system("find...rm...") to > remove the files in pg_sorttemp. Seemed cleaner than doing fancy > directory walking code in C. Should this figure out and use a path to the find command? Unless postgres sets CWD, is it possible as-is to coerce it into rming some files you did not intend to have rm'd? I didn't have/spend time looking at it with a fine toothcomb, but usually calling find and rm without an explicit path is a Bad Thing(tm). -Michael _________________________________________________________________ http://fastmail.ca/ - Fast Free Web Email for Canadians
> > The only unusual part is the use of system("find...rm...") to > > remove the files in pg_sorttemp. Seemed cleaner than doing fancy > > directory walking code in C. > > Should this figure out and use a path to the find command? Unless > postgres sets CWD, is it possible as-is to coerce it into rming some > files you did not intend to have rm'd? I didn't have/spend time > looking at it with a fine toothcomb, but usually calling find and rm > without an explicit path is a Bad Thing(tm). I believe it is in /usr/bin on every OS, right? My guess is I have to have configure find it. Let me get on that for 'find' and 'rm'. If they are not found, I will skip it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> > The only unusual part is the use of system("find...rm...") to > > remove the files in pg_sorttemp. Seemed cleaner than doing fancy > > directory walking code in C. > > Should this figure out and use a path to the find command? Unless > postgres sets CWD, is it possible as-is to coerce it into rming some > files you did not intend to have rm'd? I didn't have/spend time > looking at it with a fine toothcomb, but usually calling find and rm > without an explicit path is a Bad Thing(tm). I do use a full path for find: find DataDir -... Here is an updated patch which forces find and rm to come from /bin or /usr/bin. Seems safer that way. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 Index: configure.in =================================================================== RCS file: /home/projects/pgsql/cvsroot/pgsql/configure.in,v retrieving revision 1.127 diff -c -r1.127 configure.in *** configure.in 2001/05/16 17:24:10 1.127 --- configure.in 2001/05/23 01:09:21 *************** *** 622,627 **** --- 622,629 ---- PGAC_PATH_FLEX AC_PROG_LN_S AC_PROG_LD + AC_PROG_FIND + AC_PROG_RM AC_SUBST(LD) AC_SUBST(with_gnu_ld) case $host_os in sysv5uw*) Index: src/backend/postmaster/postmaster.c =================================================================== RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v retrieving revision 1.212 diff -c -r1.212 postmaster.c *** src/backend/postmaster/postmaster.c 2001/04/19 19:09:23 1.212 --- src/backend/postmaster/postmaster.c 2001/05/23 01:09:36 *************** *** 243,248 **** --- 243,249 ---- static void SignalChildren(int signal); static int CountChildren(void); static bool CreateOptsFile(int argc, char *argv[]); + static void RemovePgSorttemp(void); static pid_t SSDataBase(int xlop); *************** *** 595,600 **** --- 596,604 ---- if (!CreateDataDirLockFile(DataDir, true)) ExitPostmaster(1); + /* Remove old sort files */ + RemovePgSorttemp(); + /* * Establish input sockets. */ *************** *** 2449,2452 **** --- 2453,2477 ---- fclose(fp); return true; + } + + + /* + * Remove old sort files + */ + static void + RemovePgSorttemp(void) + { + char clear_pg_sorttemp[1024]; + + /* Don't remove directory in case it is a symlink */ + snprintf(clear_pg_sorttemp, sizeof(clear_pg_sorttemp), + "/bin/sh -c \ + 'PATH=/bin:/usr/bin; \ + export PATH; \ + find \"%s\"/base -name pg_sorttemp -type d -exec sh -c \"rm -f {}/*\" \\;'", + DataDir); + /* Make sure we have the full 'find' command */ + if (strlen(clear_pg_sorttemp)+1 < 1024) + system(clear_pg_sorttemp); } Index: src/backend/storage/file/fd.c =================================================================== RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/storage/file/fd.c,v retrieving revision 1.76 diff -c -r1.76 fd.c *** src/backend/storage/file/fd.c 2001/04/03 04:07:02 1.76 --- src/backend/storage/file/fd.c 2001/05/23 01:09:37 *************** *** 90,95 **** --- 90,97 ---- #define VFD_CLOSED (-1) + #define TEMP_SORT_DIR "pg_sorttemp" + #define FileIsValid(file) \ ((file) > 0 && (file) < (int) SizeVfdCache && VfdCache[file].fileName != NULL) *************** *** 742,762 **** File OpenTemporaryFile(void) { ! char tempfilename[64]; File file; /* * Generate a tempfile name that's unique within the current * transaction */ ! snprintf(tempfilename, sizeof(tempfilename), ! "pg_sorttemp%d.%ld", MyProcPid, tempFileCounter++); /* Open the file */ ! file = FileNameOpenFile(tempfilename, O_RDWR | O_CREAT | O_TRUNC | PG_BINARY, 0600); if (file <= 0) ! elog(ERROR, "Failed to create temporary file %s", tempfilename); /* Mark it for deletion at close or EOXact */ VfdCache[file].fdstate |= FD_TEMPORARY; --- 744,772 ---- File OpenTemporaryFile(void) { ! char tempfilepath[128]; File file; /* * Generate a tempfile name that's unique within the current * transaction */ ! snprintf(tempfilepath, sizeof(tempfilepath), ! "%s%c%d.%ld", TEMP_SORT_DIR, SEP_CHAR, MyProcPid, ! tempFileCounter++); /* Open the file */ ! file = FileNameOpenFile(tempfilepath, O_RDWR | O_CREAT | O_TRUNC | PG_BINARY, 0600); if (file <= 0) ! { ! /* mkdir could fail if some one else already created it */ ! mkdir(TEMP_SORT_DIR, S_IRWXU); ! file = FileNameOpenFile(tempfilepath, ! O_RDWR | O_CREAT | O_TRUNC | PG_BINARY, 0600); ! if (file <= 0) ! elog(ERROR, "Failed to create temporary file %s", tempfilepath); ! } /* Mark it for deletion at close or EOXact */ VfdCache[file].fdstate |= FD_TEMPORARY;
Bruce Momjian writes: > The only unusual part is the use of system("find...rm...") to remove the > files in pg_sorttemp. Seemed cleaner than doing fancy directory > walking code in C. 'find' is not in the portable subset of shell utilities, AFAIK. If you do put them into a separate directory you could use "rm -rf %s" on that directory. Why don't we put the sort files into /tmp and let the system figure out how to get rid of them? -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
> Bruce Momjian writes: > > > The only unusual part is the use of system("find...rm...") to remove the > > files in pg_sorttemp. Seemed cleaner than doing fancy directory > > walking code in C. > > 'find' is not in the portable subset of shell utilities, AFAIK. If you do > put them into a separate directory you could use "rm -rf %s" on that > directory. I did it that way so if pg_sorttemp is a symlink, I don't remove the symlink, just the files inside. The 'find' needs to be updated though to '! -file f' rather than '-type d' because it will not see symlinks, and I am not sure the find symlink find flag is portable. Anyway, I am still thinking about this. I haven't addressed tables that are orphaned by a rename aborting. I am no where near ready to apply this yet. > > Why don't we put the sort files into /tmp and let the system figure out > how to get rid of them? I am concerned that /tmp has size limitations that /data doesn't. I know on my BSD/OS, it limits to 75MB because it is a memory-based file system. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Here is an updated patch which forces find and rm to come from /bin or > /usr/bin. Seems safer that way. createdb and dropdb assume that they should use cp and rm as found in the postmaster's PATH. I see no good reason why this code shouldn't behave the same. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Anyway, I am still thinking about this. I haven't addressed tables that > > are orphaned by a rename aborting. > > What makes you think that needs to be addressed? We don't have that > problem anymore AFAIK. If one backend creates a new file for alter table, but crashes before modifying pg_class, doesn't that file just hang around? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Here is an updated patch which forces find and rm to come from /bin or > > /usr/bin. Seems safer that way. > > createdb and dropdb assume that they should use cp and rm as found in > the postmaster's PATH. I see no good reason why this code shouldn't > behave the same. OK. I thought people would think my use of find and rm in the postmaster code to be terrible, but I guess not. PATH stuff removed. Again, this is a work in progress. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Peter Eisentraut <peter_e@gmx.net> writes: > > Why don't we put the sort files into /tmp and let the system figure out > > how to get rid of them? > > Hard-wiring the assumption that they should be in /tmp would be VASTLY > worse than our current situation. Lots of people run with /tmp its own, > not very large partition. There are also serious security issues on > machines that don't have directory stickybits. > > I do not like the way Bruce is doing the find, either, but moving stuff > to /tmp isn't the answer. > > One reasonable idea is > > cd pg_sorttemp ; rm -f * > > which could potentially fail if there are LOTS of dead temp files > (enough to make the expanded rm command overrun your kernel limit on > command length) but if there are that many then you've got serious > problems anyway. OK, here is the new 'find' line: "find \"%s\"/base -name pg_sorttemp ! -type f -exec /bin/sh -c \"cd {}; rm -f *\" \\;", -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Anyway, I am still thinking about this. I haven't addressed tables that > are orphaned by a rename aborting. What makes you think that needs to be addressed? We don't have that problem anymore AFAIK. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > Why don't we put the sort files into /tmp and let the system figure out > how to get rid of them? Hard-wiring the assumption that they should be in /tmp would be VASTLY worse than our current situation. Lots of people run with /tmp its own, not very large partition. There are also serious security issues on machines that don't have directory stickybits. I do not like the way Bruce is doing the find, either, but moving stuff to /tmp isn't the answer. One reasonable idea is cd pg_sorttemp ; rm -f * which could potentially fail if there are LOTS of dead temp files (enough to make the expanded rm command overrun your kernel limit on command length) but if there are that many then you've got serious problems anyway. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> One reasonable idea is >> cd pg_sorttemp ; rm -f * > OK, here is the new 'find' line: No, the point is not to use find at all. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Bruce Momjian <pgman@candle.pha.pa.us> writes: > Anyway, I am still thinking about this. I haven't addressed tables that > are orphaned by a rename aborting. >> >> What makes you think that needs to be addressed? We don't have that >> problem anymore AFAIK. > If one backend creates a new file for alter table, but crashes before > modifying pg_class, doesn't that file just hang around? ALTER TABLE doesn't create any new files. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: >>> One reasonable idea is >>> cd pg_sorttemp ; rm -f * > >> OK, here is the new 'find' line: > > No, the point is not to use find at all. Suppose I'm creating temporary files to sort, on the supported platforms (anything non-windows AFAIK) can't you just create and unlink the file so long as you don't close it? -Michael _________________________________________________________________ http://fastmail.ca/ - Fast Free Web Email for Canadians
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> One reasonable idea is > >> cd pg_sorttemp ; rm -f * > > > OK, here is the new 'find' line: > > No, the point is not to use find at all. Actually, I can do: rm -f */pg_sorttemp/* New command is: snprintf(clear_pg_sorttemp, sizeof(clear_pg_sorttemp), "rm -f \"%s\"/base/*/pg_sorttemp/*", DataDir); -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Bruce Momjian <pgman@candle.pha.pa.us> writes: > One reasonable idea is > cd pg_sorttemp ; rm -f * >> > OK, here is the new 'find' line: >> >> No, the point is not to use find at all. > Actually, I can do: > rm -f */pg_sorttemp/* > New command is: > snprintf(clear_pg_sorttemp, sizeof(clear_pg_sorttemp), > "rm -f \"%s\"/base/*/pg_sorttemp/*", > DataDir); I said cd for a reason: if you do it that way, you've cut the maximum number of files you can delete by at least an order of magnitude, depending on how long DataDir is. Why do we need per-database sorttemp areas anyway? Seems better to have an installation-wide one. regards, tom lane
Bruce Momjian writes: > New command is: > > snprintf(clear_pg_sorttemp, sizeof(clear_pg_sorttemp), > "rm -f \"%s\"/base/*/pg_sorttemp/*", > DataDir); Why not put all sort files into a common directory and then do a simple readdir() loop in C? -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut <peter_e@gmx.net> writes: > Bruce Momjian writes: >> New command is: >> >> snprintf(clear_pg_sorttemp, sizeof(clear_pg_sorttemp), >> "rm -f \"%s\"/base/*/pg_sorttemp/*", >> DataDir); > Why not put all sort files into a common directory and then do a simple > readdir() loop in C? I was about to say that the potential portability headaches would be more trouble than the "feature" is worth. OTOH, I see that xlog.c is depending on readdir already, so maybe it's OK to use it. regards, tom lane
"Michael Richards" <michael@fastmail.ca> writes: > Suppose I'm creating temporary files to sort, on the supported > platforms (anything non-windows AFAIK) can't you just create and > unlink the file so long as you don't close it? No. (A) cygwin is considered a supported platform. (B) we use virtual-file references to these files; there is no guarantee that the file will be held open continuously for as long as we need it. regards, tom lane
> > New command is: > > > snprintf(clear_pg_sorttemp, sizeof(clear_pg_sorttemp), > > "rm -f \"%s\"/base/*/pg_sorttemp/*", > > DataDir); > > I said cd for a reason: if you do it that way, you've cut the maximum > number of files you can delete by at least an order of magnitude, > depending on how long DataDir is. > > Why do we need per-database sorttemp areas anyway? Seems better to have > an installation-wide one. OK, here is the new code: snprintf(clear_pg_sorttemp, sizeof(clear_pg_sorttemp), "sh -c '\ cd \"%s\"/base && \ ls | while read DIR; \ do \ export DIR; \ (cd \"$DIR\"/pg_sorttemp/ 2>/dev/null && rm -f *); \ done'", DataDir); -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Anyway, I am still thinking about this. I haven't addressed tables that > > are orphaned by a rename aborting. > >> > >> What makes you think that needs to be addressed? We don't have that > >> problem anymore AFAIK. > > > If one backend creates a new file for alter table, but crashes before > > modifying pg_class, doesn't that file just hang around? > > ALTER TABLE doesn't create any new files. What about CLUSTER? If we do DROP COLUMN by creating a new heap, we will use it then too, right? Are those the only two that create new files that could be orphaned? Now that we have file numbers, seems we can do DROP COLUMN now reliably, right? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > OK, here is the new code: > snprintf(clear_pg_sorttemp, sizeof(clear_pg_sorttemp), > "sh -c '\ > cd \"%s\"/base && \ > ls | while read DIR; \ > do \ > export DIR; \ > (cd \"$DIR\"/pg_sorttemp/ 2>/dev/null && rm -f *); \ > done'", > DataDir); A readdir() loop would be hardly any longer, and it'd be faster and more secure. Among other problems with the above code, we do not prohibit double-quote in database paths anymore ... regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > What about CLUSTER? If we do DROP COLUMN by creating a new heap, we > will use it then too, right? Are those the only two that create new > files that could be orphaned? It's not practical to detect files that are orphaned in that sense: you couldn't do it without scanning pg_class, which the postmaster is unprepared to do. Far more important, trying to clean up such files automatically is a HORRIBLY bad idea. If anything goes wrong, the thing will probably delete your entire database (for example: because pg_log lossage causes it to believe all pg_class tuples are uncommitted). That's a loose cannon I do not wish to have on our decks. Auto-deletion of sorttemp files is a safe endeavor because (a) you can reliably tell which ones those are by name, and (b) there's no possibility that you wanted the data in them. Neither condition holds for data files. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > OK, here is the new code: > > > snprintf(clear_pg_sorttemp, sizeof(clear_pg_sorttemp), > > "sh -c '\ > > cd \"%s\"/base && \ > > ls | while read DIR; \ > > do \ > > export DIR; \ > > (cd \"$DIR\"/pg_sorttemp/ 2>/dev/null && rm -f *); \ > > done'", > > DataDir); > > A readdir() loop would be hardly any longer, and it'd be faster and more > secure. Among other problems with the above code, we do not prohibit > double-quote in database paths anymore ... OK, I did the readdir() thing. I hope it is safe to unlink a file while in a readdir() loop. Patch attached. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 Index: src/backend/postmaster/postmaster.c =================================================================== RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v retrieving revision 1.212 diff -c -r1.212 postmaster.c *** src/backend/postmaster/postmaster.c 2001/04/19 19:09:23 1.212 --- src/backend/postmaster/postmaster.c 2001/05/24 00:11:55 *************** *** 58,63 **** --- 58,64 ---- #include <ctype.h> #include <sys/types.h> #include <sys/stat.h> + #include <dirent.h> #include <sys/time.h> #include <sys/socket.h> #include <errno.h> *************** *** 243,248 **** --- 244,250 ---- static void SignalChildren(int signal); static int CountChildren(void); static bool CreateOptsFile(int argc, char *argv[]); + static void RemovePgSorttemp(void); static pid_t SSDataBase(int xlop); *************** *** 595,600 **** --- 597,605 ---- if (!CreateDataDirLockFile(DataDir, true)) ExitPostmaster(1); + /* Remove old sort files */ + RemovePgSorttemp(); + /* * Establish input sockets. */ *************** *** 2449,2452 **** --- 2454,2498 ---- fclose(fp); return true; + } + + + /* + * Remove old sort files + */ + static void + RemovePgSorttemp(void) + { + char db_path[MAXPGPATH]; + char temp_path[MAXPGPATH]; + char rm_path[MAXPGPATH]; + DIR *db_dir; + DIR *temp_dir; + struct dirent *db_de; + struct dirent *temp_de; + + /* + * Cycle through pg_tempsort for all databases and + * and remove old sort files. + */ + /* trailing slash forces symlink following */ + snprintf(db_path, sizeof(db_path), "%s/base/", DataDir); + if ((db_dir = opendir(db_path)) != NULL) + while ((db_de = readdir(db_dir)) != NULL) + { + snprintf(temp_path, sizeof(temp_path), + "%s/%s/%s/", db_path, db_de->d_name, SORT_TEMP_DIR); + if ((temp_dir = opendir(temp_path)) != NULL) + while ((temp_de = readdir(temp_dir)) != NULL) + { + if (temp_de->d_type == DT_REG) + { + snprintf(rm_path, sizeof(temp_path), + "%s/%s/%s/%s", + db_path, db_de->d_name, + SORT_TEMP_DIR, temp_de->d_name); + unlink(rm_path); + } + } + } } Index: src/backend/storage/file/fd.c =================================================================== RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/storage/file/fd.c,v retrieving revision 1.76 diff -c -r1.76 fd.c *** src/backend/storage/file/fd.c 2001/04/03 04:07:02 1.76 --- src/backend/storage/file/fd.c 2001/05/24 00:11:55 *************** *** 742,762 **** File OpenTemporaryFile(void) { ! char tempfilename[64]; File file; /* * Generate a tempfile name that's unique within the current * transaction */ ! snprintf(tempfilename, sizeof(tempfilename), ! "pg_sorttemp%d.%ld", MyProcPid, tempFileCounter++); /* Open the file */ ! file = FileNameOpenFile(tempfilename, O_RDWR | O_CREAT | O_TRUNC | PG_BINARY, 0600); if (file <= 0) ! elog(ERROR, "Failed to create temporary file %s", tempfilename); /* Mark it for deletion at close or EOXact */ VfdCache[file].fdstate |= FD_TEMPORARY; --- 742,770 ---- File OpenTemporaryFile(void) { ! char tempfilepath[128]; File file; /* * Generate a tempfile name that's unique within the current * transaction */ ! snprintf(tempfilepath, sizeof(tempfilepath), ! "%s%c%d.%ld", SORT_TEMP_DIR, SEP_CHAR, MyProcPid, ! tempFileCounter++); /* Open the file */ ! file = FileNameOpenFile(tempfilepath, O_RDWR | O_CREAT | O_TRUNC | PG_BINARY, 0600); if (file <= 0) ! { ! /* mkdir could fail if some one else already created it */ ! mkdir(SORT_TEMP_DIR, S_IRWXU); ! file = FileNameOpenFile(tempfilepath, ! O_RDWR | O_CREAT | O_TRUNC | PG_BINARY, 0600); ! if (file <= 0) ! elog(ERROR, "Failed to create temporary file %s", tempfilepath); ! } /* Mark it for deletion at close or EOXact */ VfdCache[file].fdstate |= FD_TEMPORARY; Index: src/include/storage/fd.h =================================================================== RCS file: /home/projects/pgsql/cvsroot/pgsql/src/include/storage/fd.h,v retrieving revision 1.27 diff -c -r1.27 fd.h *** src/include/storage/fd.h 2001/02/18 04:39:42 1.27 --- src/include/storage/fd.h 2001/05/24 00:12:01 *************** *** 39,44 **** --- 39,46 ---- * FileSeek uses the standard UNIX lseek(2) flags. */ + #define SORT_TEMP_DIR "pg_sorttemp" + typedef char *FileName; typedef int File;
Bruce Momjian <pgman@candle.pha.pa.us> writes: > + if (temp_de->d_type == DT_REG) The d_type field, and the corresponding macros such as DT_REG, are not portable. The only portable field in the dirent structure is d_name. If you want to be really really super portable, you have to think about supporting direct.h and other header files. See AC_DIR_HEADER and AC_HEADER_DIRENT in the autoconf documentation. But these days probably every OS of interest supports dirent.h, which is defined by POSIX. Ian
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > + if (temp_de->d_type == DT_REG) > > The d_type field, and the corresponding macros such as DT_REG, are not > portable. > > The only portable field in the dirent structure is d_name. > > If you want to be really really super portable, you have to think > about supporting direct.h and other header files. See AC_DIR_HEADER > and AC_HEADER_DIRENT in the autoconf documentation. But these days > probably every OS of interest supports dirent.h, which is defined by > POSIX. Seems Vadim already added readdir() that does similar work for WAL files in xlog.c: while ((xlde = readdir(xldir)) != NULL) { if (strlen(xlde->d_name) == 16 && strspn(xlde->d_name, "0123456789ABCDEF") == 16 && strcmp(xlde->d_name, lastoff) <= 0) { elog(LOG, "MoveOfflineLogs: %s %s", (XLOG_archive_dir[0]) ? "archive" : "remove", xlde->d_name); sprintf(path, "%s%c%s", XLogDir, SEP_CHAR, xlde->d_name); if (XLOG_archive_dir[0] == 0) unlink(path); } errno = 0; } I will remove the _REG test and add the strspn test he has: if (strspn(temp_de->d_name, "0123456789.") == strlen(temp_de->dname)) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian writes: > OK, I did the readdir() thing. I hope it is safe to unlink a file while > in a readdir() loop. The only portable member of struct dirent is d_name. (Do you really expect non-regular files in the sort directory?) Also, lots of error checks (after opendir, readdir, mkdir, unlink) are missing. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
> Bruce Momjian writes: > > > OK, I did the readdir() thing. I hope it is safe to unlink a file while > > in a readdir() loop. > > The only portable member of struct dirent is d_name. (Do you really > expect non-regular files in the sort directory?) Also, lots of error > checks (after opendir, readdir, mkdir, unlink) are missing. Error checks missing in my code or elsewhere? Also, what should I do on an unlink failure? I thought of printing a message but wasn't sure: if ((db_dir = opendir(db_path)) != NULL) while ((db_de = readdir(db_dir)) != NULL) { snprintf(temp_path, sizeof(temp_path), "%s/%s/%s/", db_path, db_de->d_name, SORT_TEMP_DIR); if ((temp_dir = opendir(temp_path)) != NULL) while ((temp_de = readdir(temp_dir)) != NULL) { if (strspn(temp_de->d_name, "0123456789.") == strlen(temp_de->d_name)) { snprintf(rm_path, sizeof(temp_path), "%s/%s/%s/%s", db_path, db_de->d_name, SORT_TEMP_DIR, temp_de->d_name); unlink(rm_path); } } } -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026