Thread: What's left?
Hello, I think it's safe to say there is a working implementation of a signal handler. The one tricky part left is to identify some smart places for the backend to check the awaiting signal queue. The first one is easy: switch recv() with select() with a reasonable timeout and a poll. If and when the signal handler gets patched into CVS, is there anything else left that prevents the cvs version from compiling and linking? From what I understand, Claudio's fork/exec implementation is either complete or nearly complete. I would like very much to help any way possible in solving any last remaining issues. Once the CVS sources are compliable, it will be easier to make meaningful contributions. I'm really looking forward to testing and benchmarking the win32 port. A big thanks to all who continue to work so hard on this project. Merlin
> I would like very much to help any way possible in solving any last > remaining issues. Once the CVS sources are compliable, it will be > easier to make meaningful contributions. I'm really looking > forward to testing and benchmarking the win32 port. A big thanks > to all who continue to work so hard on this project. To answer the question, "what's left?" There is one patch I've submitted waiting for application (12/01/04), and I've got another patch almost ready (adds some missing functions, and fixes busted Makefiles etc) that, with Magnus's signal code, will give us a compilable, runnable, and somewhat buggy Win32 port. It will then be a matter of fixing things like: * installation directory issues (/usr/local/pgsql/bin won't work too well outside of the MingW environment :-) * general directory handling (ie. whitespaces in directory names; forward/backslash path canonicalization) * sync issues * any missing structs/items in shared memory * generally, running the test suite, and fixing whatever is busted (I'm at 41 tests passing now :-) I'm imagining we'll be in a position to be able to start on the above in as little as 2 weeks... Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli wrote: > * installation directory issues (/usr/local/pgsql/bin won't work too well >outside of the MingW environment :-) > Clearly we will need an installer for a binary distribution. But for now I suggest that the default prefix on Windows is C:\Program Files\PostgreSQL cheers andrew
Some fool wrote: > It will then be a matter of fixing things like: > * installation directory issues (/usr/local/pgsql/bin won't work too > well outside of the MingW environment :-) > * general directory handling (ie. whitespaces in directory names; > forward/backslash path canonicalization) > * sync issues > * any missing structs/items in shared memory > * generally, running the test suite, and fixing whatever is busted (I'm > at 41 tests passing now :-) One important thing I forgot, that someone could start looking at now: * backends keeping files open when other backends are trying to delete/rename them The port I wrote for here at work simply modified the functions in dirmod.c, to attempt the delete (or rename), and, on a failure identifiable as being presumably due to another process holding the file open, simply schedules the file for deletion at system start time using the Win32 API for doing so (hey, it is Windows, it is going to reboot sooner or later :-). In the case of rename, just copies the existing file and schedules the original for deletion. Ugly, and sometimes slow where we'd rather not be, but it gets us by. We must do better for the official port, and whilst better solutions are obviously conceivable, AFAICS they will require some amount of backend changes and therefore consent from main list. Someone might want to start looking at a nice, clean solution to this. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli <claudio.natoli@memetrics.com> writes: > One important thing I forgot, that someone could start looking at now: > * backends keeping files open when other backends are trying to > delete/rename them > We must do better for the official port, Why? The procedure you mentioned seems perfectly adequate to me, seeing that it's a bit of a corner case to start with. I cannot think of any way of "doing better" that wouldn't be far too invasive to be acceptable. regards, tom lane
Tom Lane wrote: > Claudio Natoli <claudio.natoli@memetrics.com> writes: > > One important thing I forgot, that someone could start looking at now: > > * backends keeping files open when other backends are trying to > > delete/rename them > > > We must do better for the official port, > > Why? The procedure you mentioned seems perfectly adequate to me, > seeing that it's a bit of a corner case to start with. Because, on occasion, I end up with GBs of log files hanging around. You can wrack up disk space real fast that way. > I cannot think of any way of "doing better" that wouldn't be far too > invasive to be acceptable. In that case, I'm more than happy to contribute the code for this solution along with the remaining changes I've got ahead of me, but, still, if someone thinks they can do better without a great deal of drama (which, like you, I think is unlikely) then I'm all for it. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
I have added these and your later mention of delete/rename of open files to the Win32 project page: http://momjian.postgresql.org/main/writings/pgsql/project --------------------------------------------------------------------------- Claudio Natoli wrote: > > > I would like very much to help any way possible in solving any last > > remaining issues. Once the CVS sources are compliable, it will be > > easier to make meaningful contributions. I'm really looking > > forward to testing and benchmarking the win32 port. A big thanks > > to all who continue to work so hard on this project. > > To answer the question, "what's left?" > > There is one patch I've submitted waiting for application (12/01/04), and > I've got another patch almost ready (adds some missing functions, and fixes > busted Makefiles etc) that, with Magnus's signal code, will give us a > compilable, runnable, and somewhat buggy Win32 port. > > It will then be a matter of fixing things like: > * installation directory issues (/usr/local/pgsql/bin won't work too well > outside of the MingW environment :-) > * general directory handling (ie. whitespaces in directory names; > forward/backslash path canonicalization) > * sync issues > * any missing structs/items in shared memory > * generally, running the test suite, and fixing whatever is busted (I'm at > 41 tests passing now :-) > > I'm imagining we'll be in a position to be able to start on the above in as > little as 2 weeks... > > Cheers, > Claudio > > --- > Certain disclaimers and policies apply to all email sent from Memetrics. > For the full text of these disclaimers and policies see > <a > href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em > ailpolicy.html</a> > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane wrote: > Claudio Natoli <claudio.natoli@memetrics.com> writes: > > One important thing I forgot, that someone could start looking at now: > > * backends keeping files open when other backends are trying to > > delete/rename them > > > We must do better for the official port, > > Why? The procedure you mentioned seems perfectly adequate to me, > seeing that it's a bit of a corner case to start with. I don't see this as a corner case, except it being a corner case operating system. :-) I think it will very likely rename/unlink will fail because of the file descriptor cache kept by each backend. I am attaching dir.c from the PeerDirect port. It handles unlink failures by appending the failed file name to a file that is later read and another unlink attempted. Perhaps this is something we can do, and have try unlinks after each checkpoint. PeerDirect handles rename by just looping. We really can't delay a rename. There is discussion in the Win32 TODO detail that goes over some options, I think. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 #include "port/win.h" #define WIN32_LEAN_AND_MEAN /* Exclude rarely-used stuff from Windows headers */ #include "windows.h" #include <stdlib.h> #include <stdio.h> #ifdef _LIB #include "postgres.h" #include "miscadmin.h" #endif int win32_rmdir(const char *dirname) { BOOL ans = RemoveDirectory(dirname); if (!ans && GetLastError() == ERROR_DIR_NOT_EMPTY) { /* Blow away the contents recursively and try again */ WIN32_FIND_DATA dir_find_data; char buffer[MAX_PATH*2 + 1]; HANDLE find_handle; strcpy(buffer, dirname); strcat(buffer, "\\*"); ans = TRUE; find_handle = FindFirstFile(buffer, &dir_find_data); while (find_handle && find_handle != INVALID_HANDLE_VALUE) { if (strcmp(dir_find_data.cFileName, ".") != 0 && strcmp(dir_find_data.cFileName, "..") != 0) { strcpy(buffer + strlen(dirname) + 1, dir_find_data.cFileName); if (dir_find_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) ans = (win32_rmdir(buffer) == 0); else ans = DeleteFile(buffer); } if (!ans || !FindNextFile(find_handle, &dir_find_data)) break; } FindClose(find_handle); ans = ans && RemoveDirectory(dirname); } return (ans ? 0 : -1); } /* Recursively copy srcdir to destdir. Destdir is created * if needed. If destdir exists already but is not a * directory, it is an error. Returns 0 for success, * -1 otherwise. */ int copydir(const char* srcdir, const char* destdir) { WIN32_FIND_DATA dir_find_data; char src_buffer[MAX_PATH*2 + 1]; char dest_buffer[MAX_PATH*2 + 1]; HANDLE find_handle = FindFirstFile(destdir, &dir_find_data); BOOL ans; if (find_handle && (find_handle != INVALID_HANDLE_VALUE)) { /* destdir exists, now make sure it is a directory */ if (dir_find_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) ans = 1; else ans = 0; FindClose(find_handle); } else { /* Create the destdir */ ans = CreateDirectory(destdir, NULL); } FindClose(find_handle); if (!ans) return -1; /* Get ready to do some copying */ strcpy(src_buffer, srcdir); strcat(src_buffer, "\\*"); strcpy(dest_buffer, destdir); strcat(dest_buffer, "\\"); find_handle = FindFirstFile(src_buffer, &dir_find_data); if (find_handle && find_handle != INVALID_HANDLE_VALUE) { do { if (strcmp(dir_find_data.cFileName, ".") == 0) continue; if (strcmp(dir_find_data.cFileName, "..") == 0) continue; strcpy(src_buffer + strlen(srcdir) + 1, dir_find_data.cFileName); strcpy(dest_buffer + strlen(destdir) + 1, dir_find_data.cFileName); if (dir_find_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) ans = (copydir(src_buffer, dest_buffer) == 0); else ans = CopyFile(src_buffer, dest_buffer, FALSE); if (!ans) break; } while (FindNextFile(find_handle, &dir_find_data)); FindClose(find_handle); } return (ans ? 0 : -1); } #define VALID_FLAG 0xcade typedef struct { DWORD valid_flag; char dir_name[MAX_PATH + 1]; dirent *dir_info; HANDLE search_handle; } pg_win32_dir_struct; DIR* opendir(const char* name) { WIN32_FIND_DATA dir_find_data; HANDLE file_handle = FindFirstFile(name, &dir_find_data); pg_win32_dir_struct* the_dir = NULL; if (file_handle && (file_handle != INVALID_HANDLE_VALUE)) { FindClose(file_handle); if (dir_find_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) { /* Found a decent directory -- return it */ the_dir = (pg_win32_dir_struct*)malloc(sizeof(pg_win32_dir_struct)); the_dir->valid_flag = VALID_FLAG; strcpy(the_dir->dir_name, name); strcat(the_dir->dir_name, "/"); the_dir->dir_info = (dirent*)malloc(sizeof(dirent)); the_dir->search_handle = NULL; } } return the_dir; } dirent* readdir(DIR* dir) { WIN32_FIND_DATA find_data; static dirent s_dirent; pg_win32_dir_struct* the_dir = (pg_win32_dir_struct*)dir; if (!the_dir || the_dir->valid_flag != VALID_FLAG) /* Bad user -- gave invalid DIR */ return NULL; if (the_dir->search_handle == NULL) { /* This is the first call */ char buffer[MAX_PATH+1]; HANDLE file_handle; strcpy(buffer, the_dir->dir_name); strcat(buffer, "*"); file_handle = FindFirstFile(buffer, &find_data); if (file_handle && (file_handle != INVALID_HANDLE_VALUE)) { the_dir->search_handle = file_handle; } } else if (!FindNextFile(the_dir->search_handle, &find_data)) { FindClose(the_dir->search_handle); the_dir->search_handle = NULL; } if (the_dir->search_handle) { strcpy(the_dir->dir_info->d_name, find_data.cFileName); the_dir->dir_info->d_type = find_data.dwFileAttributes; return the_dir->dir_info; } else return NULL; } int closedir(DIR* dir) { pg_win32_dir_struct* the_dir = (pg_win32_dir_struct*)dir; if (the_dir && the_dir->valid_flag == VALID_FLAG) { the_dir->valid_flag = -1; if (the_dir->search_handle) FindClose(the_dir->search_handle); free(the_dir->dir_info); the_dir->dir_info = ((dirent*)1); free(the_dir); } return 0; } #ifdef _LIB #define UNLINK_LATER_FILE "pg_win32_unlink_later.data" BOOL win32_save_unlink_for_later(char* filename) { FILE* later_file; char buffer[1024]; int count = 0; sprintf(buffer, "%s/global/%s", DataDir, UNLINK_LATER_FILE); later_file = fopen(buffer, "a+"); while (!later_file && count++ < 5) { Sleep(100); later_file = fopen(buffer, "a+"); } if (later_file) { SYSTEMTIME sys_time; GetLocalTime(&sys_time); fprintf(later_file, "%02hu/%02hu/%04hu %02hu:%02hu:%02hu\t" "%s\n", sys_time.wMonth, sys_time.wDay, sys_time.wYear, sys_time.wHour, sys_time.wMinute, sys_time.wSecond, filename); fclose(later_file); elog(NOTICE, "Could not delete %s -- will try again later", filename); return true; } else return false; } void win32_unlink_leftover_files() { FILE* later_file; char later_file_path[1024]; sprintf(later_file_path, "%s/global/%s", DataDir, UNLINK_LATER_FILE); later_file = fopen(later_file_path, "r"); if (later_file) { char temp_file_path[1024], filename[1024]; FILE* temp_file = NULL; SYSTEMTIME orig_time; memset(&orig_time, 0, sizeof(SYSTEMTIME)); while (fscanf(later_file, "%02hu/%02hu/%04hu %02hu:%02hu:%02hu\t%s\n", &orig_time.wMonth, &orig_time.wDay, &orig_time.wYear, &orig_time.wHour, &orig_time.wMinute, &orig_time.wSecond, filename) != EOF) { if (DeleteFile(filename)) elog(DEBUG, "Successfully deleted file %s, which had been postponed from an earlier attempt", filename); else if (GetLastError() != ERROR_FILE_NOT_FOUND) { if (!temp_file) { sprintf(temp_file_path, "%s/global/%s.%d", DataDir, UNLINK_LATER_FILE, getpid()); temp_file = fopen(temp_file_path, "w"); } if (temp_file) { fprintf(temp_file, "%02hu/%02hu/%04hu %02hu:%02hu:%02hu\t" "%s\n", orig_time.wMonth, orig_time.wDay, orig_time.wYear, orig_time.wHour, orig_time.wMinute, orig_time.wSecond, filename); } else { elog(DEBUG, "Could not delete file %s. Please delete by hand.", filename); } } } fclose(later_file); DeleteFile(later_file_path); if (temp_file) { fclose(temp_file); rename(temp_file_path, later_file_path); } } } #endif /* _LIB */
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I think it will very likely rename/unlink will fail because of the file > descriptor cache kept by each backend. Hmm ... you're probably right. Okay, it's a more significant issue than I thought. > I am attaching dir.c from the PeerDirect port. It handles unlink > failures by appending the failed file name to a file that is later read > and another unlink attempted. Perhaps this is something we can do, and > have try unlinks after each checkpoint. That seems like a possibility. The open files will get closed very soon after the delete occurs (as a byproduct of relcache flush), so we don't need very much of a delay. Next checkpoint sounds reasonable. > PeerDirect handles rename by just looping. We really can't delay a > rename. There is discussion in the Win32 TODO detail that goes over > some options, I think. Do we really have any problem with rename? We don't rename table files. The renames I can think of are renaming temp files into place as permanent ones, and there would be no open references to such a file. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I think it will very likely rename/unlink will fail because of the file > > descriptor cache kept by each backend. > > Hmm ... you're probably right. Okay, it's a more significant issue than > I thought. > > > I am attaching dir.c from the PeerDirect port. It handles unlink > > failures by appending the failed file name to a file that is later read > > and another unlink attempted. Perhaps this is something we can do, and > > have try unlinks after each checkpoint. > > That seems like a possibility. The open files will get closed very soon > after the delete occurs (as a byproduct of relcache flush), so we don't > need very much of a delay. Next checkpoint sounds reasonable. Good. I am glad for the recache closing because we were going to need something like that. > > PeerDirect handles rename by just looping. We really can't delay a > > rename. There is discussion in the Win32 TODO detail that goes over > > some options, I think. > > Do we really have any problem with rename? We don't rename table files. > The renames I can think of are renaming temp files into place as > permanent ones, and there would be no open references to such a file. We do have a problem. It is with cache files read on startup, like pg_pwd. We can generate the file as temp, but we have to slide it in while a backend is not reading it. On a busy system, I am not sure how large a window we will get for the rename. The rename is all centralized in port/dirmod.c, so we can deal with it there, whatever the solution. We also have to do the rename during xact close because we need to hold locks so we are sure the files are written in the same order that they modify pg_shadow, waiting a long time for the rename is a serious problem. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgman wrote: > > > PeerDirect handles rename by just looping. We really can't delay a > > > rename. There is discussion in the Win32 TODO detail that goes over > > > some options, I think. > > > > Do we really have any problem with rename? We don't rename table files. > > The renames I can think of are renaming temp files into place as > > permanent ones, and there would be no open references to such a file. > > We do have a problem. It is with cache files read on startup, like > pg_pwd. We can generate the file as temp, but we have to slide it in > while a backend is not reading it. On a busy system, I am not sure how > large a window we will get for the rename. The rename is all > centralized in port/dirmod.c, so we can deal with it there, whatever the > solution. > > We also have to do the rename during xact close because we need to hold > locks so we are sure the files are written in the same order that they > modify pg_shadow, waiting a long time for the rename is a serious > problem. I think I have a solution to this. The problem with rename is that if the file is open under win32, we can't rename, so we loop, but we are still holding locks. My idea is to do this: grab lock(e.g. pg_shadow) write temp file rename temp file to realfile.new release lock rename realfile.new to realfile In this way, no one ever has the rename file open while we are holding the locks, and we can loop without holding an exclusive lock on pg_shadow, and file writes remain in order. It's so easy, I think I could code it myself. :-) This, along with the idea of having the checkpoint delete files that are pending should solve both problems for us. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > In this way, no one ever has the rename file open while we are holding > the locks, and we can loop without holding an exclusive lock on > pg_shadow, and file writes remain in order. You're doing this where exactly, and are certain that you are holding no locks why exactly? And if you aren't holding a lock, what prevents concurrency bugs? regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > In this way, no one ever has the rename file open while we are holding > > the locks, and we can loop without holding an exclusive lock on > > pg_shadow, and file writes remain in order. > > You're doing this where exactly, and are certain that you are holding no > locks why exactly? And if you aren't holding a lock, what prevents > concurrency bugs? I am looking now at the relcache file, pg_pwd and pg_group. I am sure I am holding some locks, but not an exclusive lock on e.g. pg_shadow. I am working on a patch now. I don't expect to eliminate the looping for rename, but to eliminate holding exclusive locks while doing the rename to a file actively being read. By using realfile.new, the first rename is only being done on a file that is never opened, just renamed, which should be quick. I can't think of a cleaner solution. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > In this way, no one ever has the rename file open while we are holding > > the locks, and we can loop without holding an exclusive lock on > > pg_shadow, and file writes remain in order. > > You're doing this where exactly, and are certain that you are holding no > locks why exactly? And if you aren't holding a lock, what prevents > concurrency bugs? Oh, for concurrency bugs, you make realfile.new while holding the exclusive lock, so someone could come in later and replace realfile.new while I am in the rename loop, but then I just install theirs instead. I could install someone who has just done the rename to realfile.new but not tried the rename from realfile.new to realfile, but that seems OK. They will just fine the file missing and fail on the rename, which is OK. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073