Thread: Long paths for tablespace leads to uninterruptible hang in Windows
One of the user's of PostgreSQL has reported that if tablespace path is long, it leads to hang and the hang is unbreakable. Simple testcase to reproduce hang is: a. initdb -D E:\WorkSpace\PostgreSQL\master\RM30253_Data\aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\db b. Create tablespace tbs location 'E:\WorkSpace\PostgreSQL\master\Data\idb'; c. Drop tablespace tbs; In this test path length used in 174, but I observed that hang occurs if the length is greater than 130 (approx.) I have tested this test on few different Windows platforms (Windows XP 32-bit, Windows 7 64bit). Hang occurs on Windows7 64bit. User has reported it on Windows 2008 64bit. On further analysis, I found that hang occurs in some of Windows API(FindFirstFile, RemoveDirectroy) when symlink path (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these API's. For above testcase, it will hang in path destroy_tablespace_directories->ReadDir->readdir->FindFirstFile I have tried using mklink /J (utility in Windows 7 and above) to create Junction point instead of current way in pgsymlink, it still hangs in similar way. Some of the ways to resolve the problem are described as below: 1. I found that if the link path is accessed as a full path during readdir or stat, it works fine. For example in function destroy_tablespace_directories(), the path used to access tablespace directory is of form "pg_tblspc/16235/PG_9.4_201309051" by using below sprintf sprintf(linkloc_with_version_dir, "pg_tblspc/%u/%s",tablespaceoid,TABLESPACE_VERSION_DIRECTORY); Now when it tries to access this path it is assumed in code that corresponding OS API will take care of considering this path w.r.t current working directory, which is right as per specs, however as it hangs in OS API (FindFirstFile) if path length > 130 for symlink and if try to use full path instead of starting with pg_tblspc, it works fine. So one way to resolve this issue is to use full path for symbolic link path access instead of relying on OS to use full path. 2. Resolve symbolic link to actual path in code whenever we tries to access it using pgreadlink. It is already used in pg_basebackup. 3. One another way is to check in code (initdb and create tablespace) to not allow path of length more than 100 or 120 Kindly let me know your suggestions regarding above approaches to resolve the problem or if you think there can be any other better way to address this problem. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On further analysis, I found that hang occurs in some of Windows > API(FindFirstFile, RemoveDirectroy) when symlink path > (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these > API's. For above testcase, it will hang in path > destroy_tablespace_directories->ReadDir->readdir->FindFirstFile Well, that sucks. So it's a Windows bug. > Some of the ways to resolve the problem are described as below: > > 1. I found that if the link path is accessed as a full path during > readdir or stat, it works fine. > > For example in function destroy_tablespace_directories(), the path > used to access tablespace directory is of form > "pg_tblspc/16235/PG_9.4_201309051" by using below sprintf > sprintf(linkloc_with_version_dir, > "pg_tblspc/%u/%s",tablespaceoid,TABLESPACE_VERSION_DIRECTORY); > Now when it tries to access this path it is assumed in code that > corresponding OS API will take care of considering this path w.r.t > current working directory, which is right as per specs, > however as it hangs in OS API (FindFirstFile) if path length > 130 for > symlink and if try to use full path instead of starting with > pg_tblspc, it works fine. > So one way to resolve this issue is to use full path for symbolic link > path access instead of relying on OS to use full path. I'm not sure how we'd implement this, except by doing #2. > 2. Resolve symbolic link to actual path in code whenever we tries to > access it using pgreadlink. It is already used in pg_basebackup. This seems reasonable. > 3. One another way is to check in code (initdb and create tablespace) > to not allow path of length more than 100 or 120 I don't think we could consider back-patching this, because it'd break installations that might be working fine now with longer pathnames. And I'd be a little reluctant to change the behavior in master, either, because it would create a dump-and-reload hazard, when users of older versions try to upgrade. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 14, 2013 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On further analysis, I found that hang occurs in some of Windows >> API(FindFirstFile, RemoveDirectroy) when symlink path >> (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these >> API's. For above testcase, it will hang in path >> destroy_tablespace_directories->ReadDir->readdir->FindFirstFile > > Well, that sucks. So it's a Windows bug. > >> Some of the ways to resolve the problem are described as below: >> >> 1. I found that if the link path is accessed as a full path during >> readdir or stat, it works fine. >> >> For example in function destroy_tablespace_directories(), the path >> used to access tablespace directory is of form >> "pg_tblspc/16235/PG_9.4_201309051" by using below sprintf >> sprintf(linkloc_with_version_dir, >> "pg_tblspc/%u/%s",tablespaceoid,TABLESPACE_VERSION_DIRECTORY); >> Now when it tries to access this path it is assumed in code that >> corresponding OS API will take care of considering this path w.r.t >> current working directory, which is right as per specs, >> however as it hangs in OS API (FindFirstFile) if path length > 130 for >> symlink and if try to use full path instead of starting with >> pg_tblspc, it works fine. >> So one way to resolve this issue is to use full path for symbolic link >> path access instead of relying on OS to use full path. > > I'm not sure how we'd implement this, except by doing #2. If we believe it's a Windows bug, perhaps a good start would be to report it to Microsoft? There might be an "official workaround" for it, or in fact, there might already exist a fix for it.. We're *probably* going to have to end up deploying a workaround, but it would be a good idea to check first if they have a suggestion for how... -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Mon, Oct 14, 2013 at 8:40 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Mon, Oct 14, 2013 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> On further analysis, I found that hang occurs in some of Windows >>> API(FindFirstFile, RemoveDirectroy) when symlink path >>> (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these >>> API's. For above testcase, it will hang in path >>> destroy_tablespace_directories->ReadDir->readdir->FindFirstFile >> >> Well, that sucks. So it's a Windows bug. >> >>> Some of the ways to resolve the problem are described as below: >>> >>> 1. I found that if the link path is accessed as a full path during >>> readdir or stat, it works fine. >>> >>> For example in function destroy_tablespace_directories(), the path >>> used to access tablespace directory is of form >>> "pg_tblspc/16235/PG_9.4_201309051" by using below sprintf >>> sprintf(linkloc_with_version_dir, >>> "pg_tblspc/%u/%s",tablespaceoid,TABLESPACE_VERSION_DIRECTORY); >>> Now when it tries to access this path it is assumed in code that >>> corresponding OS API will take care of considering this path w.r.t >>> current working directory, which is right as per specs, >>> however as it hangs in OS API (FindFirstFile) if path length > 130 for >>> symlink and if try to use full path instead of starting with >>> pg_tblspc, it works fine. >>> So one way to resolve this issue is to use full path for symbolic link >>> path access instead of relying on OS to use full path. >> >> I'm not sure how we'd implement this, except by doing #2. > > If we believe it's a Windows bug, perhaps a good start would be to > report it to Microsoft? I had tried it on Windows forums, but didn't got any answer from them till now. The links where I posted this are as below: http://answers.microsoft.com/en-us/windows/forum/windows_7-performance/stat-hangs-on-windows-7-when-used-for-symbolic/f7c4573e-be28-4bbf-ac9f-de990a3f5564 http://social.technet.microsoft.com/Forums/windows/en-US/73af1516-baaf-4d3d-914c-9b22c465e527/stat-hangs-on-windows-7-when-used-for-symbolic-link?forum=TechnetSandboxForum > There might be an "official workaround" for > it, or in fact, there might already exist a fix for it.. The only workaround I could find is to use absolute path, and one of the ways to fix it is that in functions like pgwin32_safestat(), call make_absolute_path() before using path. The other way to fix is whereever in code we use path as "pg_tblspc/", change it to absolute path, but it is used at quite a few places and trying to change there might make code dirty. > We're *probably* going to have to end up deploying a workaround, but > it would be a good idea to check first if they have a suggestion for > how... With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On further analysis, I found that hang occurs in some of Windows >> API(FindFirstFile, RemoveDirectroy) when symlink path >> (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these >> API's. For above testcase, it will hang in path >> destroy_tablespace_directories->ReadDir->readdir->FindFirstFile > Well, that sucks. So it's a Windows bug. It's not clear to me that we should do anything about this at all, except perhaps document that people should avoid long tablespace path names on an unknown set of Windows versions. We should not be in the business of working around any and every bug coming out of Redmond. regards, tom lane
On Mon, Oct 14, 2013 at 11:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> On further analysis, I found that hang occurs in some of Windows >>> API(FindFirstFile, RemoveDirectroy) when symlink path >>> (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these >>> API's. For above testcase, it will hang in path >>> destroy_tablespace_directories->ReadDir->readdir->FindFirstFile > >> Well, that sucks. So it's a Windows bug. > > It's not clear to me that we should do anything about this at all, > except perhaps document that people should avoid long tablespace > path names on an unknown set of Windows versions. There are few more relatively minor issues with long paths in Windows. For Example: In function CreateTableSpace(), below check protects to create tablespace on longer paths. if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 + OIDCHARS + 1 + OIDCHARS + 1 + OIDCHARS > MAXPGPATH) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("tablespace location \"%s\" is too long", location))); MAXPGPATH is defined to be 1024, whereas the windows API's used in PG have limit of 260 due to which error comes directly from API's use rather than from above check. So, one of the change I am thinking is to define MAXPGPATH for windows separately. > We should not > be in the business of working around any and every bug coming out > of Redmond. This bug leads to an uninterruptible hang (I am not able to kill process by task manager or any other way) and the corresponding backend started consuming ~100% of CPU, so user doesn't have much options but to restart his m/c. Any form of shutdown of PG is also not successful. I had proposed to fix this issue based on its severity, but if you feel that we should keep the onus of such usage on user, then I think I can try to fix other relatively minor problems on usage of long paths. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Well, that sucks. So it's a Windows bug. > > It's not clear to me that we should do anything about this at all, > except perhaps document that people should avoid long tablespace > path names on an unknown set of Windows versions. We should not > be in the business of working around any and every bug coming out > of Redmond. It's sort of incomprehensible to me that Microsoft has a bug like this and apparently hasn't fixed it. But I think I still favor trying to work around it. When people try to use a long data directory name and it freezes the system, some of them will blame us rather than Microsoft. We've certainly gone to considerable lengths to work around extremely strange bugs in various compiler toolchains, even relatively obscure ones. I don't particularly see why we shouldn't do the same here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Well, that sucks. So it's a Windows bug. >> >> It's not clear to me that we should do anything about this at all, >> except perhaps document that people should avoid long tablespace >> path names on an unknown set of Windows versions. We should not >> be in the business of working around any and every bug coming out >> of Redmond. > > It's sort of incomprehensible to me that Microsoft has a bug like this > and apparently hasn't fixed it. But I think I still favor trying to > work around it. When people try to use a long data directory name and > it freezes the system, some of them will blame us rather than > Microsoft. We've certainly gone to considerable lengths to work > around extremely strange bugs in various compiler toolchains, even > relatively obscure ones. I don't particularly see why we shouldn't do > the same here. I agree we'll probably want to work around it in the end, but I still think it should be put to Microsoft PSS if we can. The usual - have we actually produced a self-contained example that does just this (and doesn't include the full postgres support) and submitted it to *microsoft* for comments? Not talking about their end user forums, but the actual microsoft support services? (AFAIK at least EDB, and probably other pg companies as well, have agreements with MS that lets you get access to their "real" support. I know I used to have it at my last job, and used it a number of times during the initial porting work. The people backing that one are generally pretty good) -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Well, that sucks. So it's a Windows bug. >>> >>> It's not clear to me that we should do anything about this at all, >>> except perhaps document that people should avoid long tablespace >>> path names on an unknown set of Windows versions. We should not >>> be in the business of working around any and every bug coming out >>> of Redmond. >> >> It's sort of incomprehensible to me that Microsoft has a bug like this >> and apparently hasn't fixed it. But I think I still favor trying to >> work around it. When people try to use a long data directory name and >> it freezes the system, some of them will blame us rather than >> Microsoft. We've certainly gone to considerable lengths to work >> around extremely strange bugs in various compiler toolchains, even >> relatively obscure ones. I don't particularly see why we shouldn't do >> the same here. > > I agree we'll probably want to work around it in the end, but I still > think it should be put to Microsoft PSS if we can. The usual - have we > actually produced a self-contained example that does just this (and > doesn't include the full postgres support) and submitted it to > *microsoft* for comments? I have written a self contained win32 console application with which the issue can be reproduced. The application project is attached with this mail. Here is brief description of the project: This project is created using MSVC 2010, but even if somebody doesn't have this version of VC, functions in file long_path.cpp can be copied and used in new project. In project settings, I have changed Character Set to "Use Multi-Byte Character Set" which is what Postgres uses. It takes 3 parameters as input: existingpath - path for which link will be created. this path should be an already existing path with one level less than actual path. For example, if we want to create a link for path "E:/PG_Patch/Long_Path/path_dir/version_dir", then this should be "E:/PG_Patch/Long_Path/path_dir". newpath - path where link needs to be created. it should be non-absolute path of format "linked_path_dir/test_version" curpath - path to set as current working directory path, it should be the location to prepend to newpath Currently I have used input parameters as E:/PG_Patch/Long_Path/path_dir linked_path_dir/test_version E:/PG_Patch/Long_Path/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Long path is much less than 260 char limit on windows, I have observed this problem with path length > 130 (approx.) With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Tue, Oct 15, 2013 at 4:14 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote: >> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> Well, that sucks. So it's a Windows bug. >>>> >>>> It's not clear to me that we should do anything about this at all, >>>> except perhaps document that people should avoid long tablespace >>>> path names on an unknown set of Windows versions. We should not >>>> be in the business of working around any and every bug coming out >>>> of Redmond. >>> >>> It's sort of incomprehensible to me that Microsoft has a bug like this >>> and apparently hasn't fixed it. But I think I still favor trying to >>> work around it. When people try to use a long data directory name and >>> it freezes the system, some of them will blame us rather than >>> Microsoft. We've certainly gone to considerable lengths to work >>> around extremely strange bugs in various compiler toolchains, even >>> relatively obscure ones. I don't particularly see why we shouldn't do >>> the same here. >> >> I agree we'll probably want to work around it in the end, but I still >> think it should be put to Microsoft PSS if we can. The usual - have we >> actually produced a self-contained example that does just this (and >> doesn't include the full postgres support) and submitted it to >> *microsoft* for comments? > > I have written a self contained win32 console application with which > the issue can be reproduced. > The application project is attached with this mail. > > Here is brief description of the project: > This project is created using MSVC 2010, but even if somebody > doesn't have this version of VC, functions in file long_path.cpp can > be copied and > used in new project. > In project settings, I have changed Character Set to "Use Multi-Byte > Character Set" which is what Postgres uses. > > It takes 3 parameters as input: > existingpath - path for which link will be created. this path should > be an already > existing path with one level less than actual > path. For example, > if we want to create a link for path > "E:/PG_Patch/Long_Path/path_dir/version_dir", > then this should be "E:/PG_Patch/Long_Path/path_dir". > newpath - path where link needs to be created. it should be > non-absolute path > of format "linked_path_dir/test_version" > curpath - path to set as current working directory path, it > should be the > location to prepend to newpath > > Currently I have used input parameters as > E:/PG_Patch/Long_Path/path_dir > linked_path_dir/test_version > E:/PG_Patch/Long_Path/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa > > Long path is much less than 260 char limit on windows, I have > observed this problem with path length > 130 (approx.) And this reliably reproduces the hang? On which Windows version(s)? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 16, 2013 at 2:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Oct 15, 2013 at 4:14 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote: >>> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>>> Well, that sucks. So it's a Windows bug. >>>>> >>>>> It's not clear to me that we should do anything about this at all, >>>>> except perhaps document that people should avoid long tablespace >>>>> path names on an unknown set of Windows versions. We should not >>>>> be in the business of working around any and every bug coming out >>>>> of Redmond. >>>> >>>> It's sort of incomprehensible to me that Microsoft has a bug like this >>>> and apparently hasn't fixed it. But I think I still favor trying to >>>> work around it. When people try to use a long data directory name and >>>> it freezes the system, some of them will blame us rather than >>>> Microsoft. We've certainly gone to considerable lengths to work >>>> around extremely strange bugs in various compiler toolchains, even >>>> relatively obscure ones. I don't particularly see why we shouldn't do >>>> the same here. >>> >>> I agree we'll probably want to work around it in the end, but I still >>> think it should be put to Microsoft PSS if we can. The usual - have we >>> actually produced a self-contained example that does just this (and >>> doesn't include the full postgres support) and submitted it to >>> *microsoft* for comments? >> >> I have written a self contained win32 console application with which >> the issue can be reproduced. >> The application project is attached with this mail. >> >> Here is brief description of the project: >> This project is created using MSVC 2010, but even if somebody >> doesn't have this version of VC, functions in file long_path.cpp can >> be copied and >> used in new project. >> In project settings, I have changed Character Set to "Use Multi-Byte >> Character Set" which is what Postgres uses. >> >> It takes 3 parameters as input: >> existingpath - path for which link will be created. this path should >> be an already >> existing path with one level less than actual >> path. For example, >> if we want to create a link for path >> "E:/PG_Patch/Long_Path/path_dir/version_dir", >> then this should be "E:/PG_Patch/Long_Path/path_dir". >> newpath - path where link needs to be created. it should be >> non-absolute path >> of format "linked_path_dir/test_version" >> curpath - path to set as current working directory path, it >> should be the >> location to prepend to newpath >> >> Currently I have used input parameters as >> E:/PG_Patch/Long_Path/path_dir >> linked_path_dir/test_version >> E:/PG_Patch/Long_Path/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa >> >> Long path is much less than 260 char limit on windows, I have >> observed this problem with path length > 130 (approx.) > > And this reliably reproduces the hang? Yes, it produces hang whenever the length of 'curpath' parameter is greater then 130 (approx.). In above example, I used curpath of length 159. > On which Windows version(s)? I used Windows 7 64bit to reproduce it. However the original user has reported this issue on Windows 2008 64bit, so this application should hang on other Windows 2008 64bit as well. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote: >> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> Well, that sucks. So it's a Windows bug. >> >> I agree we'll probably want to work around it in the end, but I still >> think it should be put to Microsoft PSS if we can. The usual - have we >> actually produced a self-contained example that does just this (and >> doesn't include the full postgres support) and submitted it to >> *microsoft* for comments? > > I have written a self contained win32 console application with which > the issue can be reproduced. > The application project is attached with this mail. Logged a support ticket with Microsoft, they could reproduce the issue with the sample application (it is same what I had posted on hackers in this thread) and working on it. Progress on ticket can be checked at below link: https://support.microsoft.com/oas/default.aspx?st=1&as=1&iid=113&iguid=42d48223-e81d-4693-a7b2-2e70186f06b2_1_1&c=SMC&ln=en-in&incno=113102310885322 I could view above link using my Microsoft account. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 31, 2013 at 8:58 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote: >>> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>>> Well, that sucks. So it's a Windows bug. >>> >>> I agree we'll probably want to work around it in the end, but I still >>> think it should be put to Microsoft PSS if we can. The usual - have we >>> actually produced a self-contained example that does just this (and >>> doesn't include the full postgres support) and submitted it to >>> *microsoft* for comments? >> >> I have written a self contained win32 console application with which >> the issue can be reproduced. >> The application project is attached with this mail. > > Logged a support ticket with Microsoft, they could reproduce the issue > with the sample application (it is same what I had posted on hackers > in this thread) and working on it. Further update on this issue: Microsoft has suggested a workaround for stat API. Their suggestion is to use 'GetFileAttributesEx' instead of stat, when I tried their suggestion, it also gives me same problem as stat. Still they have not told anything about other API's (rmdir, RemoveDirectory) which has same problem. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Jan 7, 2014 at 12:33:33PM +0530, Amit Kapila wrote: > On Thu, Oct 31, 2013 at 8:58 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote: > >>> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >>>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>>>>> Well, that sucks. So it's a Windows bug. > >>> > >>> I agree we'll probably want to work around it in the end, but I still > >>> think it should be put to Microsoft PSS if we can. The usual - have we > >>> actually produced a self-contained example that does just this (and > >>> doesn't include the full postgres support) and submitted it to > >>> *microsoft* for comments? > >> > >> I have written a self contained win32 console application with which > >> the issue can be reproduced. > >> The application project is attached with this mail. > > > > Logged a support ticket with Microsoft, they could reproduce the issue > > with the sample application (it is same what I had posted on hackers > > in this thread) and working on it. > > Further update on this issue: > > Microsoft has suggested a workaround for stat API. Their suggestion > is to use 'GetFileAttributesEx' instead of stat, when I tried their > suggestion, it also gives me same problem as stat. > > Still they have not told anything about other API's > (rmdir, RemoveDirectory) which has same problem. Where are we on this? Is there a check we should add in our code? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Fri, Feb 14, 2014 at 8:27 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Tue, Jan 7, 2014 at 12:33:33PM +0530, Amit Kapila wrote: >> On Thu, Oct 31, 2013 at 8:58 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> > On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote: >> >>> I agree we'll probably want to work around it in the end, but I still >> >>> think it should be put to Microsoft PSS if we can. The usual - have we >> >>> actually produced a self-contained example that does just this (and >> >>> doesn't include the full postgres support) and submitted it to >> >>> *microsoft* for comments? >> >> Further update on this issue: >> >> Microsoft has suggested a workaround for stat API. Their suggestion >> is to use 'GetFileAttributesEx' instead of stat, when I tried their >> suggestion, it also gives me same problem as stat. >> >> Still they have not told anything about other API's >> (rmdir, RemoveDirectory) which has same problem. > > Where are we on this? Till now we didn't received any workaround which can fix this problem from Microsoft. From the discussion over support ticket with them, it seems this problem is in their kernel and changing the code for it might not be straight forward for them, neither they have any clear alternative. > Is there a check we should add in our code? We can possibly solve this problem in one of the below ways: 1. Resolve symbolic link to actual path in code whenever we tries to access it. 2. Another way is to check in code (initdb and create tablespace) to not allow path of length more than ~120 for Windows. Approach-1 has benefit that it can support the actual MAX_PATH and even if MS doesn't resolve the problem, PostgreSQL will not face it. Approach-2 is straightforward to fix. If we want to go with Approach-2, then we might change the limit of MaxPath for windows in future whenever there is a fix for it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 02/14/2014 10:57 AM, Bruce Momjian wrote: > On Tue, Jan 7, 2014 at 12:33:33PM +0530, Amit Kapila wrote: >> Further update on this issue: >> >> Microsoft has suggested a workaround for stat API. Their suggestion >> is to use 'GetFileAttributesEx' instead of stat, when I tried their >> suggestion, it also gives me same problem as stat. >> >> Still they have not told anything about other API's >> (rmdir, RemoveDirectory) which has same problem. > > Where are we on this? Is there a check we should add in our code? This is fascinating - I spent some time chasing the same symptoms in my Jenkins build slave, and eventually tracked it down to path lengths. gcc was just hanging uninterruptibly in a win32 syscall, and nothing short of a reboot would deal with it. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Feb 15, 2014 at 1:26 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Feb 14, 2014 at 8:27 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Tue, Jan 7, 2014 at 12:33:33PM +0530, Amit Kapila wrote:
> >> Still they have not told anything about other API's
> >> (rmdir, RemoveDirectory) which has same problem.
> >
> > Where are we on this?
>
> Till now we didn't received any workaround which can fix this problem
> from Microsoft. From the discussion over support ticket with them,
> it seems this problem is in their kernel and changing the code for
> it might not be straight forward for them, neither they have any clear
> alternative.
> > Is there a check we should add in our code?
>
> We can possibly solve this problem in one of the below ways:
>
> 1. Resolve symbolic link to actual path in code whenever we tries to
> access it.
>
> 2. Another way is to check in code (initdb and create tablespace)
> to not allow path of length more than ~120 for Windows.
>
> Approach-1 has benefit that it can support the actual MAX_PATH and
> even if MS doesn't resolve the problem, PostgreSQL will not face it.
>
> Approach-2 is straightforward to fix. If we want to go with Approach-2,
> then we might change the limit of MaxPath for windows in future
> whenever there is a fix for it.
> On Fri, Feb 14, 2014 at 8:27 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Tue, Jan 7, 2014 at 12:33:33PM +0530, Amit Kapila wrote:
> >> Still they have not told anything about other API's
> >> (rmdir, RemoveDirectory) which has same problem.
> >
> > Where are we on this?
>
> Till now we didn't received any workaround which can fix this problem
> from Microsoft. From the discussion over support ticket with them,
> it seems this problem is in their kernel and changing the code for
> it might not be straight forward for them, neither they have any clear
> alternative.
Reply from Microsoft is as below.
"This is regarding a long pending case where stat() was failing on
hard links and causing an infinite loop. We have discussed this
multiple times internally and unfortunately do not have a commercially
viable solution to this issue. Currently there are no workarounds
available for this issue, but this has been marked for triage in future
OSes. Since we have out run the maximum time that can be spent
on this Professional Level Service request, I have been asked to
move ahead and mark this as a won’t fix. We would need to close
this case out as a won’t fix, and you would not be charged for this
incident."
>
> We can possibly solve this problem in one of the below ways:
>
> 1. Resolve symbolic link to actual path in code whenever we tries to
> access it.
>
> 2. Another way is to check in code (initdb and create tablespace)
> to not allow path of length more than ~120 for Windows.
>
> Approach-1 has benefit that it can support the actual MAX_PATH and
> even if MS doesn't resolve the problem, PostgreSQL will not face it.
>
> Approach-2 is straightforward to fix. If we want to go with Approach-2,
> then we might change the limit of MaxPath for windows in future
> whenever there is a fix for it.
From the reply above, it is clear that there is neither a workaround
nor a fix for this issue in Windows. I think now we need to decide on
which solution we want to pursue for PostgreSQL. Does any one of
the above approaches seems sensible or let me know if you have any
other idea to solve this problem.