Thread: Long paths for tablespace leads to uninterruptible hang in Windows

Long paths for tablespace leads to uninterruptible hang in Windows

From
Amit Kapila
Date:
One of the user's of PostgreSQL has reported that if tablespace path
is long, it leads to hang and the hang is unbreakable.

Simple testcase to reproduce hang is:
a. initdb -D
E:\WorkSpace\PostgreSQL\master\RM30253_Data\aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\db
b. Create tablespace tbs location 'E:\WorkSpace\PostgreSQL\master\Data\idb';
c. Drop tablespace tbs;

In this test path length used in 174, but I observed that hang occurs
if the length is greater than 130 (approx.)

I have tested this test on few different Windows platforms (Windows XP
32-bit, Windows 7 64bit). Hang occurs on Windows7 64bit. User has
reported it on Windows 2008 64bit.

On further analysis, I found that hang occurs in some of Windows
API(FindFirstFile, RemoveDirectroy) when symlink path
(pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these
API's. For above testcase, it will hang in path
destroy_tablespace_directories->ReadDir->readdir->FindFirstFile

I have tried using mklink /J (utility in Windows 7 and above) to
create Junction point instead of current way in pgsymlink, it still
hangs in similar way.

Some of the ways to resolve the problem are described as below:

1. I found that if the link path is accessed as a full path during
readdir or stat, it works fine.

For example in function destroy_tablespace_directories(), the path
used to access tablespace directory is of form
"pg_tblspc/16235/PG_9.4_201309051" by using below sprintf
sprintf(linkloc_with_version_dir,
"pg_tblspc/%u/%s",tablespaceoid,TABLESPACE_VERSION_DIRECTORY);
Now when it tries to access this path it is assumed in code that
corresponding OS API will take care of considering this path w.r.t
current working directory, which is right as per specs,
however as it hangs in OS API (FindFirstFile) if path length > 130 for
symlink and if try to use full path instead of starting with
pg_tblspc, it works fine.
So one way to resolve this issue is to use full path for symbolic link
path access instead of relying on OS to use full path.

2. Resolve symbolic link to actual path in code whenever we tries to
access it using pgreadlink. It is already used in pg_basebackup.

3. One another way is to check in code (initdb and create tablespace)
to not allow path of length more than 100 or 120

Kindly let me know your suggestions regarding above approaches to
resolve the problem or if you think there can be any other better way
to address this problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Robert Haas
Date:
On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On further analysis, I found that hang occurs in some of Windows
> API(FindFirstFile, RemoveDirectroy) when symlink path
> (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these
> API's. For above testcase, it will hang in path
> destroy_tablespace_directories->ReadDir->readdir->FindFirstFile

Well, that sucks.  So it's a Windows bug.

> Some of the ways to resolve the problem are described as below:
>
> 1. I found that if the link path is accessed as a full path during
> readdir or stat, it works fine.
>
> For example in function destroy_tablespace_directories(), the path
> used to access tablespace directory is of form
> "pg_tblspc/16235/PG_9.4_201309051" by using below sprintf
> sprintf(linkloc_with_version_dir,
> "pg_tblspc/%u/%s",tablespaceoid,TABLESPACE_VERSION_DIRECTORY);
> Now when it tries to access this path it is assumed in code that
> corresponding OS API will take care of considering this path w.r.t
> current working directory, which is right as per specs,
> however as it hangs in OS API (FindFirstFile) if path length > 130 for
> symlink and if try to use full path instead of starting with
> pg_tblspc, it works fine.
> So one way to resolve this issue is to use full path for symbolic link
> path access instead of relying on OS to use full path.

I'm not sure how we'd implement this, except by doing #2.

> 2. Resolve symbolic link to actual path in code whenever we tries to
> access it using pgreadlink. It is already used in pg_basebackup.

This seems reasonable.

> 3. One another way is to check in code (initdb and create tablespace)
> to not allow path of length more than 100 or 120

I don't think we could consider back-patching this, because it'd break
installations that might be working fine now with longer pathnames.
And I'd be a little reluctant to change the behavior in master,
either, because it would create a dump-and-reload hazard, when users
of older versions try to upgrade.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Magnus Hagander
Date:
On Mon, Oct 14, 2013 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On further analysis, I found that hang occurs in some of Windows
>> API(FindFirstFile, RemoveDirectroy) when symlink path
>> (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these
>> API's. For above testcase, it will hang in path
>> destroy_tablespace_directories->ReadDir->readdir->FindFirstFile
>
> Well, that sucks.  So it's a Windows bug.
>
>> Some of the ways to resolve the problem are described as below:
>>
>> 1. I found that if the link path is accessed as a full path during
>> readdir or stat, it works fine.
>>
>> For example in function destroy_tablespace_directories(), the path
>> used to access tablespace directory is of form
>> "pg_tblspc/16235/PG_9.4_201309051" by using below sprintf
>> sprintf(linkloc_with_version_dir,
>> "pg_tblspc/%u/%s",tablespaceoid,TABLESPACE_VERSION_DIRECTORY);
>> Now when it tries to access this path it is assumed in code that
>> corresponding OS API will take care of considering this path w.r.t
>> current working directory, which is right as per specs,
>> however as it hangs in OS API (FindFirstFile) if path length > 130 for
>> symlink and if try to use full path instead of starting with
>> pg_tblspc, it works fine.
>> So one way to resolve this issue is to use full path for symbolic link
>> path access instead of relying on OS to use full path.
>
> I'm not sure how we'd implement this, except by doing #2.

If we believe it's a Windows bug, perhaps a good start would be to
report it to Microsoft? There might be an "official workaround" for
it, or in fact, there might already exist a fix for it..

We're *probably* going to have to end up deploying a workaround, but
it would be a good idea to check first if they have a suggestion for
how...

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Amit Kapila
Date:
On Mon, Oct 14, 2013 at 8:40 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Mon, Oct 14, 2013 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On further analysis, I found that hang occurs in some of Windows
>>> API(FindFirstFile, RemoveDirectroy) when symlink path
>>> (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these
>>> API's. For above testcase, it will hang in path
>>> destroy_tablespace_directories->ReadDir->readdir->FindFirstFile
>>
>> Well, that sucks.  So it's a Windows bug.
>>
>>> Some of the ways to resolve the problem are described as below:
>>>
>>> 1. I found that if the link path is accessed as a full path during
>>> readdir or stat, it works fine.
>>>
>>> For example in function destroy_tablespace_directories(), the path
>>> used to access tablespace directory is of form
>>> "pg_tblspc/16235/PG_9.4_201309051" by using below sprintf
>>> sprintf(linkloc_with_version_dir,
>>> "pg_tblspc/%u/%s",tablespaceoid,TABLESPACE_VERSION_DIRECTORY);
>>> Now when it tries to access this path it is assumed in code that
>>> corresponding OS API will take care of considering this path w.r.t
>>> current working directory, which is right as per specs,
>>> however as it hangs in OS API (FindFirstFile) if path length > 130 for
>>> symlink and if try to use full path instead of starting with
>>> pg_tblspc, it works fine.
>>> So one way to resolve this issue is to use full path for symbolic link
>>> path access instead of relying on OS to use full path.
>>
>> I'm not sure how we'd implement this, except by doing #2.
>
> If we believe it's a Windows bug, perhaps a good start would be to
> report it to Microsoft?

I had tried it on Windows forums, but didn't got any answer from them
till now. The links where I posted this are as below:

http://answers.microsoft.com/en-us/windows/forum/windows_7-performance/stat-hangs-on-windows-7-when-used-for-symbolic/f7c4573e-be28-4bbf-ac9f-de990a3f5564

http://social.technet.microsoft.com/Forums/windows/en-US/73af1516-baaf-4d3d-914c-9b22c465e527/stat-hangs-on-windows-7-when-used-for-symbolic-link?forum=TechnetSandboxForum

> There might be an "official workaround" for
> it, or in fact, there might already exist a fix for it..

The only workaround I could find is to use absolute path, and one of
the ways to fix it is that in functions like pgwin32_safestat(), call
make_absolute_path() before using path.

The other way to fix is whereever in code we use path as "pg_tblspc/",
change it to absolute path, but it is used at quite a few places and
trying to change there might make code dirty.


> We're *probably* going to have to end up deploying a workaround, but
> it would be a good idea to check first if they have a suggestion for
> how...


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On further analysis, I found that hang occurs in some of Windows
>> API(FindFirstFile, RemoveDirectroy) when symlink path
>> (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these
>> API's. For above testcase, it will hang in path
>> destroy_tablespace_directories->ReadDir->readdir->FindFirstFile

> Well, that sucks.  So it's a Windows bug.

It's not clear to me that we should do anything about this at all,
except perhaps document that people should avoid long tablespace
path names on an unknown set of Windows versions.  We should not
be in the business of working around any and every bug coming out
of Redmond.
        regards, tom lane



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Amit Kapila
Date:
On Mon, Oct 14, 2013 at 11:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On further analysis, I found that hang occurs in some of Windows
>>> API(FindFirstFile, RemoveDirectroy) when symlink path
>>> (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these
>>> API's. For above testcase, it will hang in path
>>> destroy_tablespace_directories->ReadDir->readdir->FindFirstFile
>
>> Well, that sucks.  So it's a Windows bug.
>
> It's not clear to me that we should do anything about this at all,
> except perhaps document that people should avoid long tablespace
> path names on an unknown set of Windows versions.

There are few more relatively minor issues with long paths in Windows.
For Example:
In function CreateTableSpace(), below check protects to create
tablespace on longer paths.

if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 +
OIDCHARS + 1 + OIDCHARS + 1 + OIDCHARS > MAXPGPATH)         ereport(ERROR,
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),                     errmsg("tablespace location \"%s\" is too long",
              location)));
 

MAXPGPATH is defined to be 1024, whereas the windows API's used in PG
have limit of 260 due to which error comes directly from API's use
rather than from above check.
So, one of the change I am thinking is to define MAXPGPATH for windows
separately.

> We should not
> be in the business of working around any and every bug coming out
> of Redmond.

This bug leads to an uninterruptible hang (I am not able to kill
process by task manager or any other way) and the corresponding
backend started consuming ~100% of CPU, so user doesn't have much
options but to restart his m/c. Any form of shutdown of PG is also not
successful.
I had proposed to fix this issue based on its severity, but if you
feel that we should keep the onus of such usage on user, then I think
I can try to fix other relatively minor problems on usage of long
paths.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Robert Haas
Date:
On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, that sucks.  So it's a Windows bug.
>
> It's not clear to me that we should do anything about this at all,
> except perhaps document that people should avoid long tablespace
> path names on an unknown set of Windows versions.  We should not
> be in the business of working around any and every bug coming out
> of Redmond.

It's sort of incomprehensible to me that Microsoft has a bug like this
and apparently hasn't fixed it.  But I think I still favor trying to
work around it.  When people try to use a long data directory name and
it freezes the system, some of them will blame us rather than
Microsoft.  We've certainly gone to considerable lengths to work
around extremely strange bugs in various compiler toolchains, even
relatively obscure ones.  I don't particularly see why we shouldn't do
the same here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Magnus Hagander
Date:
On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Well, that sucks.  So it's a Windows bug.
>>
>> It's not clear to me that we should do anything about this at all,
>> except perhaps document that people should avoid long tablespace
>> path names on an unknown set of Windows versions.  We should not
>> be in the business of working around any and every bug coming out
>> of Redmond.
>
> It's sort of incomprehensible to me that Microsoft has a bug like this
> and apparently hasn't fixed it.  But I think I still favor trying to
> work around it.  When people try to use a long data directory name and
> it freezes the system, some of them will blame us rather than
> Microsoft.  We've certainly gone to considerable lengths to work
> around extremely strange bugs in various compiler toolchains, even
> relatively obscure ones.  I don't particularly see why we shouldn't do
> the same here.

I agree we'll probably want to work around it in the end, but I still
think it should be put to Microsoft PSS if we can. The usual - have we
actually produced a self-contained example that does just this (and
doesn't include the full postgres support) and submitted it to
*microsoft* for comments? Not talking about their end user forums, but
the actual microsoft support services? (AFAIK at least EDB, and
probably other pg companies as well, have agreements with MS that lets
you get access to their "real" support. I know I used to have it at my
last job, and used it a number of times during the initial porting
work. The people backing that one are generally pretty good)

-- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Amit Kapila
Date:
On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Well, that sucks.  So it's a Windows bug.
>>>
>>> It's not clear to me that we should do anything about this at all,
>>> except perhaps document that people should avoid long tablespace
>>> path names on an unknown set of Windows versions.  We should not
>>> be in the business of working around any and every bug coming out
>>> of Redmond.
>>
>> It's sort of incomprehensible to me that Microsoft has a bug like this
>> and apparently hasn't fixed it.  But I think I still favor trying to
>> work around it.  When people try to use a long data directory name and
>> it freezes the system, some of them will blame us rather than
>> Microsoft.  We've certainly gone to considerable lengths to work
>> around extremely strange bugs in various compiler toolchains, even
>> relatively obscure ones.  I don't particularly see why we shouldn't do
>> the same here.
>
> I agree we'll probably want to work around it in the end, but I still
> think it should be put to Microsoft PSS if we can. The usual - have we
> actually produced a self-contained example that does just this (and
> doesn't include the full postgres support) and submitted it to
> *microsoft* for comments?

  I have written a self contained win32 console application with which
the issue can be reproduced.
  The application project is attached with this mail.

  Here is brief description of the project:
  This project is created using MSVC 2010, but even if somebody
doesn't have this version of VC, functions in file long_path.cpp can
be copied and
  used in new project.
  In project settings, I have changed Character Set to "Use Multi-Byte
Character Set" which is what Postgres uses.

  It takes 3 parameters as input:
  existingpath - path for which link will be created. this path should
be an already
                      existing path with one level less than actual
path. For example,
                      if we want to create a link for path
"E:/PG_Patch/Long_Path/path_dir/version_dir",
           then this should be "E:/PG_Patch/Long_Path/path_dir".
  newpath      - path where link needs to be created. it should be
non-absolute path
                      of format "linked_path_dir/test_version"
  curpath       - path to set as current working directory path, it
should be the
                      location to prepend to newpath

 Currently I have used input parameters as
 E:/PG_Patch/Long_Path/path_dir
 linked_path_dir/test_version

E:/PG_Patch/Long_Path/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

 Long path is much less than 260 char limit on windows, I have
observed this problem with path length > 130 (approx.)


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Robert Haas
Date:
On Tue, Oct 15, 2013 at 4:14 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>> Well, that sucks.  So it's a Windows bug.
>>>>
>>>> It's not clear to me that we should do anything about this at all,
>>>> except perhaps document that people should avoid long tablespace
>>>> path names on an unknown set of Windows versions.  We should not
>>>> be in the business of working around any and every bug coming out
>>>> of Redmond.
>>>
>>> It's sort of incomprehensible to me that Microsoft has a bug like this
>>> and apparently hasn't fixed it.  But I think I still favor trying to
>>> work around it.  When people try to use a long data directory name and
>>> it freezes the system, some of them will blame us rather than
>>> Microsoft.  We've certainly gone to considerable lengths to work
>>> around extremely strange bugs in various compiler toolchains, even
>>> relatively obscure ones.  I don't particularly see why we shouldn't do
>>> the same here.
>>
>> I agree we'll probably want to work around it in the end, but I still
>> think it should be put to Microsoft PSS if we can. The usual - have we
>> actually produced a self-contained example that does just this (and
>> doesn't include the full postgres support) and submitted it to
>> *microsoft* for comments?
>
>   I have written a self contained win32 console application with which
> the issue can be reproduced.
>   The application project is attached with this mail.
>
>   Here is brief description of the project:
>   This project is created using MSVC 2010, but even if somebody
> doesn't have this version of VC, functions in file long_path.cpp can
> be copied and
>   used in new project.
>   In project settings, I have changed Character Set to "Use Multi-Byte
> Character Set" which is what Postgres uses.
>
>   It takes 3 parameters as input:
>   existingpath - path for which link will be created. this path should
> be an already
>                       existing path with one level less than actual
> path. For example,
>                       if we want to create a link for path
> "E:/PG_Patch/Long_Path/path_dir/version_dir",
>               then this should be "E:/PG_Patch/Long_Path/path_dir".
>   newpath      - path where link needs to be created. it should be
> non-absolute path
>                       of format "linked_path_dir/test_version"
>   curpath       - path to set as current working directory path, it
> should be the
>                       location to prepend to newpath
>
>  Currently I have used input parameters as
>  E:/PG_Patch/Long_Path/path_dir
>  linked_path_dir/test_version
>
E:/PG_Patch/Long_Path/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>
>  Long path is much less than 260 char limit on windows, I have
> observed this problem with path length > 130 (approx.)

And this reliably reproduces the hang?  On which Windows version(s)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Amit Kapila
Date:
On Wed, Oct 16, 2013 at 2:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 15, 2013 at 4:14 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>>> Well, that sucks.  So it's a Windows bug.
>>>>>
>>>>> It's not clear to me that we should do anything about this at all,
>>>>> except perhaps document that people should avoid long tablespace
>>>>> path names on an unknown set of Windows versions.  We should not
>>>>> be in the business of working around any and every bug coming out
>>>>> of Redmond.
>>>>
>>>> It's sort of incomprehensible to me that Microsoft has a bug like this
>>>> and apparently hasn't fixed it.  But I think I still favor trying to
>>>> work around it.  When people try to use a long data directory name and
>>>> it freezes the system, some of them will blame us rather than
>>>> Microsoft.  We've certainly gone to considerable lengths to work
>>>> around extremely strange bugs in various compiler toolchains, even
>>>> relatively obscure ones.  I don't particularly see why we shouldn't do
>>>> the same here.
>>>
>>> I agree we'll probably want to work around it in the end, but I still
>>> think it should be put to Microsoft PSS if we can. The usual - have we
>>> actually produced a self-contained example that does just this (and
>>> doesn't include the full postgres support) and submitted it to
>>> *microsoft* for comments?
>>
>>   I have written a self contained win32 console application with which
>> the issue can be reproduced.
>>   The application project is attached with this mail.
>>
>>   Here is brief description of the project:
>>   This project is created using MSVC 2010, but even if somebody
>> doesn't have this version of VC, functions in file long_path.cpp can
>> be copied and
>>   used in new project.
>>   In project settings, I have changed Character Set to "Use Multi-Byte
>> Character Set" which is what Postgres uses.
>>
>>   It takes 3 parameters as input:
>>   existingpath - path for which link will be created. this path should
>> be an already
>>                       existing path with one level less than actual
>> path. For example,
>>                       if we want to create a link for path
>> "E:/PG_Patch/Long_Path/path_dir/version_dir",
>>               then this should be "E:/PG_Patch/Long_Path/path_dir".
>>   newpath      - path where link needs to be created. it should be
>> non-absolute path
>>                       of format "linked_path_dir/test_version"
>>   curpath       - path to set as current working directory path, it
>> should be the
>>                       location to prepend to newpath
>>
>>  Currently I have used input parameters as
>>  E:/PG_Patch/Long_Path/path_dir
>>  linked_path_dir/test_version
>>
E:/PG_Patch/Long_Path/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>>
>>  Long path is much less than 260 char limit on windows, I have
>> observed this problem with path length > 130 (approx.)
>
> And this reliably reproduces the hang?
  Yes, it produces hang whenever the length of 'curpath' parameter is
greater then 130 (approx.). In above example, I used curpath of length
159.

> On which Windows version(s)?
  I used Windows 7 64bit to reproduce it. However the original user
has reported this issue on Windows 2008 64bit, so this application
should hang on other Windows 2008 64bit as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Amit Kapila
Date:
On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>> Well, that sucks.  So it's a Windows bug.
>>
>> I agree we'll probably want to work around it in the end, but I still
>> think it should be put to Microsoft PSS if we can. The usual - have we
>> actually produced a self-contained example that does just this (and
>> doesn't include the full postgres support) and submitted it to
>> *microsoft* for comments?
>
>   I have written a self contained win32 console application with which
> the issue can be reproduced.
>   The application project is attached with this mail.

Logged a support ticket with Microsoft, they could reproduce the issue
with the sample application (it is same what I had posted on hackers
in this thread) and working on it.
Progress on ticket can be checked at below link:

https://support.microsoft.com/oas/default.aspx?st=1&as=1&iid=113&iguid=42d48223-e81d-4693-a7b2-2e70186f06b2_1_1&c=SMC&ln=en-in&incno=113102310885322

I could view above link using my Microsoft account.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Amit Kapila
Date:
On Thu, Oct 31, 2013 at 8:58 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>>> Well, that sucks.  So it's a Windows bug.
>>>
>>> I agree we'll probably want to work around it in the end, but I still
>>> think it should be put to Microsoft PSS if we can. The usual - have we
>>> actually produced a self-contained example that does just this (and
>>> doesn't include the full postgres support) and submitted it to
>>> *microsoft* for comments?
>>
>>   I have written a self contained win32 console application with which
>> the issue can be reproduced.
>>   The application project is attached with this mail.
>
> Logged a support ticket with Microsoft, they could reproduce the issue
> with the sample application (it is same what I had posted on hackers
> in this thread) and working on it.

Further update on this issue:

Microsoft has suggested a workaround for stat API. Their suggestion
is to use 'GetFileAttributesEx' instead of stat, when I tried their
suggestion, it also gives me same problem as stat.

Still they have not told anything about other API's
(rmdir, RemoveDirectory) which has same problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Bruce Momjian
Date:
On Tue, Jan  7, 2014 at 12:33:33PM +0530, Amit Kapila wrote:
> On Thu, Oct 31, 2013 at 8:58 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
> >>> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >>>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>>>>> Well, that sucks.  So it's a Windows bug.
> >>>
> >>> I agree we'll probably want to work around it in the end, but I still
> >>> think it should be put to Microsoft PSS if we can. The usual - have we
> >>> actually produced a self-contained example that does just this (and
> >>> doesn't include the full postgres support) and submitted it to
> >>> *microsoft* for comments?
> >>
> >>   I have written a self contained win32 console application with which
> >> the issue can be reproduced.
> >>   The application project is attached with this mail.
> >
> > Logged a support ticket with Microsoft, they could reproduce the issue
> > with the sample application (it is same what I had posted on hackers
> > in this thread) and working on it.
> 
> Further update on this issue:
> 
> Microsoft has suggested a workaround for stat API. Their suggestion
> is to use 'GetFileAttributesEx' instead of stat, when I tried their
> suggestion, it also gives me same problem as stat.
> 
> Still they have not told anything about other API's
> (rmdir, RemoveDirectory) which has same problem.

Where are we on this?  Is there a check we should add in our code?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Amit Kapila
Date:
On Fri, Feb 14, 2014 at 8:27 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Jan  7, 2014 at 12:33:33PM +0530, Amit Kapila wrote:
>> On Thu, Oct 31, 2013 at 8:58 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> > On Wed, Oct 16, 2013 at 1:44 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> >>> I agree we'll probably want to work around it in the end, but I still
>> >>> think it should be put to Microsoft PSS if we can. The usual - have we
>> >>> actually produced a self-contained example that does just this (and
>> >>> doesn't include the full postgres support) and submitted it to
>> >>> *microsoft* for comments?
>>
>> Further update on this issue:
>>
>> Microsoft has suggested a workaround for stat API. Their suggestion
>> is to use 'GetFileAttributesEx' instead of stat, when I tried their
>> suggestion, it also gives me same problem as stat.
>>
>> Still they have not told anything about other API's
>> (rmdir, RemoveDirectory) which has same problem.
>
> Where are we on this?

Till now we didn't received any workaround which can fix this problem
from Microsoft. From the discussion over support ticket with them,
it seems this problem is in their kernel and changing the code for
it might not be straight forward for them, neither they have any clear
alternative.

> Is there a check we should add in our code?

We can possibly solve this problem in one of the below ways:

1. Resolve symbolic link to actual path in code whenever we tries to
access it.

2. Another way is to check in code (initdb and create tablespace)
to not allow path of length more than ~120 for Windows.

Approach-1 has benefit that it can support the actual MAX_PATH and
even if MS doesn't resolve the problem, PostgreSQL will not face it.

Approach-2 is straightforward to fix. If we want to go with Approach-2,
then we might change the limit of MaxPath for windows in future
whenever there is a fix for it.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Craig Ringer
Date:
On 02/14/2014 10:57 AM, Bruce Momjian wrote:
> On Tue, Jan  7, 2014 at 12:33:33PM +0530, Amit Kapila wrote:

>> Further update on this issue:
>>
>> Microsoft has suggested a workaround for stat API. Their suggestion
>> is to use 'GetFileAttributesEx' instead of stat, when I tried their
>> suggestion, it also gives me same problem as stat.
>>
>> Still they have not told anything about other API's
>> (rmdir, RemoveDirectory) which has same problem.
> 
> Where are we on this?  Is there a check we should add in our code?

This is fascinating - I spent some time chasing the same symptoms in my
Jenkins build slave, and eventually tracked it down to path lengths. gcc
was just hanging uninterruptibly in a win32 syscall, and nothing short
of a reboot would deal with it.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Long paths for tablespace leads to uninterruptible hang in Windows

From
Amit Kapila
Date:
On Sat, Feb 15, 2014 at 1:26 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Feb 14, 2014 at 8:27 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Tue, Jan  7, 2014 at 12:33:33PM +0530, Amit Kapila wrote:

> >> Still they have not told anything about other API's
> >> (rmdir, RemoveDirectory) which has same problem.
> >
> > Where are we on this?
>
> Till now we didn't received any workaround which can fix this problem
> from Microsoft. From the discussion over support ticket with them,
> it seems this problem is in their kernel and changing the code for
> it might not be straight forward for them, neither they have any clear
> alternative.

Reply from Microsoft is as below.

"This is regarding a long pending case where stat() was failing on
hard links and causing an infinite loop. We have discussed this 
multiple times internally and unfortunately do not have a commercially
viable solution to this issue. Currently there are no workarounds
available for this issue, but this has been marked for triage in future
OSes. Since we have out run the maximum time that can be spent
on this Professional Level Service request, I have been asked to
move ahead and mark this as a won’t fix. We would need to close
this case out as a won’t fix, and you would not be charged for this
incident."

> > Is there a check we should add in our code?
>
> We can possibly solve this problem in one of the below ways:
>
> 1. Resolve symbolic link to actual path in code whenever we tries to
> access it.
>
> 2. Another way is to check in code (initdb and create tablespace)
> to not allow path of length more than ~120 for Windows.
>
> Approach-1 has benefit that it can support the actual MAX_PATH and
> even if MS doesn't resolve the problem, PostgreSQL will not face it.
>
> Approach-2 is straightforward to fix. If we want to go with Approach-2,
> then we might change the limit of MaxPath for windows in future
> whenever there is a fix for it.

From the reply above, it is clear that there is neither a workaround
nor a fix for this issue in Windows.  I think now we need to decide on
which solution we want to pursue for PostgreSQL.  Does any one of
the above approaches seems sensible or let me know if you have any
other idea to solve this problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com