Thread: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

Hello,

I've found and fixed a bug that causes recovery (crash recovery, PITR) to
fail.  Please find attached the patch against HEAD.


[Bug]
To reproduce the problem, do the following on Windows:

1. pg_ctl start
2. CREATE TABLESPACE tbs LOCATION 'some_tblspc_path';
3. pg_ctl stop -mi
4. pg_ctl start

The database server fails to start, leaving the below messages:

LOG:  database system was interrupted; last known up at 2013-10-31 20:24:07
JST
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 0/1788938
FATAL:  could not remove symbolic link "pg_tblspc/16385": Permission denied
CONTEXT:  xlog redo create tablespace: 16385 "d:/tbs"
LOG:  startup process (PID 2724) exited with exit code 1
LOG:  aborting startup due to startup process failure


[Cause]
In redo, create_tablespace_directories() tries to remove the symbolic link
for the tablespace using unlink().  However, unlink() on Windows fails with
errno=13 (Permission denied).  This is because junction points are
directories on Windows.


[Fix]
Follow destroy_tablespace_directories() and use rmdir() to remove the
junction point.


I've tested the patch.  Could you review it and commit?  I wish it to be
backported to all major releases.


Regards
MauMau

Attachment
On 10/31/2013 08:40 AM, MauMau wrote:
> Hello,
>
> I've found and fixed a bug that causes recovery (crash recovery, PITR)
> to fail. Please find attached the patch against HEAD.
>
>
> [Bug]
> To reproduce the problem, do the following on Windows:
>
> 1. pg_ctl start
> 2. CREATE TABLESPACE tbs LOCATION 'some_tblspc_path';
> 3. pg_ctl stop -mi
> 4. pg_ctl start
>
> The database server fails to start, leaving the below messages:
>
> LOG: database system was interrupted; last known up at 2013-10-31
> 20:24:07 JST
> LOG: database system was not properly shut down; automatic recovery in
> progress
> LOG: redo starts at 0/1788938
> FATAL: could not remove symbolic link "pg_tblspc/16385": Permission
> denied
> CONTEXT: xlog redo create tablespace: 16385 "d:/tbs"
> LOG: startup process (PID 2724) exited with exit code 1
> LOG: aborting startup due to startup process failure
>
>
> [Cause]
> In redo, create_tablespace_directories() tries to remove the symbolic
> link for the tablespace using unlink(). However, unlink() on Windows
> fails with errno=13 (Permission denied). This is because junction
> points are directories on Windows.
>
>
> [Fix]
> Follow destroy_tablespace_directories() and use rmdir() to remove the
> junction point.
>
>
> I've tested the patch. Could you review it and commit? I wish it to be
> backported to all major releases.
>
>
> Regards
> MauMau
>


Why are you making this a runtime check instead of a compile time check?

cheers

andrew




From: "Andrew Dunstan" <andrew@dunslane.net>
> Why are you making this a runtime check instead of a compile time check?

I thought the same situation could happen as in
destroy_tablespace_directories(), but it doesn't seem to apply on second
thought.  Revised patch attached, which is very simple based on compile time
check.

Regards
MauMau


Attachment
From: "MauMau" <maumau307@gmail.com>
> I thought the same situation could happen as in
> destroy_tablespace_directories(), but it doesn't seem to apply on second
> thought.  Revised patch attached, which is very simple based on compile
> time
> check.

Sorry, I was careless to leave an old comment.  Please use this version.

Regards
MauMau

Attachment
I did worked on testing the patch on Windows, test scenario mentioned above thread is reproducible and the provided patch resolve the issue. In case of junction or directory unlink function (deprecated) returns -1 with errno “EACCES” (i.e. Permission denied). As you have followed destroy_tablespace_directories() function, Is there any specific reason not to use same logic to detect type of the file/link i.e. “(lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))”, It also seems have more appropriate error message i.e.

src/backend/commands/tablespace.c
….
    745         /*
    746          * Try to remove the symlink.  We must however deal with the possibility
    747          * that it's a directory instead of a symlink --- this could happen during
    748          * WAL replay (see TablespaceCreateDbspace), and it is also the case on
    749          * Windows where junction points lstat() as directories.
    750          *
    751          * Note: in the redo case, we'll return true if this final step fails;
    752          * there's no point in retrying it.  Also, ENOENT should provoke no more
    753          * than a warning.
    754          */
    755 remove_symlink:
    756         linkloc = pstrdup(linkloc_with_version_dir);
    757         get_parent_directory(linkloc);
    758         if (lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))
    759         {
    760                 if (rmdir(linkloc) < 0)
    761                         ereport(redo ? LOG : ERROR,
    762                                         (errcode_for_file_access(),
    763                                          errmsg("could not remove directory \"%s\": %m",
    764                                                         linkloc)));
    765         }
    766         else
    767         {
    768                 if (unlink(linkloc) < 0)
    769                         ereport(redo ? LOG : (errno == ENOENT ? WARNING : ERROR),
    770                                         (errcode_for_file_access(),
    771                                          errmsg("could not remove symbolic link \"%s\": %m",
    772                                                         linkloc)));
    773         }
….

Other than this patch looks good to me.

Regards,
Muhammad Asif Naeem



On Thu, Oct 31, 2013 at 8:03 PM, MauMau <maumau307@gmail.com> wrote:
From: "MauMau" <maumau307@gmail.com>

I thought the same situation could happen as in
destroy_tablespace_directories(), but it doesn't seem to apply on second
thought.  Revised patch attached, which is very simple based on compile time
check.

Sorry, I was careless to leave an old comment.  Please use this version.

Regards
MauMau


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


From: "Asif Naeem" <anaeem.it@gmail.com>
> As you have
followed destroy_tablespace_directories() function, Is there any specific
reason not to use same logic to detect type of the file/link i.e.
“(lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))”, It also seems have
more appropriate error message i.e.

Thanks for reviewing and testing the patch.  Yes, at first I did what you 
mentioned, but modified the patch according to some advice in the mail 
thread.  During redo, create_tablespace_directories() needs to handle the 
case where the $PGDATA/pg_tblspc/xxx is not a symlink but a directory even 
on UNIX/Linux.  Please see TablespaceCreateDbspace is(). 
destroy_tablespace_directories() doesn't have to handle such situation.

Regards
MauMau




Hi MauMau,

Ah. Sorry, I missed that part. As NTFS junctions and symbolic links are different (although they behave similarly), there seems only a minor inconvenience related to misleading error message i.e.

+ #ifdef WIN32
if (rmdir(linkloc) < 0 && errno != ENOENT)
+ #else
   if (unlink(linkloc) < 0 && errno != ENOENT)
+ #endif
   ereport(ERROR,
   (errcode_for_file_access(),
    errmsg("could not remove symbolic link \"%s\": %m",

What is your opinion about it, Is it not worth changing ? . Thanks.



On Wed, Jan 15, 2014 at 7:42 PM, MauMau <maumau307@gmail.com> wrote:
From: "Asif Naeem" <anaeem.it@gmail.com>

As you have
followed destroy_tablespace_directories() function, Is there any specific
reason not to use same logic to detect type of the file/link i.e.
“(lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))”, It also seems have
more appropriate error message i.e.

Thanks for reviewing and testing the patch.  Yes, at first I did what you mentioned, but modified the patch according to some advice in the mail thread.  During redo, create_tablespace_directories() needs to handle the case where the $PGDATA/pg_tblspc/xxx is not a symlink but a directory even on UNIX/Linux.  Please see TablespaceCreateDbspace is(). destroy_tablespace_directories() doesn't have to handle such situation.

Regards
MauMau


I'm sorry, I'm replying to an older mail, because I lost your latest mail by
mistake.

> Ah. Sorry, I missed that part. As NTFS junctions and symbolic links are
> different (although they behave similarly), there seems only a minor
> inconvenience related to misleading error message i.e.

You are right.  Fixed.

Regards
MauMau

Attachment
On Wed, Jan 15, 2014 at 8:12 PM, MauMau <maumau307@gmail.com> wrote:
> From: "Asif Naeem" <anaeem.it@gmail.com>
>
>> As you have
>
> followed destroy_tablespace_directories() function, Is there any specific
> reason not to use same logic to detect type of the file/link i.e.
> "(lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))", It also seems have
> more appropriate error message i.e.
>
> Thanks for reviewing and testing the patch.  Yes, at first I did what you
> mentioned, but modified the patch according to some advice in the mail
> thread.  During redo, create_tablespace_directories() needs to handle the
> case where the $PGDATA/pg_tblspc/xxx is not a symlink but a directory even
> on UNIX/Linux.  Please see TablespaceCreateDbspace is().
> destroy_tablespace_directories() doesn't have to handle such situation.

If create_tablespace_directories() needs to handle with directory both on
Windows/Linux, then shouldn't it be a runtime check as in your first
version rather than compile time check?
Also isn't that the reason why destroy_tablespace_directories() have similar
check?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



From: "Amit Kapila" <amit.kapila16@gmail.com>
> On Wed, Jan 15, 2014 at 8:12 PM, MauMau <maumau307@gmail.com> wrote:
>> Thanks for reviewing and testing the patch.  Yes, at first I did what you
>> mentioned, but modified the patch according to some advice in the mail
>> thread.  During redo, create_tablespace_directories() needs to handle the
>> case where the $PGDATA/pg_tblspc/xxx is not a symlink but a directory 
>> even
>> on UNIX/Linux.  Please see TablespaceCreateDbspace is().
>> destroy_tablespace_directories() doesn't have to handle such situation.
>
> If create_tablespace_directories() needs to handle with directory both on
> Windows/Linux, then shouldn't it be a runtime check as in your first
> version rather than compile time check?
> Also isn't that the reason why destroy_tablespace_directories() have 
> similar
> check?

I see..., and you are correct.  The first version of my patch should be the 
right fix.  It seems that my head went somewhere when I submitted the second 
revision.

What should I do?  Should I re-submit the first revision as the latest fifth 
revision and link the email from the CommitFest newest entry?

Regards
MauMau







On Fri, Mar 21, 2014 at 12:24 PM, MauMau <maumau307@gmail.com> wrote:
> From: "Amit Kapila" <amit.kapila16@gmail.com>
>> If create_tablespace_directories() needs to handle with directory both on
>> Windows/Linux, then shouldn't it be a runtime check as in your first
>> version rather than compile time check?
>> Also isn't that the reason why destroy_tablespace_directories() have
>> similar
>> check?
>
>
> I see..., and you are correct.  The first version of my patch should be the
> right fix.  It seems that my head went somewhere when I submitted the second
> revision.
>
> What should I do?  Should I re-submit the first revision as the latest fifth
> revision and link the email from the CommitFest newest entry?

The comments in your first version needs to be improved, as there
you just mentioned a Windows specific comment:
+ /* On Windows, lstat()

I think you can change comments (make it somewhat similar to
destroy_tablespace_directories) and then submit it as a new version.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



From: "Amit Kapila" <amit.kapila16@gmail.com>
> The comments in your first version needs to be improved, as there
> you just mentioned a Windows specific comment:
> + /* On Windows, lstat()
>
> I think you can change comments (make it somewhat similar to
> destroy_tablespace_directories) and then submit it as a new version.

OK, done.  Please find the attached patch.  I also rebased the patch to
HEAD.

I'll update the CommitFest entry soon.
Regards
MauMau


Attachment
On Fri, Mar 21, 2014 at 12:24 PM, MauMau <maumau307@gmail.com> wrote:
> From: "Amit Kapila" <amit.kapila16@gmail.com>
>> On Wed, Jan 15, 2014 at 8:12 PM, MauMau <maumau307@gmail.com> wrote:
>>>
>>> Thanks for reviewing and testing the patch.  Yes, at first I did what you
>>> mentioned, but modified the patch according to some advice in the mail
>>> thread.  During redo, create_tablespace_directories() needs to handle the
>>> case where the $PGDATA/pg_tblspc/xxx is not a symlink but a directory
>>> even
>>> on UNIX/Linux.  Please see TablespaceCreateDbspace is().
>>> destroy_tablespace_directories() doesn't have to handle such situation.
>>
>>
>> If create_tablespace_directories() needs to handle with directory both on
>> Windows/Linux, then shouldn't it be a runtime check as in your first
>> version rather than compile time check?
>> Also isn't that the reason why destroy_tablespace_directories() have
>> similar
>> check?
>
>
> I see..., and you are correct.

I have thought about the above point again and it seems to me that
the above claim (create_tablespace_directories() needs to handle a
path which is not a symlink but directory) might not be true.
For Example, I could easily think of case where it is required for
destroy_tablespace_directories().

1. Assume a tablespace tbs already exists.
2. Create table t1(c1 int) tablespace tbs;
3. drop table t1;
4. Drop tablespace tbs;
5. Do immediate shutdown (pg_ctl stop -mi);
6. During recovery it will create a table in directory (in function   TablespaceCreateDbspace) which needs to be
removedby   destroy_tablespace_directories().
 

I am neither aware of, nor could think of such a case for
create_tablespace_directories(). Do you have any such case in mind
which I might be missing?

By saying above, I don't mean that your current patch has any
problem; even if there is no such scenario, I think your code is
right as stat/isdir check seems to be okay to identify junction
points and it avoids ifdef WIN32 check (which I personally think
is bit annoying and we should try to avoid such code unless it
is must or provides any significant advantage).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



From: "Amit Kapila" <amit.kapila16@gmail.com>
> 1. Assume a tablespace tbs already exists.
> 2. Create table t1(c1 int) tablespace tbs;
> 3. drop table t1;
> 4. Drop tablespace tbs;
> 5. Do immediate shutdown (pg_ctl stop -mi);
> 6. During recovery it will create a table in directory (in function
>    TablespaceCreateDbspace) which needs to be removed by
>    destroy_tablespace_directories().
>
> I am neither aware of, nor could think of such a case for
> create_tablespace_directories(). Do you have any such case in mind
> which I might be missing?

A bit contrived example is:

1. After the directory is created by TablespaceCreateDbspace(), recovery is 
stopped (e.g. due to power outage).  The directory remains.
2. Restart the server, redoing CREATE TABLESPACE during recovery, which 
executes create_tablespace_directories().

> By saying above, I don't mean that your current patch has any
> problem; even if there is no such scenario, I think your code is
> right as stat/isdir check seems to be okay to identify junction
> points and it avoids ifdef WIN32 check (which I personally think
> is bit annoying and we should try to avoid such code unless it
> is must or provides any significant advantage).

I think so, too.

Regards
MauMau




On Mon, Mar 24, 2014 at 7:49 PM, MauMau <maumau307@gmail.com> wrote:
> A bit contrived example is:
>
> 1. After the directory is created by TablespaceCreateDbspace(), recovery is
> stopped (e.g. due to power outage).  The directory remains.
> 2. Restart the server, redoing CREATE TABLESPACE during recovery, which
> executes create_tablespace_directories().

I don't understand how after step-1, step-2 can occur in recovery for same
tablespace path.

Function TablespaceCreateDbspace() would have called for CREATE TABLE.
Now Step-2 can only occur if there is a Drop Tablespace command in-between
step-1 and step-2, else CREATE TABLESPACE can't be successful during
command execution so will not get recorded in WAL.  Basically Create Table
cannot happen on a particular directory without having some
CREATE TABLESPACE before it, so in the above example taken by you,
there must be some Create TableSpace before step-1 or it's on default
tablespace location which means you cannot perform step-2 for same
tablespace path as step-1 without having DROP TABLESPACE in-between
step-1 and step-2.

If you think that above scenario is not possible, then you just need to
modify comment:
"!  * Remove old symlink in recovery...."

One more minor point about patch:
+ struct stat st;

if (InRecovery)
{
struct stat st;

Defining stat struct two times in same function in different ways doesn't
seem to be good, we can do the same way for new usage as is already
done in code or may be declare it once.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



From: "Amit Kapila" <amit.kapila16@gmail.com>
> If you think that above scenario is not possible, then you just need to
> modify comment:
> "!  * Remove old symlink in recovery...."

Hm, my scenario seems impossible.  I reverted the comment.


> One more minor point about patch:
> + struct stat st;
>
> if (InRecovery)
> {
> struct stat st;
>
> Defining stat struct two times in same function in different ways doesn't
> seem to be good, we can do the same way for new usage as is already
> done in code or may be declare it once.

OK, I removed the second (existing) definition of st.

Regards
MauMau

Attachment
On Tue, Mar 25, 2014 at 5:39 PM, MauMau <maumau307@gmail.com> wrote:
> OK, I removed the second (existing) definition of st.

This patch version looks fine, I have verified the issue, regression tests
passed.

I had attached the latest version of patch in CF app and marked it as
"Ready For Committer".


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On 2014-03-25 21:09:13 +0900, MauMau wrote:
> !     /*
> !      * Remove old symlink in recovery, in case it points to the wrong place.
> !      * On Windows, lstat() reports junction points as directories.
> !      */
>       if (InRecovery)
>       {
> !         if (lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))
> !         {
> !             if (rmdir(linkloc) < 0)
> !                 ereport(ERROR,
> !                         (errcode_for_file_access(),
> !                          errmsg("could not remove directory \"%s\": %m",
> !                                 linkloc)));
> !         }
> !         else
> !         {
> !             if (unlink(linkloc) < 0 && errno != ENOENT)
> !                 ereport(ERROR,
> !                         (errcode_for_file_access(),
> !                          errmsg("could not remove symbolic link \"%s\": %m",
> !                                 linkloc)));
> !         }
>       }

if (..)   ...
else
{   if (...)      ...
}

is pretty odd code layout.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



"MauMau" <maumau307@gmail.com> writes:
> [ remove_tblspc_symlink_v6.patch ]

Committed, thanks.
        regards, tom lane