Thread: Possible race condition in pg_mkdir_p()?

Possible race condition in pg_mkdir_p()?

From
Ning Yu
Date:
Hi Hackers,

We found a race condition in pg_mkdir_p(), here is a simple reproducer:

    #!/bin/sh
   
    basedir=pgdata
    datadir=$basedir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
    logdir=$basedir/logs
    n=2
   
    rm -rf $basedir
    mkdir -p $logdir
   
    # init databases concurrently, they will all try to create the parent dirs
    for i in `seq $n`; do
          initdb -D $datadir/$i >$logdir/$i.log 2>&1 &
    done
   
    wait
   
    # there is a chance one of the initdb commands failed to create the datadir
    grep 'could not create directory' $logdir/*

The logic in pg_mkdir_p() is as below:

        /* check for pre-existing directory */
        if (stat(path, &sb) == 0)
        {
            if (!S_ISDIR(sb.st_mode))
            {
                if (last)
                    errno = EEXIST;
                else
                    errno = ENOTDIR;
                retval = -1;
                break;
            }
        }
        else if (mkdir(path, last ? omode : S_IRWXU | S_IRWXG | S_IRWXO) < 0)
        {
            retval = -1;
            break;
        }

This seems buggy as it first checks the existence of the dir and makes the dir if it does not exist yet, however when executing concurrently a possible race condition can be as below:

A: does a/ exists? no
B: does a/ exists? no
A: try to create a/, succeed
B: try to create a/, failed as it already exists

To prevent the race condition we could mkdir() directly, if it returns -1 and errno is EEXIST then check whether it's really a dir with stat().  In fact this is what is done in the `mkdir -p` command: https://github.com/coreutils/gnulib/blob/b5a9fa677847081c9b4f26908272f122b15df8b9/lib/mkdir-p.c#L130-L164

By the way, some callers of pg_mkdir_p() checks for EEXIST explicitly, such as in pg_basebackup.c:

        if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 && errno != EEXIST)
        {
            pg_log_error("could not create directory \"%s\": %m", statusdir);
            exit(1);
        }

This is still wrong with current code logic, because when the statusdir is a file the errno is also EEXIST, but it can pass the check here.  Even if we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here is still wrong.

Best Regards
Ning

Re: Possible race condition in pg_mkdir_p()?

From
Michael Paquier
Date:
On Thu, Jul 18, 2019 at 04:17:22PM +0800, Ning Yu wrote:
> This is still wrong with current code logic, because when the statusdir is
> a file the errno is also EEXIST, but it can pass the check here.  Even if
> we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here is
> still wrong.

Would you like to send a patch?
--
Michael

Attachment

Re: Possible race condition in pg_mkdir_p()?

From
Paul Guo
Date:


On Thu, Jul 18, 2019 at 4:57 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jul 18, 2019 at 04:17:22PM +0800, Ning Yu wrote:
> This is still wrong with current code logic, because when the statusdir is
> a file the errno is also EEXIST, but it can pass the check here.  Even if
> we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here is
> still wrong.

Would you like to send a patch?

Michael, we'll send out the patch later. Checked code, it seems that there is another related mkdir() issue.

MakePGDirectory() is actually a syscall mkdir(), and manpage says the errno meaning of EEXIST,

       EEXIST pathname already exists (not necessarily as a directory).  This includes the case where pathname is a symbolic link, dangling or not.

However it looks like some callers do not use that correctly, e.g.

      if (MakePGDirectory(directory) < 0)
      {
          if (errno == EEXIST)
              return;

OR

      if (MakePGDirectory(parentdir) < 0 && errno != EEXIST)

i.e. we should better use stat(path) && S_ISDIR(buf) && errno == EEXIST to replace errno == EEXIST.

One possible fix is to add an argument like ignore_created (in case some callers want to fail if the path has been created) in MakePGDirectory() and then add that code logic into it.


Re: Possible race condition in pg_mkdir_p()?

From
Ning Yu
Date:
Hi Michael,

The patches are attached.  To make reviewing easier we spilt them into small
pieces:

- v1-0001-Fix-race-condition-in-pg_mkdir_p.patch: the fix to pg_mkdir_p()
  itself, basically we are following the `mkdir -p` logic;
- v1-0002-Test-concurrent-call-to-pg_mkdir_p.patch: the tests for pg_mkdir_p(),
  we could see how it fails by reverting the first patch, and a reproducer with
  initdb is also provided in the README; as we do not know how to create a unit
  test in postgresql we have to employ a test module to do the job, not sure if
  this is a proper solution;
- v1-0003-Fix-callers-of-pg_mkdir_p.patch &
  v1-0004-Fix-callers-of-MakePGDirectory.patch: fix callers of pg_mkdir_p() and
  MakePGDirectory(), tests are not provided for these changes;

MakePGDirectory() is also called in TablespaceCreateDbspace(), EEXIST is
considered as non-error for parent directories, however as it considers EEXIST
as a failure for the last level of the path so the logic is still correct, we
do not add a final stat() check for it.

Best Regards
Ning


On Thu, Jul 18, 2019 at 4:57 PM Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jul 18, 2019 at 04:17:22PM +0800, Ning Yu wrote:
> This is still wrong with current code logic, because when the statusdir is
> a file the errno is also EEXIST, but it can pass the check here.  Even if
> we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here is
> still wrong.

Would you like to send a patch?
--
Michael
Attachment

Re: Possible race condition in pg_mkdir_p()?

From
Michael Paquier
Date:
On Tue, Jul 23, 2019 at 02:54:20PM +0800, Ning Yu wrote:
> MakePGDirectory() is also called in TablespaceCreateDbspace(), EEXIST is
> considered as non-error for parent directories, however as it considers
> EEXIST as a failure for the last level of the path so the logic is
> still correct,

So the complains here are about two things:
- In some code paths calling mkdir, we don't care about the fact that
EEXIST can happen for something else than a folder.  This could be a
problem if we have conflicts in the backend related to the naming of
the files/folders created.  I find a bit surprising to not perform
the sanity checks in MakePGDirectory() in your patch.  What of all the
existing callers of this routine?
- pg_mkdir_p is pretty bad at detecting problems with concurrent
creation of parent directories, leading to random failures where these
should not happen.

I may be missing something, but your patch does not actually fix
problem 2, no?  Trying to do an initdb with a set of N folders using
the same parent folders not created still results in random failures.
--
Michael

Attachment

Re: Possible race condition in pg_mkdir_p()?

From
Ning Yu
Date:
On Tue, Jul 30, 2019 at 4:04 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jul 23, 2019 at 02:54:20PM +0800, Ning Yu wrote:
> > MakePGDirectory() is also called in TablespaceCreateDbspace(), EEXIST is
> > considered as non-error for parent directories, however as it considers
> > EEXIST as a failure for the last level of the path so the logic is
> > still correct,
>
> So the complains here are about two things:
> - In some code paths calling mkdir, we don't care about the fact that
> EEXIST can happen for something else than a folder.  This could be a
> problem if we have conflicts in the backend related to the naming of
> the files/folders created.  I find a bit surprising to not perform
> the sanity checks in MakePGDirectory() in your patch.  What of all the
> existing callers of this routine?

Thanks for the reply.  There are several callers of MakePGDirectory(), most of
them already treats EEXIST as an error; TablespaceCreateDbspace() already has
a proper checking for the target dir, it has the chance to fail on a
concurrently created dir, but at least it will not be confused by a file with
the same name; PathNameCreateTemporaryDir() is fixed by our patch 0004, we
will call stat() after mkdir() to double check the result.

In fact personally I'm thinking that whether could we replace all uses of
MakePGDirectory() with pg_mkdir_p(), so we could simplify
TablespaceCreateDbspace() and PathNameCreateTemporaryDir() and other callers
significantly.

> - pg_mkdir_p is pretty bad at detecting problems with concurrent
> creation of parent directories, leading to random failures where these
> should not happen.
>
> I may be missing something, but your patch does not actually fix
> problem 2, no?  Trying to do an initdb with a set of N folders using
> the same parent folders not created still results in random failures.

Well, we should have fixed problem 2, this is our major purpose of the patch
0001, it performs sanity check with stat() after mkdir() at each part of the
path.

The initdb test was just the one used by us to verify our fix, here is our
script:

        n=4
        testdir=testdir
        datadir=$testdir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
        logdir=$testdir/logs

        rm -rf $testdir
        mkdir -p $logdir

        for i in `seq $n`; do
            initdb -D $datadir/$i >$logdir/$i.log 2>&1 &
        done

        wait

        # check for failures
        grep 'could not create directory' $logdir/*

We have provided a test module in patch 0002 to perform a similar test, it
calls pg_mkdir_p() concurrently to trigger the issue, which has higher fail
rate than initdb.  With the patch 0001 both the initdb test and the test
module will always succeed in our local environment.

Thanks
Ning

> --
> Michael



Re: Possible race condition in pg_mkdir_p()?

From
Michael Paquier
Date:
On Tue, Jul 30, 2019 at 06:22:59PM +0800, Ning Yu wrote:
> In fact personally I'm thinking that whether could we replace all uses of
> MakePGDirectory() with pg_mkdir_p(), so we could simplify
> TablespaceCreateDbspace() and PathNameCreateTemporaryDir() and other callers
> significantly.

I would still keep the wrapper, but I think that as a final result we
should be able to get the code in PathNameCreateTemporaryDir() shaped
in such a way that there are no multiple attempts at calling
MakePGDirectory() on EEXIST.  This has been introduced by dc6c4c9d to
allow sharing temporary files between backends, which is rather recent
but a fixed set of two retries is not a deterministic method of
resolution.

> Well, we should have fixed problem 2, this is our major purpose of the patch
> 0001, it performs sanity check with stat() after mkdir() at each part of the
> path.

I just reuse the script presented at the top of the thread with n=2,
and I get that:
pgdata/logs/1.log:creating directory
pgdata/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z/1
... initdb: error: could not create directory "pgdata/a": File exists

But the result expected is that all the paths should be created with
no complains about the parents existing, no?  This reproduces on my
Debian box 100% of the time, for different sub-paths.  So something
looks wrong in your solution.  The code comes originally from FreeBSD,
how do things happen there.  Do we get failures if doing something
like that?  I would expect this sequence to not fail:
for i in `seq 1 100`; do mkdir -p b/c/d/f/g/h/j/$i; done
--
Michael

Attachment

Re: Possible race condition in pg_mkdir_p()?

From
Andres Freund
Date:
Hi,

On 2019-07-18 16:17:22 +0800, Ning Yu wrote:
> This seems buggy as it first checks the existence of the dir and makes the
> dir if it does not exist yet, however when executing concurrently a
> possible race condition can be as below:
> 
> A: does a/ exists? no
> B: does a/ exists? no
> A: try to create a/, succeed
> B: try to create a/, failed as it already exists

Hm. I'm not really seing much of a point in making mkdir_p safe against
all of this. What's the scenario for pg where this matters? I assume
you're using it for somewhat different purposes, and that's why it is
problematic for you?

Greetings,

Andres Freund



Re: Possible race condition in pg_mkdir_p()?

From
Ning Yu
Date:
On Wed, Jul 31, 2019 at 12:04 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jul 30, 2019 at 06:22:59PM +0800, Ning Yu wrote:
> > In fact personally I'm thinking that whether could we replace all uses of
> > MakePGDirectory() with pg_mkdir_p(), so we could simplify
> > TablespaceCreateDbspace() and PathNameCreateTemporaryDir() and other callers
> > significantly.
>
> I would still keep the wrapper, but I think that as a final result we
> should be able to get the code in PathNameCreateTemporaryDir() shaped
> in such a way that there are no multiple attempts at calling
> MakePGDirectory() on EEXIST.  This has been introduced by dc6c4c9d to
> allow sharing temporary files between backends, which is rather recent
> but a fixed set of two retries is not a deterministic method of
> resolution.
>
> > Well, we should have fixed problem 2, this is our major purpose of the patch
> > 0001, it performs sanity check with stat() after mkdir() at each part of the
> > path.
>
> I just reuse the script presented at the top of the thread with n=2,
> and I get that:
> pgdata/logs/1.log:creating directory
> pgdata/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z/1
> ... initdb: error: could not create directory "pgdata/a": File exists

Could I double confirm with you that you made a clean rebuild after
applying the patches?  pg_mkdir_p() is compiled as part of libpgport.a,
and the postgres makefile will not relink the initdb binary
automatically, for myself I must 'make clean' and 'make' to ensure
initdb gets relinked.

>
> But the result expected is that all the paths should be created with
> no complains about the parents existing, no?  This reproduces on my
> Debian box 100% of the time, for different sub-paths.  So something
> looks wrong in your solution.  The code comes originally from FreeBSD,
> how do things happen there.  Do we get failures if doing something
> like that?  I would expect this sequence to not fail:
> for i in `seq 1 100`; do mkdir -p b/c/d/f/g/h/j/$i; done
> --
> Michael



Re: Possible race condition in pg_mkdir_p()?

From
Ning Yu
Date:
On Wed, Jul 31, 2019 at 12:11 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-07-18 16:17:22 +0800, Ning Yu wrote:
> > This seems buggy as it first checks the existence of the dir and makes the
> > dir if it does not exist yet, however when executing concurrently a
> > possible race condition can be as below:
> >
> > A: does a/ exists? no
> > B: does a/ exists? no
> > A: try to create a/, succeed
> > B: try to create a/, failed as it already exists
>
> Hm. I'm not really seing much of a point in making mkdir_p safe against
> all of this. What's the scenario for pg where this matters? I assume
> you're using it for somewhat different purposes, and that's why it is
> problematic for you?

Yes, you are right, postgres itself might not run into such kind of race
condition issue.  The problem we encountered was on a downstream product
of postgres, where multiple postgres clusters are deployed on the same
machine with common parent dirs.

Best Regards
Ning

>
> Greetings,
>
> Andres Freund



Re: Possible race condition in pg_mkdir_p()?

From
Michael Paquier
Date:
On Wed, Jul 31, 2019 at 12:26:30PM +0800, Ning Yu wrote:
> Could I double confirm with you that you made a clean rebuild after
> applying the patches?  pg_mkdir_p() is compiled as part of libpgport.a,
> and the postgres makefile will not relink the initdb binary
> automatically, for myself I must 'make clean' and 'make' to ensure
> initdb gets relinked.

For any patch I test, I just do a "git clean -d -x -f" before building
as I switch a lot across stable branches as well.  It looks that you
are right on this one though, I have just rebuilt from scratch and I
don't see the failures anymore.
--
Michael

Attachment

Re: Possible race condition in pg_mkdir_p()?

From
Michael Paquier
Date:
On Tue, Jul 30, 2019 at 09:11:44PM -0700, Andres Freund wrote:
> Hm. I'm not really seing much of a point in making mkdir_p safe against
> all of this. What's the scenario for pg where this matters? I assume
> you're using it for somewhat different purposes, and that's why it is
> problematic for you?

I don't see why it is a problem to make our APIs more stable if we
have ways to do so.  I actually fixed one recently as of 754b90f for a
problem that involved a tool linking to our version of readdir() that
we ship.  Even with that, the retries for mkdir() on the base
directory in PathNameCreateTemporaryDir() are basically caused by that
same limitation with the parent paths from this report, no?  So we
could actually remove the dependency to the base directory in this
code path and just rely on pg_mkdir_p() to do the right thing for all
the parent paths.  That's also a point raised by Ning upthread.
--
Michael

Attachment

Re: Possible race condition in pg_mkdir_p()?

From
Ning Yu
Date:
On Wed, Jul 31, 2019 at 12:41 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jul 31, 2019 at 12:26:30PM +0800, Ning Yu wrote:
> > Could I double confirm with you that you made a clean rebuild after
> > applying the patches?  pg_mkdir_p() is compiled as part of libpgport.a,
> > and the postgres makefile will not relink the initdb binary
> > automatically, for myself I must 'make clean' and 'make' to ensure
> > initdb gets relinked.
>
> For any patch I test, I just do a "git clean -d -x -f" before building
> as I switch a lot across stable branches as well.  It looks that you
> are right on this one though, I have just rebuilt from scratch and I
> don't see the failures anymore.

Cool, glad to know that it works.

Best Regards
Ning

> --
> Michael



Re: Possible race condition in pg_mkdir_p()?

From
Andres Freund
Date:
Hi,

On 2019-07-31 13:48:23 +0900, Michael Paquier wrote:
> On Tue, Jul 30, 2019 at 09:11:44PM -0700, Andres Freund wrote:
> > Hm. I'm not really seing much of a point in making mkdir_p safe against
> > all of this. What's the scenario for pg where this matters? I assume
> > you're using it for somewhat different purposes, and that's why it is
> > problematic for you?
> 
> I don't see why it is a problem to make our APIs more stable if we
> have ways to do so.

Well, wanting to support additional use-cases often isn't free. There's
additional code for the new usecase, there's review & commit time,
there's additional test time, there's bug fixes for the new code etc.

We're not developing a general application support library...

I don't really have a problem fixing this case if we think it's
useful. But I'm a bit bothered by all the "fixes" being submitted that
don't matter for PG itself. They do eat up resources.


And sorry, adding in-backend threading to test testing mkdir_p doesn't
make me inclined to believe that this is all well considered.  There's
minor issues like us not supporting threads in the backend, pthread not
being portable, and also it being entirely out of proportion to the
issue.

Greetings,

Andres Freund



Re: Possible race condition in pg_mkdir_p()?

From
Michael Paquier
Date:
On Tue, Jul 30, 2019 at 10:19:45PM -0700, Andres Freund wrote:
> I don't really have a problem fixing this case if we think it's
> useful. But I'm a bit bothered by all the "fixes" being submitted that
> don't matter for PG itself. They do eat up resources.

Sure.  In this particular case, we can simplify at least one code path
in the backend though for temporary path creation.  Such cleanup rings
like a sufficient argument to me.

> And sorry, adding in-backend threading to test testing mkdir_p doesn't
> make me inclined to believe that this is all well considered.  There's
> minor issues like us not supporting threads in the backend, pthread not
> being portable, and also it being entirely out of proportion to the
> issue.

Agreed on this one.  The test case may be useful for the purpose of
testing the presented patches, but it does not have enough value to be
merged.
--
Michael

Attachment

Re: Possible race condition in pg_mkdir_p()?

From
Ning Yu
Date:
On Wed, Jul 31, 2019 at 1:31 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jul 30, 2019 at 10:19:45PM -0700, Andres Freund wrote:
> > I don't really have a problem fixing this case if we think it's
> > useful. But I'm a bit bothered by all the "fixes" being submitted that
> > don't matter for PG itself. They do eat up resources.
>
> Sure.  In this particular case, we can simplify at least one code path
> in the backend though for temporary path creation.  Such cleanup rings
> like a sufficient argument to me.

Yes, in current postgres source code there are several wrappers of
mkdir() that do similar jobs.  If we could have a safe mkdir_p()
implementation then we could use it directly in all these wrappers, that
could save a lot of maintenance effort in the long run.  I'm not saying
that our patches are enough to make it safe and reliable, and I agree
that any patches may introduce new bugs, but I think that having a safe
and unified mkdir_p() is a good direction to go.

>
> > And sorry, adding in-backend threading to test testing mkdir_p doesn't
> > make me inclined to believe that this is all well considered.  There's
> > minor issues like us not supporting threads in the backend, pthread not
> > being portable, and also it being entirely out of proportion to the
> > issue.
>
> Agreed on this one.  The test case may be useful for the purpose of
> testing the presented patches, but it does not have enough value to be
> merged.

Yes, that's why we put the testing module in a separate patch from the
fix, feel free to ignore it.  In fact ourselves have concerns about it ;)

Best Regards
Ning

> --
> Michael



Re: Possible race condition in pg_mkdir_p()?

From
Michael Paquier
Date:
On Wed, Jul 31, 2019 at 02:02:03PM +0800, Ning Yu wrote:
> Yes, in current postgres source code there are several wrappers of
> mkdir() that do similar jobs.  If we could have a safe mkdir_p()
> implementation then we could use it directly in all these wrappers, that
> could save a lot of maintenance effort in the long run.  I'm not saying
> that our patches are enough to make it safe and reliable, and I agree
> that any patches may introduce new bugs, but I think that having a safe
> and unified mkdir_p() is a good direction to go.

That's my impression as well.  Please note that this does not involve
an actual bug in Postgres and that this is rather invasive, so this
does not really qualify for a back-patch.  No objections to simplify
the backend on HEAD though.  It would be good if you could actually
register a patch to the commit fest app [1] and also rework the patch
set so as at least PathNameCreateTemporaryDir wins its simplifications
for the first problem (double-checking the other code paths would be
nice as well).  The EEXIST handling, and the confusion about EEXIST
showing for both a path and a file need some separate handling (not
sure what to do on these parts yet).

> Yes, that's why we put the testing module in a separate patch from the
> fix, feel free to ignore it.  In fact ourselves have concerns about it ;)

I think that it is nice that you took the time to do so as you get
yourself more familiar with the TAP infrastructure in the tree and
prove your point.  For this case, I would not have gone to do this
much though ;p

[1]: https://commitfest.postgresql.org/24/
--
Michael

Attachment

Re: Possible race condition in pg_mkdir_p()?

From
Ning Yu
Date:
On Wed, Jul 31, 2019 at 2:21 PM Michael Paquier <michael@paquier.xyz> wrote:
> That's my impression as well.  Please note that this does not involve
> an actual bug in Postgres and that this is rather invasive, so this
> does not really qualify for a back-patch.  No objections to simplify
> the backend on HEAD though.  It would be good if you could actually
> register a patch to the commit fest app [1] and also rework the patch
> set so as at least PathNameCreateTemporaryDir wins its simplifications
> for the first problem (double-checking the other code paths would be
> nice as well).  The EEXIST handling, and the confusion about EEXIST
> showing for both a path and a file need some separate handling (not
> sure what to do on these parts yet).

Thanks for the suggestion and information, we will rework the patch and
register it.  The planned changes are: 1) remove the tests (cool!), 2)
simplify PathNameCreateTemporaryDir() and other callers.  The EEXIST
handling will be put in a separate patch so depends on the reviews we
could accept or drop it easily.

Best Regards
Ning