Thread: Race condition in pg_database_size()

Race condition in pg_database_size()

From
Michael Fuhr
Date:
I'm occasionally seeing calls to pg_database_size() fail with

ERROR:  could not stat file "/var/lib/pgsql/data/base/16404/1738343": No such file or directory

So far I haven't noticed any other problems that might be related
to this error.  This database frequently uses temporary tables so
I'm wondering if the error might be due to a race condition in
db_dir_size(), which does the following:
   while ((direntry = ReadDir(dirdesc, path)) != NULL)   {       struct stat fst;
       if (strcmp(direntry->d_name, ".") == 0 ||           strcmp(direntry->d_name, "..") == 0)           continue;
       snprintf(filename, MAXPGPATH, "%s/%s", path, direntry->d_name);
       if (stat(filename, &fst) < 0)           ereport(ERROR,                   (errcode_for_file_access(),
      errmsg("could not stat file \"%s\": %m", filename)));
 
       dirsize += fst.st_size;   }

I'm wondering if the code should check for ENOENT if stat() fails
and either skip this entry silently under the assumption that the
file had been deleted since the call to ReadDir(), or issue a warning
without failing.

-- 
Michael Fuhr


Re: Race condition in pg_database_size()

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> I'm wondering if the code should check for ENOENT if stat() fails
> and either skip this entry silently under the assumption that the
> file had been deleted since the call to ReadDir(),

Probably.  Want to look through the rest of that module for similar
problems?
        regards, tom lane


Re: Race condition in pg_database_size()

From
Michael Fuhr
Date:
On Sat, Mar 10, 2007 at 12:32:04PM -0500, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > I'm wondering if the code should check for ENOENT if stat() fails
> > and either skip this entry silently under the assumption that the
> > file had been deleted since the call to ReadDir(),
> 
> Probably.  Want to look through the rest of that module for similar
> problems?

I think only db_dir_size() and calculate_tablespace_size() are
affected by this particular failure (ReadDir followed by stat).
I'll submit a patch -- any preferences for silent continuation vs.
continuation with a notice or warning?

-- 
Michael Fuhr


Re: Race condition in pg_database_size()

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> I'll submit a patch -- any preferences for silent continuation vs.
> continuation with a notice or warning?

I think silent is fine for ENOENT cases.  We know the file had been
there at ReadDir time, so the only possible conclusion is that it was
just unlinked, and I see no reason to complain about that.
        regards, tom lane


Re: Race condition in pg_database_size()

From
Michael Fuhr
Date:
On Sat, Mar 10, 2007 at 05:39:37PM -0500, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > I'll submit a patch -- any preferences for silent continuation vs.
> > continuation with a notice or warning?
> 
> I think silent is fine for ENOENT cases.  We know the file had been
> there at ReadDir time, so the only possible conclusion is that it was
> just unlinked, and I see no reason to complain about that.

Patch submitted.

-- 
Michael Fuhr


Re: Race condition in pg_database_size()

From
Alvaro Herrera
Date:
Michael Fuhr wrote:
> On Sat, Mar 10, 2007 at 05:39:37PM -0500, Tom Lane wrote:
> > Michael Fuhr <mike@fuhr.org> writes:
> > > I'll submit a patch -- any preferences for silent continuation vs.
> > > continuation with a notice or warning?
> > 
> > I think silent is fine for ENOENT cases.  We know the file had been
> > there at ReadDir time, so the only possible conclusion is that it was
> > just unlinked, and I see no reason to complain about that.
> 
> Patch submitted.

Applied, thanks.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.