Thread: merge file_exists_in_directory and _fileExistsInDirectory functions and move into common file dumputils.c

Hi,
We have file_exists_in_directory function in pg_restore.c and same
code we are using in _fileExistsInDirectory function in pg_backup_archiver.c
also.
Here, I am attaching a patch to move these duplicate functions into
dumputils.c file

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachment
On Thu, Apr 10, 2025 at 10:41:33PM +0530, Mahendra Singh Thalor wrote:
> We have file_exists_in_directory function in pg_restore.c and same
> code we are using in _fileExistsInDirectory function in pg_backup_archiver.c
> also.
> Here, I am attaching a patch to move these duplicate functions into
> dumputils.c file

Indeed.  I don't quite see a reason not to remove this duplication,
and both routines in pg_restore.c and the pg_dump code are the same.

dumputils.h is only used by pg_dump and pg_dumpall, and its top
comment documents exactly that, so using this location for a routine
that would be used by a pg_restore path is a bit strange to me for
something that is qualified as a "dump" routine in your patch.

Perhaps we should just use a more centralized place, like file_utils.c
so as all frontends could benefit of it?

Please make sure to add it to the next commit fest that will begin in
July, this refactoring proposal is too late to be considered for v18.
--
Michael

Attachment
On 2025-Apr-11, Michael Paquier wrote:

> Perhaps we should just use a more centralized place, like file_utils.c
> so as all frontends could benefit of it?

I'm not sure about that.  This code looks to be making too many
assumptions that aren't acceptable for a general routine, such as
complaining only that the directory name is long without the possibility
that the culprit is the file name.  It's more or less okay in current
uses because they're all using harcoded short names, but that would not
hold in general.  At the same time, isn't every call of this routine a
potential TOCTTOU bug?  Again it's probably fine for the current code,
but I wouldn't be too sure about making this generally available as-is.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)



On Fri, 11 Apr 2025 at 10:21, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Apr 10, 2025 at 10:41:33PM +0530, Mahendra Singh Thalor wrote:
> > We have file_exists_in_directory function in pg_restore.c and same
> > code we are using in _fileExistsInDirectory function in pg_backup_archiver.c
> > also.
> > Here, I am attaching a patch to move these duplicate functions into
> > dumputils.c file
>
> Indeed.  I don't quite see a reason not to remove this duplication,
> and both routines in pg_restore.c and the pg_dump code are the same.

Thanks Michael for the feedback.

>
> dumputils.h is only used by pg_dump and pg_dumpall, and its top
> comment documents exactly that, so using this location for a routine
> that would be used by a pg_restore path is a bit strange to me for
> something that is qualified as a "dump" routine in your patch.
>
> Perhaps we should just use a more centralized place, like file_utils.c
> so as all frontends could benefit of it?

I tried to add it into file_utils.c but I was getting many "symbols not found errors"  so I moved this common function into dumputils.h as we have another common function in that file. (Ex; create_or_open_dir)

If we want to move this function into file_utils.c, then I can try to rewrite the patch.

>
> Please make sure to add it to the next commit fest that will begin in
> July, this refactoring proposal is too late to be considered for v18.
> --
> Michael

Okay. Thank you. I will add it.

On Fri, 11 Apr 2025 at 15:08, Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Apr-11, Michael Paquier wrote:
>
> > Perhaps we should just use a more centralized place, like file_utils.c
> > so as all frontends could benefit of it?
>
> I'm not sure about that.  This code looks to be making too many
> assumptions that aren't acceptable for a general routine, such as
> complaining only that the directory name is long without the possibility
> that the culprit is the file name.  It's more or less okay in current
> uses because they're all using harcoded short names, but that would not
> hold in general.  At the same time, isn't every call of this routine a
> potential TOCTTOU bug?  Again it's probably fine for the current code,
> but I wouldn't be too sure about making this generally available as-is.
>
> --
> Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
> "Oh, great altar of passive entertainment, bestow upon me thy discordant images
> at such speed as to render linear thought impossible" (Calvin a la TV)

Thanks Álvaro for the feedback.

/*
 * file_exists_in_directory
 *
 * Returns true if the file exists in the given directory.
 */
static bool
file_exists_in_directory(const char *dir, const char *filename)
{
    struct stat st;
    char        buf[MAXPGPATH];

    if (strlen(dir) >= MAXPGPATH)
        pg_fatal("directory name too long: \"%s\"", dir);

    if (strlen(filename) >= MAXPGPATH)
        pg_fatal("file name too long: \"%s\"", filename);

    /* Now check path length of dir/filename */
    if (snprintf(buf, MAXPGPATH, "%s/%s", dir, filename) >= MAXPGPATH)
        pg_fatal("combined name of directory:\"%s\" and file:\"%s\" is too long", filename, dir);

    return (stat(buf, &st) == 0 && S_ISREG(st.st_mode));
}

I did changes as per above code and added some checks in code to give an error for "too long" name. I started a new thread[1] for "too long names" check and later I will post an updated patch here to move duplicate functions into one common file.


--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com