Thread: Re: [PATCH v1] pg_ls_tmpdir to show directories

Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Justin Pryzby
Date:
Re-added -hackers.

Thanks for reviewing.

On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote:
> The implementation simply extends an existing functions with a boolean to
> allow for sub-directories. However, the function does not seem to show
> subdir contents recursively. Should it be the case?

> STM that "//"-comments are not project policy.

Sure, but the patch is less important than the design, which needs to be
addressed first.  The goal is to somehow show tmpfiles (or at least dirs) used
by parallel workers.  I mentioned a few possible ways, of which this was the
simplest to implement.  Showing files beneath the dir is probably good, but
need to decide how to present it.  Should there be a column for the dir (null
if not a shared filesets)?  Or some other presentation, like a boolean column
"is_shared_fileset".

> I'm unconvinced by the skipping condition:
> 
>   +  if (!S_ISREG(attrib.st_mode) &&
>   +      (!dir_ok && S_ISDIR(attrib.st_mode)))
>             continue;
> 
> which is pretty hard to read. ISTM you meant "not A and not (B and C)"
> instead?

I can write it as two ifs.  And, it's probably better to say:
    if (!ISDIR() || !dir_ok)

..which is same as: !(B && C), as you said.

> Catalog update should be a date + number? Maybe this is best left to the
> committer?

Yes, but I preferred to include it, causing a deliberate conflict, to ensure
it's not forgotten.

Thanks,
Justin



Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Fabien COELHO
Date:
> Re-added -hackers.

Indeed, I left it out by accident. I tried to bounce the original mail but 
it did not work.

> Thanks for reviewing.
>
> On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote:
>> The implementation simply extends an existing functions with a boolean to
>> allow for sub-directories. However, the function does not seem to show
>> subdir contents recursively. Should it be the case?
>
>> STM that "//"-comments are not project policy.
>
> Sure, but the patch is less important than the design, which needs to be
> addressed first.  The goal is to somehow show tmpfiles (or at least dirs) used
> by parallel workers.  I mentioned a few possible ways, of which this was the
> simplest to implement.  Showing files beneath the dir is probably good, but
> need to decide how to present it.  Should there be a column for the dir (null
> if not a shared filesets)?  Or some other presentation, like a boolean column
> "is_shared_fileset".

Why not simply showing the files underneath their directories?

   /path/to/tmp/file1
   /path/to/tmp/subdir1/file2

In which case probably showing the directory itself is not useful,
and the is_dir column could be dropped?

>> I'm unconvinced by the skipping condition:
>>
>>   +  if (!S_ISREG(attrib.st_mode) &&
>>   +      (!dir_ok && S_ISDIR(attrib.st_mode)))
>>             continue;
>>
>> which is pretty hard to read. ISTM you meant "not A and not (B and C)"
>> instead?
>
> I can write it as two ifs.

Hmmm. Not sure it would help that much. At least the condition must be 
right. Also, the comment should be updated.

>  And, it's probably better to say:
>  if (!ISDIR() || !dir_ok)

I cannot say I like it. dir_ok is cheaper to test so could come first.

> ..which is same as: !(B && C), as you said.

Ok, so you confirm that the condition was wrong.

>> Catalog update should be a date + number? Maybe this is best left to 
>> the committer?
>
> Yes, but I preferred to include it, causing a deliberate conflict, to ensure
> it's not forgotten.

Ok.

-- 
Fabien.



Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Justin Pryzby
Date:
On Fri, Dec 27, 2019 at 06:50:24PM +0100, Fabien COELHO wrote:
> >On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote:
> >>The implementation simply extends an existing functions with a boolean to
> >>allow for sub-directories. However, the function does not seem to show
> >>subdir contents recursively. Should it be the case?
> >
> >>STM that "//"-comments are not project policy.
> >
> >Sure, but the patch is less important than the design, which needs to be
> >addressed first.  The goal is to somehow show tmpfiles (or at least dirs) used
> >by parallel workers.  I mentioned a few possible ways, of which this was the
> >simplest to implement.  Showing files beneath the dir is probably good, but
> >need to decide how to present it.  Should there be a column for the dir (null
> >if not a shared filesets)?  Or some other presentation, like a boolean column
> >"is_shared_fileset".
> 
> Why not simply showing the files underneath their directories?
> 
>   /path/to/tmp/file1
>   /path/to/tmp/subdir1/file2
> 
> In which case probably showing the directory itself is not useful,
> and the is_dir column could be dropped?

The names are expected to look like this:

$ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
142977    4 drwxr-x---   3 postgres postgres     4096 Dec 27 13:51 /var/lib/pgsql/12/data/base/pgsql_tmp
169868    4 drwxr-x---   2 postgres postgres     4096 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
169347 5492 -rw-r-----   1 postgres postgres  5619712 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
169346 5380 -rw-r-----   1 postgres postgres  5505024 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0

I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2.
It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0.
Actually the results should be unique, either on filename or (dir,file).

"ls" wouldn't list same name twice, unless you list multiple dirs, like:
|ls a/b c/d.

It's worth thinking if subdir should be a separate column.

I'm interested to hear back from others.

Justin



Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Fabien COELHO
Date:
Hello Justin,

>> Why not simply showing the files underneath their directories?
>>
>>   /path/to/tmp/file1
>>   /path/to/tmp/subdir1/file2
>>
>> In which case probably showing the directory itself is not useful,
>> and the is_dir column could be dropped?
>
> The names are expected to look like this:
>
> $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
> 142977    4 drwxr-x---   3 postgres postgres     4096 Dec 27 13:51 /var/lib/pgsql/12/data/base/pgsql_tmp
> 169868    4 drwxr-x---   2 postgres postgres     4096 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
> 169347 5492 -rw-r-----   1 postgres postgres  5619712 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
> 169346 5380 -rw-r-----   1 postgres postgres  5505024 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0
>
> I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2.
> It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0.
> Actually the results should be unique, either on filename or (dir,file).

Ok, so this suggests recursing into subdirs, which requires to make a 
separate function of the inner loop.

> It's worth thinking if subdir should be a separate column.

My 0.02 €: I would rather simply keep the full path and just add subdir 
contents, so that the function output does not change at all.

-- 
Fabien.

Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Justin Pryzby
Date:
On Sat, Dec 28, 2019 at 07:52:55AM +0100, Fabien COELHO wrote:
> >>Why not simply showing the files underneath their directories?
> >>
> >>  /path/to/tmp/file1
> >>  /path/to/tmp/subdir1/file2
> >>
> >>In which case probably showing the directory itself is not useful,
> >>and the is_dir column could be dropped?
> >
> >The names are expected to look like this:
> >
> >$ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls
> >142977    4 drwxr-x---   3 postgres postgres     4096 Dec 27 13:51 /var/lib/pgsql/12/data/base/pgsql_tmp
> >169868    4 drwxr-x---   2 postgres postgres     4096 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset
> >169347 5492 -rw-r-----   1 postgres postgres  5619712 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0
> >169346 5380 -rw-r-----   1 postgres postgres  5505024 Dec  7 01:35
/var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0
> >
> >I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2.
> >It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0.
> >Actually the results should be unique, either on filename or (dir,file).
> 
> Ok, so this suggests recursing into subdirs, which requires to make a
> separate function of the inner loop.

Yea, it suggests that; but, SRF_RETURN_NEXT doesn't make that very easy.
It'd need to accept the fcinfo argument, and pg_ls_dir_files would call it once
for every tuple to be returned.  So it's recursive and saves its state...

The attached is pretty ugly, but I can't see how to do better.
The alternative seems to be to build up a full list of pathnames in the SRF
initial branch, and stat them all later.  Or stat them all in the INITIAL case,
and keep a list of stat structures to be emited during future calls.

BTW, it seems to me this error message should be changed:

                snprintf(path, sizeof(path), "%s/%s", fctx->location, de->d_name);
                if (stat(path, &attrib) < 0)
                        ereport(ERROR,
                                        (errcode_for_file_access(),
-                                        errmsg("could not stat directory \"%s\": %m", dir)));
+                                        errmsg("could not stat file \"%s\": %m", path)));


Attachment

Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Fabien COELHO
Date:
Hello Justin,

>> Ok, so this suggests recursing into subdirs, which requires to make a
>> separate function of the inner loop.
>
> Yea, it suggests that; but, SRF_RETURN_NEXT doesn't make that very easy.
> It'd need to accept the fcinfo argument, and pg_ls_dir_files would call it once
> for every tuple to be returned.  So it's recursive and saves its state...
>
> The attached is pretty ugly, but I can't see how to do better.

Hmmm… I do agree with pretty ugly:-)

If it really neads to be in the structure, why not save the open directory 
field in the caller and restore it after the recursive call, and pass the 
rest of the structure as a pointer? Something like:

   ... root_function(...)
   {
      struct status_t status = initialization();
      ...
      call rec_function(&status, path)
      ...
      cleanup();
   }

   ... rec_function(struct *status, path)
   {
      status->dir = opendir(path);
      for (dir contents)
      {
         if (it_is_a_dir)
         {
              /* some comment about why we do that… */
              dir_t saved = status->dir;
              status->dir = NULL;
              rec_function(status, subpath);
              status->dir = saved;
         }
      }
     closedir(status->dir), status->dir = NULL;
   }

> The alternative seems to be to build up a full list of pathnames in the SRF
> initial branch, and stat them all later.  Or stat them all in the INITIAL case,
> and keep a list of stat structures to be emited during future calls.

-- 
Fabien.

Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Justin Pryzby
Date:
Here's a version which uses an array of directory_fctx, rather than of DIR and
location.  That avoids changing the data structure and collatoral implications
to pg_ls_dir().

Currently, this *shows* subdirs of subdirs, but doesn't decend into them.
So I think maybe toplevel subdirs should be shown, too.
And maybe the is_dir flag should be re-introduced (although someone could call
pg_stat_file if needed).
I'm interested to hear feedback on that, although this patch still isn't great.

Attachment

Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Fabien COELHO
Date:
Hello Justin,

About this v4.

It applies cleanly.

I'm trying to think about how to get rid of the strange structure and 
hacks, and the arbitrary looking size 2 array.

Also the recursion is one step, but I'm not sure why, ISTM it could/should 
go on always?

Maybe the recursive implementation was not such a good idea, if I 
suggested it is because I did not noticed the "return next" re-entrant 
stuff, shame on me.

Looking at the code, ISTM that relying on a stack/list would be much 
cleaner and easier to understand. The code could look like:

   ls_func()
     if (first_time)
       initialize description
       set next directory to visit
     while (1)
        if next directory
          init/push next directory to visit as current
        read current directory
          if emty
            pop/close current directory
          elif is_a_dir and recursion allowed
            set next directory to visit
          else ...
            return next tuple
     cleanup

The point is to avoid a hack around the directory_fctx array, to have only 
one place to push/init a new directory (i.e. call AllocateDir and play 
around with the memory context) instead of two, and to allow deeper 
recursion if useful.

Details : snprintf return is not checked. Maybe it should say why an 
overflow cannot occur.

"bool nulls[3] = { 0,};" -> "bool nulls[3} = { false, false, false };"?

-- 
Fabien.



Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Justin Pryzby
Date:
On Wed, Jan 15, 2020 at 11:21:36AM +0100, Fabien COELHO wrote:
> I'm trying to think about how to get rid of the strange structure and hacks,
> and the arbitrary looking size 2 array.
> 
> Also the recursion is one step, but I'm not sure why, ISTM it could/should
> go on always?

Because tmpfiles only go one level deep.

> Looking at the code, ISTM that relying on a stack/list would be much cleaner
> and easier to understand. The code could look like:

I'm willing to change the implementation, but only after there's an agreement
about the desired behavior (extra column, one level, etc).

Justin



Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Fabien COELHO
Date:
Hello Justin,

>> I'm trying to think about how to get rid of the strange structure and hacks,
>> and the arbitrary looking size 2 array.
>>
>> Also the recursion is one step, but I'm not sure why, ISTM it could/should
>> go on always?
>
> Because tmpfiles only go one level deep.

I'm not sure it is a general rule. ISTM that extensions can use tmp files, 
and we would have no control about what they would do there.

>> Looking at the code, ISTM that relying on a stack/list would be much cleaner
>> and easier to understand. The code could look like:
>
> I'm willing to change the implementation, but only after there's an agreement
> about the desired behavior (extra column, one level, etc).

For the level, ISTM that the implementation should not make this 
assumption. If in practice there is just one level, then the function will 
not recurse deep, no problem.

For the column, I'm not sure that "isdir" is necessary.

You could put it implicitely in the file name by ending it with "/", 
and/or showing the directory contents is enough a hint that there is a 
directory?

Also, I'm not fully sure why ".*" files should be skipped, maybe it should 
be an option? Or the user can filter it with SQL if it does not want them?

-- 
Fabien.



Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Justin Pryzby
Date:
On Thu, Jan 16, 2020 at 09:34:32AM +0100, Fabien COELHO wrote:
> Also, I'm not fully sure why ".*" files should be skipped, maybe it should
> be an option? Or the user can filter it with SQL if it does not want them?

I think if someone wants the full generality, they can do this:

postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name,
pg_stat_file(p||'/'||name)s;
 name | size |      modification      | isdir 
------+------+------------------------+-------
 .foo | 4096 | 2020-01-16 08:57:04-05 | t

In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to
SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces:
WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no) suffix
FROMpg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT
COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix,'base') a FROM pg_tablespace,x)a) SELECT a,
pg_ls_dir(a||'/pgsql_tmp')FROM y WHERE d='pgsql_tmp';
 

I think changing dotfiles is topic for another patch.
That would also affect pg_ls_dir, and everything else that uses the backing
function pg_ls_dir_files_recurse.  I'd have to ask why not also show . and .. ?

(In fact, if I were to change anything, I would propose to limit pg_ls_tmpdir()
to files matching PG_TEMP_FILE_PREFIX).

Justin



Re: [PATCH v1] pg_ls_tmpdir to show directories

From
David Steele
Date:
Hi Fabien,

On 1/16/20 9:38 AM, Justin Pryzby wrote:
> On Thu, Jan 16, 2020 at 09:34:32AM +0100, Fabien COELHO wrote:
>> Also, I'm not fully sure why ".*" files should be skipped, maybe it should
>> be an option? Or the user can filter it with SQL if it does not want them?
> 
> I think if someone wants the full generality, they can do this:
> 
> postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name,
pg_stat_file(p||'/'||name)s;
>   name | size |      modification      | isdir
> ------+------+------------------------+-------
>   .foo | 4096 | 2020-01-16 08:57:04-05 | t
> 
> In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to
> SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces:
> WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no)
suffixFROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT
COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix,'base') a FROM pg_tablespace,x)a) SELECT a,
pg_ls_dir(a||'/pgsql_tmp')FROM y WHERE d='pgsql_tmp';
 
> 
> I think changing dotfiles is topic for another patch.
> That would also affect pg_ls_dir, and everything else that uses the backing
> function pg_ls_dir_files_recurse.  I'd have to ask why not also show . and .. ?
> 
> (In fact, if I were to change anything, I would propose to limit pg_ls_tmpdir()
> to files matching PG_TEMP_FILE_PREFIX).

We seem to be at an impasse on this patch.  What do you think of 
Justin's comments here?

Do you still believe a different implementation is required?

Regards,
-- 
-David
david@pgmasters.net



Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Justin Pryzby
Date:
On Tue, Mar 03, 2020 at 02:51:54PM -0500, David Steele wrote:
> Hi Fabien,
> 
> On 1/16/20 9:38 AM, Justin Pryzby wrote:
> >On Thu, Jan 16, 2020 at 09:34:32AM +0100, Fabien COELHO wrote:
> >>Also, I'm not fully sure why ".*" files should be skipped, maybe it should
> >>be an option? Or the user can filter it with SQL if it does not want them?
> >
> >I think if someone wants the full generality, they can do this:
> >
> >postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name,
pg_stat_file(p||'/'||name)s;
> >  name | size |      modification      | isdir
> >------+------+------------------------+-------
> >  .foo | 4096 | 2020-01-16 08:57:04-05 | t
> >
> >In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to
> >SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces:
> >WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no)
suffixFROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT
COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix,'base') a FROM pg_tablespace,x)a) SELECT a,
pg_ls_dir(a||'/pgsql_tmp')FROM y WHERE d='pgsql_tmp';
 
> >
> >I think changing dotfiles is topic for another patch.
> >That would also affect pg_ls_dir, and everything else that uses the backing
> >function pg_ls_dir_files_recurse.  I'd have to ask why not also show . and .. ?
> >
> >(In fact, if I were to change anything, I would propose to limit pg_ls_tmpdir()
> >to files matching PG_TEMP_FILE_PREFIX).
> 
> We seem to be at an impasse on this patch.  What do you think of Justin's
> comments here?

Actually, I found Fabien's comment regarding extensions use of tmp dir to be
convincing.  And I'm willing to update the patch to use a stack to show
arbitrarily-deep files/dirs rather than just one level deep (as used for shared
filesets in core postgres).

But I don't think it makes sense to go through more implementation/review
cycles without some agreement from a larger group regarding the
desired/intended interface.  Should there be a column for "parent dir" ?  Or a
column for "is_dir" ?  Should dirs be shown at all, or only files ?

-- 
Justin



Re: [PATCH v1] pg_ls_tmpdir to show directories

From
Alvaro Herrera
Date:
On 2020-Mar-03, Justin Pryzby wrote:

> But I don't think it makes sense to go through more implementation/review
> cycles without some agreement from a larger group regarding the
> desired/intended interface.  Should there be a column for "parent dir" ?  Or a
> column for "is_dir" ?  Should dirs be shown at all, or only files ?

IMO: is_dir should be there (and subdirs should be listed), but
parent_dir should not appear.  Also, the "path" should show the complete
pathname, including containing dirs, starting from whatever the "root"
is for the operation.

So for the example in the initial email, it would look like

path                    isdir
pgsql_tmp11025.0.sharedfileset/        t
pgsql_tmp11025.0.sharedfileset/0.0    f
pgsql_tmp11025.0.sharedfileset/1.0    f

plus additional columns, same as pg_ls_waldir et al.

I'd rather not have the code assume that there's a single level of
subdirs, or assuming that an entry in the subdir cannot itself be a dir;
that might end up hiding files for no good reason.

I don't understand what purpose is served by having pg_ls_waldir() hide
directories.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_ls_tmpdir to show directories and shared filesets

From
Justin Pryzby
Date:
On Tue, Mar 03, 2020 at 05:23:13PM -0300, Alvaro Herrera wrote:
> On 2020-Mar-03, Justin Pryzby wrote:
> 
> > But I don't think it makes sense to go through more implementation/review
> > cycles without some agreement from a larger group regarding the
> > desired/intended interface.  Should there be a column for "parent dir" ?  Or a
> > column for "is_dir" ?  Should dirs be shown at all, or only files ?
> 
> IMO: is_dir should be there (and subdirs should be listed), but
> parent_dir should not appear.  Also, the "path" should show the complete
> pathname, including containing dirs, starting from whatever the "root"
> is for the operation.
> 
> So for the example in the initial email, it would look like
> 
> path                    isdir
> pgsql_tmp11025.0.sharedfileset/        t
> pgsql_tmp11025.0.sharedfileset/0.0    f
> pgsql_tmp11025.0.sharedfileset/1.0    f
> 
> plus additional columns, same as pg_ls_waldir et al.
> 
> I'd rather not have the code assume that there's a single level of
> subdirs, or assuming that an entry in the subdir cannot itself be a dir;
> that might end up hiding files for no good reason.
> 

Thanks for your input, see attached.

I'm not sure if prefer the 0002 patch alone (which recurses into dirs all at
once during the initial call), or 0002+3+4, which incrementally reads the dirs
on each call (but requires keeping dirs opened).

> I don't understand what purpose is served by having pg_ls_waldir() hide
> directories.

We could talk about whether the other functions should show dirs, if it's worth
breaking their return type.  Or if they should show hidden or special files,
which doesn't require breaking the return.  But until then I am to leave the
behavior alone.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets

From
Justin Pryzby
Date:
On Thu, Mar 05, 2020 at 10:18:38AM -0600, Justin Pryzby wrote:
> I'm not sure if prefer the 0002 patch alone (which recurses into dirs all at
> once during the initial call), or 0002+3+4, which incrementally reads the dirs
> on each call (but requires keeping dirs opened).

I fixed an issue that leading dirs were being shown which should not have been,
which was easier in the 0004 patch, so squished.  And fixed a bug that
"special" files weren't excluded, and "missing_ok" wasn't effective.

> > I don't understand what purpose is served by having pg_ls_waldir() hide
> > directories.
> 
> We could talk about whether the other functions should show dirs, if it's worth
> breaking their return type.  Or if they should show hidden or special files,
> which doesn't require breaking the return.  But until then I am to leave the
> behavior alone.

I don't see why any of the functions would exclude dirs, but ls_tmpdir deserves
to be fixed since core postgres dynamically creates dirs there.

Also ... I accidentally changed the behavior: master not only doesn't decend
into dirs, it hides them - that was my original complaint.  I propose to *also*
change at least tmpdir and logdir to show dirs, but don't decend.  I left
waldir alone for now.

Since v12 ls_tmpdir and since v10 logdir and waldir exclude dirs, I think we
should backpatch documentation to say so.

ISTM pg_ls_tmpdir and ls_logdir should be called with missing_ok=true, since
they're not created until they're used.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets

From
Fabien COELHO
Date:
Hello Justin,

Some feedback about the v7 patch set.

About v7.1, seems ok.

About v7.2 & v7.3 seems ok, altought the two could be merged.

About v7.4:

The documentation sentences could probably be improved "for for", "used 
... used". Maybe:

   For the temporary directory for <parameter>tablespace</parameter>, ...
->
   For <parameter>tablespace</parameter> temporary directory, ...

   Directories are used for temporary files used by parallel
   processes, and are shown recursively.
->
   Directories holding temporary files used by parallel
   processes are shown recursively.

It seems that lists are used as FIFO structures by appending, fetching & 
deleting last, all of which are O(n). ISTM it would be better to use the 
head of the list by inserting, getting and deleting first, which are O(1).

ISTM that several instances of: "pg_ls_dir_files(..., true, false);" 
should be "pg_ls_dir_files(..., true, DIR_HIDE);".

About v7.5 looks like a doc update which should be merged with v7.4.

Alas, ISTM that there are no tests on any of these functions:-(

-- 
Fabien.



Re: pg_ls_tmpdir to show directories and shared filesets

From
Justin Pryzby
Date:
On Sat, Mar 07, 2020 at 03:14:37PM +0100, Fabien COELHO wrote:
> Some feedback about the v7 patch set.

Thanks for looking again

> About v7.1, seems ok.
> 
> About v7.2 & v7.3 seems ok, altought the two could be merged.

These are separate since I proprose that one should be backpatched to v12 and
the other to v10.

> About v7.4:
...
> It seems that lists are used as FIFO structures by appending, fetching &
> deleting last, all of which are O(n). ISTM it would be better to use the
> head of the list by inserting, getting and deleting first, which are O(1).

I think you're referring to linked lists, but pglists are now arrays, for which
that's backwards.  See 1cff1b95a and d97b714a2.  For example, list_delete_last
says:
 * This is the opposite of list_delete_first(), but is noticeably cheaper
 * with a long list, since no data need be moved.

> ISTM that several instances of: "pg_ls_dir_files(..., true, false);" should
> be "pg_ls_dir_files(..., true, DIR_HIDE);".

Oops, that affects an intermediate commit and maybe due to merge conflict.
Thanks.

> About v7.5 looks like a doc update which should be merged with v7.4.

No, v7.5 updates pg_proc.dat and changes the return type of two functions.
It's a short commit since all the infrastructure is implemented to make the
functions do whatever we want.  But it's deliberately separate since I'm
proposing a breaking change, and one that hasn't been discussed until now.

> Alas, ISTM that there are no tests on any of these functions:-(

Yeah.  Everything that includes any output is going to include timestamps;
those could be filtered out.  waldir is going to have random filenames, and a
differing number of rows.  But we should exercize pg_ls_dir_files at least
once..

My previous version had a bug with ignore_missing with pg_ls_tmpdir, which
would've been caught by a test like:
SELECT FROM pg_ls_tmpdir() WHERE name='Does not exist'; -- Never true, so the function runs to completion but returns
zerorows.
 

The 0006 commit changes that for logdir, too.  Without 0006, that will ERROR if
the dir doesn't exist (which I think would be the default during regression
tests).

It'd be nice to run pg_ls_tmpdir before the tmpdir exists, and again
afterwards.  But I'm having trouble finding a single place to put it.  The
closest I can find is dbsize.sql.  Any ideas ?

-- 
Justin



Re: pg_ls_tmpdir to show directories and shared filesets

From
Fabien COELHO
Date:
>> It seems that lists are used as FIFO structures by appending, fetching &
>> deleting last, all of which are O(n). ISTM it would be better to use the
>> head of the list by inserting, getting and deleting first, which are O(1).
>
> I think you're referring to linked lists, but pglists are now arrays,

Ok… I forgot about this change, so my point is void, you took the right 
one.

-- 
Fabien.

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
On Sat, Mar 07, 2020 at 03:14:37PM +0100, Fabien COELHO wrote:
> The documentation sentences could probably be improved "for for", "used ...
> used". Maybe:

> ISTM that several instances of: "pg_ls_dir_files(..., true, false);" should
> be "pg_ls_dir_files(..., true, DIR_HIDE);".

> Alas, ISTM that there are no tests on any of these functions:-(

Addressed these.

And reordered the last two commits to demonstrate and exercize the behavior
change in regress test.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Fabien COELHO
Date:
Hello Justin,

Patch series applies cleanly. The last status compiles and passes "make 
check". A few more comments:

* v8.[123] ok.

* v8.4

Avoid using the type name as a field name? "enum dir_action dir_action;" 
-> "enum dir_action action", or maybe rename "dir_action" enum 
"dir_action_t".

About pg_ls_dir:

"if (!fctx->dirdesc)" I do not think that is true even if AllocateDir 
failed, the list exists anyway. ISTM it should be linitial which is NULL 
in that case.

Given the overlap between pg_ls_dir and pg_ls_dir_files, ISTM that the 
former should call the slightly extended later with appropriate flags.

About populate_paths:

function is a little bit strange to me, ISTM it would deserve more 
comments.

I'm not sure the name reflect what it does. For instance, ISTM that it 
does one thing, but the name is plural. Maybe "move_to_next_path" or 
"update_current_path" or something?

It returns an int which can only be 0 or 1, which smells like an bool. 
What is this int/bool is not told in the function head comment. I guess it 
is whether the path was updated. When it returns false, the list length is 
down to one.

Shouldn't AllocateDir be tested for bad result? Maybe it is a dir but you 
do not have perms to open it? Or give a comment about why it cannot 
happen?

later, good, at least the function is called, even if it is only for an 
error case. Maybe some non empty coverage tests could be added with a 
"count(*) > 0" on not is_dir or maybe "count(*) = 0" on is_dir, for 
instance?

   (SELECT oid FROM pg_tablespace b WHERE b.spcname='regress_tblspace'
   UNION SELECT 0 ORDER BY 1 DESC LIMIT 1)b

The 'b' glued to the ')' looks pretty strange. I'd suggest ") AS b". 
Reusing the same alias twice could be avoided for clarity, maybe.

* v8.[56]

I'd say that a behavior change which adds a column and a possibly a few 
rows is ok, especially as the tmpdir contains subdirs now.

-- 
Fabien.



Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
I took a step back, and I wondered whether we should add a generic function for
listing a dir with metadata, possibly instead of changing the existing
functions.  Then one could do pg_ls_dir_metadata('pg_wal',false,false);

Since pg8.1, we have pg_ls_dir() to show a list of files.  Since pg10, we've
had pg_ls_logdir and pg_ls_waldir, which show not only file names but also
(some) metadata (size, mtime).  And since pg12, we've had pg_ls_tmpfile and
pg_ls_archive_statusdir, which also show metadata.

...but there's no a function which lists the metadata of an directory other
than tmp, wal, log.

One can do this:
|SELECT b.*, c.* FROM (SELECT 'base' a)a, LATERAL (SELECT a||'/'||pg_ls_dir(a.a)b)b, pg_stat_file(b)c;
..but that's not as helpful as allowing:
|SELECT * FROM pg_ls_dir_metadata('.',true,true);

There's also no function which recurses into an arbitrary directory, so it
seems shortsighted to provide a function to recursively list a tmpdir.

Also, since pg_ls_dir_metadata indicates whether the path is a dir, one can
write a SQL function to show the dir recursively.  It'd be trivial to plug in
wal/log/tmp (it seems like tmpdirs of other tablespace's are not entirely
trivial).
|SELECT * FROM pg_ls_dir_recurse('base/pgsql_tmp');

Also, on a neighboring thread[1], Tom indicated that the pg_ls_* functions
should enumerate all files during the initial call, which sounds like a bad
idea when recursively showing directories.  If we add a function recursing into
a directory, we'd need to discuss all the flags to expose to it, like recurse,
ignore_errors, one_filesystem?, show_dotfiles (and eventually bikeshed all the
rest of the flags in find(1)).

My initial patch [2] changed ls_tmpdir to show metadata columns including
is_dir, but not decend.  It's pretty unfortunate if a function called
pg_ls_tmpdir hides shared filesets, so maybe it really is best to change that
(it's new in v12).

I'm interested to in feedback on the alternative approach, as attached.  The
final patch to include all the rest of columns shown by pg_stat_file() is more
of an idea/proposal and not sure if it'll be desirable.  But pg_ls_tmpdir() is
essentially the same as my v1 patch.

This is intended to be mostly independent of any fix to the WARNING I reported
[1].  Since my patch collapses pg_ls_dir into pg_ls_dir_files, we'd only need
to fix one place.  I'm planning to eventually look into Tom's suggestion of
returning tuplestore to fix that, and maybe rebase this patchset on top of
that.

-- 
Justin

[1] https://www.postgresql.org/message-id/flat/20200308173103.GC1357%40telsasoft.com
[2] https://www.postgresql.org/message-id/20191214224735.GA28433%40telsasoft.com

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
@cfbot: rebased onto 085b6b6679e73b9b386f209b4d625c7bc60597c0

The merge conflict presents another opportunity to solicit comments on the new
approach.  Rather than making "recurse into tmpdir" the end goal:

  - add a function to show metadata of an arbitrary dir;
  - add isdir arguments to pg_ls_* functions (including pg_ls_tmpdir but not
    pg_ls_dir).
  - maybe add pg_ls_dir_recurse, which satisfies the original need;
  - retire pg_ls_dir (does this work with tuplestore?)
  - profit

The alternative seems to be to go back to Alvaro's earlier proposal:
 - not only add "isdir", but also recurse;

I think I would insist on adding a general function to recurse into any dir.
And *optionally* change ps_ls_* to recurse (either by accepting an argument, or
by making that a separate patch to debate).  tuplestore is certainly better
than keeping a stack/List of DIRs for this.

On Tue, Mar 10, 2020 at 01:30:37PM -0500, Justin Pryzby wrote:
> I took a step back, and I wondered whether we should add a generic function for
> listing a dir with metadata, possibly instead of changing the existing
> functions.  Then one could do pg_ls_dir_metadata('pg_wal',false,false);
> 
> Since pg8.1, we have pg_ls_dir() to show a list of files.  Since pg10, we've
> had pg_ls_logdir and pg_ls_waldir, which show not only file names but also
> (some) metadata (size, mtime).  And since pg12, we've had pg_ls_tmpfile and
> pg_ls_archive_statusdir, which also show metadata.
> 
> ...but there's no a function which lists the metadata of an directory other
> than tmp, wal, log.
> 
> One can do this:
> |SELECT b.*, c.* FROM (SELECT 'base' a)a, LATERAL (SELECT a||'/'||pg_ls_dir(a.a)b)b, pg_stat_file(b)c;
> ..but that's not as helpful as allowing:
> |SELECT * FROM pg_ls_dir_metadata('.',true,true);
> 
> There's also no function which recurses into an arbitrary directory, so it
> seems shortsighted to provide a function to recursively list a tmpdir.
> 
> Also, since pg_ls_dir_metadata indicates whether the path is a dir, one can
> write a SQL function to show the dir recursively.  It'd be trivial to plug in
> wal/log/tmp (it seems like tmpdirs of other tablespace's are not entirely
> trivial).
> |SELECT * FROM pg_ls_dir_recurse('base/pgsql_tmp');
> 
> Also, on a neighboring thread[1], Tom indicated that the pg_ls_* functions
> should enumerate all files during the initial call, which sounds like a bad
> idea when recursively showing directories.  If we add a function recursing into
> a directory, we'd need to discuss all the flags to expose to it, like recurse,
> ignore_errors, one_filesystem?, show_dotfiles (and eventually bikeshed all the
> rest of the flags in find(1)).
> 
> My initial patch [2] changed ls_tmpdir to show metadata columns including
> is_dir, but not decend.  It's pretty unfortunate if a function called
> pg_ls_tmpdir hides shared filesets, so maybe it really is best to change that
> (it's new in v12).
> 
> I'm interested to in feedback on the alternative approach, as attached.  The
> final patch to include all the rest of columns shown by pg_stat_file() is more
> of an idea/proposal and not sure if it'll be desirable.  But pg_ls_tmpdir() is
> essentially the same as my v1 patch.
> 
> This is intended to be mostly independent of any fix to the WARNING I reported
> [1].  Since my patch collapses pg_ls_dir into pg_ls_dir_files, we'd only need
> to fix one place.  I'm planning to eventually look into Tom's suggestion of
> returning tuplestore to fix that, and maybe rebase this patchset on top of
> that.
> 
> -- 
> Justin
> 
> [1] https://www.postgresql.org/message-id/flat/20200308173103.GC1357%40telsasoft.com
> [2] https://www.postgresql.org/message-id/20191214224735.GA28433%40telsasoft.com

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Fabien COELHO
Date:
Hello Justin,

Some feedback on v10:

All patches apply cleanly, one on top of the previous. I really wish there 
would be less than 9 patches…

* v10.1 doc change: ok

* v10.2 doc change: ok, not sure why it is not merged with previous

* v10.3 test add: could be merge with both previous

Tests seem a little contrived. I'm wondering whether something more 
straightforward could be proposed. For instance, once the tablespace is 
just created but not used yet, probably we do know that the tmp file 
exists and is empty?

* v10.4 at least, some code!

Compiles, make check ok.

pg_ls_dir_files: I'm fine with the flag approach given the number of 
switches and the internal nature of the function.

I'm not sure of the "FLAG_" prefix which seems too generic, even if it is 
local. I'd suggest "LS_DIR_*", maybe, as a more specific prefix.

ISTM that Pg style requires spaces around operators. I'd suggest some 
parenthesis would help as well, eg: "flags&FLAG_MISSING_OK" -> "(flags & 
FLAG_MISSING_OK)" and other instances.

About:

  if (S_ISDIR(attrib.st_mode)) {
    if (flags&FLAG_SKIP_DIRS)
      continue;
   }

and similars, why not the simpler:

  if (S_ISDIR(attrib.st_mode) && (flags & FLAG_SKIP_DIRS))
     continue;

Especially that it is done like that in previous cases.

Maybe I'd create defines for long and common flag specs, eg:

  #define ..._LS_SIMPLE (FLAG_SKIP_DIRS|FLAG_SKIP_HIDDEN|FLAG_SKIP_SPECIAL|FLAG_METADATA)

No attempt at recursing.

I'm not sure about these asserts:

        /* isdir depends on metadata */
        Assert(!(flags&FLAG_ISDIR) || (flags&FLAG_METADATA));

Hmmm. Why?

        /* Unreasonable to show isdir and skip dirs */
        Assert(!(flags&FLAG_ISDIR) || !(flags&FLAG_SKIP_DIRS));

Hmmm. Why would I prevent that, even if it has little sense, it should 
work. I do not see having false on the isdir column as an actual issue.

* v10.5 add is_dir column, a few tests & doc.

Ok.

* v10.6 behavior change for existing functions, always show isdir column,
and removal of IS_DIR flag.

I'm unsure why the features are removed, some use case may benefit from 
the more complete function?

Maybe flags defs should not be changed anyway?

I do not like much the "if (...) /* empty */;" code. Maybe it could be 
caught more cleanly later in the conditional structure.

* v10.7 adds "pg_ls_dir_recurse" function

Using sql recurse to possibly to implement the feature is pretty elegant
and limits open directories to one at a time, which is pretty neat.

Doc looks incomplete and the example is very contrived and badly indented.

The function definition does not follow the style around: uppercase 
whereas all others are lowercase, "" instead of '', no "as"…

I do not understand why oid 8511 is given to the new function.

I do not understand why UNION ALL and not UNION.

I would have put the definition after "pg_ls_dir_metadata" definition.

pg_ls_dir_metadata seems defined as (text,bool,bool) but called as 
(text,bool,bool,bool).

Maybe a better alias could be given instead of x?

There are no tests for the new function. I'm not sure it would work.

* v10.8 change flags & add test on pg_ls_logdir().

I'm unsure why it is done at this stage.

* v10.9 change some ls functions and fix patch 10.7 issue

I'm unsure why it is done at this stage. "make check" ok.

-- 
Fabien.

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
On Sun, Mar 15, 2020 at 06:15:02PM +0100, Fabien COELHO wrote:
> Some feedback on v10:

Thanks for looking.  I'm hoping to hear from Alvaro what he thinks of this
approach (all functions to show isdir, rather than one function which lists
recursively).

> All patches apply cleanly, one on top of the previous. I really wish there
> would be less than 9 patches…

I kept them separate to allow the earlier patches to be applied.
And intended to make easier to review, even if it's more work for me..

If you mean that it's a pain to apply 9 patches, I will suggest to use:
|git am -3 ./mailbox
where ./mailbox is either a copy of the mail you received, or retrieved from
the web interface.

To test that each one works (compiles, passes tests, etc), I use git rebase -i
HEAD~11 and "e"edit the target (set of) patches.

> * v10.1 doc change: ok
> 
> * v10.2 doc change: ok, not sure why it is not merged with previous

As I mentioned, separate since I'm proposing that they're backpatched to
different releases.  Those could be applied now (and Tom already applied a
patch identical to 0001 in a prior patchset).

> * v10.3 test add: could be merge with both previous

> Tests seem a little contrived. I'm wondering whether something more
> straightforward could be proposed. For instance, once the tablespace is just
> created but not used yet, probably we do know that the tmp file exists and
> is empty?

The tmpdir *doesn't* exist until someone creates tmpfiles there.
As it mentions:
+-- This tests the missing_ok parameter, which causes pg_ls_tmpdir to succeed even if the tmpdir doesn't exist yet

> * v10.4 at least, some code!
> I'm not sure of the "FLAG_" prefix which seems too generic, even if it is
> local. I'd suggest "LS_DIR_*", maybe, as a more specific prefix.

Done.

> ISTM that Pg style requires spaces around operators. I'd suggest some
> parenthesis would help as well, eg: "flags&FLAG_MISSING_OK" -> "(flags &
> FLAG_MISSING_OK)" and other instances.

Partially took your suggestion.

> About:
> 
>  if (S_ISDIR(attrib.st_mode)) {
>    if (flags&FLAG_SKIP_DIRS)
>      continue;
>   }
> 
> and similars, why not the simpler:
> 
>  if (S_ISDIR(attrib.st_mode) && (flags & FLAG_SKIP_DIRS))
>     continue;

That's not the same - if SKIP_DIRS isn't set, your way would fail that test for
dirs, and then hit the !ISREG test, and skip them anyway.
|else if (!S_ISREG(attrib.st_mode))
|    continue

> Maybe I'd create defines for long and common flag specs, eg:
> 
>  #define ..._LS_SIMPLE (FLAG_SKIP_DIRS|FLAG_SKIP_HIDDEN|FLAG_SKIP_SPECIAL|FLAG_METADATA)

Done

> I'm not sure about these asserts:
> 
>        /* isdir depends on metadata */
>        Assert(!(flags&FLAG_ISDIR) || (flags&FLAG_METADATA));
> 
> Hmmm. Why?

It's not supported to show isdir without showing metadata (because that case
isn't needed to support the old and the new behaviors).

+               if (flags & FLAG_METADATA)
+               {
+                       values[1] = Int64GetDatum((int64) attrib.st_size);
+                       values[2] = TimestampTzGetDatum(time_t_to_timestamptz(attrib.st_mtime));
+                       if (flags & FLAG_ISDIR)
+                               values[3] = BoolGetDatum(S_ISDIR(attrib.st_mode));
+               }

>        /* Unreasonable to show isdir and skip dirs */
>        Assert(!(flags&FLAG_ISDIR) || !(flags&FLAG_SKIP_DIRS));
> 
> Hmmm. Why would I prevent that, even if it has little sense, it should work.
> I do not see having false on the isdir column as an actual issue.

It's unimportant, but testing for intended use of flags during development.

> * v10.6 behavior change for existing functions, always show isdir column,
> and removal of IS_DIR flag.
> 
> I'm unsure why the features are removed, some use case may benefit from the
> more complete function?
> Maybe flags defs should not be changed anyway?

Maybe.  I put them back...but it means they're not being exercized by any
*used* case.

> I do not like much the "if (...) /* empty */;" code. Maybe it could be
> caught more cleanly later in the conditional structure.

This went away when I put back the SKIP_DIRS flag.

> * v10.7 adds "pg_ls_dir_recurse" function

> Doc looks incomplete and the example is very contrived and badly indented.

Why you think it's contrived?  Listing a tmpdir recursively is the initial
motivation of this patch.  Maybe you think I should list just the tmpdir for
one tablespace ?  Note that for temp_tablespaces parameter:

|When there is more than one name in the list, PostgreSQL chooses a random member of the list each time a temporary
objectis to be created; except that within a transaction, successively created temporary objects are placed in
successivetablespaces from the list.
 

> The function definition does not follow the style around: uppercase whereas
> all others are lowercase, "" instead of '', no "as"…

I used "" because of this:
| x.name||'/'||a.name
I don't know if there's a better way to join paths in SQL, or if that suggests
this is a bad way to do it.

> I do not understand why oid 8511 is given to the new function.

I used: ./src/include/catalog/unused_oids (maybe not correctly).

> I do not understand why UNION ALL and not UNION.

In general, union ALL can avoid a "distinct" plan node, but it doesn't seem to
have any effect here.

> I would have put the definition after "pg_ls_dir_metadata" definition.

Done

> pg_ls_dir_metadata seems defined as (text,bool,bool) but called as
> (text,bool,bool,bool).

fixed, thanks.

> Maybe a better alias could be given instead of x?
> 
> There are no tests for the new function. I'm not sure it would work.

I added something which would've caught the issue with number of arguments.

> * v10.8 change flags & add test on pg_ls_logdir().
> 
> I'm unsure why it is done at this stage.

I think it makes sense to allow ls_logdir to succeed even if ./log doesn't
exist, since it isn't created by initdb or during postmaster start, and since
we already using MISSING_OK for tmpdir.

But a separate patch since we didn't previous discuss changing logdir.  

> * v10.9 change some ls functions and fix patch 10.7 issue
> I'm unsure why it is done at this stage. "make check" ok.

This is the last patch in the series, since I think it's least likely to be
agreed on.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Fabien COELHO
Date:
About v11, ISTM that the recursive function should check for symbolic 
links and possibly avoid them:

  sh> cd data/base
  sh> ln -s .. foo

  psql> SELECT * FROM pg_ls_dir_recurse('.');
  ERROR:  could not stat file
"./base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo":
Toomany levels of symbolic links
 
  CONTEXT:  SQL function "pg_ls_dir_recurse" statement 1

This probably means using lstat instead of (in supplement to?) stat, and 
probably tell if something is a link, and if so not recurse in them.

-- 
Fabien.



Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
On Mon, Mar 16, 2020 at 04:20:21PM +0100, Fabien COELHO wrote:
> 
> About v11, ISTM that the recursive function should check for symbolic links
> and possibly avoid them:
> 
>  sh> cd data/base
>  sh> ln -s .. foo
> 
>  psql> SELECT * FROM pg_ls_dir_recurse('.');
>  ERROR:  could not stat file
"./base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo":
Toomany levels of symbolic links
 
>  CONTEXT:  SQL function "pg_ls_dir_recurse" statement 1
> 
> This probably means using lstat instead of (in supplement to?) stat, and
> probably tell if something is a link, and if so not recurse in them.

Thanks for looking.

I think that opens up a can of worms.  I don't want to go into the business of
re-implementing all of find(1) - I count ~128 flags (most of which take
arguments).  You're referring to find -L vs find -P, and some people would want
one and some would want another.  And don't forget about find -H...

pg_stat_file doesn't expose the file type (I guess because it's not portable?),
and I think it's outside the scope of this patch to change that.  Maybe it
suggests that the pg_ls_dir_recurse patch should be excluded.

ISTM if someone wants to recursively list a directory, they should avoid
putting cycles there, or permission errors, or similar.  Or they should write
their own C extension that borrows from pg_ls_dir_files but handles more
arguments.

-- 
Justin



Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Fabien COELHO
Date:
Hello Justin,

>>  psql> SELECT * FROM pg_ls_dir_recurse('.');
>>  ERROR:  could not stat file
"./base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo":
Toomany levels of symbolic links
 
>>  CONTEXT:  SQL function "pg_ls_dir_recurse" statement 1
>>
>> This probably means using lstat instead of (in supplement to?) stat, and
>> probably tell if something is a link, and if so not recurse in them.
>
> Thanks for looking.
>
> I think that opens up a can of worms.  I don't want to go into the business of
> re-implementing all of find(1) - I count ~128 flags (most of which take
> arguments).  You're referring to find -L vs find -P, and some people would want
> one and some would want another.  And don't forget about find -H...

This is not the point. The point is that a link can change a finite tree 
into cyclic graph, and you do not want to delve into that, ever.

The "find" command, by default, does not recurse into a link because of 
said problem, and the user *must* ask for it and assume the infinite loop 
if any.

So if you implement one behavior, it should be not recursing into links. 
Franckly, I would not provide the recurse into link alternative, but it 
could be implemented if someone wants it, and the problem that come with 
it.

> pg_stat_file doesn't expose the file type (I guess because it's not portable?),

You are right that Un*x and Windows are not the same wrt link. It seems 
that there is already something about that in port:

   "./src/port/dirmod.c:pgwin32_is_junction(const char *path)"

So most of the details are already hidden.

> and I think it's outside the scope of this patch to change that.  Maybe it
> suggests that the pg_ls_dir_recurse patch should be excluded.

IMHO, I really think that it should be included. Dealing with links is no 
big deal, but you need an additional column in _metadata to tell it is a 
link, and there is a ifdef because testing is a little different between 
unix and windows. I'd guess around 10-20 lines of code added.

> ISTM if someone wants to recursively list a directory, they should avoid
> putting cycles there, or permission errors, or similar.

Hmmm. I'd say the user should like to be able to call the function and 
never have a bad experience with it such as a failure on an infinite loop.

> Or they should write their own C extension that borrows from 
> pg_ls_dir_files but handles more arguments.

ISTM that the point of your patch is to provide the basic tool needed to 
list directories contents, and handling links somehow is a necessary part 
of that.

-- 
Fabien.



Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
On Mon, Mar 16, 2020 at 04:20:21PM +0100, Fabien COELHO wrote:
> This probably means using lstat instead of (in supplement to?) stat, and
> probably tell if something is a link, and if so not recurse in them.

On Mon, Mar 16, 2020 at 07:21:06PM +0100, Fabien COELHO wrote:
> IMHO, I really think that it should be included. Dealing with links is no
> big deal, but you need an additional column in _metadata to tell it is a
> link

Instead of showing another column, I changed to show links with isdir=false.
At the cost of two more patches, to allow backpatching docs and maybe separate
commit to make the subtle change obvious in commit history, at least.

I see a few places in the backend and a few more in the fronted using the same
logic that I used for islink(), but I'm not sure if there's a good place to put
that to allow factoring out at least the other backend ones.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Alvaro Herrera
Date:
I pushed 0001 and 0003 (as a single commit).  archive_statusdir didn't
get here until 12, so your commit message was mistaken.  Also, pg10 is
slightly different so it didn't apply there, so I left it alone.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
On Mon, Mar 16, 2020 at 07:17:36PM -0300, Alvaro Herrera wrote:
> I pushed 0001 and 0003 (as a single commit).  archive_statusdir didn't
> get here until 12, so your commit message was mistaken.  Also, pg10 is
> slightly different so it didn't apply there, so I left it alone.

Thanks, I appreciate it (and I'm sure Fabien will appreciate having two fewer
patches...).

@cfbot: rebased onto b4570d33aa045df330bb325ba8a2cbf02266a555

I realized that if I lstat() a file to make sure links to dirs show as
isdir=false, it's odd to then show size and timestamps of the dir.  So changed
to use lstat ... and squished.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Fabien COELHO
Date:
About v13, seens as one patch:

Function "pg_ls_dir_metadata" documentation suggests a variable number of 
arguments with brackets, but parameters are really mandatory.

  postgres=# SELECT pg_ls_dir_metadata('.');
  ERROR:  function pg_ls_dir_metadata(unknown) does not exist
  LINE 1: SELECT pg_ls_dir_metadata('.');
                ^
  HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
  postgres=# SELECT pg_ls_dir_metadata('.', true, true);
  …

The example in the documentation could be better indented. Also, ISTM that 
there are two implicit laterals (format & pg_ls_dir_recurse) that I would 
make explicit. I'd use the pcs alias explicitely. I'd use meaningful 
aliases (eg ts instead of b, …).

On reflection, I think that a boolean "isdir" column is a bad idea because 
it is not extensible. I'd propose to switch to the standard "ls" approach 
of providing the type as one character: '-' for regular, 'd' for 
directory, 'l' for link, 's' for socket, 'c' for character special…

ISTM that "lstat" is not available on windows, which suggests to call 
"stat" always, and then "lstat" on un*x and pg ports stuff on win.

I'm wondering about the restriction on directories only. Why should it not 
work on a file? Can it be easily extended to work on a simple file? If so, 
it could be just "pg_ls".

-- 
Fabien.

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
On Tue, Mar 17, 2020 at 10:21:48AM +0100, Fabien COELHO wrote:
> 
> About v13, seens as one patch:
> 
> Function "pg_ls_dir_metadata" documentation suggests a variable number of
> arguments with brackets, but parameters are really mandatory.

Fixed, and added tests on 1 and 3 arg versions of both pg_ls_dir() and
pg_ls_dir_metadata().

It seems like the only way to make variable number of arguments is is with
multiple entries in pg_proc.dat, one for each "number of" arguments.  Is that
right ?

> The example in the documentation could be better indented. Also, ISTM that
> there are two implicit laterals (format & pg_ls_dir_recurse) that I would
> make explicit. I'd use the pcs alias explicitely. I'd use meaningful aliases
> (eg ts instead of b, …).

> On reflection, I think that a boolean "isdir" column is a bad idea because
> it is not extensible. I'd propose to switch to the standard "ls" approach of
> providing the type as one character: '-' for regular, 'd' for directory, 'l'
> for link, 's' for socket, 'c' for character special…

I think that's outside the scope of the patch, since I'd want to change
pg_stat_file; that's where I borrowed "isdir" from, for consistency.

Note that both LS_DIR_HISTORIC and LS_DIR_MODERN include LS_DIR_SKIP_SPECIAL,
so only pg_ls_dir itself show specials, so they way to do it would be to 1)
change pg_stat_file to expose the file's "type", 2) use pg_ls_dir() AS a,
lateral pg_stat_file(a) AS b, 3) then consider also changing LS_DIR_MODERN and
all the existing pg_ls_*.

> ISTM that "lstat" is not available on windows, which suggests to call "stat"
> always, and then "lstat" on un*x and pg ports stuff on win.

I believe that's handled here.
src/include/port/win32_port.h:#define lstat(path, sb) stat(path, sb)

> I'm wondering about the restriction on directories only. Why should it not
> work on a file? Can it be easily extended to work on a simple file? If so,
> it could be just "pg_ls".

I think that's a good idea, except it doesn't fit with what the code does:
AllocDir() and ReadDir().  Instead, use pg_stat_file() for that.

Hm, I realized that the existing pg_ls_dir_metadata was skipping links to dirs,
since !ISREG().  So changed to use both stat() and lstat().

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> It seems like the only way to make variable number of arguments is is with
> multiple entries in pg_proc.dat, one for each "number of" arguments.  Is that
> right ?

Another way to do it is to have one entry, put the full set of arguments
into the initial pg_proc.dat data, and then use CREATE OR REPLACE FUNCTION
later during initdb to install some defaults.  See existing cases in
system_views.sql, starting about line 1180.  Neither way is especially
pretty, so take your choice.

            regards, tom lane



Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
On Tue, Mar 17, 2020 at 02:04:01PM -0500, Justin Pryzby wrote:
> > The example in the documentation could be better indented. Also, ISTM that
> > there are two implicit laterals (format & pg_ls_dir_recurse) that I would
> > make explicit. I'd use the pcs alias explicitely. I'd use meaningful aliases
> > (eg ts instead of b, …).
> 
> > On reflection, I think that a boolean "isdir" column is a bad idea because
> > it is not extensible. I'd propose to switch to the standard "ls" approach of
> > providing the type as one character: '-' for regular, 'd' for directory, 'l'
> > for link, 's' for socket, 'c' for character special…
> 
> I think that's outside the scope of the patch, since I'd want to change
> pg_stat_file; that's where I borrowed "isdir" from, for consistency.
> 
> Note that both LS_DIR_HISTORIC and LS_DIR_MODERN include LS_DIR_SKIP_SPECIAL,
> so only pg_ls_dir itself show specials, so they way to do it would be to 1)
> change pg_stat_file to expose the file's "type", 2) use pg_ls_dir() AS a,
> lateral pg_stat_file(a) AS b, 3) then consider also changing LS_DIR_MODERN and
> all the existing pg_ls_*.

The patch intends to fix the issue of "failing to show failed filesets"
(because dirs are skipped) while also generalizing existing functions (to show
directories and "isdir" column) and providing some more flexible ones (to list
file and metadata of a dir, which is currently possible [only] for "special"
directories, or by recursively calling pg_stat_file).

I'm still of the opinion that supporting arbitrary file types is out of scope,
but I changed the "isdir" to show "type".  I'm only supporting '[-dl]'.  I
don't want to have to check #ifdef S_ISDOOR or whatever other vendors have.  I
insist that it is a separate patch, since it depends on everything else, and I
have no feedback from anybody else as to whether any of that is desired.

template1=# SELECT * FROM pg_ls_waldir();
           name           |   size   |         access         |      modification      |         change         |
creation| type 
 

--------------------------+----------+------------------------+------------------------+------------------------+----------+------
 barr                     |        0 | 2020-03-31 14:43:11-05 | 2020-03-31 14:43:11-05 | 2020-03-31 14:43:11-05 |
  | ?
 
 baz                      |     4096 | 2020-03-31 14:39:18-05 | 2020-03-31 14:39:18-05 | 2020-03-31 14:39:18-05 |
  | d
 
 foo                      |        0 | 2020-03-31 14:39:37-05 | 2020-03-31 14:39:37-05 | 2020-03-31 14:39:37-05 |
  | -
 
 archive_status           |     4096 | 2020-03-31 14:38:20-05 | 2020-03-31 14:38:18-05 | 2020-03-31 14:38:18-05 |
  | d
 
 000000010000000000000001 | 16777216 | 2020-03-31 14:42:53-05 | 2020-03-31 14:43:08-05 | 2020-03-31 14:43:08-05 |
  | -
 
 bar                      |        3 | 2020-03-31 14:39:16-05 | 2020-03-31 14:39:01-05 | 2020-03-31 14:39:01-05 |
  | l
 


Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Fabien COELHO
Date:
Hello Justin,

About v15, seen as one patch.

Patches serie applies cleanly, compiles, "make check" ok.

Documentation:
  - indent documentation text around 80 cols, as done around?
  - indent SQL example for readability and capitalize keywords
    (pg_ls_dir_metadata)
  - "For each file in a directory, list the file and its metadata."
    maybe: "List files and their metadata in a directory"?

Code:
  - Most pg_ls_*dir* functions call pg_ls_dir_files(), which looks like
    reasonable refactoring, ISTM that the code is actually smaller.
  - please follow pg style, eg not "} else {"
  - there is a "XXX" (meaning fixme?) tag remaining in a comment.
  - file types: why not do block & character devices, fifo and socket
    as well, before the unkown case?
  - I'm wondering whether could pg_stat_file call pg_ls_dir_files without
    too much effort? ISTM that the output structure nearly the same. I do
    not like much having one function specialized for files and one for
    directories.

Tests:
  - good, there are some!
  - indent SQL code, eg by starting a new line on new clauses?
  - put comments on separate lines (I'm not against it on principle, I do
    that, but I do not think that it is done much in test files).

-- 
Fabien.



Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
On Sun, Apr 12, 2020 at 01:53:40PM +0200, Fabien COELHO wrote:
> About v15, seen as one patch.

Thanks for looking.

> - I'm wondering whether could pg_stat_file call pg_ls_dir_files without
>   too much effort? ISTM that the output structure nearly the same. I do
>   not like much having one function specialized for files and one for
>   directories.

I refactored but not like that.  As I mentioned in the commit message, I don't
see a good way to make a function operate on a file when the function's primary
data structure is a DIR*.  Do you ?  I don't think it should call stat() and
then conditionally branch off to pg_stat_file().

There are two functions because they wrap two separate syscalls, which see as
good, transparent goal.  If we want a function that does what "ls -al" does,
that would also be a good example to follow, except that we already didn't
follow it.

/bin/ls first stat()s the path, and then either outputs its metadata (if it's a
file or if -d was specified) or lists a dir.  It's essentially a wrapper around
*two* system calls (stat and readdir/getdents).

Maybe we could invent a new pg_ls() which does that, and then refactor existing
code.  Or, maybe it would be a SQL function which calls stat() and then
conditionally calls pg_ls_dir if isdir=True (or type='d').  That would be easy
if we merge the commit which outputs all stat fields.

I'm still hoping for confirmation from a committer if this approach is worth
pursuing:

https://www.postgresql.org/message-id/20200310183037.GA29065%40telsasoft.com
https://www.postgresql.org/message-id/20200313131232.GO29065%40telsasoft.com
|Rather than making "recurse into tmpdir" the end goal:
|
|  - add a function to show metadata of an arbitrary dir;
|  - add isdir arguments to pg_ls_* functions (including pg_ls_tmpdir but not
|    pg_ls_dir).
|  - maybe add pg_ls_dir_recurse, which satisfies the original need;
|  - retire pg_ls_dir (does this work with tuplestore?)
|  - profit

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Fabien COELHO
Date:
Hello Justin,

> Rebased onto 7b48f1b490978a8abca61e9a9380f8de2a56f266 and renumbered OIDs.

Some feedback about v18, seen as one patch.

Patch applies cleanly & compiles. "make check" is okay.

pg_stat_file() and pg_stat_dir_files() now return a char type, as well as 
the function which call them, but the documentation does not seem to say 
that it is the case.

I must admit that I'm not a fan on the argument management of 
pg_ls_dir_metadata and pg_ls_dir_metadata_1arg and others. I understand 
that it saves a few lines though, so maybe let it be.

There is a comment in pg_ls_dir_files which talks about pg_ls_dir.

Could pg_ls_*dir functions C implementations be dropped in favor of a pure 
SQL implementation, like you did with recurse?

If so, ISTM that pg_ls_dir_files() could be significantly simplified by 
moving its filtering flag to SQL conditions on "type" and others. That 
could allow not to change the existing function output a keep the "isdir" 
column defined as "type = 'd'" where it was used previously, if someone 
complains, but still have the full capability of "ls". That would also 
allow to drop the "*_1arg" hacks. Basically I'm advocating having 1 or 2 
actual C functions, and all other variants managed at the SQL level.

-- 
Fabien.



Re: pg_ls_tmpdir to show directories and shared filesets (andpg_ls_*)

From
Justin Pryzby
Date:
On Sun, Jun 07, 2020 at 10:07:19AM +0200, Fabien COELHO wrote:
> Hello Justin,
> > Rebased onto 7b48f1b490978a8abca61e9a9380f8de2a56f266 and renumbered OIDs.

Rebased again on whatever broke func.sgml.

> pg_stat_file() and pg_stat_dir_files() now return a char type, as well as
> the function which call them, but the documentation does not seem to say
> that it is the case.

Fixed, thanks

> I must admit that I'm not a fan on the argument management of
> pg_ls_dir_metadata and pg_ls_dir_metadata_1arg and others. I understand that
> it saves a few lines though, so maybe let it be.

I think you're saying that you don't like the _1arg functions, but they're
needed to allow the regression tests to pass:

| * note: this wrapper is necessary to pass the sanity check in opr_sanity,
| * which checks that all built-in functions that share the implementing C
| * function take the same number of arguments

> There is a comment in pg_ls_dir_files which talks about pg_ls_dir.
> 
> Could pg_ls_*dir functions C implementations be dropped in favor of a pure
> SQL implementation, like you did with recurse?

I'm still waiting to hear feedback from a commiter if this is a good idea to
put this into the system catalog.  Right now, ts_debug is the only nontrivial
function.

> If so, ISTM that pg_ls_dir_files() could be significantly simplified by
> moving its filtering flag to SQL conditions on "type" and others. That could
> allow not to change the existing function output a keep the "isdir" column
> defined as "type = 'd'" where it was used previously, if someone complains,
> but still have the full capability of "ls". That would also allow to drop
> the "*_1arg" hacks. Basically I'm advocating having 1 or 2 actual C
> functions, and all other variants managed at the SQL level.

You want to get rid of the 1arg stuff and just have one function.

I see your point, but I guess the C function would still need to accept a
"missing_ok" argument, so we need two functions, so there's not much utility in
getting rid of the "include_dot_dirs" argument, which is there for consistency
with pg_ls_dir.

Conceivably we could 1) get rid of pg_ls_dir, and 2) get rid of the
include_dot_dirs argument and 3) maybe make "missing_ok" a required argument;
and, 4) get rid of the C wrapper functions, and replace with a bunch of stuff
like this:

SELECT name, size, access, modification, change, creation, type='d' AS isdir
FROM pg_ls_dir_metadata('pg_wal') WHERE substring(name,1,1)!='.' AND type!='d';

Where the defaults I changed in this patchset still remain to be discussed:
with or without metadata, hidden files, dotdirs.

As I'm still waiting for committer feedback on the first 10 patches, so not
intending to add more.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Sun, Jun 21, 2020 at 08:53:25PM -0500, Justin Pryzby wrote:
> I'm still waiting to hear feedback from a commiter if this is a good idea to
> put this into the system catalog.  Right now, ts_debug is the only nontrivial
> function.

I'm still missing feedback from committers about the foundation of this
approach.

But I finally looked into the pg_rewind test failure 

That led met to keep the "dir" as a separate column, since that's what's needed
there, and it's more convenient to have a separate column than to provide a
column needing to be parsed.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Tue, Jul 14, 2020 at 10:08:39PM -0500, Justin Pryzby wrote:
> I'm still missing feedback from committers about the foundation of this
> approach.

Now rebased on top of fix for my own bug report (1d09fb1f).

I also changed argument handling for pg_ls_dir_recurse().

Passing '.' gave an initial path of . (of course) but then every other path
begins with './' which I didn't like, since it's ambiguous with empty path, or
.// or ././ ...  And one could pass './' which gives different output (like
././).  

So I specially handled the input of '.'.  Maybe the special value should be
NULL instead of ''.  But it looks no other system functions are currently
non-strict.

For pg_rewind testcase, getting the output path+filename uses a coalesce, since
the rest of the test does stuff like strcmp("pg_wal").

Still waiting for feedback from a committer.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Sat, Jul 18, 2020 at 03:15:32PM -0500, Justin Pryzby wrote:
> Still waiting for feedback from a committer.

This patch has been waiting for input from a committer on the approach I've
taken with the patches since March 10, so I'm planning to set to "Ready" - at
least ready for senior review.

-- 
Justin



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Tue, Sep 08, 2020 at 02:51:26PM -0500, Justin Pryzby wrote:
> On Sat, Jul 18, 2020 at 03:15:32PM -0500, Justin Pryzby wrote:
> > Still waiting for feedback from a committer.
> 
> This patch has been waiting for input from a committer on the approach I've
> taken with the patches since March 10, so I'm planning to set to "Ready" - at
> least ready for senior review.

@cfbot: rebased

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Wed, Oct 28, 2020 at 02:34:02PM -0500, Justin Pryzby wrote:
> On Tue, Sep 08, 2020 at 02:51:26PM -0500, Justin Pryzby wrote:
> > On Sat, Jul 18, 2020 at 03:15:32PM -0500, Justin Pryzby wrote:
> > > Still waiting for feedback from a committer.
> > 
> > This patch has been waiting for input from a committer on the approach I've
> > taken with the patches since March 10, so I'm planning to set to "Ready" - at
> > least ready for senior review.
> 
> @cfbot: rebased

Rebased on e152506adef4bc503ea7b8ebb4fedc0b8eebda81

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
>> This patch has been waiting for input from a committer on the approach I've
>> taken with the patches since March 10, so I'm planning to set to "Ready" - at
>> least ready for senior review.

I took a quick look through this.  This is just MHO, of course:

* I don't think it's okay to change the existing signatures of
pg_ls_logdir() et al.  Even if you can make an argument that it's
not too harmful to add more output columns, replacing pg_stat_file's
isdir output with something of a different name and datatype is most
surely not OK --- there is no possible way that doesn't break existing
user queries.

I think possibly a more acceptable approach is to leave these functions
alone but add documentation explaining how to get the additional info.
You could say things along the lines of "pg_ls_waldir() is the same as
pg_ls_dir_metadata('pg_wal') except for showing fewer columns."

* I'm not very much on board with implementing pg_ls_dir_recurse()
as a SQL function that depends on a WITH RECURSIVE construct.
I do not think that's okay from either performance or security
standpoints.  Surely it can't be hard to build a recursion capability
into the C code?  We could then also debate whether this ought to be
a separate function at all, instead of something you invoke via an
additional boolean flag parameter to pg_ls_dir_metadata().

* I'm fairly unimpressed with the testing approach, because it doesn't
seem like you're getting very much coverage.  It's hard to do better while
still having the totally-fixed output expected by our regular regression
test framework, but to me that just says not to test these functions in
that framework.  I'd consider ripping all of that out in favor of a
TAP test.

While I didn't read the C code in any detail, a couple of things stood
out to me:

* I noticed that you did s/stat/lstat/.  That's fine on Unix systems,
but it won't have any effect on Windows systems (cf bed90759f),
which means that we'll have to document a platform-specific behavioral
difference.  Do we want to go there?  Maybe this patch needs to wait
on somebody fixing our lack of real lstat() on Windows.  (I assume BTW
that this means the WIN32 code in get_file_type() is unreachable.)

* This bit:

+        /* Skip dot dirs? */
+        if (flags & LS_DIR_SKIP_DOT_DIRS &&
+            (strcmp(de->d_name, ".") == 0 ||
+             strcmp(de->d_name, "..") == 0))
+            continue;
+
+        /* Skip hidden files? */
+        if (flags & LS_DIR_SKIP_HIDDEN &&
+            de->d_name[0] == '.')
             continue;

doesn't seem to have thought very carefully about the interaction
of those two flags, ie it seems like LS_DIR_SKIP_HIDDEN effectively
implies LS_DIR_SKIP_DOT_DIRS.  Do we want that?

            regards, tom lane



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> >> This patch has been waiting for input from a committer on the approach I've
> >> taken with the patches since March 10, so I'm planning to set to "Ready" - at
> >> least ready for senior review.
>
> I took a quick look through this.  This is just MHO, of course:
>
> * I don't think it's okay to change the existing signatures of
> pg_ls_logdir() et al.  Even if you can make an argument that it's
> not too harmful to add more output columns, replacing pg_stat_file's
> isdir output with something of a different name and datatype is most
> surely not OK --- there is no possible way that doesn't break existing
> user queries.

I disagree that we need to stress over this- we pretty routinely change
the signature of various catalogs and functions and anyone using these
is already of the understanding that we are free to make such changes
between major versions.  If anything, we should be strongly discouraging
the notion of "don't break user queries" when it comes to administrative
and monitoring functions like these because, otherwise, we end up with
things like the mess that is pg_start/stop_backup() (and just contrast
that to what we did to recovery.conf when thinking about "well, do we
need to 'deprecate' or keep around the old stuff so we don't break
things for users who use these functions?" or the changes made in v10,
neither of which have produced much in the way of complaints).

Let's focus on working towards cleaner APIs and functions, accepting a
break when it makes sense to, which seems to be the case with this patch
(though I agree about using a TAP test suite and about performing the
directory recursion in C instead), and not pull forward cruft that we
then are telling ourselves we have to maintain compatibility of
indefinitely and at the expense of sensible APIs.

Thanks,

Stephen

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I took a quick look through this.  This is just MHO, of course:
>> 
>> * I don't think it's okay to change the existing signatures of
>> pg_ls_logdir() et al.

> I disagree that we need to stress over this- we pretty routinely change
> the signature of various catalogs and functions and anyone using these
> is already of the understanding that we are free to make such changes
> between major versions.

Well, like I said, just MHO.  Anybody else want to weigh in?

I'm mostly concerned about removing the isdir output of pg_stat_file().
Maybe we could compromise to the extent of keeping that, allowing it
to be partially duplicative of a file-type-code output column.

            regards, tom lane



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> I took a quick look through this.  This is just MHO, of course:
> >>
> >> * I don't think it's okay to change the existing signatures of
> >> pg_ls_logdir() et al.
>
> > I disagree that we need to stress over this- we pretty routinely change
> > the signature of various catalogs and functions and anyone using these
> > is already of the understanding that we are free to make such changes
> > between major versions.
>
> Well, like I said, just MHO.  Anybody else want to weigh in?
>
> I'm mostly concerned about removing the isdir output of pg_stat_file().
> Maybe we could compromise to the extent of keeping that, allowing it
> to be partially duplicative of a file-type-code output column.

I don't have any particular issue with keeping isdir as a convenience
column.  I agree it'll now be a bit duplicative but that seems alright.

Thanks,

Stephen

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> >> This patch has been waiting for input from a committer on the approach I've
> >> taken with the patches since March 10, so I'm planning to set to "Ready" - at
> >> least ready for senior review.
> 
> I took a quick look through this.  This is just MHO, of course:
> 
> * I don't think it's okay to change the existing signatures of
> pg_ls_logdir() et al.  Even if you can make an argument that it's
> not too harmful to add more output columns, replacing pg_stat_file's
> isdir output with something of a different name and datatype is most
> surely not OK --- there is no possible way that doesn't break existing
> user queries.
> 
> I think possibly a more acceptable approach is to leave these functions
> alone but add documentation explaining how to get the additional info.
> You could say things along the lines of "pg_ls_waldir() is the same as
> pg_ls_dir_metadata('pg_wal') except for showing fewer columns."
> 
> * I'm not very much on board with implementing pg_ls_dir_recurse()
> as a SQL function that depends on a WITH RECURSIVE construct.
> I do not think that's okay from either performance or security
> standpoints.  Surely it can't be hard to build a recursion capability

Thanks.  WITH RECURSIVE was the "new approach" I took early this year.  Of
course we can recurse in C, now that I know (how) to use the tuplestore.
Working on that patch was how I ran into the "LIMIT 1" SRF bug.

I don't see how security is relevant though, though, since someone can run a
the WITH query directly.  The function just needs to be restricted to
superusers, same as pg_ls_dir().

Anyway, I've re-ordered commits so this the last patch, since earlier commits
don't need to depend on it.  I don't think it's even essential to provide a
recursive function (anyone could write the CTE), so long as we don't hide dirs
and show isdir or type.

I implemented it first as a separate function and then as an optional argument
to pg_ls_dir_files().  If it's implemented as an optional "mode" of an existing
function, there's the constraint that returning a "path" argument has to be
after all other arguments (the ones that are useful without recursion) or else
it messes up other functions (like pg_ls_waldir()) that also call
pg_ls_dir_files().

> doesn't seem to have thought very carefully about the interaction
> of those two flags, ie it seems like LS_DIR_SKIP_HIDDEN effectively
> implies LS_DIR_SKIP_DOT_DIRS.  Do we want that?

Yes it's implied.  Those options exist to support the pre-existing behavior.
pg_ls_dir can optionaly show dotdirs, but pg_ls_*dir skip all hidden files
(which is documented since 8b6d94cf6).  I'm happy to implement something else
if a different behavior is desirable.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
[ v24-0001-Document-historic-behavior-of-links-to-directori.patch ]

The cfbot is unhappy with one of the test cases you added:

6245@@ -259,9 +259,11 @@
6246 select path, filename, type from pg_ls_dir_metadata('.', true, false, true) where
path!~'[0-9]|pg_internal.init|global.tmp'order by 1; 
6247                path               |       filename        | type
6248 ----------------------------------+-----------------------+------
6249+ PG_VERSION                       | PG_VERSION            | -
6250  base                             | base                  | d
6251  base/pgsql_tmp                   | pgsql_tmp             | d
6252  global                           | global                | d
6253+ global/config_exec_params        | config_exec_params    | -
6254  global/pg_control                | pg_control            | -
6255  global/pg_filenode.map           | pg_filenode.map       | -
6256  pg_commit_ts                     | pg_commit_ts          | d
6257@@ -285,7 +287,6 @@
6258  pg_subtrans                      | pg_subtrans           | d
6259  pg_tblspc                        | pg_tblspc             | d
6260  pg_twophase                      | pg_twophase           | d
6261- PG_VERSION                       | PG_VERSION            | -
6262  pg_wal                           | pg_wal                | d
6263  pg_wal/archive_status            | archive_status        | d
6264  pg_xact                          | pg_xact               | d
6265@@ -293,7 +294,7 @@
6266  postgresql.conf                  | postgresql.conf       | -
6267  postmaster.opts                  | postmaster.opts       | -
6268  postmaster.pid                   | postmaster.pid        | -
6269-(34 rows)
6270+(35 rows)

This shows that (a) the test is sensitive to prevailing collation and
(b) it's not filtering out enough temporary files.  Even if those things
were fixed, though, the test would break every time we added/removed
some PGDATA substructure.  Worse, it'd also break if say somebody had
edited postgresql.conf and left an editor backup file behind, or when
running in an installation where the configuration files are someplace
else.  I think this is way too fragile to be acceptable.

Maybe it could be salvaged by reversing the sense of the WHERE condition
so that instead of trying to blacklist stuff, you whitelist just a small
number of files that should certainly be there.

            regards, tom lane



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Fri, Dec 04, 2020 at 12:23:23PM -0500, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> [ v24-0001-Document-historic-behavior-of-links-to-directori.patch ]
> 
> The cfbot is unhappy with one of the test cases you added:

> Maybe it could be salvaged by reversing the sense of the WHERE condition
> so that instead of trying to blacklist stuff, you whitelist just a small
> number of files that should certainly be there.

Yes, I had noticed this one.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote:
> * I don't think it's okay to change the existing signatures of
> pg_ls_logdir() et al.  Even if you can make an argument that it's
> not too harmful to add more output columns, replacing pg_stat_file's
> isdir output with something of a different name and datatype is most
> surely not OK --- there is no possible way that doesn't break existing
> user queries.
> 
> I think possibly a more acceptable approach is to leave these functions
> alone but add documentation explaining how to get the additional info.
> You could say things along the lines of "pg_ls_waldir() is the same as
> pg_ls_dir_metadata('pg_wal') except for showing fewer columns."

On Mon, Nov 23, 2020 at 06:06:19PM -0500, Tom Lane wrote:
> I'm mostly concerned about removing the isdir output of pg_stat_file().
> Maybe we could compromise to the extent of keeping that, allowing it
> to be partially duplicative of a file-type-code output column.

On Tue, Nov 24, 2020 at 11:53:22AM -0500, Stephen Frost wrote:
> I don't have any particular issue with keeping isdir as a convenience
> column.  I agree it'll now be a bit duplicative but that seems alright.

Maybe we should do what Tom said, and leave pg_ls_* unchanged, but also mark
them as deprecated in favour of:
| pg_ls_dir_metadata(dir), dir={'pg_wal/archive_status', 'log', 'pg_wal', 'base/pgsql_tmp'}

However, pg_ls_tmpdir is special since it handles tablespace tmpdirs, which it
seems is not trivial to get from sql:

+SELECT * FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(b.oid),'')||suffix, 'base/pgsql_tmp') AS dir
+FROM pg_tablespace b, pg_control_system() pcs,
+LATERAL format('/PG_%s_%s', left(current_setting('server_version_num'), 2), pcs.catalog_version_no) AS suffix) AS
dir,
+LATERAL pg_ls_dir_recurse(dir) AS a;

For context, the line of reasoning that led me to this patch series was
something like this:

0) Why can't I list shared tempfiles (dirs) using pg_ls_tmpdir() ?
1) Implement recursion for pg_ls_tmpdir();
2) Eventually realize that it's silly to implement a function to recurse into
   one particular directory when no general feature exists;
3) Implement generic facility;

> * I noticed that you did s/stat/lstat/.  That's fine on Unix systems,
> but it won't have any effect on Windows systems (cf bed90759f),
> which means that we'll have to document a platform-specific behavioral
> difference.  Do we want to go there?
>
> Maybe this patch needs to wait on somebody fixing our lack of real lstat() on Windows.

I think only the "top" patches depend on lstat (for the "type" column and
recursion, to avoid loops).  The initial patches are independently useful, and
resolve the original issue of hiding tmpdirs.  I've rebased and re-arranged the
patches to reflect this.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Stephen Frost
Date:
Greetings,

* Justin Pryzby (pryzby@telsasoft.com) wrote:
> On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote:
> > * I don't think it's okay to change the existing signatures of
> > pg_ls_logdir() et al.  Even if you can make an argument that it's
> > not too harmful to add more output columns, replacing pg_stat_file's
> > isdir output with something of a different name and datatype is most
> > surely not OK --- there is no possible way that doesn't break existing
> > user queries.
> >
> > I think possibly a more acceptable approach is to leave these functions
> > alone but add documentation explaining how to get the additional info.
> > You could say things along the lines of "pg_ls_waldir() is the same as
> > pg_ls_dir_metadata('pg_wal') except for showing fewer columns."
>
> On Mon, Nov 23, 2020 at 06:06:19PM -0500, Tom Lane wrote:
> > I'm mostly concerned about removing the isdir output of pg_stat_file().
> > Maybe we could compromise to the extent of keeping that, allowing it
> > to be partially duplicative of a file-type-code output column.
>
> On Tue, Nov 24, 2020 at 11:53:22AM -0500, Stephen Frost wrote:
> > I don't have any particular issue with keeping isdir as a convenience
> > column.  I agree it'll now be a bit duplicative but that seems alright.
>
> Maybe we should do what Tom said, and leave pg_ls_* unchanged, but also mark
> them as deprecated in favour of:
> | pg_ls_dir_metadata(dir), dir={'pg_wal/archive_status', 'log', 'pg_wal', 'base/pgsql_tmp'}

Haven't really time to review the patches here in detail right now
(maybe next month), but in general, I dislike marking things as
deprecated.  If we don't want to change them and we're happy to continue
supporting them as-is (which is what 'deprecated' really means), then we
can just do so- nothing stops us from that.  If we don't think the
current API makes sense, for whatever reason, we can just change that-
there's no need for a 'deprecation period', as we already have major
versions and support each major version for 5 years.

I haven't particularly strong feelings one way or the other regarding
these particular functions.  If you asked which way I leaned, I'd say
that I'd rather redefine the functions to make more sense and to be easy
to use for people who would like to use them.  I wouldn't object to new
functions to provide that either though.  I don't think there's all that
much code or that it's changed often enough to be a big burden to keep
both, but that's more feeling than anything based in actual research at
this point.

Thanks,

Stephen

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Wed, Dec 23, 2020 at 01:17:10PM -0600, Justin Pryzby wrote:
> On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote:
> > * I noticed that you did s/stat/lstat/.  That's fine on Unix systems,
> > but it won't have any effect on Windows systems (cf bed90759f),
> > which means that we'll have to document a platform-specific behavioral
> > difference.  Do we want to go there?
> >
> > Maybe this patch needs to wait on somebody fixing our lack of real lstat() on Windows.
> 
> I think only the "top" patches depend on lstat (for the "type" column and
> recursion, to avoid loops).  The initial patches are independently useful, and
> resolve the original issue of hiding tmpdirs.  I've rebased and re-arranged the
> patches to reflect this.

I said that, but then failed to attach the re-arranged patches.
Now I also renumbered OIDs following best practice.

The first handful of patches address the original issue, and I think could be
"ready":

$ git log --oneline origin..pg-ls-dir-new |tac
... Document historic behavior of links to directories..
... Add tests on pg_ls_dir before changing it
... Add pg_ls_dir_metadata to list a dir with file metadata..
... pg_ls_tmpdir to show directories and "isdir" argument..
... pg_ls_*dir to show directories and "isdir" column..

These others are optional:
... pg_ls_logdir to ignore error if initial/top dir is missing..
... pg_ls_*dir to return all the metadata from pg_stat_file..

..and these maybe requires more work for lstat on windows:
... pg_stat_file and pg_ls_dir_* to use lstat()..
... pg_ls_*/pg_stat_file to show file *type*..
... Preserve pg_stat_file() isdir..
... Add recursion option in pg_ls_dir_files..

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Tue, Apr 06, 2021 at 11:01:31AM -0500, Justin Pryzby wrote:
> On Wed, Dec 23, 2020 at 01:17:10PM -0600, Justin Pryzby wrote:
> > On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote:
> > > * I noticed that you did s/stat/lstat/.  That's fine on Unix systems,
> > > but it won't have any effect on Windows systems (cf bed90759f),
> > > which means that we'll have to document a platform-specific behavioral
> > > difference.  Do we want to go there?
> > >
> > > Maybe this patch needs to wait on somebody fixing our lack of real lstat() on Windows.
> > 
> > I think only the "top" patches depend on lstat (for the "type" column and
> > recursion, to avoid loops).  The initial patches are independently useful, and
> > resolve the original issue of hiding tmpdirs.  I've rebased and re-arranged the
> > patches to reflect this.
> 
> I said that, but then failed to attach the re-arranged patches.
> Now I also renumbered OIDs following best practice.
> 
> The first handful of patches address the original issue, and I think could be
> "ready":
> 
> $ git log --oneline origin..pg-ls-dir-new |tac
> ... Document historic behavior of links to directories..
> ... Add tests on pg_ls_dir before changing it
> ... Add pg_ls_dir_metadata to list a dir with file metadata..
> ... pg_ls_tmpdir to show directories and "isdir" argument..
> ... pg_ls_*dir to show directories and "isdir" column..
> 
> These others are optional:
> ... pg_ls_logdir to ignore error if initial/top dir is missing..
> ... pg_ls_*dir to return all the metadata from pg_stat_file..
> 
> ..and these maybe requires more work for lstat on windows:
> ... pg_stat_file and pg_ls_dir_* to use lstat()..
> ... pg_ls_*/pg_stat_file to show file *type*..
> ... Preserve pg_stat_file() isdir..
> ... Add recursion option in pg_ls_dir_files..

@cfbot: rebased

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
"Bossart, Nathan"
Date:
In an attempt to get this patch set off the ground again, I took a
look at the first 5 patches.

0001: This one is a very small documentation update for pg_stat_file
to point out that isdir will be true for symbolic links to
directories.  Given this is true, I think the patch looks good.

0002: This patch adds some very basic testing for pg_ls_dir().  The
only comment that I have for this one is that I might also check
whether '..' is included in the results of the include_dot_dirs tests.
The docs specifically note that include_dot_dirs indicates whether
both '.' and '..' are included, so IMO we might as well verify that.

0003: This one didn't apply cleanly until I used 'git apply -3', so it
likely needs a rebase.  This patch introduces the pg_ls_dir_metadata()
function, which appears to just be pg_ls_dir() with some additional
columns for the size and modification time.  My initial reaction to
this one is that we should just add those columns to pg_ls_dir() to
match all the other pg_ls_* functions (and not bother attempting to
maintain historic behavior for things like hidden and special files).
I believe there is some existing discussion on this point upthread, so
perhaps there is a good reason to make a new function.  In any case, I
like the idea of having pg_ls_dir() use pg_ls_dir_files() internally
like the rest of the pg_ls_* functions.

0004: This one changes pg_ls_tmpdir to show directories as well.  I
think this is a reasonable change.  On it's own, the patch looks
alright, although it might look different if my suggestions for 0003
were followed.

0005: This one adjusts the rest of the pg_ls_* functions to show
directories.  Again, I think this is a reasonable change.  As noted in
0003, I think it'd be alright just to have all the pg_ls_* functions
show special and hidden files as well.  It's simple enough already to
filter our files that start with '.' if necessary, and I'm not sure
there's any strong reason for leaving out special files.  If special
files are included, perhaps isdir should be changed to indicate the
file type instead of just whether it is a directory.  (Reading ahead,
it looks like this is what 0009 might do.)

I haven't looked at the following patches too much, but I'm getting
the idea that they might address a lot of the feedback above and that
the first bunch of patches are more like staging patches that add the
abilities without changing the behavior.  I wonder if just going
straight to the end goal behavior might simplify the patch set a bit.
I can't say I feel too strongly about this, but I figure I'd at least
share my thoughts.

Nathan


Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Mon, Nov 22, 2021 at 07:17:01PM +0000, Bossart, Nathan wrote:
> In an attempt to get this patch set off the ground again, I took a
> look at the first 5 patches.

> I haven't looked at the following patches too much, but I'm getting
> the idea that they might address a lot of the feedback above and that
> the first bunch of patches are more like staging patches that add the
> abilities without changing the behavior.  I wonder if just going
> straight to the end goal behavior might simplify the patch set a bit.
> I can't say I feel too strongly about this, but I figure I'd at least
> share my thoughts.

Thanks for looking.

The patches are separate since the early patches are the most necessary, least
disputable parts, to allow the possibility of (say) chaging pg_ls_tmpdir() without
changing other functions, since pg_ls_tmpdir was was original motivation behind
this whole thread.

In a recent thread, Bharath Rupireddy added pg_ls functions for the logical
dirs, but expressed a preference not to add the metadata columns.  I still
think that at least "isdir" should be added to all the "ls" functions, since
it's easy to SELECT the columns you want, and a bit of a pain to write the
corresponding LATERAL query: 'dir' AS dir, pg_ls_dir(dir) AS ls,
pg_stat_file(ls) AS st.  I think it would be strange if pg_ls_tmpdir() were to
return a different set of columns than the other functions, even though admins
or extensions might have created dirs or other files in those directories.

Tom pointed out that we don't have a working lstat() for windows, so then it
seems like we're not yet ready to show file "types" (we'd show the type of the
link target, which is sometimes what's wanted, but not usually what "ls" would
show), nor ready to implement recurse.

As before:

On Tue, Apr 06, 2021 at 11:01:31AM -0500, Justin Pryzby wrote:
> The first handful of patches address the original issue, and I think could be
> "ready":
> 
> $ git log --oneline origin..pg-ls-dir-new |tac
> ... Document historic behavior of links to directories..
> ... Add tests on pg_ls_dir before changing it
> ... Add pg_ls_dir_metadata to list a dir with file metadata..
> ... pg_ls_tmpdir to show directories and "isdir" argument..
> ... pg_ls_*dir to show directories and "isdir" column..
> 
> These others are optional:
> ... pg_ls_logdir to ignore error if initial/top dir is missing..
> ... pg_ls_*dir to return all the metadata from pg_stat_file..
> 
> ..and these maybe requires more work for lstat on windows:
> ... pg_stat_file and pg_ls_dir_* to use lstat()..
> ... pg_ls_*/pg_stat_file to show file *type*..
> ... Preserve pg_stat_file() isdir..
> ... Add recursion option in pg_ls_dir_files..

rebased on 1922d7c6e1a74178bd2f1d5aa5a6ab921b3fcd34

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Fabien COELHO
Date:
Hello Justin,

It seems that the v31 patch does not apply anymore:

   postgresql> git apply ~/v31-0001-Document-historic-behavior-of-links-to-directori.patch
   error: patch failed: doc/src/sgml/func.sgml:27410
   error: doc/src/sgml/func.sgml: patch does not apply

-- 
Fabien.



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Thu, Dec 23, 2021 at 09:14:18AM -0400, Fabien COELHO wrote:
> It seems that the v31 patch does not apply anymore:
> 
>   postgresql> git apply ~/v31-0001-Document-historic-behavior-of-links-to-directori.patch
>   error: patch failed: doc/src/sgml/func.sgml:27410
>   error: doc/src/sgml/func.sgml: patch does not apply

Thanks for continuing to follow this patch ;)

I fixed a conflict with output/tablespace from d1029bb5a et seq.
I'm not sure why you got a conflict with 0001, though.

I think the 2nd half of the patches are still waiting for fixes to lstat() on
windows.

You complained before that there were too many patches, and I can see how it
might be a pain depending on your workflow.  But the division is deliberate.
Dealing with patchsets is easy for me: I use "mutt" and for each patch
attachment, I type "|git am" (or |git am -3).

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Fabien COELHO
Date:
Hello Justin,

Happy new year!

> I think the 2nd half of the patches are still waiting for fixes to lstat() on
> windows.

Not your problem?

Here is my review about v32:

All patches apply cleanly.

# part 01

One liner doc improvement to tell that creation time is only available on windows.
It is indeed not available on Linux.

OK.

# part 02

Add tests for various options on pg_ls_dir, and for pg_stat_file, which were not
exercised before.  "make check" is ok.

OK.

# part 03

This patch adds a new pg_ls_dir_metadata.  Internally, this is an extension of
pg_ls_dir_files function which is used by other pg_ls functions.  Doc ok.

About the code:

ISTM that the "if (1) { if (2) continue; } else if(3) { if (4) continue; }" structure"
may be simplified to "if (1 && 2) continue; if (3 && 4) continue;", at least if
IS_DIR and IS_REG are incompatible? Otherwise, at least "else if (3 & 4) continue"?

The ifdef WIN32 (which probably detects windows 64 bits…) overwrites values[3]. ISTM
it could be reordered so that there is no overwrite, and simpler single assignements.

   #ifndef WIN32
     v = ...;
   #else
     v = ... ? ... : ...;
   #endif

New tests are added which check that the result columns are configured as required,
including error cases.

"make check" is ok.

OK.

# part 04

Add a new "isdir" column to "pg_ls_tmpdir" output.  This is a small behavioral
change.  I'm ok with it, however I'm unsure why we would not jump directly to
the "type" char column done later in the patch series.  ISTM all such functions
should be extended the same way for better homogeneity? That would also impact
"waldir", "archive_status", "logical_*", "replslot" variants. "make check" ok.

OK.

# part 05

This patch applies my previous advice:-) ISTM that parts 4 and 5 should be one
single patch. The test changes show that only waldir has a test. Would it be
possible to add minimal tests to other variants as well?  "make check" ok.

I'd consider add such tests with part 02.

OK.

# part 06

This part extends and adds a test for pg_ls_logdir. ISTM that it should
be merged with the previous patches.  "make check" is ok.

OK.

# part 07

This part extends pg_stat_file with more date informations.

ISTM that the documentation should be clear about windows vs unix/cygwin specific
data provided (change/creation).

The code adds a new value_from_stat function to avoid code duplication.
Fine with me.

All pg_ls_*dir functions are impacted. Fine with me.

"make check" is ok.

OK.

# part 08

This part substitutes lstat to stat. Fine with me.  "make check" is ok.
I guess that lstat does something under windows even if the concept of
link is somehow different there. Maybe the doc should say so somewhere?

OK.

# part 09

This part switches the added "isdir" to a "type" column.  "make check" is ok.
This is a definite improvement.

OK.

# part 10

This part adds a redundant "isdir" column. I do not see the point.
"make check" is ok.

NOT OK.

# part 11

This part adds a recurse option. Why not. However, the true value does not
seem to be tested? "make check" is ok.

My opinion is unclear.

Overall, ignoring part 10, this makes a definite improvement to postgres ls
capabilities. I do not seen any reason why not to add those.

-- 
Fabien.

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Fabien COELHO
Date:
> Here is my review about v32:

I forgot to tell that doc generation for the cumulated changes also works.

-- 
Fabien.



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Julien Rouhaud
Date:
Hi,

On Sun, Jan 02, 2022 at 02:55:04PM +0100, Fabien COELHO wrote:
> 
> > Here is my review about v32:
> 
> I forgot to tell that doc generation for the cumulated changes also works.

Unfortunately the patchset doesn't apply anymore:

http://cfbot.cputube.org/patch_36_2377.log
=== Applying patches on top of PostgreSQL commit ID 4483b2cf29bfe8091b721756928ccbe31c5c8e14 ===
=== applying patch ./v32-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
[...]
patching file src/test/regress/expected/misc_functions.out
Hunk #1 succeeded at 274 (offset 7 lines).
patching file src/test/regress/expected/tablespace.out
Hunk #1 FAILED at 16.
1 out of 1 hunk FAILED -- saving rejects to file src/test/regress/expected/tablespace.out.rej

Justin, could you send a rebased version?



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Sun, Jan 02, 2022 at 01:07:29PM +0100, Fabien COELHO wrote:
> One liner doc improvement to tell that creation time is only available on windows.
> It is indeed not available on Linux.

The change is about the "isflag" flag, not creation time.

         Returns a record containing the file's size, last access time stamp,
         last modification time stamp, last file status change time stamp (Unix
         platforms only), file creation time stamp (Windows only), and a flag
-        indicating if it is a directory.
+        indicating if it is a directory (or a symbolic link to a directory).

> # part 03
> ISTM that the "if (1) { if (2) continue; } else if(3) { if (4) continue; }" structure"
> may be simplified to "if (1 && 2) continue; if (3 && 4) continue;", at least if
> IS_DIR and IS_REG are incompatible?

No, what you suggested is not the same;

We talked about this before:
https://www.postgresql.org/message-id/20200315212729.GC26184@telsasoft.com

> Otherwise, at least "else if (3 & 4) continue"?

I could write the *final* "else if" like that, but then it would be different
from the previous case.  Which would be confusing and prone to mistakes.

If I wrote it like this, I think it'd just provoke suggestions from someone
else to change it differently:

        /* Skip dirs or special files? */
        if (S_ISDIR(attrib.st_mode) && !(flags & LS_DIR_SKIP_DIRS))
                continue;
        if (!S_ISDIR(attrib.st_mode) && !S_ISREG(attrib.st_mode) && !(flags & LS_DIR_SKIP_SPECIAL)
                continue;

...
<< Why don't you use "else if" instead of "if (a){} if (!a && b){}" >>

I'm going to leave it up to a committer.

> The ifdef WIN32 (which probably detects windows 64 bits…) overwrites values[3]. ISTM
> it could be reordered so that there is no overwrite, and simpler single assignements.
> 
>   #ifndef WIN32
>     v = ...;
>   #else
>     v = ... ? ... : ...;
>   #endif

I changed this but without using nested conditionals.

> Add a new "isdir" column to "pg_ls_tmpdir" output.  This is a small behavioral
> change.  I'm ok with it, however I'm unsure why we would not jump directly to
> the "type" char column done later in the patch series.

Because that depends on lstat().

> ISTM all such functions
> should be extended the same way for better homogeneity? That would also impact
> "waldir", "archive_status", "logical_*", "replslot" variants. "make check" ok.

I agree that makes sense, however others have expressed the opposite opinion.
https://www.postgresql.org/message-id/CALj2ACWtrt5EkHrY4WAZ4Cv42SidXAwpeQJU021bxaKpjmbGfA%40mail.gmail.com

The original motive for the patch was that pg_ls_tmpdir doesn't show shared
filesets.  This fixes that essential problem without immediately dragging
everything else along.  I think it's more likely that a committer would merge
them both.  But I don't know, and it's easy to combine patches if desired.

> This patch applies my previous advice:-) ISTM that parts 4 and 5 should be one
> single patch. The test changes show that only waldir has a test. Would it be
> possible to add minimal tests to other variants as well?  "make check" ok.

I have added tests, although some are duplicative.

> This part extends and adds a test for pg_ls_logdir. ISTM that it should
> be merged with the previous patches.  "make check" is ok.

It's seperate to allow writing a separate commit message since it does
something unrelated to the other patches.  What other patch would it would be
merged with ?
| v32-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch 

> ISTM that the documentation should be clear about windows vs unix/cygwin specific
> data provided (change/creation).

I preferred to refer to pg_stat_file rather than repeat it for all 7 functions
currently in v15, (and future functions added for new, toplevel dirs).

> # part 11
> 
> This part adds a recurse option. Why not. However, the true value does not
> seem to be tested? "make check" is ok.

WDYM the true value?  It's tested like:

+-- Exercise recursion
+select path, filename, type from pg_ls_dir_metadata('.', true, false, true) where
+path in ('base', 'base/pgsql_tmp', 'global', 'global/pg_control', 'global/pg_filenode.map', 'PG_VERSION',
'pg_multixact','pg_multixact/members', 'pg_multixact/offsets', 'pg_wal', 'pg_wal/archive_status')
 
+-- (type='d' or path~'^(global/.*|PG_VERSION|postmaster\.opts|postmaster\.pid|pg_logical/replorigin_checkpoint)$') and
filename!~'[0-9]'
+order by path collate "C", filename collate "C";
+          path          |    filename     | type 
+------------------------+-----------------+------
+ PG_VERSION             | PG_VERSION      | -
+ base                   | base            | d
+ base/pgsql_tmp         | pgsql_tmp       | d
...

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
Rebased over 9e9858389 (Michael may want to look at the tuplestore part?).

Fixing a comment typo.

I also changed pg_ls_dir_recurse() to handle concurrent removal of a dir, which
I noticed caused an infrequent failure on CI.  However I'm not including that
here, since the 2nd half of the patch set seems isn't ready due to lstat() on
windows.

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Fabien COELHO
Date:
Hello Justin,

I hope to look at it over the week-end.

-- 
Fabien Coelho - CRI, MINES ParisTech



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Fabien COELHO
Date:
Hello Justin,

Review about v34, up from v32 which I reviewed in January. v34 is an 
updated version of v32, without the part about lstat at the end of the 
series.

All 7 patches apply cleanly.

# part 01

One liner doc improvement about the directory flag.

OK.

# part 02

Add tests for various options on pg_ls_dir, and for pg_stat_file, which were not
exercised before.  "make check" is ok.

OK.

# part 03

This patch adds a new pg_ls_dir_metadata.  Internally, this is an extension of
pg_ls_dir_files function which is used by other pg_ls functions.  Doc ok.

New tests are added which check that the result columns are configured as required,
including error cases.

"make check" is ok.

OK.

# part 04

Add a new "isdir" column to "pg_ls_tmpdir" output.  This is a small behavioral
change.

I'm ok with that, however I must say that I'm still unsure why we would 
not jump directly to a "type" char column.  What is wrong with outputing 
'd' or '-' instead of true or false? This way, the interface needs not 
change if "lstat" is used later? ISTM that the interface issue should be 
somehow independent of the implementation issue, and we should choose 
directly the right/best interface?

Independently, the documentation may be clearer about what happens to 
"isdir" when the file is a link? It may say that the behavior may change 
in the future?

About homogeneity, I note that some people may be against adding "isdir" 
to other ls functions. I must say that I cannot see a good argument not to 
do it, and that I hate dealing with systems which are not homogeneous 
because it creates surprises and some loss of time.

"make check" ok.

OK.

# part 05

Add isdir to other ls functions. Doc is updated.

Same as above, I'd prefer a char instead of a bool, as it is more extendable and
future-proof.

OK.

# part 06

This part extends and adds a test for pg_ls_logdir.
"make check" is ok.

OK.

# part 07

This part extends pg_stat_file with more date informations.

"make check" is ok.

OK.

# doc

Overall doc generation is OK.

-- 
Fabien.



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Michael Paquier
Date:
On Wed, Mar 09, 2022 at 10:50:45AM -0600, Justin Pryzby wrote:
> Rebased over 9e9858389 (Michael may want to look at the tuplestore part?).

Are you referring to the contents of 0003 here that changes the
semantics of pg_ls_dir_files() regarding its setup call?
--
Michael

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Sun, Mar 13, 2022 at 09:45:35AM +0900, Michael Paquier wrote:
> On Wed, Mar 09, 2022 at 10:50:45AM -0600, Justin Pryzby wrote:
> > Rebased over 9e9858389 (Michael may want to look at the tuplestore part?).
> 
> Are you referring to the contents of 0003 here that changes the
> semantics of pg_ls_dir_files() regarding its setup call?

Yes, as it has this:

-       SetSingleFuncCall(fcinfo, SRF_SINGLE_USE_EXPECTED);
                                                                                                         
 
...
-       SetSingleFuncCall(fcinfo, 0);
                                                                                                         
 
...
+       if (flags & LS_DIR_METADATA)
                                                                                                         
 
+               SetSingleFuncCall(fcinfo, 0);
                                                                                                         
 
+       else
                                                                                                         
 
+               SetSingleFuncCall(fcinfo, SRF_SINGLE_USE_EXPECTED);
                                                                                                         
 

-- 
Justin



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Michael Paquier
Date:
On Wed, Mar 09, 2022 at 10:50:45AM -0600, Justin Pryzby wrote:
> I also changed pg_ls_dir_recurse() to handle concurrent removal of a dir, which
> I noticed caused an infrequent failure on CI.  However I'm not including that
> here, since the 2nd half of the patch set seems isn't ready due to lstat() on
> windows.

lstat() has been a subject of many issues over the years with our
internal emulation and issues related to its concurrency, but we use
it in various areas of the in-core code, so that does not sound like
an issue to me.  It depends on what you want to do with it in
genfile.c and which data you'd expect, in addition to the detection of
junction points for WIN32, I guess.  v34 has no references to
pg_ls_dir_recurse(), but that's a WITH RECURSIVE, so we would not
really need it, do we?

@@ -27618,7 +27618,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
         Returns a record containing the file's size, last access time stamp,
         last modification time stamp, last file status change time stamp (Unix
         platforms only), file creation time stamp (Windows only), and a flag
-        indicating if it is a directory.
+        indicating if it is a directory (or a symbolic link to a directory).
        </para>
        <para>
         This function is restricted to superusers by default, but other users

This is from 0001, and this addition in the documentation is not
completely right.  As pg_stat_file() uses stat() to get back the
information of a file/directory, we'd just follow the link if
specifying one in the input argument.  We could say instead, if we
were to improve the docs, that "If filename is a link, this function
returns information about the file or directory the link refers to."
I would put that as a different paragraph.

+select * from pg_ls_archive_statusdir() limit 0;
+ name | size | modification
+------+------+--------------
+(0 rows)

FWIW, this one is fine as of ValidateXLOGDirectoryStructure() that
would make sure archive_status exists before any connection is
attempted to the cluster.

> +select * from pg_ls_logdir() limit 0;

This test on pg_ls_logdir() would fail if running installcheck on a
cluster that has logging_collector disabled.  So this cannot be
included.

+select * from pg_ls_logicalmapdir() limit 0;
+select * from pg_ls_logicalsnapdir() limit 0;
+select * from pg_ls_replslotdir('') limit 0;
+select * from pg_ls_tmpdir() limit 0;
+select * from pg_ls_waldir() limit 0;
+select * from pg_stat_file('.') limit 0;

The rest of the patch set should be stable AFAIK, there are various
steps when running a checkpoint that makes sure that any of these
exist, without caring about the value of wal_level.

+       <para>
+        For each file in the specified directory, list the file and its
+        metadata.
+        Restricted to superusers by default, but other users can be granted
+        EXECUTE to run the function.
+       </para></entry>

What is metadata in this case?  (I have read the code and know what
you mean, but folks only looking at the documentation may be puzzled
by that).  It could be cleaner to use the same tupledesc for any
callers of this function, and return NULL in cases these are not
adapted.

+   /* check the optional arguments */
+   if (PG_NARGS() == 3)
+   {
+       if (!PG_ARGISNULL(1))
+       {
+           if (PG_GETARG_BOOL(1))
+               flags |= LS_DIR_MISSING_OK;
+           else
+               flags &= ~LS_DIR_MISSING_OK;
+       }
+
+       if (!PG_ARGISNULL(2))
+       {
+           if (PG_GETARG_BOOL(2))
+               flags &= ~LS_DIR_SKIP_DOT_DIRS;
+           else
+               flags |= LS_DIR_SKIP_DOT_DIRS;
+       }
+   }

The subtle different between the false and true code paths of those
arguments 1 and 2 had better be explained?  The bit-wise operations
are slightly different in both cases, so it is not clear which part
does what, and what's the purpose of this logic.

-   SetSingleFuncCall(fcinfo, 0);
+   /* isdir depends on metadata */
+   Assert(!(flags&LS_DIR_ISDIR) || (flags&LS_DIR_METADATA));
+   /* Unreasonable to show isdir and skip dirs */
+   Assert(!(flags&LS_DIR_ISDIR) || !(flags&LS_DIR_SKIP_DIRS));

Incorrect code format.  Spaces required.

+-- This tests the missing_ok parameter, which causes pg_ls_tmpdir to
succeed even if the tmpdir doesn't exist yet
+-- The name='' condition is never true, so the function runs to
completion but returns zero rows.
+-- The query is written to ERROR if the tablespace doesn't exist,
rather than silently failing to call pg_ls_tmpdir()
+SELECT c.* FROM (SELECT oid FROM pg_tablespace b WHERE
b.spcname='regress_tblspace' UNION SELECT 0 ORDER BY 1 DESC LIMIT 1)
AS b , pg_ls_tmpdir(oid) AS c WHERE c.name='Does not exist';

So, here, we have a test that may not actually test what we want to
test.

Hmm.  I am not convinced that we need a new set of SQL functions as
presented in 0003 (addition of meta-data in pg_ls_dir()), and
extensions of 0004 (do the same but for pg_ls_tmpdir) and 0005 (same
for the other pg_ls* functions).  The changes of pg_ls_dir_files()
make in my opinion the code harder to follow as the resulting tuple
size depends on the type of flag used, and you can already retrieve
the rest of the information with a join, probably LATERAL, on
pg_stat_file(), no?  The same can be said about 0007, actually.

The addition of isdir for any of the paths related to pg_logical/ and
the replication slots has also a limited interest, because we know
already those contents, and these are not directories as far as I
recall.

0006 invokes a behavior change for pg_ls_logdir(), where it makes
sense to me to fail if the directory does not exist, so I am not in
favor of that.

In the whole set, improving the docs as of 0001 makes sense, but the
change is incomplete.  Most of 0002 also makes sense and should be
stable enough.  I am less enthusiastic about any of the other changes
proposed and what we can gain from these parts.
--
Michael

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Michael Paquier
Date:
On Mon, Mar 14, 2022 at 01:53:54PM +0900, Michael Paquier wrote:
> +select * from pg_ls_logicalmapdir() limit 0;
> +select * from pg_ls_logicalsnapdir() limit 0;
> +select * from pg_ls_replslotdir('') limit 0;
> +select * from pg_ls_tmpdir() limit 0;
> +select * from pg_ls_waldir() limit 0;
> +select * from pg_stat_file('.') limit 0;
>
> The rest of the patch set should be stable AFAIK, there are various
> steps when running a checkpoint that makes sure that any of these
> exist, without caring about the value of wal_level.

I was contemplating at 0002 this morning, so see which parts would be
independently useful, and got reminded that we already check the
execution of all those functions in other regression tests, like
test_decoding for the replication slot ones and misc_functions.sql for
the more critical ones, so those extra queries would be just
interesting to check the shape of their SRFs, which is related to the
other patches of the set and limited based on my arguments from
yesterday.

There was one thing that stood out though: there was nothing for the
two options of pg_ls_dir(), called missing_ok and include_dot_dirs.
missing_ok is embedded in one query of pg_rewind, but this is a
counter-measure against concurrent file removals so we cannot rely on
pg_rewind to check that.  And the second option was not run at all.

I have extracted both test cases after rewriting them a bit, and
applied that separately.
--
Michael

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Mon, Mar 14, 2022 at 01:53:54PM +0900, Michael Paquier wrote:
> On Wed, Mar 09, 2022 at 10:50:45AM -0600, Justin Pryzby wrote:
> > I also changed pg_ls_dir_recurse() to handle concurrent removal of a dir, which
> > I noticed caused an infrequent failure on CI.  However I'm not including that
> > here, since the 2nd half of the patch set seems isn't ready due to lstat() on
> > windows.
> 
> lstat() has been a subject of many issues over the years with our
> internal emulation and issues related to its concurrency, but we use
> it in various areas of the in-core code, so that does not sound like
> an issue to me.  It depends on what you want to do with it in
> genfile.c and which data you'd expect, in addition to the detection of
> junction points for WIN32, I guess.

To avoid cycles, a recursion function would need to know whether to recurse
into a directory or to output that something is isdir=false or type=link, and
avoid recursing into it. 

> pg_ls_dir_recurse(), but that's a WITH RECURSIVE, so we would not
> really need it, do we?

Tom disliked it when I had written it as a recursive CTE, so I rewrote it in C.
129225.1606166058@sss.pgh.pa.us

> Hmm.  I am not convinced that we need a new set of SQL functions as
> presented in 0003 (addition of meta-data in pg_ls_dir()), and
> extensions of 0004 (do the same but for pg_ls_tmpdir) and 0005 (same
> for the other pg_ls* functions).  The changes of pg_ls_dir_files()
> make in my opinion the code harder to follow as the resulting tuple 
> size depends on the type of flag used, and you can already retrieve
> the rest of the information with a join, probably LATERAL, on
> pg_stat_file(), no?  The same can be said about 0007, actually.

Yes, one can get the same information with a lateral join (as I said 2 years
ago).  But it's more helpful to provide a function, rather than leave that to
people to figure out - possibly incorrectly, or badly, like by parsing the
output of COPY FROM PROGRAM 'ls -l'.  The query to handle tablespaces is
particularly obscure:
20200310183037.GA29065@telsasoft.com
20201223191710.GR30237@telsasoft.com

One could argue that most of the pg_ls_* functions aren't needed (including
1922d7c6e), since the same things are possible with pg_ls_dir() and
pg_stat_file().
|1922d7c6e Add SQL functions to monitor the directory contents of replication slots

The original, minimal goal of this patch was to show shared tempdirs in
pg_ls_tmpfile() - rather than hiding them misleadingly as currently happens.
20200310183037.GA29065@telsasoft.com
20200313131232.GO29065@telsasoft.com

I added the metadata function 2 years ago since it's silly to show metadata for
tmpdir but not other, arbitrary directories.
20200310183037.GA29065@telsasoft.com
20200313131232.GO29065@telsasoft.com
20201223191710.GR30237@telsasoft.com

> The addition of isdir for any of the paths related to pg_logical/ and
> the replication slots has also a limited interest, because we know
> already those contents, and these are not directories as far as I
> recall.

Except when we don't, since extensions can do things that core doesn't, as
Fabien pointed out.
alpine.DEB.2.21.2001160927390.30419@pseudo

> In the whole set, improving the docs as of 0001 makes sense, but the
> change is incomplete.  Most of 0002 also makes sense and should be
> stable enough.  I am less enthusiastic about any of the other changes
> proposed and what we can gain from these parts.

It is frustrating to hear this feedback now, after the patch has gone through
multiple rewrites over 2 years - based on other positive feedback and review.
I went to the effort to ask, numerous times, whether to write the patch and how
its interfaces should look.  Now, I'm hearing that not only the implementation
but its goals are wrong.  What should I have done to avoid that ?

20200503024215.GJ28974@telsasoft.com
20191227195918.GF12890@telsasoft.com
20200116003924.GJ26045@telsasoft.com
20200908195126.GB18552@telsasoft.com



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Michael Paquier
Date:
On Mon, Mar 14, 2022 at 09:37:25PM -0500, Justin Pryzby wrote:
> One could argue that most of the pg_ls_* functions aren't needed (including
> 1922d7c6e), since the same things are possible with pg_ls_dir() and
> pg_stat_file().
> |1922d7c6e Add SQL functions to monitor the directory contents of replication slots

This main argument behind this one is monitoring, as the execution to
those functions can be granted at a granular level depending on the
roles doing the disk space lookups.
--
Michael

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Andres Freund
Date:
Hi,

On 2022-03-09 10:50:45 -0600, Justin Pryzby wrote:
> Rebased over 9e9858389 (Michael may want to look at the tuplestore part?).

Doesn't apply cleanly anymore: http://cfbot.cputube.org/patch_37_2377.log

Marked as waiting-on-author.

Greetings,

Andres Freund



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Michael Paquier
Date:
On Mon, Mar 21, 2022 at 06:28:28PM -0700, Andres Freund wrote:
> Doesn't apply cleanly anymore: http://cfbot.cputube.org/patch_37_2377.log
>
> Marked as waiting-on-author.

FWIW, per my review the bit of the patch set that I found the most
relevant is the addition of a note in the docs of pg_stat_file() about
the case where "filename" is a link, because the code internally uses
stat().   The function name makes that obvious, but that's not
commonly known, I guess.  Please see the attached, that I would be
fine to apply.
--
Michael

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Michael Paquier
Date:
On Wed, Mar 23, 2022 at 03:17:35PM +0900, Michael Paquier wrote:
> FWIW, per my review the bit of the patch set that I found the most
> relevant is the addition of a note in the docs of pg_stat_file() about
> the case where "filename" is a link, because the code internally uses
> stat().   The function name makes that obvious, but that's not
> commonly known, I guess.  Please see the attached, that I would be
> fine to apply.

Hmm.  I am having second thoughts on this one, as on Windows we rely
on GetFileInformationByHandle() for the emulation of stat() in
win32stat.c, and it looks like this returns some information about the
junction point and not the directory or file this is pointing to, it
seems.  At the end, it looks better to keep things simple here, so
let's drop it.
--
Michael

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Sat, Mar 26, 2022 at 08:23:54PM +0900, Michael Paquier wrote:
> On Wed, Mar 23, 2022 at 03:17:35PM +0900, Michael Paquier wrote:
> > FWIW, per my review the bit of the patch set that I found the most
> > relevant is the addition of a note in the docs of pg_stat_file() about
> > the case where "filename" is a link, because the code internally uses
> > stat().   The function name makes that obvious, but that's not
> > commonly known, I guess.  Please see the attached, that I would be
> > fine to apply.
> 
> Hmm.  I am having second thoughts on this one, as on Windows we rely
> on GetFileInformationByHandle() for the emulation of stat() in
> win32stat.c, and it looks like this returns some information about the
> junction point and not the directory or file this is pointing to, it
> seems.

Where did you find that ?  What metadata does it return about the junction
point ?  We only care about a handful of fields.



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Mon, Mar 14, 2022 at 09:37:25PM -0500, Justin Pryzby wrote:
> The original, minimal goal of this patch was to show shared tempdirs in
> pg_ls_tmpfile() - rather than hiding them misleadingly as currently happens.
> 20200310183037.GA29065@telsasoft.com
> 20200313131232.GO29065@telsasoft.com
> 
> I added the metadata function 2 years ago since it's silly to show metadata for
> tmpdir but not other, arbitrary directories.
> 20200310183037.GA29065@telsasoft.com
> 20200313131232.GO29065@telsasoft.com
> 20201223191710.GR30237@telsasoft.com

I renamed the CF entry to make even more clear the original motive for the
patches (I'm not maintaining the patch to add the metadata function just to
avoid writing a lateral join).

> > In the whole set, improving the docs as of 0001 makes sense, but the
> > change is incomplete.  Most of 0002 also makes sense and should be
> > stable enough.  I am less enthusiastic about any of the other changes
> > proposed and what we can gain from these parts.
> 
> It is frustrating to hear this feedback now, after the patch has gone through
> multiple rewrites over 2 years - based on other positive feedback and review.
> I went to the effort to ask, numerous times, whether to write the patch and how
> its interfaces should look.  Now, I'm hearing that not only the implementation
> but its goals are wrong.  What should I have done to avoid that ?
> 
> 20200503024215.GJ28974@telsasoft.com
> 20191227195918.GF12890@telsasoft.com
> 20200116003924.GJ26045@telsasoft.com
> 20200908195126.GB18552@telsasoft.com

Michael said he's not enthusiastic about the patch.  But I haven't heard a
suggestion about how else to address the issue that pg_ls_tmpdir() hides shared
filesets.

On Mon, Mar 28, 2022 at 09:13:52PM -0500, Justin Pryzby wrote:
> On Sat, Mar 26, 2022 at 08:23:54PM +0900, Michael Paquier wrote:
> > On Wed, Mar 23, 2022 at 03:17:35PM +0900, Michael Paquier wrote:
> > > FWIW, per my review the bit of the patch set that I found the most
> > > relevant is the addition of a note in the docs of pg_stat_file() about
> > > the case where "filename" is a link, because the code internally uses
> > > stat().   The function name makes that obvious, but that's not
> > > commonly known, I guess.  Please see the attached, that I would be
> > > fine to apply.
> > 
> > Hmm.  I am having second thoughts on this one, as on Windows we rely
> > on GetFileInformationByHandle() for the emulation of stat() in
> > win32stat.c, and it looks like this returns some information about the
> > junction point and not the directory or file this is pointing to, it
> > seems.
> 
> Where did you find that ?  What metadata does it return about the junction
> point ?  We only care about a handful of fields.

Pending your feedback, I didn't modify this beyond your original suggestion -
which seemed like a good one.

This also adds some comments you requested and fixes your coding style
complaints, and causes cfbot to test my proposed patch rather than your doc
patch.

-- 
Justin

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Greg Stark
Date:
The cfbot is failing testing this patch. It seems... unlikely given
the nature of the patch modifying an admin function that doesn't even
modify the database that it should be breaking a streaming test.
Perhaps the streaming test is using this function in the testing
scaffolding?

[03:19:29.564] # Failed test 'regression tests pass'
[03:19:29.564] # at t/027_stream_regress.pl line 76.
[03:19:29.564] # got: '256'
[03:19:29.564] # expected: '0'
[03:19:29.564] # Looks like you failed 1 test of 5.
[03:19:29.565] [03:19:27] t/027_stream_regress.pl ..............
[03:19:29.565] Dubious, test returned 1 (wstat 256, 0x100)
[03:19:29.565] Failed 1/5 subtests



Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
This thread has been going for 2.5 years, so here's a(nother) recap.

This omits the patches for recursion, since they're optional and evidently a
distraction from the main patches.

On Fri, Dec 27, 2019 at 11:02:20AM -0600, Justin Pryzby wrote:
> The goal is to somehow show tmpfiles (or at least dirs) used by parallel
> workers.

On Thu, Jan 16, 2020 at 08:38:46AM -0600, Justin Pryzby wrote:
> I think if someone wants the full generality, they can do this:
> 
> postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name,
pg_stat_file(p||'/'||name)s;
>  name | size |      modification      | isdir 
> ------+------+------------------------+-------
>  .foo | 4096 | 2020-01-16 08:57:04-05 | t
> 
> In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to
> SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces:
> WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no)
suffixFROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT
COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix,'base') a FROM pg_tablespace,x)a) SELECT a,
pg_ls_dir(a||'/pgsql_tmp')FROM y WHERE d='pgsql_tmp';
 

On Tue, Mar 10, 2020 at 01:30:37PM -0500, Justin Pryzby wrote:
> I took a step back, and I wondered whether we should add a generic function for
> listing a dir with metadata, possibly instead of changing the existing
> functions.  Then one could do pg_ls_dir_metadata('pg_wal',false,false);
> 
> Since pg8.1, we have pg_ls_dir() to show a list of files.  Since pg10, we've
> had pg_ls_logdir and pg_ls_waldir, which show not only file names but also
> (some) metadata (size, mtime).  And since pg12, we've had pg_ls_tmpfile and
> pg_ls_archive_statusdir, which also show metadata.
> 
> ...but there's no a function which lists the metadata of an directory other
> than tmp, wal, log.
> 
> One can do this:
> |SELECT b.*, c.* FROM (SELECT 'base' a)a, LATERAL (SELECT a||'/'||pg_ls_dir(a.a)b)b, pg_stat_file(b)c;
> ..but that's not as helpful as allowing:
> |SELECT * FROM pg_ls_dir_metadata('.',true,true);
> 
> There's also no function which recurses into an arbitrary directory, so it
> seems shortsighted to provide a function to recursively list a tmpdir.
> 
> Also, since pg_ls_dir_metadata indicates whether the path is a dir, one can
> write a SQL function to show the dir recursively.  It'd be trivial to plug in
> wal/log/tmp (it seems like tmpdirs of other tablespace's are not entirely
> trivial).
> |SELECT * FROM pg_ls_dir_recurse('base/pgsql_tmp');

> It's pretty unfortunate if a function called
> pg_ls_tmpdir hides shared filesets, so maybe it really is best to change that
> (it's new in v12).

On Fri, Mar 13, 2020 at 08:12:32AM -0500, Justin Pryzby wrote:
> The merge conflict presents another opportunity to solicit comments on the new
> approach.  Rather than making "recurse into tmpdir" the end goal:
> 
>   - add a function to show metadata of an arbitrary dir;
>   - add isdir arguments to pg_ls_* functions (including pg_ls_tmpdir but not
>     pg_ls_dir).
>   - maybe add pg_ls_dir_recurse, which satisfies the original need;
>   - retire pg_ls_dir (does this work with tuplestore?)
>   - profit
> 
> The alternative seems to be to go back to Alvaro's earlier proposal:
>  - not only add "isdir", but also recurse;
> 
> I think I would insist on adding a general function to recurse into any dir.
> And *optionally* change ps_ls_* to recurse (either by accepting an argument, or
> by making that a separate patch to debate).

On Tue, Mar 31, 2020 at 03:08:12PM -0500, Justin Pryzby wrote:
> The patch intends to fix the issue of "failing to show failed filesets"
> (because dirs are skipped) while also generalizing existing functions (to show
> directories and "isdir" column) and providing some more flexible ones (to list
> file and metadata of a dir, which is currently possible [only] for "special"
> directories, or by recursively calling pg_stat_file).

On Wed, Dec 23, 2020 at 01:17:10PM -0600, Justin Pryzby wrote:
> However, pg_ls_tmpdir is special since it handles tablespace tmpdirs, which it
> seems is not trivial to get from sql:
> 
> +SELECT * FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(b.oid),'')||suffix, 'base/pgsql_tmp') AS dir
> +FROM pg_tablespace b, pg_control_system() pcs,
> +LATERAL format('/PG_%s_%s', left(current_setting('server_version_num'), 2), pcs.catalog_version_no) AS suffix) AS
dir,
> +LATERAL pg_ls_dir_recurse(dir) AS a;
> 
> For context, the line of reasoning that led me to this patch series was
> something like this:
> 
> 0) Why can't I list shared tempfiles (dirs) using pg_ls_tmpdir() ?
> 1) Implement recursion for pg_ls_tmpdir();
> 2) Eventually realize that it's silly to implement a function to recurse into
>    one particular directory when no general feature exists;
> 3) Implement generic facility;

On Tue, Apr 06, 2021 at 11:01:31AM -0500, Justin Pryzby wrote:
> The first handful of patches address the original issue, and I think could be
> "ready":
> 
> $ git log --oneline origin..pg-ls-dir-new |tac
> ... Document historic behavior of links to directories..
> ... Add tests on pg_ls_dir before changing it
> ... Add pg_ls_dir_metadata to list a dir with file metadata..
> ... pg_ls_tmpdir to show directories and "isdir" argument..
> ... pg_ls_*dir to show directories and "isdir" column..
> 
> These others are optional:
> ... pg_ls_logdir to ignore error if initial/top dir is missing..
> ... pg_ls_*dir to return all the metadata from pg_stat_file..
> 
> ..and these maybe requires more work for lstat on windows:
> ... pg_stat_file and pg_ls_dir_* to use lstat()..
> ... pg_ls_*/pg_stat_file to show file *type*..
> ... Preserve pg_stat_file() isdir..
> ... Add recursion option in pg_ls_dir_files..

On Tue, Jan 25, 2022 at 01:27:55PM -0600, Justin Pryzby wrote:
> The original motive for the patch was that pg_ls_tmpdir doesn't show shared
> filesets.

Attachment

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

From
Justin Pryzby
Date:
On Fri, Dec 13, 2019 at 03:03:47PM +1300, Thomas Munro wrote:
> > Actually, I tried using pg_ls_tmpdir(), but it unconditionally masks
> > non-regular files and thus shared filesets.  Maybe that's worth
> > discussion on a new thread ?
> >
> > src/backend/utils/adt/genfile.c
> >                 /* Ignore anything but regular files */
> >                 if (!S_ISREG(attrib.st_mode))
> >                         continue;
> 
> +1, that's worth fixing.

@cfbot: rebased on eddc128be.

Attachment
2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review", but there has been
no activity on this thread for 14+ months.

Since there seems not much interest, I have changed the status to
"Returned with Feedback" [1]. Feel free to propose a stronger use case
for the patch and add an entry for the same.

======
[1] https://commitfest.postgresql.org/46/2377/

Kind Regards,
Peter Smith.