Thread: Re: [PATCH v1] pg_ls_tmpdir to show directories
Re-added -hackers. Thanks for reviewing. On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote: > The implementation simply extends an existing functions with a boolean to > allow for sub-directories. However, the function does not seem to show > subdir contents recursively. Should it be the case? > STM that "//"-comments are not project policy. Sure, but the patch is less important than the design, which needs to be addressed first. The goal is to somehow show tmpfiles (or at least dirs) used by parallel workers. I mentioned a few possible ways, of which this was the simplest to implement. Showing files beneath the dir is probably good, but need to decide how to present it. Should there be a column for the dir (null if not a shared filesets)? Or some other presentation, like a boolean column "is_shared_fileset". > I'm unconvinced by the skipping condition: > > + if (!S_ISREG(attrib.st_mode) && > + (!dir_ok && S_ISDIR(attrib.st_mode))) > continue; > > which is pretty hard to read. ISTM you meant "not A and not (B and C)" > instead? I can write it as two ifs. And, it's probably better to say: if (!ISDIR() || !dir_ok) ..which is same as: !(B && C), as you said. > Catalog update should be a date + number? Maybe this is best left to the > committer? Yes, but I preferred to include it, causing a deliberate conflict, to ensure it's not forgotten. Thanks, Justin
> Re-added -hackers. Indeed, I left it out by accident. I tried to bounce the original mail but it did not work. > Thanks for reviewing. > > On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote: >> The implementation simply extends an existing functions with a boolean to >> allow for sub-directories. However, the function does not seem to show >> subdir contents recursively. Should it be the case? > >> STM that "//"-comments are not project policy. > > Sure, but the patch is less important than the design, which needs to be > addressed first. The goal is to somehow show tmpfiles (or at least dirs) used > by parallel workers. I mentioned a few possible ways, of which this was the > simplest to implement. Showing files beneath the dir is probably good, but > need to decide how to present it. Should there be a column for the dir (null > if not a shared filesets)? Or some other presentation, like a boolean column > "is_shared_fileset". Why not simply showing the files underneath their directories? /path/to/tmp/file1 /path/to/tmp/subdir1/file2 In which case probably showing the directory itself is not useful, and the is_dir column could be dropped? >> I'm unconvinced by the skipping condition: >> >> + if (!S_ISREG(attrib.st_mode) && >> + (!dir_ok && S_ISDIR(attrib.st_mode))) >> continue; >> >> which is pretty hard to read. ISTM you meant "not A and not (B and C)" >> instead? > > I can write it as two ifs. Hmmm. Not sure it would help that much. At least the condition must be right. Also, the comment should be updated. > And, it's probably better to say: > if (!ISDIR() || !dir_ok) I cannot say I like it. dir_ok is cheaper to test so could come first. > ..which is same as: !(B && C), as you said. Ok, so you confirm that the condition was wrong. >> Catalog update should be a date + number? Maybe this is best left to >> the committer? > > Yes, but I preferred to include it, causing a deliberate conflict, to ensure > it's not forgotten. Ok. -- Fabien.
On Fri, Dec 27, 2019 at 06:50:24PM +0100, Fabien COELHO wrote: > >On Fri, Dec 27, 2019 at 05:22:47PM +0100, Fabien COELHO wrote: > >>The implementation simply extends an existing functions with a boolean to > >>allow for sub-directories. However, the function does not seem to show > >>subdir contents recursively. Should it be the case? > > > >>STM that "//"-comments are not project policy. > > > >Sure, but the patch is less important than the design, which needs to be > >addressed first. The goal is to somehow show tmpfiles (or at least dirs) used > >by parallel workers. I mentioned a few possible ways, of which this was the > >simplest to implement. Showing files beneath the dir is probably good, but > >need to decide how to present it. Should there be a column for the dir (null > >if not a shared filesets)? Or some other presentation, like a boolean column > >"is_shared_fileset". > > Why not simply showing the files underneath their directories? > > /path/to/tmp/file1 > /path/to/tmp/subdir1/file2 > > In which case probably showing the directory itself is not useful, > and the is_dir column could be dropped? The names are expected to look like this: $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls 142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 27 13:51 /var/lib/pgsql/12/data/base/pgsql_tmp 169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset 169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0 169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0 I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2. It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0. Actually the results should be unique, either on filename or (dir,file). "ls" wouldn't list same name twice, unless you list multiple dirs, like: |ls a/b c/d. It's worth thinking if subdir should be a separate column. I'm interested to hear back from others. Justin
Hello Justin, >> Why not simply showing the files underneath their directories? >> >> /path/to/tmp/file1 >> /path/to/tmp/subdir1/file2 >> >> In which case probably showing the directory itself is not useful, >> and the is_dir column could be dropped? > > The names are expected to look like this: > > $ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls > 142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 27 13:51 /var/lib/pgsql/12/data/base/pgsql_tmp > 169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset > 169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0 > 169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0 > > I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2. > It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0. > Actually the results should be unique, either on filename or (dir,file). Ok, so this suggests recursing into subdirs, which requires to make a separate function of the inner loop. > It's worth thinking if subdir should be a separate column. My 0.02 €: I would rather simply keep the full path and just add subdir contents, so that the function output does not change at all. -- Fabien.
On Sat, Dec 28, 2019 at 07:52:55AM +0100, Fabien COELHO wrote: > >>Why not simply showing the files underneath their directories? > >> > >> /path/to/tmp/file1 > >> /path/to/tmp/subdir1/file2 > >> > >>In which case probably showing the directory itself is not useful, > >>and the is_dir column could be dropped? > > > >The names are expected to look like this: > > > >$ sudo find /var/lib/pgsql/12/data/base/pgsql_tmp -ls > >142977 4 drwxr-x--- 3 postgres postgres 4096 Dec 27 13:51 /var/lib/pgsql/12/data/base/pgsql_tmp > >169868 4 drwxr-x--- 2 postgres postgres 4096 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset > >169347 5492 -rw-r----- 1 postgres postgres 5619712 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/0.0 > >169346 5380 -rw-r----- 1 postgres postgres 5505024 Dec 7 01:35 /var/lib/pgsql/12/data/base/pgsql_tmp/pgsql_tmp11025.0.sharedfileset/1.0 > > > >I think we'd have to show sudbdir/file1, subdir/file2, not just file1, file2. > >It doesn't seem useful or nice to show a bunch of files called 0.0 or 1.0. > >Actually the results should be unique, either on filename or (dir,file). > > Ok, so this suggests recursing into subdirs, which requires to make a > separate function of the inner loop. Yea, it suggests that; but, SRF_RETURN_NEXT doesn't make that very easy. It'd need to accept the fcinfo argument, and pg_ls_dir_files would call it once for every tuple to be returned. So it's recursive and saves its state... The attached is pretty ugly, but I can't see how to do better. The alternative seems to be to build up a full list of pathnames in the SRF initial branch, and stat them all later. Or stat them all in the INITIAL case, and keep a list of stat structures to be emited during future calls. BTW, it seems to me this error message should be changed: snprintf(path, sizeof(path), "%s/%s", fctx->location, de->d_name); if (stat(path, &attrib) < 0) ereport(ERROR, (errcode_for_file_access(), - errmsg("could not stat directory \"%s\": %m", dir))); + errmsg("could not stat file \"%s\": %m", path)));
Attachment
Hello Justin, >> Ok, so this suggests recursing into subdirs, which requires to make a >> separate function of the inner loop. > > Yea, it suggests that; but, SRF_RETURN_NEXT doesn't make that very easy. > It'd need to accept the fcinfo argument, and pg_ls_dir_files would call it once > for every tuple to be returned. So it's recursive and saves its state... > > The attached is pretty ugly, but I can't see how to do better. Hmmm… I do agree with pretty ugly:-) If it really neads to be in the structure, why not save the open directory field in the caller and restore it after the recursive call, and pass the rest of the structure as a pointer? Something like: ... root_function(...) { struct status_t status = initialization(); ... call rec_function(&status, path) ... cleanup(); } ... rec_function(struct *status, path) { status->dir = opendir(path); for (dir contents) { if (it_is_a_dir) { /* some comment about why we do that… */ dir_t saved = status->dir; status->dir = NULL; rec_function(status, subpath); status->dir = saved; } } closedir(status->dir), status->dir = NULL; } > The alternative seems to be to build up a full list of pathnames in the SRF > initial branch, and stat them all later. Or stat them all in the INITIAL case, > and keep a list of stat structures to be emited during future calls. -- Fabien.
Here's a version which uses an array of directory_fctx, rather than of DIR and location. That avoids changing the data structure and collatoral implications to pg_ls_dir(). Currently, this *shows* subdirs of subdirs, but doesn't decend into them. So I think maybe toplevel subdirs should be shown, too. And maybe the is_dir flag should be re-introduced (although someone could call pg_stat_file if needed). I'm interested to hear feedback on that, although this patch still isn't great.
Attachment
Hello Justin, About this v4. It applies cleanly. I'm trying to think about how to get rid of the strange structure and hacks, and the arbitrary looking size 2 array. Also the recursion is one step, but I'm not sure why, ISTM it could/should go on always? Maybe the recursive implementation was not such a good idea, if I suggested it is because I did not noticed the "return next" re-entrant stuff, shame on me. Looking at the code, ISTM that relying on a stack/list would be much cleaner and easier to understand. The code could look like: ls_func() if (first_time) initialize description set next directory to visit while (1) if next directory init/push next directory to visit as current read current directory if emty pop/close current directory elif is_a_dir and recursion allowed set next directory to visit else ... return next tuple cleanup The point is to avoid a hack around the directory_fctx array, to have only one place to push/init a new directory (i.e. call AllocateDir and play around with the memory context) instead of two, and to allow deeper recursion if useful. Details : snprintf return is not checked. Maybe it should say why an overflow cannot occur. "bool nulls[3] = { 0,};" -> "bool nulls[3} = { false, false, false };"? -- Fabien.
On Wed, Jan 15, 2020 at 11:21:36AM +0100, Fabien COELHO wrote: > I'm trying to think about how to get rid of the strange structure and hacks, > and the arbitrary looking size 2 array. > > Also the recursion is one step, but I'm not sure why, ISTM it could/should > go on always? Because tmpfiles only go one level deep. > Looking at the code, ISTM that relying on a stack/list would be much cleaner > and easier to understand. The code could look like: I'm willing to change the implementation, but only after there's an agreement about the desired behavior (extra column, one level, etc). Justin
Hello Justin, >> I'm trying to think about how to get rid of the strange structure and hacks, >> and the arbitrary looking size 2 array. >> >> Also the recursion is one step, but I'm not sure why, ISTM it could/should >> go on always? > > Because tmpfiles only go one level deep. I'm not sure it is a general rule. ISTM that extensions can use tmp files, and we would have no control about what they would do there. >> Looking at the code, ISTM that relying on a stack/list would be much cleaner >> and easier to understand. The code could look like: > > I'm willing to change the implementation, but only after there's an agreement > about the desired behavior (extra column, one level, etc). For the level, ISTM that the implementation should not make this assumption. If in practice there is just one level, then the function will not recurse deep, no problem. For the column, I'm not sure that "isdir" is necessary. You could put it implicitely in the file name by ending it with "/", and/or showing the directory contents is enough a hint that there is a directory? Also, I'm not fully sure why ".*" files should be skipped, maybe it should be an option? Or the user can filter it with SQL if it does not want them? -- Fabien.
On Thu, Jan 16, 2020 at 09:34:32AM +0100, Fabien COELHO wrote: > Also, I'm not fully sure why ".*" files should be skipped, maybe it should > be an option? Or the user can filter it with SQL if it does not want them? I think if someone wants the full generality, they can do this: postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name, pg_stat_file(p||'/'||name)s; name | size | modification | isdir ------+------+------------------------+------- .foo | 4096 | 2020-01-16 08:57:04-05 | t In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces: WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no) suffix FROMpg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix,'base') a FROM pg_tablespace,x)a) SELECT a, pg_ls_dir(a||'/pgsql_tmp')FROM y WHERE d='pgsql_tmp'; I think changing dotfiles is topic for another patch. That would also affect pg_ls_dir, and everything else that uses the backing function pg_ls_dir_files_recurse. I'd have to ask why not also show . and .. ? (In fact, if I were to change anything, I would propose to limit pg_ls_tmpdir() to files matching PG_TEMP_FILE_PREFIX). Justin
Hi Fabien, On 1/16/20 9:38 AM, Justin Pryzby wrote: > On Thu, Jan 16, 2020 at 09:34:32AM +0100, Fabien COELHO wrote: >> Also, I'm not fully sure why ".*" files should be skipped, maybe it should >> be an option? Or the user can filter it with SQL if it does not want them? > > I think if someone wants the full generality, they can do this: > > postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name, pg_stat_file(p||'/'||name)s; > name | size | modification | isdir > ------+------+------------------------+------- > .foo | 4096 | 2020-01-16 08:57:04-05 | t > > In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to > SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces: > WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no) suffixFROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix,'base') a FROM pg_tablespace,x)a) SELECT a, pg_ls_dir(a||'/pgsql_tmp')FROM y WHERE d='pgsql_tmp'; > > I think changing dotfiles is topic for another patch. > That would also affect pg_ls_dir, and everything else that uses the backing > function pg_ls_dir_files_recurse. I'd have to ask why not also show . and .. ? > > (In fact, if I were to change anything, I would propose to limit pg_ls_tmpdir() > to files matching PG_TEMP_FILE_PREFIX). We seem to be at an impasse on this patch. What do you think of Justin's comments here? Do you still believe a different implementation is required? Regards, -- -David david@pgmasters.net
On Tue, Mar 03, 2020 at 02:51:54PM -0500, David Steele wrote: > Hi Fabien, > > On 1/16/20 9:38 AM, Justin Pryzby wrote: > >On Thu, Jan 16, 2020 at 09:34:32AM +0100, Fabien COELHO wrote: > >>Also, I'm not fully sure why ".*" files should be skipped, maybe it should > >>be an option? Or the user can filter it with SQL if it does not want them? > > > >I think if someone wants the full generality, they can do this: > > > >postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name, pg_stat_file(p||'/'||name)s; > > name | size | modification | isdir > >------+------+------------------------+------- > > .foo | 4096 | 2020-01-16 08:57:04-05 | t > > > >In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to > >SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces: > >WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no) suffixFROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix,'base') a FROM pg_tablespace,x)a) SELECT a, pg_ls_dir(a||'/pgsql_tmp')FROM y WHERE d='pgsql_tmp'; > > > >I think changing dotfiles is topic for another patch. > >That would also affect pg_ls_dir, and everything else that uses the backing > >function pg_ls_dir_files_recurse. I'd have to ask why not also show . and .. ? > > > >(In fact, if I were to change anything, I would propose to limit pg_ls_tmpdir() > >to files matching PG_TEMP_FILE_PREFIX). > > We seem to be at an impasse on this patch. What do you think of Justin's > comments here? Actually, I found Fabien's comment regarding extensions use of tmp dir to be convincing. And I'm willing to update the patch to use a stack to show arbitrarily-deep files/dirs rather than just one level deep (as used for shared filesets in core postgres). But I don't think it makes sense to go through more implementation/review cycles without some agreement from a larger group regarding the desired/intended interface. Should there be a column for "parent dir" ? Or a column for "is_dir" ? Should dirs be shown at all, or only files ? -- Justin
On 2020-Mar-03, Justin Pryzby wrote: > But I don't think it makes sense to go through more implementation/review > cycles without some agreement from a larger group regarding the > desired/intended interface. Should there be a column for "parent dir" ? Or a > column for "is_dir" ? Should dirs be shown at all, or only files ? IMO: is_dir should be there (and subdirs should be listed), but parent_dir should not appear. Also, the "path" should show the complete pathname, including containing dirs, starting from whatever the "root" is for the operation. So for the example in the initial email, it would look like path isdir pgsql_tmp11025.0.sharedfileset/ t pgsql_tmp11025.0.sharedfileset/0.0 f pgsql_tmp11025.0.sharedfileset/1.0 f plus additional columns, same as pg_ls_waldir et al. I'd rather not have the code assume that there's a single level of subdirs, or assuming that an entry in the subdir cannot itself be a dir; that might end up hiding files for no good reason. I don't understand what purpose is served by having pg_ls_waldir() hide directories. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Mar 03, 2020 at 05:23:13PM -0300, Alvaro Herrera wrote: > On 2020-Mar-03, Justin Pryzby wrote: > > > But I don't think it makes sense to go through more implementation/review > > cycles without some agreement from a larger group regarding the > > desired/intended interface. Should there be a column for "parent dir" ? Or a > > column for "is_dir" ? Should dirs be shown at all, or only files ? > > IMO: is_dir should be there (and subdirs should be listed), but > parent_dir should not appear. Also, the "path" should show the complete > pathname, including containing dirs, starting from whatever the "root" > is for the operation. > > So for the example in the initial email, it would look like > > path isdir > pgsql_tmp11025.0.sharedfileset/ t > pgsql_tmp11025.0.sharedfileset/0.0 f > pgsql_tmp11025.0.sharedfileset/1.0 f > > plus additional columns, same as pg_ls_waldir et al. > > I'd rather not have the code assume that there's a single level of > subdirs, or assuming that an entry in the subdir cannot itself be a dir; > that might end up hiding files for no good reason. > Thanks for your input, see attached. I'm not sure if prefer the 0002 patch alone (which recurses into dirs all at once during the initial call), or 0002+3+4, which incrementally reads the dirs on each call (but requires keeping dirs opened). > I don't understand what purpose is served by having pg_ls_waldir() hide > directories. We could talk about whether the other functions should show dirs, if it's worth breaking their return type. Or if they should show hidden or special files, which doesn't require breaking the return. But until then I am to leave the behavior alone. -- Justin
Attachment
On Thu, Mar 05, 2020 at 10:18:38AM -0600, Justin Pryzby wrote: > I'm not sure if prefer the 0002 patch alone (which recurses into dirs all at > once during the initial call), or 0002+3+4, which incrementally reads the dirs > on each call (but requires keeping dirs opened). I fixed an issue that leading dirs were being shown which should not have been, which was easier in the 0004 patch, so squished. And fixed a bug that "special" files weren't excluded, and "missing_ok" wasn't effective. > > I don't understand what purpose is served by having pg_ls_waldir() hide > > directories. > > We could talk about whether the other functions should show dirs, if it's worth > breaking their return type. Or if they should show hidden or special files, > which doesn't require breaking the return. But until then I am to leave the > behavior alone. I don't see why any of the functions would exclude dirs, but ls_tmpdir deserves to be fixed since core postgres dynamically creates dirs there. Also ... I accidentally changed the behavior: master not only doesn't decend into dirs, it hides them - that was my original complaint. I propose to *also* change at least tmpdir and logdir to show dirs, but don't decend. I left waldir alone for now. Since v12 ls_tmpdir and since v10 logdir and waldir exclude dirs, I think we should backpatch documentation to say so. ISTM pg_ls_tmpdir and ls_logdir should be called with missing_ok=true, since they're not created until they're used. -- Justin
Attachment
- v7-0001-BUG-in-errmsg.patch
- v7-0002-Document-historic-behavior-about-hiding-directori.patch
- v7-0003-Document-historic-behavior-about-hiding-directori.patch
- v7-0004-pg_ls_tmpdir-to-show-directories.patch
- v7-0005-Change-logdir-and-archive_statusdir-to-include-di.patch
- v7-0006-Change-pg_ls_logdir-to-ignore-error-if-initial-to.patch
Hello Justin, Some feedback about the v7 patch set. About v7.1, seems ok. About v7.2 & v7.3 seems ok, altought the two could be merged. About v7.4: The documentation sentences could probably be improved "for for", "used ... used". Maybe: For the temporary directory for <parameter>tablespace</parameter>, ... -> For <parameter>tablespace</parameter> temporary directory, ... Directories are used for temporary files used by parallel processes, and are shown recursively. -> Directories holding temporary files used by parallel processes are shown recursively. It seems that lists are used as FIFO structures by appending, fetching & deleting last, all of which are O(n). ISTM it would be better to use the head of the list by inserting, getting and deleting first, which are O(1). ISTM that several instances of: "pg_ls_dir_files(..., true, false);" should be "pg_ls_dir_files(..., true, DIR_HIDE);". About v7.5 looks like a doc update which should be merged with v7.4. Alas, ISTM that there are no tests on any of these functions:-( -- Fabien.
On Sat, Mar 07, 2020 at 03:14:37PM +0100, Fabien COELHO wrote: > Some feedback about the v7 patch set. Thanks for looking again > About v7.1, seems ok. > > About v7.2 & v7.3 seems ok, altought the two could be merged. These are separate since I proprose that one should be backpatched to v12 and the other to v10. > About v7.4: ... > It seems that lists are used as FIFO structures by appending, fetching & > deleting last, all of which are O(n). ISTM it would be better to use the > head of the list by inserting, getting and deleting first, which are O(1). I think you're referring to linked lists, but pglists are now arrays, for which that's backwards. See 1cff1b95a and d97b714a2. For example, list_delete_last says: * This is the opposite of list_delete_first(), but is noticeably cheaper * with a long list, since no data need be moved. > ISTM that several instances of: "pg_ls_dir_files(..., true, false);" should > be "pg_ls_dir_files(..., true, DIR_HIDE);". Oops, that affects an intermediate commit and maybe due to merge conflict. Thanks. > About v7.5 looks like a doc update which should be merged with v7.4. No, v7.5 updates pg_proc.dat and changes the return type of two functions. It's a short commit since all the infrastructure is implemented to make the functions do whatever we want. But it's deliberately separate since I'm proposing a breaking change, and one that hasn't been discussed until now. > Alas, ISTM that there are no tests on any of these functions:-( Yeah. Everything that includes any output is going to include timestamps; those could be filtered out. waldir is going to have random filenames, and a differing number of rows. But we should exercize pg_ls_dir_files at least once.. My previous version had a bug with ignore_missing with pg_ls_tmpdir, which would've been caught by a test like: SELECT FROM pg_ls_tmpdir() WHERE name='Does not exist'; -- Never true, so the function runs to completion but returns zerorows. The 0006 commit changes that for logdir, too. Without 0006, that will ERROR if the dir doesn't exist (which I think would be the default during regression tests). It'd be nice to run pg_ls_tmpdir before the tmpdir exists, and again afterwards. But I'm having trouble finding a single place to put it. The closest I can find is dbsize.sql. Any ideas ? -- Justin
>> It seems that lists are used as FIFO structures by appending, fetching & >> deleting last, all of which are O(n). ISTM it would be better to use the >> head of the list by inserting, getting and deleting first, which are O(1). > > I think you're referring to linked lists, but pglists are now arrays, Ok… I forgot about this change, so my point is void, you took the right one. -- Fabien.
On Sat, Mar 07, 2020 at 03:14:37PM +0100, Fabien COELHO wrote: > The documentation sentences could probably be improved "for for", "used ... > used". Maybe: > ISTM that several instances of: "pg_ls_dir_files(..., true, false);" should > be "pg_ls_dir_files(..., true, DIR_HIDE);". > Alas, ISTM that there are no tests on any of these functions:-( Addressed these. And reordered the last two commits to demonstrate and exercize the behavior change in regress test. -- Justin
Attachment
- v8-0001-BUG-in-errmsg.patch
- v8-0002-Document-historic-behavior-about-hiding-directori.patch
- v8-0003-Document-historic-behavior-about-hiding-directori.patch
- v8-0004-pg_ls_tmpdir-to-show-directories-recursively.patch
- v8-0005-Change-pg_ls_logdir-to-ignore-error-if-initial-to.patch
- v8-0006-Change-logdir-and-archive_statusdir-to-include-di.patch
Hello Justin, Patch series applies cleanly. The last status compiles and passes "make check". A few more comments: * v8.[123] ok. * v8.4 Avoid using the type name as a field name? "enum dir_action dir_action;" -> "enum dir_action action", or maybe rename "dir_action" enum "dir_action_t". About pg_ls_dir: "if (!fctx->dirdesc)" I do not think that is true even if AllocateDir failed, the list exists anyway. ISTM it should be linitial which is NULL in that case. Given the overlap between pg_ls_dir and pg_ls_dir_files, ISTM that the former should call the slightly extended later with appropriate flags. About populate_paths: function is a little bit strange to me, ISTM it would deserve more comments. I'm not sure the name reflect what it does. For instance, ISTM that it does one thing, but the name is plural. Maybe "move_to_next_path" or "update_current_path" or something? It returns an int which can only be 0 or 1, which smells like an bool. What is this int/bool is not told in the function head comment. I guess it is whether the path was updated. When it returns false, the list length is down to one. Shouldn't AllocateDir be tested for bad result? Maybe it is a dir but you do not have perms to open it? Or give a comment about why it cannot happen? later, good, at least the function is called, even if it is only for an error case. Maybe some non empty coverage tests could be added with a "count(*) > 0" on not is_dir or maybe "count(*) = 0" on is_dir, for instance? (SELECT oid FROM pg_tablespace b WHERE b.spcname='regress_tblspace' UNION SELECT 0 ORDER BY 1 DESC LIMIT 1)b The 'b' glued to the ')' looks pretty strange. I'd suggest ") AS b". Reusing the same alias twice could be avoided for clarity, maybe. * v8.[56] I'd say that a behavior change which adds a column and a possibly a few rows is ok, especially as the tmpdir contains subdirs now. -- Fabien.
I took a step back, and I wondered whether we should add a generic function for listing a dir with metadata, possibly instead of changing the existing functions. Then one could do pg_ls_dir_metadata('pg_wal',false,false); Since pg8.1, we have pg_ls_dir() to show a list of files. Since pg10, we've had pg_ls_logdir and pg_ls_waldir, which show not only file names but also (some) metadata (size, mtime). And since pg12, we've had pg_ls_tmpfile and pg_ls_archive_statusdir, which also show metadata. ...but there's no a function which lists the metadata of an directory other than tmp, wal, log. One can do this: |SELECT b.*, c.* FROM (SELECT 'base' a)a, LATERAL (SELECT a||'/'||pg_ls_dir(a.a)b)b, pg_stat_file(b)c; ..but that's not as helpful as allowing: |SELECT * FROM pg_ls_dir_metadata('.',true,true); There's also no function which recurses into an arbitrary directory, so it seems shortsighted to provide a function to recursively list a tmpdir. Also, since pg_ls_dir_metadata indicates whether the path is a dir, one can write a SQL function to show the dir recursively. It'd be trivial to plug in wal/log/tmp (it seems like tmpdirs of other tablespace's are not entirely trivial). |SELECT * FROM pg_ls_dir_recurse('base/pgsql_tmp'); Also, on a neighboring thread[1], Tom indicated that the pg_ls_* functions should enumerate all files during the initial call, which sounds like a bad idea when recursively showing directories. If we add a function recursing into a directory, we'd need to discuss all the flags to expose to it, like recurse, ignore_errors, one_filesystem?, show_dotfiles (and eventually bikeshed all the rest of the flags in find(1)). My initial patch [2] changed ls_tmpdir to show metadata columns including is_dir, but not decend. It's pretty unfortunate if a function called pg_ls_tmpdir hides shared filesets, so maybe it really is best to change that (it's new in v12). I'm interested to in feedback on the alternative approach, as attached. The final patch to include all the rest of columns shown by pg_stat_file() is more of an idea/proposal and not sure if it'll be desirable. But pg_ls_tmpdir() is essentially the same as my v1 patch. This is intended to be mostly independent of any fix to the WARNING I reported [1]. Since my patch collapses pg_ls_dir into pg_ls_dir_files, we'd only need to fix one place. I'm planning to eventually look into Tom's suggestion of returning tuplestore to fix that, and maybe rebase this patchset on top of that. -- Justin [1] https://www.postgresql.org/message-id/flat/20200308173103.GC1357%40telsasoft.com [2] https://www.postgresql.org/message-id/20191214224735.GA28433%40telsasoft.com
Attachment
- v9-0001-BUG-in-errmsg.patch
- v9-0002-Document-historic-behavior-about-hiding-directori.patch
- v9-0003-Document-historic-behavior-about-hiding-directori.patch
- v9-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-me.patch
- v9-0005-Add-tests-exercizing-pg_ls_tmpdir.patch
- v9-0006-pg_ls_tmpdir-to-show-isdir-argument.patch
- v9-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v9-0008-generalize-pg_ls_dir_files-and-retire-pg_ls_dir.patch
- v9-0009-pg_ls_logdir-to-ignore-error-if-initial-top-dir-i.patch
- v9-0010-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v9-0011-pg_ls_-dir-to-return-all-the-metadata-from-pg_sta.patch
@cfbot: rebased onto 085b6b6679e73b9b386f209b4d625c7bc60597c0 The merge conflict presents another opportunity to solicit comments on the new approach. Rather than making "recurse into tmpdir" the end goal: - add a function to show metadata of an arbitrary dir; - add isdir arguments to pg_ls_* functions (including pg_ls_tmpdir but not pg_ls_dir). - maybe add pg_ls_dir_recurse, which satisfies the original need; - retire pg_ls_dir (does this work with tuplestore?) - profit The alternative seems to be to go back to Alvaro's earlier proposal: - not only add "isdir", but also recurse; I think I would insist on adding a general function to recurse into any dir. And *optionally* change ps_ls_* to recurse (either by accepting an argument, or by making that a separate patch to debate). tuplestore is certainly better than keeping a stack/List of DIRs for this. On Tue, Mar 10, 2020 at 01:30:37PM -0500, Justin Pryzby wrote: > I took a step back, and I wondered whether we should add a generic function for > listing a dir with metadata, possibly instead of changing the existing > functions. Then one could do pg_ls_dir_metadata('pg_wal',false,false); > > Since pg8.1, we have pg_ls_dir() to show a list of files. Since pg10, we've > had pg_ls_logdir and pg_ls_waldir, which show not only file names but also > (some) metadata (size, mtime). And since pg12, we've had pg_ls_tmpfile and > pg_ls_archive_statusdir, which also show metadata. > > ...but there's no a function which lists the metadata of an directory other > than tmp, wal, log. > > One can do this: > |SELECT b.*, c.* FROM (SELECT 'base' a)a, LATERAL (SELECT a||'/'||pg_ls_dir(a.a)b)b, pg_stat_file(b)c; > ..but that's not as helpful as allowing: > |SELECT * FROM pg_ls_dir_metadata('.',true,true); > > There's also no function which recurses into an arbitrary directory, so it > seems shortsighted to provide a function to recursively list a tmpdir. > > Also, since pg_ls_dir_metadata indicates whether the path is a dir, one can > write a SQL function to show the dir recursively. It'd be trivial to plug in > wal/log/tmp (it seems like tmpdirs of other tablespace's are not entirely > trivial). > |SELECT * FROM pg_ls_dir_recurse('base/pgsql_tmp'); > > Also, on a neighboring thread[1], Tom indicated that the pg_ls_* functions > should enumerate all files during the initial call, which sounds like a bad > idea when recursively showing directories. If we add a function recursing into > a directory, we'd need to discuss all the flags to expose to it, like recurse, > ignore_errors, one_filesystem?, show_dotfiles (and eventually bikeshed all the > rest of the flags in find(1)). > > My initial patch [2] changed ls_tmpdir to show metadata columns including > is_dir, but not decend. It's pretty unfortunate if a function called > pg_ls_tmpdir hides shared filesets, so maybe it really is best to change that > (it's new in v12). > > I'm interested to in feedback on the alternative approach, as attached. The > final patch to include all the rest of columns shown by pg_stat_file() is more > of an idea/proposal and not sure if it'll be desirable. But pg_ls_tmpdir() is > essentially the same as my v1 patch. > > This is intended to be mostly independent of any fix to the WARNING I reported > [1]. Since my patch collapses pg_ls_dir into pg_ls_dir_files, we'd only need > to fix one place. I'm planning to eventually look into Tom's suggestion of > returning tuplestore to fix that, and maybe rebase this patchset on top of > that. > > -- > Justin > > [1] https://www.postgresql.org/message-id/flat/20200308173103.GC1357%40telsasoft.com > [2] https://www.postgresql.org/message-id/20191214224735.GA28433%40telsasoft.com
Attachment
- v10-0001-Document-historic-behavior-about-hiding-director.patch
- v10-0002-Document-historic-behavior-about-hiding-director.patch
- v10-0003-Add-tests-exercizing-pg_ls_tmpdir.patch
- v10-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v10-0005-pg_ls_tmpdir-to-show-isdir-argument.patch
- v10-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v10-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v10-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v10-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
Hello Justin, Some feedback on v10: All patches apply cleanly, one on top of the previous. I really wish there would be less than 9 patches… * v10.1 doc change: ok * v10.2 doc change: ok, not sure why it is not merged with previous * v10.3 test add: could be merge with both previous Tests seem a little contrived. I'm wondering whether something more straightforward could be proposed. For instance, once the tablespace is just created but not used yet, probably we do know that the tmp file exists and is empty? * v10.4 at least, some code! Compiles, make check ok. pg_ls_dir_files: I'm fine with the flag approach given the number of switches and the internal nature of the function. I'm not sure of the "FLAG_" prefix which seems too generic, even if it is local. I'd suggest "LS_DIR_*", maybe, as a more specific prefix. ISTM that Pg style requires spaces around operators. I'd suggest some parenthesis would help as well, eg: "flags&FLAG_MISSING_OK" -> "(flags & FLAG_MISSING_OK)" and other instances. About: if (S_ISDIR(attrib.st_mode)) { if (flags&FLAG_SKIP_DIRS) continue; } and similars, why not the simpler: if (S_ISDIR(attrib.st_mode) && (flags & FLAG_SKIP_DIRS)) continue; Especially that it is done like that in previous cases. Maybe I'd create defines for long and common flag specs, eg: #define ..._LS_SIMPLE (FLAG_SKIP_DIRS|FLAG_SKIP_HIDDEN|FLAG_SKIP_SPECIAL|FLAG_METADATA) No attempt at recursing. I'm not sure about these asserts: /* isdir depends on metadata */ Assert(!(flags&FLAG_ISDIR) || (flags&FLAG_METADATA)); Hmmm. Why? /* Unreasonable to show isdir and skip dirs */ Assert(!(flags&FLAG_ISDIR) || !(flags&FLAG_SKIP_DIRS)); Hmmm. Why would I prevent that, even if it has little sense, it should work. I do not see having false on the isdir column as an actual issue. * v10.5 add is_dir column, a few tests & doc. Ok. * v10.6 behavior change for existing functions, always show isdir column, and removal of IS_DIR flag. I'm unsure why the features are removed, some use case may benefit from the more complete function? Maybe flags defs should not be changed anyway? I do not like much the "if (...) /* empty */;" code. Maybe it could be caught more cleanly later in the conditional structure. * v10.7 adds "pg_ls_dir_recurse" function Using sql recurse to possibly to implement the feature is pretty elegant and limits open directories to one at a time, which is pretty neat. Doc looks incomplete and the example is very contrived and badly indented. The function definition does not follow the style around: uppercase whereas all others are lowercase, "" instead of '', no "as"… I do not understand why oid 8511 is given to the new function. I do not understand why UNION ALL and not UNION. I would have put the definition after "pg_ls_dir_metadata" definition. pg_ls_dir_metadata seems defined as (text,bool,bool) but called as (text,bool,bool,bool). Maybe a better alias could be given instead of x? There are no tests for the new function. I'm not sure it would work. * v10.8 change flags & add test on pg_ls_logdir(). I'm unsure why it is done at this stage. * v10.9 change some ls functions and fix patch 10.7 issue I'm unsure why it is done at this stage. "make check" ok. -- Fabien.
On Sun, Mar 15, 2020 at 06:15:02PM +0100, Fabien COELHO wrote: > Some feedback on v10: Thanks for looking. I'm hoping to hear from Alvaro what he thinks of this approach (all functions to show isdir, rather than one function which lists recursively). > All patches apply cleanly, one on top of the previous. I really wish there > would be less than 9 patches… I kept them separate to allow the earlier patches to be applied. And intended to make easier to review, even if it's more work for me.. If you mean that it's a pain to apply 9 patches, I will suggest to use: |git am -3 ./mailbox where ./mailbox is either a copy of the mail you received, or retrieved from the web interface. To test that each one works (compiles, passes tests, etc), I use git rebase -i HEAD~11 and "e"edit the target (set of) patches. > * v10.1 doc change: ok > > * v10.2 doc change: ok, not sure why it is not merged with previous As I mentioned, separate since I'm proposing that they're backpatched to different releases. Those could be applied now (and Tom already applied a patch identical to 0001 in a prior patchset). > * v10.3 test add: could be merge with both previous > Tests seem a little contrived. I'm wondering whether something more > straightforward could be proposed. For instance, once the tablespace is just > created but not used yet, probably we do know that the tmp file exists and > is empty? The tmpdir *doesn't* exist until someone creates tmpfiles there. As it mentions: +-- This tests the missing_ok parameter, which causes pg_ls_tmpdir to succeed even if the tmpdir doesn't exist yet > * v10.4 at least, some code! > I'm not sure of the "FLAG_" prefix which seems too generic, even if it is > local. I'd suggest "LS_DIR_*", maybe, as a more specific prefix. Done. > ISTM that Pg style requires spaces around operators. I'd suggest some > parenthesis would help as well, eg: "flags&FLAG_MISSING_OK" -> "(flags & > FLAG_MISSING_OK)" and other instances. Partially took your suggestion. > About: > > if (S_ISDIR(attrib.st_mode)) { > if (flags&FLAG_SKIP_DIRS) > continue; > } > > and similars, why not the simpler: > > if (S_ISDIR(attrib.st_mode) && (flags & FLAG_SKIP_DIRS)) > continue; That's not the same - if SKIP_DIRS isn't set, your way would fail that test for dirs, and then hit the !ISREG test, and skip them anyway. |else if (!S_ISREG(attrib.st_mode)) | continue > Maybe I'd create defines for long and common flag specs, eg: > > #define ..._LS_SIMPLE (FLAG_SKIP_DIRS|FLAG_SKIP_HIDDEN|FLAG_SKIP_SPECIAL|FLAG_METADATA) Done > I'm not sure about these asserts: > > /* isdir depends on metadata */ > Assert(!(flags&FLAG_ISDIR) || (flags&FLAG_METADATA)); > > Hmmm. Why? It's not supported to show isdir without showing metadata (because that case isn't needed to support the old and the new behaviors). + if (flags & FLAG_METADATA) + { + values[1] = Int64GetDatum((int64) attrib.st_size); + values[2] = TimestampTzGetDatum(time_t_to_timestamptz(attrib.st_mtime)); + if (flags & FLAG_ISDIR) + values[3] = BoolGetDatum(S_ISDIR(attrib.st_mode)); + } > /* Unreasonable to show isdir and skip dirs */ > Assert(!(flags&FLAG_ISDIR) || !(flags&FLAG_SKIP_DIRS)); > > Hmmm. Why would I prevent that, even if it has little sense, it should work. > I do not see having false on the isdir column as an actual issue. It's unimportant, but testing for intended use of flags during development. > * v10.6 behavior change for existing functions, always show isdir column, > and removal of IS_DIR flag. > > I'm unsure why the features are removed, some use case may benefit from the > more complete function? > Maybe flags defs should not be changed anyway? Maybe. I put them back...but it means they're not being exercized by any *used* case. > I do not like much the "if (...) /* empty */;" code. Maybe it could be > caught more cleanly later in the conditional structure. This went away when I put back the SKIP_DIRS flag. > * v10.7 adds "pg_ls_dir_recurse" function > Doc looks incomplete and the example is very contrived and badly indented. Why you think it's contrived? Listing a tmpdir recursively is the initial motivation of this patch. Maybe you think I should list just the tmpdir for one tablespace ? Note that for temp_tablespaces parameter: |When there is more than one name in the list, PostgreSQL chooses a random member of the list each time a temporary objectis to be created; except that within a transaction, successively created temporary objects are placed in successivetablespaces from the list. > The function definition does not follow the style around: uppercase whereas > all others are lowercase, "" instead of '', no "as"… I used "" because of this: | x.name||'/'||a.name I don't know if there's a better way to join paths in SQL, or if that suggests this is a bad way to do it. > I do not understand why oid 8511 is given to the new function. I used: ./src/include/catalog/unused_oids (maybe not correctly). > I do not understand why UNION ALL and not UNION. In general, union ALL can avoid a "distinct" plan node, but it doesn't seem to have any effect here. > I would have put the definition after "pg_ls_dir_metadata" definition. Done > pg_ls_dir_metadata seems defined as (text,bool,bool) but called as > (text,bool,bool,bool). fixed, thanks. > Maybe a better alias could be given instead of x? > > There are no tests for the new function. I'm not sure it would work. I added something which would've caught the issue with number of arguments. > * v10.8 change flags & add test on pg_ls_logdir(). > > I'm unsure why it is done at this stage. I think it makes sense to allow ls_logdir to succeed even if ./log doesn't exist, since it isn't created by initdb or during postmaster start, and since we already using MISSING_OK for tmpdir. But a separate patch since we didn't previous discuss changing logdir. > * v10.9 change some ls functions and fix patch 10.7 issue > I'm unsure why it is done at this stage. "make check" ok. This is the last patch in the series, since I think it's least likely to be agreed on. -- Justin
Attachment
- v11-0001-Document-historic-behavior-about-hiding-director.patch
- v11-0002-Document-historic-behavior-about-hiding-director.patch
- v11-0003-Add-tests-exercizing-pg_ls_tmpdir.patch
- v11-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v11-0005-pg_ls_tmpdir-to-show-isdir-argument.patch
- v11-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v11-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v11-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v11-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
About v11, ISTM that the recursive function should check for symbolic links and possibly avoid them: sh> cd data/base sh> ln -s .. foo psql> SELECT * FROM pg_ls_dir_recurse('.'); ERROR: could not stat file "./base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo": Toomany levels of symbolic links CONTEXT: SQL function "pg_ls_dir_recurse" statement 1 This probably means using lstat instead of (in supplement to?) stat, and probably tell if something is a link, and if so not recurse in them. -- Fabien.
On Mon, Mar 16, 2020 at 04:20:21PM +0100, Fabien COELHO wrote: > > About v11, ISTM that the recursive function should check for symbolic links > and possibly avoid them: > > sh> cd data/base > sh> ln -s .. foo > > psql> SELECT * FROM pg_ls_dir_recurse('.'); > ERROR: could not stat file "./base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo": Toomany levels of symbolic links > CONTEXT: SQL function "pg_ls_dir_recurse" statement 1 > > This probably means using lstat instead of (in supplement to?) stat, and > probably tell if something is a link, and if so not recurse in them. Thanks for looking. I think that opens up a can of worms. I don't want to go into the business of re-implementing all of find(1) - I count ~128 flags (most of which take arguments). You're referring to find -L vs find -P, and some people would want one and some would want another. And don't forget about find -H... pg_stat_file doesn't expose the file type (I guess because it's not portable?), and I think it's outside the scope of this patch to change that. Maybe it suggests that the pg_ls_dir_recurse patch should be excluded. ISTM if someone wants to recursively list a directory, they should avoid putting cycles there, or permission errors, or similar. Or they should write their own C extension that borrows from pg_ls_dir_files but handles more arguments. -- Justin
Hello Justin, >> psql> SELECT * FROM pg_ls_dir_recurse('.'); >> ERROR: could not stat file "./base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo/base/foo": Toomany levels of symbolic links >> CONTEXT: SQL function "pg_ls_dir_recurse" statement 1 >> >> This probably means using lstat instead of (in supplement to?) stat, and >> probably tell if something is a link, and if so not recurse in them. > > Thanks for looking. > > I think that opens up a can of worms. I don't want to go into the business of > re-implementing all of find(1) - I count ~128 flags (most of which take > arguments). You're referring to find -L vs find -P, and some people would want > one and some would want another. And don't forget about find -H... This is not the point. The point is that a link can change a finite tree into cyclic graph, and you do not want to delve into that, ever. The "find" command, by default, does not recurse into a link because of said problem, and the user *must* ask for it and assume the infinite loop if any. So if you implement one behavior, it should be not recursing into links. Franckly, I would not provide the recurse into link alternative, but it could be implemented if someone wants it, and the problem that come with it. > pg_stat_file doesn't expose the file type (I guess because it's not portable?), You are right that Un*x and Windows are not the same wrt link. It seems that there is already something about that in port: "./src/port/dirmod.c:pgwin32_is_junction(const char *path)" So most of the details are already hidden. > and I think it's outside the scope of this patch to change that. Maybe it > suggests that the pg_ls_dir_recurse patch should be excluded. IMHO, I really think that it should be included. Dealing with links is no big deal, but you need an additional column in _metadata to tell it is a link, and there is a ifdef because testing is a little different between unix and windows. I'd guess around 10-20 lines of code added. > ISTM if someone wants to recursively list a directory, they should avoid > putting cycles there, or permission errors, or similar. Hmmm. I'd say the user should like to be able to call the function and never have a bad experience with it such as a failure on an infinite loop. > Or they should write their own C extension that borrows from > pg_ls_dir_files but handles more arguments. ISTM that the point of your patch is to provide the basic tool needed to list directories contents, and handling links somehow is a necessary part of that. -- Fabien.
On Mon, Mar 16, 2020 at 04:20:21PM +0100, Fabien COELHO wrote: > This probably means using lstat instead of (in supplement to?) stat, and > probably tell if something is a link, and if so not recurse in them. On Mon, Mar 16, 2020 at 07:21:06PM +0100, Fabien COELHO wrote: > IMHO, I really think that it should be included. Dealing with links is no > big deal, but you need an additional column in _metadata to tell it is a > link Instead of showing another column, I changed to show links with isdir=false. At the cost of two more patches, to allow backpatching docs and maybe separate commit to make the subtle change obvious in commit history, at least. I see a few places in the backend and a few more in the fronted using the same logic that I used for islink(), but I'm not sure if there's a good place to put that to allow factoring out at least the other backend ones. -- Justin
Attachment
- v12-0001-Document-historic-behavior-about-hiding-director.patch
- v12-0002-Document-historic-behavior-of-links-to-directori.patch
- v12-0003-Document-historic-behavior-about-hiding-director.patch
- v12-0004-Add-tests-exercizing-pg_ls_tmpdir.patch
- v12-0005-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v12-0006-Show-links-to-dirs-with-isdir-false.patch
- v12-0007-pg_ls_tmpdir-to-show-isdir-argument.patch
- v12-0008-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v12-0009-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v12-0010-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v12-0011-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
I pushed 0001 and 0003 (as a single commit). archive_statusdir didn't get here until 12, so your commit message was mistaken. Also, pg10 is slightly different so it didn't apply there, so I left it alone. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 16, 2020 at 07:17:36PM -0300, Alvaro Herrera wrote: > I pushed 0001 and 0003 (as a single commit). archive_statusdir didn't > get here until 12, so your commit message was mistaken. Also, pg10 is > slightly different so it didn't apply there, so I left it alone. Thanks, I appreciate it (and I'm sure Fabien will appreciate having two fewer patches...). @cfbot: rebased onto b4570d33aa045df330bb325ba8a2cbf02266a555 I realized that if I lstat() a file to make sure links to dirs show as isdir=false, it's odd to then show size and timestamps of the dir. So changed to use lstat ... and squished. -- Justin
Attachment
- v13-0001-Document-historic-behavior-of-links-to-directori.patch
- v13-0002-Add-tests-exercising-pg_ls_tmpdir.patch
- v13-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v13-0004-pg_ls_tmpdir-to-show-isdir-argument.patch
- v13-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v13-0006-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v13-0007-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v13-0008-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
About v13, seens as one patch: Function "pg_ls_dir_metadata" documentation suggests a variable number of arguments with brackets, but parameters are really mandatory. postgres=# SELECT pg_ls_dir_metadata('.'); ERROR: function pg_ls_dir_metadata(unknown) does not exist LINE 1: SELECT pg_ls_dir_metadata('.'); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. postgres=# SELECT pg_ls_dir_metadata('.', true, true); … The example in the documentation could be better indented. Also, ISTM that there are two implicit laterals (format & pg_ls_dir_recurse) that I would make explicit. I'd use the pcs alias explicitely. I'd use meaningful aliases (eg ts instead of b, …). On reflection, I think that a boolean "isdir" column is a bad idea because it is not extensible. I'd propose to switch to the standard "ls" approach of providing the type as one character: '-' for regular, 'd' for directory, 'l' for link, 's' for socket, 'c' for character special… ISTM that "lstat" is not available on windows, which suggests to call "stat" always, and then "lstat" on un*x and pg ports stuff on win. I'm wondering about the restriction on directories only. Why should it not work on a file? Can it be easily extended to work on a simple file? If so, it could be just "pg_ls". -- Fabien.
On Tue, Mar 17, 2020 at 10:21:48AM +0100, Fabien COELHO wrote: > > About v13, seens as one patch: > > Function "pg_ls_dir_metadata" documentation suggests a variable number of > arguments with brackets, but parameters are really mandatory. Fixed, and added tests on 1 and 3 arg versions of both pg_ls_dir() and pg_ls_dir_metadata(). It seems like the only way to make variable number of arguments is is with multiple entries in pg_proc.dat, one for each "number of" arguments. Is that right ? > The example in the documentation could be better indented. Also, ISTM that > there are two implicit laterals (format & pg_ls_dir_recurse) that I would > make explicit. I'd use the pcs alias explicitely. I'd use meaningful aliases > (eg ts instead of b, …). > On reflection, I think that a boolean "isdir" column is a bad idea because > it is not extensible. I'd propose to switch to the standard "ls" approach of > providing the type as one character: '-' for regular, 'd' for directory, 'l' > for link, 's' for socket, 'c' for character special… I think that's outside the scope of the patch, since I'd want to change pg_stat_file; that's where I borrowed "isdir" from, for consistency. Note that both LS_DIR_HISTORIC and LS_DIR_MODERN include LS_DIR_SKIP_SPECIAL, so only pg_ls_dir itself show specials, so they way to do it would be to 1) change pg_stat_file to expose the file's "type", 2) use pg_ls_dir() AS a, lateral pg_stat_file(a) AS b, 3) then consider also changing LS_DIR_MODERN and all the existing pg_ls_*. > ISTM that "lstat" is not available on windows, which suggests to call "stat" > always, and then "lstat" on un*x and pg ports stuff on win. I believe that's handled here. src/include/port/win32_port.h:#define lstat(path, sb) stat(path, sb) > I'm wondering about the restriction on directories only. Why should it not > work on a file? Can it be easily extended to work on a simple file? If so, > it could be just "pg_ls". I think that's a good idea, except it doesn't fit with what the code does: AllocDir() and ReadDir(). Instead, use pg_stat_file() for that. Hm, I realized that the existing pg_ls_dir_metadata was skipping links to dirs, since !ISREG(). So changed to use both stat() and lstat(). -- Justin
Attachment
- v14-0001-Document-historic-behavior-of-links-to-directori.patch
- v14-0002-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v14-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v14-0004-pg_ls_tmpdir-to-show-isdir-argument.patch
- v14-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v14-0006-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v14-0007-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v14-0008-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
Justin Pryzby <pryzby@telsasoft.com> writes: > It seems like the only way to make variable number of arguments is is with > multiple entries in pg_proc.dat, one for each "number of" arguments. Is that > right ? Another way to do it is to have one entry, put the full set of arguments into the initial pg_proc.dat data, and then use CREATE OR REPLACE FUNCTION later during initdb to install some defaults. See existing cases in system_views.sql, starting about line 1180. Neither way is especially pretty, so take your choice. regards, tom lane
On Tue, Mar 17, 2020 at 02:04:01PM -0500, Justin Pryzby wrote: > > The example in the documentation could be better indented. Also, ISTM that > > there are two implicit laterals (format & pg_ls_dir_recurse) that I would > > make explicit. I'd use the pcs alias explicitely. I'd use meaningful aliases > > (eg ts instead of b, …). > > > On reflection, I think that a boolean "isdir" column is a bad idea because > > it is not extensible. I'd propose to switch to the standard "ls" approach of > > providing the type as one character: '-' for regular, 'd' for directory, 'l' > > for link, 's' for socket, 'c' for character special… > > I think that's outside the scope of the patch, since I'd want to change > pg_stat_file; that's where I borrowed "isdir" from, for consistency. > > Note that both LS_DIR_HISTORIC and LS_DIR_MODERN include LS_DIR_SKIP_SPECIAL, > so only pg_ls_dir itself show specials, so they way to do it would be to 1) > change pg_stat_file to expose the file's "type", 2) use pg_ls_dir() AS a, > lateral pg_stat_file(a) AS b, 3) then consider also changing LS_DIR_MODERN and > all the existing pg_ls_*. The patch intends to fix the issue of "failing to show failed filesets" (because dirs are skipped) while also generalizing existing functions (to show directories and "isdir" column) and providing some more flexible ones (to list file and metadata of a dir, which is currently possible [only] for "special" directories, or by recursively calling pg_stat_file). I'm still of the opinion that supporting arbitrary file types is out of scope, but I changed the "isdir" to show "type". I'm only supporting '[-dl]'. I don't want to have to check #ifdef S_ISDOOR or whatever other vendors have. I insist that it is a separate patch, since it depends on everything else, and I have no feedback from anybody else as to whether any of that is desired. template1=# SELECT * FROM pg_ls_waldir(); name | size | access | modification | change | creation| type --------------------------+----------+------------------------+------------------------+------------------------+----------+------ barr | 0 | 2020-03-31 14:43:11-05 | 2020-03-31 14:43:11-05 | 2020-03-31 14:43:11-05 | | ? baz | 4096 | 2020-03-31 14:39:18-05 | 2020-03-31 14:39:18-05 | 2020-03-31 14:39:18-05 | | d foo | 0 | 2020-03-31 14:39:37-05 | 2020-03-31 14:39:37-05 | 2020-03-31 14:39:37-05 | | - archive_status | 4096 | 2020-03-31 14:38:20-05 | 2020-03-31 14:38:18-05 | 2020-03-31 14:38:18-05 | | d 000000010000000000000001 | 16777216 | 2020-03-31 14:42:53-05 | 2020-03-31 14:43:08-05 | 2020-03-31 14:43:08-05 | | - bar | 3 | 2020-03-31 14:39:16-05 | 2020-03-31 14:39:01-05 | 2020-03-31 14:39:01-05 | | l
Attachment
- v15-0001-Document-historic-behavior-of-links-to-directori.patch
- v15-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v15-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v15-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v15-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v15-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v15-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v15-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v15-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v15-0010-pg_ls_-to-show-file-type-and-show-special-files.patch
Hello Justin, About v15, seen as one patch. Patches serie applies cleanly, compiles, "make check" ok. Documentation: - indent documentation text around 80 cols, as done around? - indent SQL example for readability and capitalize keywords (pg_ls_dir_metadata) - "For each file in a directory, list the file and its metadata." maybe: "List files and their metadata in a directory"? Code: - Most pg_ls_*dir* functions call pg_ls_dir_files(), which looks like reasonable refactoring, ISTM that the code is actually smaller. - please follow pg style, eg not "} else {" - there is a "XXX" (meaning fixme?) tag remaining in a comment. - file types: why not do block & character devices, fifo and socket as well, before the unkown case? - I'm wondering whether could pg_stat_file call pg_ls_dir_files without too much effort? ISTM that the output structure nearly the same. I do not like much having one function specialized for files and one for directories. Tests: - good, there are some! - indent SQL code, eg by starting a new line on new clauses? - put comments on separate lines (I'm not against it on principle, I do that, but I do not think that it is done much in test files). -- Fabien.
On Sun, Apr 12, 2020 at 01:53:40PM +0200, Fabien COELHO wrote: > About v15, seen as one patch. Thanks for looking. > - I'm wondering whether could pg_stat_file call pg_ls_dir_files without > too much effort? ISTM that the output structure nearly the same. I do > not like much having one function specialized for files and one for > directories. I refactored but not like that. As I mentioned in the commit message, I don't see a good way to make a function operate on a file when the function's primary data structure is a DIR*. Do you ? I don't think it should call stat() and then conditionally branch off to pg_stat_file(). There are two functions because they wrap two separate syscalls, which see as good, transparent goal. If we want a function that does what "ls -al" does, that would also be a good example to follow, except that we already didn't follow it. /bin/ls first stat()s the path, and then either outputs its metadata (if it's a file or if -d was specified) or lists a dir. It's essentially a wrapper around *two* system calls (stat and readdir/getdents). Maybe we could invent a new pg_ls() which does that, and then refactor existing code. Or, maybe it would be a SQL function which calls stat() and then conditionally calls pg_ls_dir if isdir=True (or type='d'). That would be easy if we merge the commit which outputs all stat fields. I'm still hoping for confirmation from a committer if this approach is worth pursuing: https://www.postgresql.org/message-id/20200310183037.GA29065%40telsasoft.com https://www.postgresql.org/message-id/20200313131232.GO29065%40telsasoft.com |Rather than making "recurse into tmpdir" the end goal: | | - add a function to show metadata of an arbitrary dir; | - add isdir arguments to pg_ls_* functions (including pg_ls_tmpdir but not | pg_ls_dir). | - maybe add pg_ls_dir_recurse, which satisfies the original need; | - retire pg_ls_dir (does this work with tuplestore?) | - profit -- Justin
Attachment
- v16-0001-Document-historic-behavior-of-links-to-directori.patch
- v16-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v16-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v16-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v16-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v16-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v16-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v16-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v16-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v16-0010-pg_ls_-to-show-file-type-and-show-special-files.patch
Rebased onto 1ad23335f36b07f4574906a8dc66a3d62af7c40c
Attachment
- v17-0001-Document-historic-behavior-of-links-to-directori.patch
- v17-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v17-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v17-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v17-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v17-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v17-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v17-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v17-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v17-0010-pg_ls_-to-show-file-type-and-show-special-files.patch
Rebased onto 7b48f1b490978a8abca61e9a9380f8de2a56f266 and renumbered OIDs.
Attachment
- v18-0001-Document-historic-behavior-of-links-to-directori.patch
- v18-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v18-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v18-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v18-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v18-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v18-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v18-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v18-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v18-0010-pg_ls_-to-show-file-type-and-show-special-files.patch
Hello Justin, > Rebased onto 7b48f1b490978a8abca61e9a9380f8de2a56f266 and renumbered OIDs. Some feedback about v18, seen as one patch. Patch applies cleanly & compiles. "make check" is okay. pg_stat_file() and pg_stat_dir_files() now return a char type, as well as the function which call them, but the documentation does not seem to say that it is the case. I must admit that I'm not a fan on the argument management of pg_ls_dir_metadata and pg_ls_dir_metadata_1arg and others. I understand that it saves a few lines though, so maybe let it be. There is a comment in pg_ls_dir_files which talks about pg_ls_dir. Could pg_ls_*dir functions C implementations be dropped in favor of a pure SQL implementation, like you did with recurse? If so, ISTM that pg_ls_dir_files() could be significantly simplified by moving its filtering flag to SQL conditions on "type" and others. That could allow not to change the existing function output a keep the "isdir" column defined as "type = 'd'" where it was used previously, if someone complains, but still have the full capability of "ls". That would also allow to drop the "*_1arg" hacks. Basically I'm advocating having 1 or 2 actual C functions, and all other variants managed at the SQL level. -- Fabien.
On Sun, Jun 07, 2020 at 10:07:19AM +0200, Fabien COELHO wrote: > Hello Justin, > > Rebased onto 7b48f1b490978a8abca61e9a9380f8de2a56f266 and renumbered OIDs. Rebased again on whatever broke func.sgml. > pg_stat_file() and pg_stat_dir_files() now return a char type, as well as > the function which call them, but the documentation does not seem to say > that it is the case. Fixed, thanks > I must admit that I'm not a fan on the argument management of > pg_ls_dir_metadata and pg_ls_dir_metadata_1arg and others. I understand that > it saves a few lines though, so maybe let it be. I think you're saying that you don't like the _1arg functions, but they're needed to allow the regression tests to pass: | * note: this wrapper is necessary to pass the sanity check in opr_sanity, | * which checks that all built-in functions that share the implementing C | * function take the same number of arguments > There is a comment in pg_ls_dir_files which talks about pg_ls_dir. > > Could pg_ls_*dir functions C implementations be dropped in favor of a pure > SQL implementation, like you did with recurse? I'm still waiting to hear feedback from a commiter if this is a good idea to put this into the system catalog. Right now, ts_debug is the only nontrivial function. > If so, ISTM that pg_ls_dir_files() could be significantly simplified by > moving its filtering flag to SQL conditions on "type" and others. That could > allow not to change the existing function output a keep the "isdir" column > defined as "type = 'd'" where it was used previously, if someone complains, > but still have the full capability of "ls". That would also allow to drop > the "*_1arg" hacks. Basically I'm advocating having 1 or 2 actual C > functions, and all other variants managed at the SQL level. You want to get rid of the 1arg stuff and just have one function. I see your point, but I guess the C function would still need to accept a "missing_ok" argument, so we need two functions, so there's not much utility in getting rid of the "include_dot_dirs" argument, which is there for consistency with pg_ls_dir. Conceivably we could 1) get rid of pg_ls_dir, and 2) get rid of the include_dot_dirs argument and 3) maybe make "missing_ok" a required argument; and, 4) get rid of the C wrapper functions, and replace with a bunch of stuff like this: SELECT name, size, access, modification, change, creation, type='d' AS isdir FROM pg_ls_dir_metadata('pg_wal') WHERE substring(name,1,1)!='.' AND type!='d'; Where the defaults I changed in this patchset still remain to be discussed: with or without metadata, hidden files, dotdirs. As I'm still waiting for committer feedback on the first 10 patches, so not intending to add more. -- Justin
Attachment
- v19-0001-Document-historic-behavior-of-links-to-directori.patch
- v19-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v19-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v19-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v19-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v19-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v19-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v19-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v19-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v19-0010-pg_ls_-to-show-file-type-and-show-special-files.patch
On Sun, Jun 21, 2020 at 08:53:25PM -0500, Justin Pryzby wrote: > I'm still waiting to hear feedback from a commiter if this is a good idea to > put this into the system catalog. Right now, ts_debug is the only nontrivial > function. I'm still missing feedback from committers about the foundation of this approach. But I finally looked into the pg_rewind test failure That led met to keep the "dir" as a separate column, since that's what's needed there, and it's more convenient to have a separate column than to provide a column needing to be parsed. -- Justin
Attachment
- v20-0001-Document-historic-behavior-of-links-to-directori.patch
- v20-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v20-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v20-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v20-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v20-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v20-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v20-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v20-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v20-0010-pg_ls_-to-show-file-type-and-show-special-files.patch
On Tue, Jul 14, 2020 at 10:08:39PM -0500, Justin Pryzby wrote: > I'm still missing feedback from committers about the foundation of this > approach. Now rebased on top of fix for my own bug report (1d09fb1f). I also changed argument handling for pg_ls_dir_recurse(). Passing '.' gave an initial path of . (of course) but then every other path begins with './' which I didn't like, since it's ambiguous with empty path, or .// or ././ ... And one could pass './' which gives different output (like ././). So I specially handled the input of '.'. Maybe the special value should be NULL instead of ''. But it looks no other system functions are currently non-strict. For pg_rewind testcase, getting the output path+filename uses a coalesce, since the rest of the test does stuff like strcmp("pg_wal"). Still waiting for feedback from a committer. -- Justin
Attachment
- v21-0001-Document-historic-behavior-of-links-to-directori.patch
- v21-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v21-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v21-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v21-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v21-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v21-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v21-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v21-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v21-0010-pg_ls_-to-show-file-type-and-show-special-files.patch
On Sat, Jul 18, 2020 at 03:15:32PM -0500, Justin Pryzby wrote: > Still waiting for feedback from a committer. This patch has been waiting for input from a committer on the approach I've taken with the patches since March 10, so I'm planning to set to "Ready" - at least ready for senior review. -- Justin
On Tue, Sep 08, 2020 at 02:51:26PM -0500, Justin Pryzby wrote: > On Sat, Jul 18, 2020 at 03:15:32PM -0500, Justin Pryzby wrote: > > Still waiting for feedback from a committer. > > This patch has been waiting for input from a committer on the approach I've > taken with the patches since March 10, so I'm planning to set to "Ready" - at > least ready for senior review. @cfbot: rebased
Attachment
- v22-0001-Document-historic-behavior-of-links-to-directori.patch
- v22-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v22-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v22-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v22-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v22-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v22-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v22-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v22-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v22-0010-pg_ls_-pg_stat_file-to-show-file-type.patch
On Wed, Oct 28, 2020 at 02:34:02PM -0500, Justin Pryzby wrote: > On Tue, Sep 08, 2020 at 02:51:26PM -0500, Justin Pryzby wrote: > > On Sat, Jul 18, 2020 at 03:15:32PM -0500, Justin Pryzby wrote: > > > Still waiting for feedback from a committer. > > > > This patch has been waiting for input from a committer on the approach I've > > taken with the patches since March 10, so I'm planning to set to "Ready" - at > > least ready for senior review. > > @cfbot: rebased Rebased on e152506adef4bc503ea7b8ebb4fedc0b8eebda81 -- Justin
Attachment
- v23-0001-Document-historic-behavior-of-links-to-directori.patch
- v23-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v23-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v23-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v23-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v23-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v23-0007-Add-pg_ls_dir_recurse-to-show-dir-recursively.patch
- v23-0008-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v23-0009-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v23-0010-pg_ls_-pg_stat_file-to-show-file-type.patch
Justin Pryzby <pryzby@telsasoft.com> writes: >> This patch has been waiting for input from a committer on the approach I've >> taken with the patches since March 10, so I'm planning to set to "Ready" - at >> least ready for senior review. I took a quick look through this. This is just MHO, of course: * I don't think it's okay to change the existing signatures of pg_ls_logdir() et al. Even if you can make an argument that it's not too harmful to add more output columns, replacing pg_stat_file's isdir output with something of a different name and datatype is most surely not OK --- there is no possible way that doesn't break existing user queries. I think possibly a more acceptable approach is to leave these functions alone but add documentation explaining how to get the additional info. You could say things along the lines of "pg_ls_waldir() is the same as pg_ls_dir_metadata('pg_wal') except for showing fewer columns." * I'm not very much on board with implementing pg_ls_dir_recurse() as a SQL function that depends on a WITH RECURSIVE construct. I do not think that's okay from either performance or security standpoints. Surely it can't be hard to build a recursion capability into the C code? We could then also debate whether this ought to be a separate function at all, instead of something you invoke via an additional boolean flag parameter to pg_ls_dir_metadata(). * I'm fairly unimpressed with the testing approach, because it doesn't seem like you're getting very much coverage. It's hard to do better while still having the totally-fixed output expected by our regular regression test framework, but to me that just says not to test these functions in that framework. I'd consider ripping all of that out in favor of a TAP test. While I didn't read the C code in any detail, a couple of things stood out to me: * I noticed that you did s/stat/lstat/. That's fine on Unix systems, but it won't have any effect on Windows systems (cf bed90759f), which means that we'll have to document a platform-specific behavioral difference. Do we want to go there? Maybe this patch needs to wait on somebody fixing our lack of real lstat() on Windows. (I assume BTW that this means the WIN32 code in get_file_type() is unreachable.) * This bit: + /* Skip dot dirs? */ + if (flags & LS_DIR_SKIP_DOT_DIRS && + (strcmp(de->d_name, ".") == 0 || + strcmp(de->d_name, "..") == 0)) + continue; + + /* Skip hidden files? */ + if (flags & LS_DIR_SKIP_HIDDEN && + de->d_name[0] == '.') continue; doesn't seem to have thought very carefully about the interaction of those two flags, ie it seems like LS_DIR_SKIP_HIDDEN effectively implies LS_DIR_SKIP_DOT_DIRS. Do we want that? regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: > >> This patch has been waiting for input from a committer on the approach I've > >> taken with the patches since March 10, so I'm planning to set to "Ready" - at > >> least ready for senior review. > > I took a quick look through this. This is just MHO, of course: > > * I don't think it's okay to change the existing signatures of > pg_ls_logdir() et al. Even if you can make an argument that it's > not too harmful to add more output columns, replacing pg_stat_file's > isdir output with something of a different name and datatype is most > surely not OK --- there is no possible way that doesn't break existing > user queries. I disagree that we need to stress over this- we pretty routinely change the signature of various catalogs and functions and anyone using these is already of the understanding that we are free to make such changes between major versions. If anything, we should be strongly discouraging the notion of "don't break user queries" when it comes to administrative and monitoring functions like these because, otherwise, we end up with things like the mess that is pg_start/stop_backup() (and just contrast that to what we did to recovery.conf when thinking about "well, do we need to 'deprecate' or keep around the old stuff so we don't break things for users who use these functions?" or the changes made in v10, neither of which have produced much in the way of complaints). Let's focus on working towards cleaner APIs and functions, accepting a break when it makes sense to, which seems to be the case with this patch (though I agree about using a TAP test suite and about performing the directory recursion in C instead), and not pull forward cruft that we then are telling ourselves we have to maintain compatibility of indefinitely and at the expense of sensible APIs. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I took a quick look through this. This is just MHO, of course: >> >> * I don't think it's okay to change the existing signatures of >> pg_ls_logdir() et al. > I disagree that we need to stress over this- we pretty routinely change > the signature of various catalogs and functions and anyone using these > is already of the understanding that we are free to make such changes > between major versions. Well, like I said, just MHO. Anybody else want to weigh in? I'm mostly concerned about removing the isdir output of pg_stat_file(). Maybe we could compromise to the extent of keeping that, allowing it to be partially duplicative of a file-type-code output column. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> I took a quick look through this. This is just MHO, of course: > >> > >> * I don't think it's okay to change the existing signatures of > >> pg_ls_logdir() et al. > > > I disagree that we need to stress over this- we pretty routinely change > > the signature of various catalogs and functions and anyone using these > > is already of the understanding that we are free to make such changes > > between major versions. > > Well, like I said, just MHO. Anybody else want to weigh in? > > I'm mostly concerned about removing the isdir output of pg_stat_file(). > Maybe we could compromise to the extent of keeping that, allowing it > to be partially duplicative of a file-type-code output column. I don't have any particular issue with keeping isdir as a convenience column. I agree it'll now be a bit duplicative but that seems alright. Thanks, Stephen
Attachment
On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: > >> This patch has been waiting for input from a committer on the approach I've > >> taken with the patches since March 10, so I'm planning to set to "Ready" - at > >> least ready for senior review. > > I took a quick look through this. This is just MHO, of course: > > * I don't think it's okay to change the existing signatures of > pg_ls_logdir() et al. Even if you can make an argument that it's > not too harmful to add more output columns, replacing pg_stat_file's > isdir output with something of a different name and datatype is most > surely not OK --- there is no possible way that doesn't break existing > user queries. > > I think possibly a more acceptable approach is to leave these functions > alone but add documentation explaining how to get the additional info. > You could say things along the lines of "pg_ls_waldir() is the same as > pg_ls_dir_metadata('pg_wal') except for showing fewer columns." > > * I'm not very much on board with implementing pg_ls_dir_recurse() > as a SQL function that depends on a WITH RECURSIVE construct. > I do not think that's okay from either performance or security > standpoints. Surely it can't be hard to build a recursion capability Thanks. WITH RECURSIVE was the "new approach" I took early this year. Of course we can recurse in C, now that I know (how) to use the tuplestore. Working on that patch was how I ran into the "LIMIT 1" SRF bug. I don't see how security is relevant though, though, since someone can run a the WITH query directly. The function just needs to be restricted to superusers, same as pg_ls_dir(). Anyway, I've re-ordered commits so this the last patch, since earlier commits don't need to depend on it. I don't think it's even essential to provide a recursive function (anyone could write the CTE), so long as we don't hide dirs and show isdir or type. I implemented it first as a separate function and then as an optional argument to pg_ls_dir_files(). If it's implemented as an optional "mode" of an existing function, there's the constraint that returning a "path" argument has to be after all other arguments (the ones that are useful without recursion) or else it messes up other functions (like pg_ls_waldir()) that also call pg_ls_dir_files(). > doesn't seem to have thought very carefully about the interaction > of those two flags, ie it seems like LS_DIR_SKIP_HIDDEN effectively > implies LS_DIR_SKIP_DOT_DIRS. Do we want that? Yes it's implied. Those options exist to support the pre-existing behavior. pg_ls_dir can optionaly show dotdirs, but pg_ls_*dir skip all hidden files (which is documented since 8b6d94cf6). I'm happy to implement something else if a different behavior is desirable. -- Justin
Attachment
- v24-0001-Document-historic-behavior-of-links-to-directori.patch
- v24-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v24-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v24-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v24-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v24-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v24-0007-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v24-0008-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v24-0009-pg_ls_-pg_stat_file-to-show-file-type.patch
- v24-0010-Preserve-pg_stat_file-isdir.patch
- v24-0011-Add-recursion-option-in-pg_ls_dir_files.patch
Justin Pryzby <pryzby@telsasoft.com> writes: [ v24-0001-Document-historic-behavior-of-links-to-directori.patch ] The cfbot is unhappy with one of the test cases you added: 6245@@ -259,9 +259,11 @@ 6246 select path, filename, type from pg_ls_dir_metadata('.', true, false, true) where path!~'[0-9]|pg_internal.init|global.tmp'order by 1; 6247 path | filename | type 6248 ----------------------------------+-----------------------+------ 6249+ PG_VERSION | PG_VERSION | - 6250 base | base | d 6251 base/pgsql_tmp | pgsql_tmp | d 6252 global | global | d 6253+ global/config_exec_params | config_exec_params | - 6254 global/pg_control | pg_control | - 6255 global/pg_filenode.map | pg_filenode.map | - 6256 pg_commit_ts | pg_commit_ts | d 6257@@ -285,7 +287,6 @@ 6258 pg_subtrans | pg_subtrans | d 6259 pg_tblspc | pg_tblspc | d 6260 pg_twophase | pg_twophase | d 6261- PG_VERSION | PG_VERSION | - 6262 pg_wal | pg_wal | d 6263 pg_wal/archive_status | archive_status | d 6264 pg_xact | pg_xact | d 6265@@ -293,7 +294,7 @@ 6266 postgresql.conf | postgresql.conf | - 6267 postmaster.opts | postmaster.opts | - 6268 postmaster.pid | postmaster.pid | - 6269-(34 rows) 6270+(35 rows) This shows that (a) the test is sensitive to prevailing collation and (b) it's not filtering out enough temporary files. Even if those things were fixed, though, the test would break every time we added/removed some PGDATA substructure. Worse, it'd also break if say somebody had edited postgresql.conf and left an editor backup file behind, or when running in an installation where the configuration files are someplace else. I think this is way too fragile to be acceptable. Maybe it could be salvaged by reversing the sense of the WHERE condition so that instead of trying to blacklist stuff, you whitelist just a small number of files that should certainly be there. regards, tom lane
On Fri, Dec 04, 2020 at 12:23:23PM -0500, Tom Lane wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: > [ v24-0001-Document-historic-behavior-of-links-to-directori.patch ] > > The cfbot is unhappy with one of the test cases you added: > Maybe it could be salvaged by reversing the sense of the WHERE condition > so that instead of trying to blacklist stuff, you whitelist just a small > number of files that should certainly be there. Yes, I had noticed this one. -- Justin
Attachment
- v25-0001-Document-historic-behavior-of-links-to-directori.patch
- v25-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v25-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v25-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v25-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v25-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v25-0007-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v25-0008-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v25-0009-pg_ls_-pg_stat_file-to-show-file-type.patch
- v25-0010-Preserve-pg_stat_file-isdir.patch
- v25-0011-Add-recursion-option-in-pg_ls_dir_files.patch
On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote: > * I don't think it's okay to change the existing signatures of > pg_ls_logdir() et al. Even if you can make an argument that it's > not too harmful to add more output columns, replacing pg_stat_file's > isdir output with something of a different name and datatype is most > surely not OK --- there is no possible way that doesn't break existing > user queries. > > I think possibly a more acceptable approach is to leave these functions > alone but add documentation explaining how to get the additional info. > You could say things along the lines of "pg_ls_waldir() is the same as > pg_ls_dir_metadata('pg_wal') except for showing fewer columns." On Mon, Nov 23, 2020 at 06:06:19PM -0500, Tom Lane wrote: > I'm mostly concerned about removing the isdir output of pg_stat_file(). > Maybe we could compromise to the extent of keeping that, allowing it > to be partially duplicative of a file-type-code output column. On Tue, Nov 24, 2020 at 11:53:22AM -0500, Stephen Frost wrote: > I don't have any particular issue with keeping isdir as a convenience > column. I agree it'll now be a bit duplicative but that seems alright. Maybe we should do what Tom said, and leave pg_ls_* unchanged, but also mark them as deprecated in favour of: | pg_ls_dir_metadata(dir), dir={'pg_wal/archive_status', 'log', 'pg_wal', 'base/pgsql_tmp'} However, pg_ls_tmpdir is special since it handles tablespace tmpdirs, which it seems is not trivial to get from sql: +SELECT * FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(b.oid),'')||suffix, 'base/pgsql_tmp') AS dir +FROM pg_tablespace b, pg_control_system() pcs, +LATERAL format('/PG_%s_%s', left(current_setting('server_version_num'), 2), pcs.catalog_version_no) AS suffix) AS dir, +LATERAL pg_ls_dir_recurse(dir) AS a; For context, the line of reasoning that led me to this patch series was something like this: 0) Why can't I list shared tempfiles (dirs) using pg_ls_tmpdir() ? 1) Implement recursion for pg_ls_tmpdir(); 2) Eventually realize that it's silly to implement a function to recurse into one particular directory when no general feature exists; 3) Implement generic facility; > * I noticed that you did s/stat/lstat/. That's fine on Unix systems, > but it won't have any effect on Windows systems (cf bed90759f), > which means that we'll have to document a platform-specific behavioral > difference. Do we want to go there? > > Maybe this patch needs to wait on somebody fixing our lack of real lstat() on Windows. I think only the "top" patches depend on lstat (for the "type" column and recursion, to avoid loops). The initial patches are independently useful, and resolve the original issue of hiding tmpdirs. I've rebased and re-arranged the patches to reflect this. -- Justin
Attachment
- v26-0001-Document-historic-behavior-of-links-to-directori.patch
- v26-0002-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v26-0003-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v26-0004-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v26-0005-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v26-0006-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v26-0007-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v26-0008-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v26-0009-pg_ls_-pg_stat_file-to-show-file-type.patch
- v26-0010-Preserve-pg_stat_file-isdir.patch
- v26-0011-Add-recursion-option-in-pg_ls_dir_files.patch
Greetings, * Justin Pryzby (pryzby@telsasoft.com) wrote: > On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote: > > * I don't think it's okay to change the existing signatures of > > pg_ls_logdir() et al. Even if you can make an argument that it's > > not too harmful to add more output columns, replacing pg_stat_file's > > isdir output with something of a different name and datatype is most > > surely not OK --- there is no possible way that doesn't break existing > > user queries. > > > > I think possibly a more acceptable approach is to leave these functions > > alone but add documentation explaining how to get the additional info. > > You could say things along the lines of "pg_ls_waldir() is the same as > > pg_ls_dir_metadata('pg_wal') except for showing fewer columns." > > On Mon, Nov 23, 2020 at 06:06:19PM -0500, Tom Lane wrote: > > I'm mostly concerned about removing the isdir output of pg_stat_file(). > > Maybe we could compromise to the extent of keeping that, allowing it > > to be partially duplicative of a file-type-code output column. > > On Tue, Nov 24, 2020 at 11:53:22AM -0500, Stephen Frost wrote: > > I don't have any particular issue with keeping isdir as a convenience > > column. I agree it'll now be a bit duplicative but that seems alright. > > Maybe we should do what Tom said, and leave pg_ls_* unchanged, but also mark > them as deprecated in favour of: > | pg_ls_dir_metadata(dir), dir={'pg_wal/archive_status', 'log', 'pg_wal', 'base/pgsql_tmp'} Haven't really time to review the patches here in detail right now (maybe next month), but in general, I dislike marking things as deprecated. If we don't want to change them and we're happy to continue supporting them as-is (which is what 'deprecated' really means), then we can just do so- nothing stops us from that. If we don't think the current API makes sense, for whatever reason, we can just change that- there's no need for a 'deprecation period', as we already have major versions and support each major version for 5 years. I haven't particularly strong feelings one way or the other regarding these particular functions. If you asked which way I leaned, I'd say that I'd rather redefine the functions to make more sense and to be easy to use for people who would like to use them. I wouldn't object to new functions to provide that either though. I don't think there's all that much code or that it's changed often enough to be a big burden to keep both, but that's more feeling than anything based in actual research at this point. Thanks, Stephen
Attachment
On Wed, Dec 23, 2020 at 01:17:10PM -0600, Justin Pryzby wrote: > On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote: > > * I noticed that you did s/stat/lstat/. That's fine on Unix systems, > > but it won't have any effect on Windows systems (cf bed90759f), > > which means that we'll have to document a platform-specific behavioral > > difference. Do we want to go there? > > > > Maybe this patch needs to wait on somebody fixing our lack of real lstat() on Windows. > > I think only the "top" patches depend on lstat (for the "type" column and > recursion, to avoid loops). The initial patches are independently useful, and > resolve the original issue of hiding tmpdirs. I've rebased and re-arranged the > patches to reflect this. I said that, but then failed to attach the re-arranged patches. Now I also renumbered OIDs following best practice. The first handful of patches address the original issue, and I think could be "ready": $ git log --oneline origin..pg-ls-dir-new |tac ... Document historic behavior of links to directories.. ... Add tests on pg_ls_dir before changing it ... Add pg_ls_dir_metadata to list a dir with file metadata.. ... pg_ls_tmpdir to show directories and "isdir" argument.. ... pg_ls_*dir to show directories and "isdir" column.. These others are optional: ... pg_ls_logdir to ignore error if initial/top dir is missing.. ... pg_ls_*dir to return all the metadata from pg_stat_file.. ..and these maybe requires more work for lstat on windows: ... pg_stat_file and pg_ls_dir_* to use lstat().. ... pg_ls_*/pg_stat_file to show file *type*.. ... Preserve pg_stat_file() isdir.. ... Add recursion option in pg_ls_dir_files.. -- Justin
Attachment
- v27-0001-Document-historic-behavior-of-links-to-directori.patch
- v27-0002-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v27-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v27-0004-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v27-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v27-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v27-0007-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v27-0008-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v27-0009-pg_ls_-pg_stat_file-to-show-file-type.patch
- v27-0010-Preserve-pg_stat_file-isdir.patch
- v27-0011-Add-recursion-option-in-pg_ls_dir_files.patch
Breaking with tradition, the previous patch included one too *few* changes, and failed to resolve the OID collisions.
Attachment
- v28-0001-Document-historic-behavior-of-links-to-directori.patch
- v28-0002-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v28-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v28-0004-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v28-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v28-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v28-0007-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v28-0008-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v28-0009-pg_ls_-pg_stat_file-to-show-file-type.patch
- v28-0010-Preserve-pg_stat_file-isdir.patch
- v28-0011-Add-recursion-option-in-pg_ls_dir_files.patch
On Tue, Apr 06, 2021 at 11:01:31AM -0500, Justin Pryzby wrote: > On Wed, Dec 23, 2020 at 01:17:10PM -0600, Justin Pryzby wrote: > > On Mon, Nov 23, 2020 at 04:14:18PM -0500, Tom Lane wrote: > > > * I noticed that you did s/stat/lstat/. That's fine on Unix systems, > > > but it won't have any effect on Windows systems (cf bed90759f), > > > which means that we'll have to document a platform-specific behavioral > > > difference. Do we want to go there? > > > > > > Maybe this patch needs to wait on somebody fixing our lack of real lstat() on Windows. > > > > I think only the "top" patches depend on lstat (for the "type" column and > > recursion, to avoid loops). The initial patches are independently useful, and > > resolve the original issue of hiding tmpdirs. I've rebased and re-arranged the > > patches to reflect this. > > I said that, but then failed to attach the re-arranged patches. > Now I also renumbered OIDs following best practice. > > The first handful of patches address the original issue, and I think could be > "ready": > > $ git log --oneline origin..pg-ls-dir-new |tac > ... Document historic behavior of links to directories.. > ... Add tests on pg_ls_dir before changing it > ... Add pg_ls_dir_metadata to list a dir with file metadata.. > ... pg_ls_tmpdir to show directories and "isdir" argument.. > ... pg_ls_*dir to show directories and "isdir" column.. > > These others are optional: > ... pg_ls_logdir to ignore error if initial/top dir is missing.. > ... pg_ls_*dir to return all the metadata from pg_stat_file.. > > ..and these maybe requires more work for lstat on windows: > ... pg_stat_file and pg_ls_dir_* to use lstat().. > ... pg_ls_*/pg_stat_file to show file *type*.. > ... Preserve pg_stat_file() isdir.. > ... Add recursion option in pg_ls_dir_files.. @cfbot: rebased
Attachment
- v30-0001-Document-historic-behavior-of-links-to-directori.patch
- v30-0002-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v30-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v30-0004-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v30-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v30-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v30-0007-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v30-0008-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v30-0009-pg_ls_-pg_stat_file-to-show-file-type.patch
- v30-0010-Preserve-pg_stat_file-isdir.patch
- v30-0011-Add-recursion-option-in-pg_ls_dir_files.patch
In an attempt to get this patch set off the ground again, I took a look at the first 5 patches. 0001: This one is a very small documentation update for pg_stat_file to point out that isdir will be true for symbolic links to directories. Given this is true, I think the patch looks good. 0002: This patch adds some very basic testing for pg_ls_dir(). The only comment that I have for this one is that I might also check whether '..' is included in the results of the include_dot_dirs tests. The docs specifically note that include_dot_dirs indicates whether both '.' and '..' are included, so IMO we might as well verify that. 0003: This one didn't apply cleanly until I used 'git apply -3', so it likely needs a rebase. This patch introduces the pg_ls_dir_metadata() function, which appears to just be pg_ls_dir() with some additional columns for the size and modification time. My initial reaction to this one is that we should just add those columns to pg_ls_dir() to match all the other pg_ls_* functions (and not bother attempting to maintain historic behavior for things like hidden and special files). I believe there is some existing discussion on this point upthread, so perhaps there is a good reason to make a new function. In any case, I like the idea of having pg_ls_dir() use pg_ls_dir_files() internally like the rest of the pg_ls_* functions. 0004: This one changes pg_ls_tmpdir to show directories as well. I think this is a reasonable change. On it's own, the patch looks alright, although it might look different if my suggestions for 0003 were followed. 0005: This one adjusts the rest of the pg_ls_* functions to show directories. Again, I think this is a reasonable change. As noted in 0003, I think it'd be alright just to have all the pg_ls_* functions show special and hidden files as well. It's simple enough already to filter our files that start with '.' if necessary, and I'm not sure there's any strong reason for leaving out special files. If special files are included, perhaps isdir should be changed to indicate the file type instead of just whether it is a directory. (Reading ahead, it looks like this is what 0009 might do.) I haven't looked at the following patches too much, but I'm getting the idea that they might address a lot of the feedback above and that the first bunch of patches are more like staging patches that add the abilities without changing the behavior. I wonder if just going straight to the end goal behavior might simplify the patch set a bit. I can't say I feel too strongly about this, but I figure I'd at least share my thoughts. Nathan
On Mon, Nov 22, 2021 at 07:17:01PM +0000, Bossart, Nathan wrote: > In an attempt to get this patch set off the ground again, I took a > look at the first 5 patches. > I haven't looked at the following patches too much, but I'm getting > the idea that they might address a lot of the feedback above and that > the first bunch of patches are more like staging patches that add the > abilities without changing the behavior. I wonder if just going > straight to the end goal behavior might simplify the patch set a bit. > I can't say I feel too strongly about this, but I figure I'd at least > share my thoughts. Thanks for looking. The patches are separate since the early patches are the most necessary, least disputable parts, to allow the possibility of (say) chaging pg_ls_tmpdir() without changing other functions, since pg_ls_tmpdir was was original motivation behind this whole thread. In a recent thread, Bharath Rupireddy added pg_ls functions for the logical dirs, but expressed a preference not to add the metadata columns. I still think that at least "isdir" should be added to all the "ls" functions, since it's easy to SELECT the columns you want, and a bit of a pain to write the corresponding LATERAL query: 'dir' AS dir, pg_ls_dir(dir) AS ls, pg_stat_file(ls) AS st. I think it would be strange if pg_ls_tmpdir() were to return a different set of columns than the other functions, even though admins or extensions might have created dirs or other files in those directories. Tom pointed out that we don't have a working lstat() for windows, so then it seems like we're not yet ready to show file "types" (we'd show the type of the link target, which is sometimes what's wanted, but not usually what "ls" would show), nor ready to implement recurse. As before: On Tue, Apr 06, 2021 at 11:01:31AM -0500, Justin Pryzby wrote: > The first handful of patches address the original issue, and I think could be > "ready": > > $ git log --oneline origin..pg-ls-dir-new |tac > ... Document historic behavior of links to directories.. > ... Add tests on pg_ls_dir before changing it > ... Add pg_ls_dir_metadata to list a dir with file metadata.. > ... pg_ls_tmpdir to show directories and "isdir" argument.. > ... pg_ls_*dir to show directories and "isdir" column.. > > These others are optional: > ... pg_ls_logdir to ignore error if initial/top dir is missing.. > ... pg_ls_*dir to return all the metadata from pg_stat_file.. > > ..and these maybe requires more work for lstat on windows: > ... pg_stat_file and pg_ls_dir_* to use lstat().. > ... pg_ls_*/pg_stat_file to show file *type*.. > ... Preserve pg_stat_file() isdir.. > ... Add recursion option in pg_ls_dir_files.. rebased on 1922d7c6e1a74178bd2f1d5aa5a6ab921b3fcd34 -- Justin
Attachment
- v31-0001-Document-historic-behavior-of-links-to-directori.patch
- v31-0002-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v31-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v31-0004-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v31-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v31-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v31-0007-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v31-0008-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v31-0009-pg_ls_-pg_stat_file-to-show-file-type.patch
- v31-0010-Preserve-pg_stat_file-isdir.patch
- v31-0011-Add-recursion-option-in-pg_ls_dir_files.patch
Hello Justin, It seems that the v31 patch does not apply anymore: postgresql> git apply ~/v31-0001-Document-historic-behavior-of-links-to-directori.patch error: patch failed: doc/src/sgml/func.sgml:27410 error: doc/src/sgml/func.sgml: patch does not apply -- Fabien.
On Thu, Dec 23, 2021 at 09:14:18AM -0400, Fabien COELHO wrote: > It seems that the v31 patch does not apply anymore: > > postgresql> git apply ~/v31-0001-Document-historic-behavior-of-links-to-directori.patch > error: patch failed: doc/src/sgml/func.sgml:27410 > error: doc/src/sgml/func.sgml: patch does not apply Thanks for continuing to follow this patch ;) I fixed a conflict with output/tablespace from d1029bb5a et seq. I'm not sure why you got a conflict with 0001, though. I think the 2nd half of the patches are still waiting for fixes to lstat() on windows. You complained before that there were too many patches, and I can see how it might be a pain depending on your workflow. But the division is deliberate. Dealing with patchsets is easy for me: I use "mutt" and for each patch attachment, I type "|git am" (or |git am -3).
Attachment
- v32-0001-Document-historic-behavior-of-links-to-directori.patch
- v32-0002-Add-tests-on-pg_ls_dir-before-changing-it.patch
- v32-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v32-0004-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v32-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v32-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v32-0007-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v32-0008-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v32-0009-pg_ls_-pg_stat_file-to-show-file-type.patch
- v32-0010-Preserve-pg_stat_file-isdir.patch
- v32-0011-Add-recursion-option-in-pg_ls_dir_files.patch
Hello Justin, Happy new year! > I think the 2nd half of the patches are still waiting for fixes to lstat() on > windows. Not your problem? Here is my review about v32: All patches apply cleanly. # part 01 One liner doc improvement to tell that creation time is only available on windows. It is indeed not available on Linux. OK. # part 02 Add tests for various options on pg_ls_dir, and for pg_stat_file, which were not exercised before. "make check" is ok. OK. # part 03 This patch adds a new pg_ls_dir_metadata. Internally, this is an extension of pg_ls_dir_files function which is used by other pg_ls functions. Doc ok. About the code: ISTM that the "if (1) { if (2) continue; } else if(3) { if (4) continue; }" structure" may be simplified to "if (1 && 2) continue; if (3 && 4) continue;", at least if IS_DIR and IS_REG are incompatible? Otherwise, at least "else if (3 & 4) continue"? The ifdef WIN32 (which probably detects windows 64 bits…) overwrites values[3]. ISTM it could be reordered so that there is no overwrite, and simpler single assignements. #ifndef WIN32 v = ...; #else v = ... ? ... : ...; #endif New tests are added which check that the result columns are configured as required, including error cases. "make check" is ok. OK. # part 04 Add a new "isdir" column to "pg_ls_tmpdir" output. This is a small behavioral change. I'm ok with it, however I'm unsure why we would not jump directly to the "type" char column done later in the patch series. ISTM all such functions should be extended the same way for better homogeneity? That would also impact "waldir", "archive_status", "logical_*", "replslot" variants. "make check" ok. OK. # part 05 This patch applies my previous advice:-) ISTM that parts 4 and 5 should be one single patch. The test changes show that only waldir has a test. Would it be possible to add minimal tests to other variants as well? "make check" ok. I'd consider add such tests with part 02. OK. # part 06 This part extends and adds a test for pg_ls_logdir. ISTM that it should be merged with the previous patches. "make check" is ok. OK. # part 07 This part extends pg_stat_file with more date informations. ISTM that the documentation should be clear about windows vs unix/cygwin specific data provided (change/creation). The code adds a new value_from_stat function to avoid code duplication. Fine with me. All pg_ls_*dir functions are impacted. Fine with me. "make check" is ok. OK. # part 08 This part substitutes lstat to stat. Fine with me. "make check" is ok. I guess that lstat does something under windows even if the concept of link is somehow different there. Maybe the doc should say so somewhere? OK. # part 09 This part switches the added "isdir" to a "type" column. "make check" is ok. This is a definite improvement. OK. # part 10 This part adds a redundant "isdir" column. I do not see the point. "make check" is ok. NOT OK. # part 11 This part adds a recurse option. Why not. However, the true value does not seem to be tested? "make check" is ok. My opinion is unclear. Overall, ignoring part 10, this makes a definite improvement to postgres ls capabilities. I do not seen any reason why not to add those. -- Fabien.
> Here is my review about v32: I forgot to tell that doc generation for the cumulated changes also works. -- Fabien.
Hi, On Sun, Jan 02, 2022 at 02:55:04PM +0100, Fabien COELHO wrote: > > > Here is my review about v32: > > I forgot to tell that doc generation for the cumulated changes also works. Unfortunately the patchset doesn't apply anymore: http://cfbot.cputube.org/patch_36_2377.log === Applying patches on top of PostgreSQL commit ID 4483b2cf29bfe8091b721756928ccbe31c5c8e14 === === applying patch ./v32-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch [...] patching file src/test/regress/expected/misc_functions.out Hunk #1 succeeded at 274 (offset 7 lines). patching file src/test/regress/expected/tablespace.out Hunk #1 FAILED at 16. 1 out of 1 hunk FAILED -- saving rejects to file src/test/regress/expected/tablespace.out.rej Justin, could you send a rebased version?
On Sun, Jan 02, 2022 at 01:07:29PM +0100, Fabien COELHO wrote: > One liner doc improvement to tell that creation time is only available on windows. > It is indeed not available on Linux. The change is about the "isflag" flag, not creation time. Returns a record containing the file's size, last access time stamp, last modification time stamp, last file status change time stamp (Unix platforms only), file creation time stamp (Windows only), and a flag - indicating if it is a directory. + indicating if it is a directory (or a symbolic link to a directory). > # part 03 > ISTM that the "if (1) { if (2) continue; } else if(3) { if (4) continue; }" structure" > may be simplified to "if (1 && 2) continue; if (3 && 4) continue;", at least if > IS_DIR and IS_REG are incompatible? No, what you suggested is not the same; We talked about this before: https://www.postgresql.org/message-id/20200315212729.GC26184@telsasoft.com > Otherwise, at least "else if (3 & 4) continue"? I could write the *final* "else if" like that, but then it would be different from the previous case. Which would be confusing and prone to mistakes. If I wrote it like this, I think it'd just provoke suggestions from someone else to change it differently: /* Skip dirs or special files? */ if (S_ISDIR(attrib.st_mode) && !(flags & LS_DIR_SKIP_DIRS)) continue; if (!S_ISDIR(attrib.st_mode) && !S_ISREG(attrib.st_mode) && !(flags & LS_DIR_SKIP_SPECIAL) continue; ... << Why don't you use "else if" instead of "if (a){} if (!a && b){}" >> I'm going to leave it up to a committer. > The ifdef WIN32 (which probably detects windows 64 bits…) overwrites values[3]. ISTM > it could be reordered so that there is no overwrite, and simpler single assignements. > > #ifndef WIN32 > v = ...; > #else > v = ... ? ... : ...; > #endif I changed this but without using nested conditionals. > Add a new "isdir" column to "pg_ls_tmpdir" output. This is a small behavioral > change. I'm ok with it, however I'm unsure why we would not jump directly to > the "type" char column done later in the patch series. Because that depends on lstat(). > ISTM all such functions > should be extended the same way for better homogeneity? That would also impact > "waldir", "archive_status", "logical_*", "replslot" variants. "make check" ok. I agree that makes sense, however others have expressed the opposite opinion. https://www.postgresql.org/message-id/CALj2ACWtrt5EkHrY4WAZ4Cv42SidXAwpeQJU021bxaKpjmbGfA%40mail.gmail.com The original motive for the patch was that pg_ls_tmpdir doesn't show shared filesets. This fixes that essential problem without immediately dragging everything else along. I think it's more likely that a committer would merge them both. But I don't know, and it's easy to combine patches if desired. > This patch applies my previous advice:-) ISTM that parts 4 and 5 should be one > single patch. The test changes show that only waldir has a test. Would it be > possible to add minimal tests to other variants as well? "make check" ok. I have added tests, although some are duplicative. > This part extends and adds a test for pg_ls_logdir. ISTM that it should > be merged with the previous patches. "make check" is ok. It's seperate to allow writing a separate commit message since it does something unrelated to the other patches. What other patch would it would be merged with ? | v32-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch > ISTM that the documentation should be clear about windows vs unix/cygwin specific > data provided (change/creation). I preferred to refer to pg_stat_file rather than repeat it for all 7 functions currently in v15, (and future functions added for new, toplevel dirs). > # part 11 > > This part adds a recurse option. Why not. However, the true value does not > seem to be tested? "make check" is ok. WDYM the true value? It's tested like: +-- Exercise recursion +select path, filename, type from pg_ls_dir_metadata('.', true, false, true) where +path in ('base', 'base/pgsql_tmp', 'global', 'global/pg_control', 'global/pg_filenode.map', 'PG_VERSION', 'pg_multixact','pg_multixact/members', 'pg_multixact/offsets', 'pg_wal', 'pg_wal/archive_status') +-- (type='d' or path~'^(global/.*|PG_VERSION|postmaster\.opts|postmaster\.pid|pg_logical/replorigin_checkpoint)$') and filename!~'[0-9]' +order by path collate "C", filename collate "C"; + path | filename | type +------------------------+-----------------+------ + PG_VERSION | PG_VERSION | - + base | base | d + base/pgsql_tmp | pgsql_tmp | d ... -- Justin
Attachment
- v33-0001-Document-historic-behavior-of-links-to-directori.patch
- v33-0002-Add-tests-before-changing-pg_ls_.patch
- v33-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v33-0004-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v33-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v33-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v33-0007-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
- v33-0008-pg_stat_file-and-pg_ls_dir_-to-use-lstat.patch
- v33-0009-pg_ls_-pg_stat_file-to-show-file-type.patch
- v33-0010-Preserve-pg_stat_file-isdir.patch
- v33-0011-Add-recursion-option-in-pg_ls_dir_files.patch
Rebased over 9e9858389 (Michael may want to look at the tuplestore part?). Fixing a comment typo. I also changed pg_ls_dir_recurse() to handle concurrent removal of a dir, which I noticed caused an infrequent failure on CI. However I'm not including that here, since the 2nd half of the patch set seems isn't ready due to lstat() on windows.
Attachment
- v34-0001-Document-historic-behavior-of-links-to-directori.patch
- v34-0002-Add-tests-before-changing-pg_ls_.patch
- v34-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v34-0004-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v34-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v34-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v34-0007-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
Hello Justin, I hope to look at it over the week-end. -- Fabien Coelho - CRI, MINES ParisTech
Hello Justin, Review about v34, up from v32 which I reviewed in January. v34 is an updated version of v32, without the part about lstat at the end of the series. All 7 patches apply cleanly. # part 01 One liner doc improvement about the directory flag. OK. # part 02 Add tests for various options on pg_ls_dir, and for pg_stat_file, which were not exercised before. "make check" is ok. OK. # part 03 This patch adds a new pg_ls_dir_metadata. Internally, this is an extension of pg_ls_dir_files function which is used by other pg_ls functions. Doc ok. New tests are added which check that the result columns are configured as required, including error cases. "make check" is ok. OK. # part 04 Add a new "isdir" column to "pg_ls_tmpdir" output. This is a small behavioral change. I'm ok with that, however I must say that I'm still unsure why we would not jump directly to a "type" char column. What is wrong with outputing 'd' or '-' instead of true or false? This way, the interface needs not change if "lstat" is used later? ISTM that the interface issue should be somehow independent of the implementation issue, and we should choose directly the right/best interface? Independently, the documentation may be clearer about what happens to "isdir" when the file is a link? It may say that the behavior may change in the future? About homogeneity, I note that some people may be against adding "isdir" to other ls functions. I must say that I cannot see a good argument not to do it, and that I hate dealing with systems which are not homogeneous because it creates surprises and some loss of time. "make check" ok. OK. # part 05 Add isdir to other ls functions. Doc is updated. Same as above, I'd prefer a char instead of a bool, as it is more extendable and future-proof. OK. # part 06 This part extends and adds a test for pg_ls_logdir. "make check" is ok. OK. # part 07 This part extends pg_stat_file with more date informations. "make check" is ok. OK. # doc Overall doc generation is OK. -- Fabien.
On Wed, Mar 09, 2022 at 10:50:45AM -0600, Justin Pryzby wrote: > Rebased over 9e9858389 (Michael may want to look at the tuplestore part?). Are you referring to the contents of 0003 here that changes the semantics of pg_ls_dir_files() regarding its setup call? -- Michael
Attachment
On Sun, Mar 13, 2022 at 09:45:35AM +0900, Michael Paquier wrote: > On Wed, Mar 09, 2022 at 10:50:45AM -0600, Justin Pryzby wrote: > > Rebased over 9e9858389 (Michael may want to look at the tuplestore part?). > > Are you referring to the contents of 0003 here that changes the > semantics of pg_ls_dir_files() regarding its setup call? Yes, as it has this: - SetSingleFuncCall(fcinfo, SRF_SINGLE_USE_EXPECTED); ... - SetSingleFuncCall(fcinfo, 0); ... + if (flags & LS_DIR_METADATA) + SetSingleFuncCall(fcinfo, 0); + else + SetSingleFuncCall(fcinfo, SRF_SINGLE_USE_EXPECTED); -- Justin
On Wed, Mar 09, 2022 at 10:50:45AM -0600, Justin Pryzby wrote: > I also changed pg_ls_dir_recurse() to handle concurrent removal of a dir, which > I noticed caused an infrequent failure on CI. However I'm not including that > here, since the 2nd half of the patch set seems isn't ready due to lstat() on > windows. lstat() has been a subject of many issues over the years with our internal emulation and issues related to its concurrency, but we use it in various areas of the in-core code, so that does not sound like an issue to me. It depends on what you want to do with it in genfile.c and which data you'd expect, in addition to the detection of junction points for WIN32, I guess. v34 has no references to pg_ls_dir_recurse(), but that's a WITH RECURSIVE, so we would not really need it, do we? @@ -27618,7 +27618,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8'); Returns a record containing the file's size, last access time stamp, last modification time stamp, last file status change time stamp (Unix platforms only), file creation time stamp (Windows only), and a flag - indicating if it is a directory. + indicating if it is a directory (or a symbolic link to a directory). </para> <para> This function is restricted to superusers by default, but other users This is from 0001, and this addition in the documentation is not completely right. As pg_stat_file() uses stat() to get back the information of a file/directory, we'd just follow the link if specifying one in the input argument. We could say instead, if we were to improve the docs, that "If filename is a link, this function returns information about the file or directory the link refers to." I would put that as a different paragraph. +select * from pg_ls_archive_statusdir() limit 0; + name | size | modification +------+------+-------------- +(0 rows) FWIW, this one is fine as of ValidateXLOGDirectoryStructure() that would make sure archive_status exists before any connection is attempted to the cluster. > +select * from pg_ls_logdir() limit 0; This test on pg_ls_logdir() would fail if running installcheck on a cluster that has logging_collector disabled. So this cannot be included. +select * from pg_ls_logicalmapdir() limit 0; +select * from pg_ls_logicalsnapdir() limit 0; +select * from pg_ls_replslotdir('') limit 0; +select * from pg_ls_tmpdir() limit 0; +select * from pg_ls_waldir() limit 0; +select * from pg_stat_file('.') limit 0; The rest of the patch set should be stable AFAIK, there are various steps when running a checkpoint that makes sure that any of these exist, without caring about the value of wal_level. + <para> + For each file in the specified directory, list the file and its + metadata. + Restricted to superusers by default, but other users can be granted + EXECUTE to run the function. + </para></entry> What is metadata in this case? (I have read the code and know what you mean, but folks only looking at the documentation may be puzzled by that). It could be cleaner to use the same tupledesc for any callers of this function, and return NULL in cases these are not adapted. + /* check the optional arguments */ + if (PG_NARGS() == 3) + { + if (!PG_ARGISNULL(1)) + { + if (PG_GETARG_BOOL(1)) + flags |= LS_DIR_MISSING_OK; + else + flags &= ~LS_DIR_MISSING_OK; + } + + if (!PG_ARGISNULL(2)) + { + if (PG_GETARG_BOOL(2)) + flags &= ~LS_DIR_SKIP_DOT_DIRS; + else + flags |= LS_DIR_SKIP_DOT_DIRS; + } + } The subtle different between the false and true code paths of those arguments 1 and 2 had better be explained? The bit-wise operations are slightly different in both cases, so it is not clear which part does what, and what's the purpose of this logic. - SetSingleFuncCall(fcinfo, 0); + /* isdir depends on metadata */ + Assert(!(flags&LS_DIR_ISDIR) || (flags&LS_DIR_METADATA)); + /* Unreasonable to show isdir and skip dirs */ + Assert(!(flags&LS_DIR_ISDIR) || !(flags&LS_DIR_SKIP_DIRS)); Incorrect code format. Spaces required. +-- This tests the missing_ok parameter, which causes pg_ls_tmpdir to succeed even if the tmpdir doesn't exist yet +-- The name='' condition is never true, so the function runs to completion but returns zero rows. +-- The query is written to ERROR if the tablespace doesn't exist, rather than silently failing to call pg_ls_tmpdir() +SELECT c.* FROM (SELECT oid FROM pg_tablespace b WHERE b.spcname='regress_tblspace' UNION SELECT 0 ORDER BY 1 DESC LIMIT 1) AS b , pg_ls_tmpdir(oid) AS c WHERE c.name='Does not exist'; So, here, we have a test that may not actually test what we want to test. Hmm. I am not convinced that we need a new set of SQL functions as presented in 0003 (addition of meta-data in pg_ls_dir()), and extensions of 0004 (do the same but for pg_ls_tmpdir) and 0005 (same for the other pg_ls* functions). The changes of pg_ls_dir_files() make in my opinion the code harder to follow as the resulting tuple size depends on the type of flag used, and you can already retrieve the rest of the information with a join, probably LATERAL, on pg_stat_file(), no? The same can be said about 0007, actually. The addition of isdir for any of the paths related to pg_logical/ and the replication slots has also a limited interest, because we know already those contents, and these are not directories as far as I recall. 0006 invokes a behavior change for pg_ls_logdir(), where it makes sense to me to fail if the directory does not exist, so I am not in favor of that. In the whole set, improving the docs as of 0001 makes sense, but the change is incomplete. Most of 0002 also makes sense and should be stable enough. I am less enthusiastic about any of the other changes proposed and what we can gain from these parts. -- Michael
Attachment
On Mon, Mar 14, 2022 at 01:53:54PM +0900, Michael Paquier wrote: > +select * from pg_ls_logicalmapdir() limit 0; > +select * from pg_ls_logicalsnapdir() limit 0; > +select * from pg_ls_replslotdir('') limit 0; > +select * from pg_ls_tmpdir() limit 0; > +select * from pg_ls_waldir() limit 0; > +select * from pg_stat_file('.') limit 0; > > The rest of the patch set should be stable AFAIK, there are various > steps when running a checkpoint that makes sure that any of these > exist, without caring about the value of wal_level. I was contemplating at 0002 this morning, so see which parts would be independently useful, and got reminded that we already check the execution of all those functions in other regression tests, like test_decoding for the replication slot ones and misc_functions.sql for the more critical ones, so those extra queries would be just interesting to check the shape of their SRFs, which is related to the other patches of the set and limited based on my arguments from yesterday. There was one thing that stood out though: there was nothing for the two options of pg_ls_dir(), called missing_ok and include_dot_dirs. missing_ok is embedded in one query of pg_rewind, but this is a counter-measure against concurrent file removals so we cannot rely on pg_rewind to check that. And the second option was not run at all. I have extracted both test cases after rewriting them a bit, and applied that separately. -- Michael
Attachment
On Mon, Mar 14, 2022 at 01:53:54PM +0900, Michael Paquier wrote: > On Wed, Mar 09, 2022 at 10:50:45AM -0600, Justin Pryzby wrote: > > I also changed pg_ls_dir_recurse() to handle concurrent removal of a dir, which > > I noticed caused an infrequent failure on CI. However I'm not including that > > here, since the 2nd half of the patch set seems isn't ready due to lstat() on > > windows. > > lstat() has been a subject of many issues over the years with our > internal emulation and issues related to its concurrency, but we use > it in various areas of the in-core code, so that does not sound like > an issue to me. It depends on what you want to do with it in > genfile.c and which data you'd expect, in addition to the detection of > junction points for WIN32, I guess. To avoid cycles, a recursion function would need to know whether to recurse into a directory or to output that something is isdir=false or type=link, and avoid recursing into it. > pg_ls_dir_recurse(), but that's a WITH RECURSIVE, so we would not > really need it, do we? Tom disliked it when I had written it as a recursive CTE, so I rewrote it in C. 129225.1606166058@sss.pgh.pa.us > Hmm. I am not convinced that we need a new set of SQL functions as > presented in 0003 (addition of meta-data in pg_ls_dir()), and > extensions of 0004 (do the same but for pg_ls_tmpdir) and 0005 (same > for the other pg_ls* functions). The changes of pg_ls_dir_files() > make in my opinion the code harder to follow as the resulting tuple > size depends on the type of flag used, and you can already retrieve > the rest of the information with a join, probably LATERAL, on > pg_stat_file(), no? The same can be said about 0007, actually. Yes, one can get the same information with a lateral join (as I said 2 years ago). But it's more helpful to provide a function, rather than leave that to people to figure out - possibly incorrectly, or badly, like by parsing the output of COPY FROM PROGRAM 'ls -l'. The query to handle tablespaces is particularly obscure: 20200310183037.GA29065@telsasoft.com 20201223191710.GR30237@telsasoft.com One could argue that most of the pg_ls_* functions aren't needed (including 1922d7c6e), since the same things are possible with pg_ls_dir() and pg_stat_file(). |1922d7c6e Add SQL functions to monitor the directory contents of replication slots The original, minimal goal of this patch was to show shared tempdirs in pg_ls_tmpfile() - rather than hiding them misleadingly as currently happens. 20200310183037.GA29065@telsasoft.com 20200313131232.GO29065@telsasoft.com I added the metadata function 2 years ago since it's silly to show metadata for tmpdir but not other, arbitrary directories. 20200310183037.GA29065@telsasoft.com 20200313131232.GO29065@telsasoft.com 20201223191710.GR30237@telsasoft.com > The addition of isdir for any of the paths related to pg_logical/ and > the replication slots has also a limited interest, because we know > already those contents, and these are not directories as far as I > recall. Except when we don't, since extensions can do things that core doesn't, as Fabien pointed out. alpine.DEB.2.21.2001160927390.30419@pseudo > In the whole set, improving the docs as of 0001 makes sense, but the > change is incomplete. Most of 0002 also makes sense and should be > stable enough. I am less enthusiastic about any of the other changes > proposed and what we can gain from these parts. It is frustrating to hear this feedback now, after the patch has gone through multiple rewrites over 2 years - based on other positive feedback and review. I went to the effort to ask, numerous times, whether to write the patch and how its interfaces should look. Now, I'm hearing that not only the implementation but its goals are wrong. What should I have done to avoid that ? 20200503024215.GJ28974@telsasoft.com 20191227195918.GF12890@telsasoft.com 20200116003924.GJ26045@telsasoft.com 20200908195126.GB18552@telsasoft.com
On Mon, Mar 14, 2022 at 09:37:25PM -0500, Justin Pryzby wrote: > One could argue that most of the pg_ls_* functions aren't needed (including > 1922d7c6e), since the same things are possible with pg_ls_dir() and > pg_stat_file(). > |1922d7c6e Add SQL functions to monitor the directory contents of replication slots This main argument behind this one is monitoring, as the execution to those functions can be granted at a granular level depending on the roles doing the disk space lookups. -- Michael
Attachment
Hi, On 2022-03-09 10:50:45 -0600, Justin Pryzby wrote: > Rebased over 9e9858389 (Michael may want to look at the tuplestore part?). Doesn't apply cleanly anymore: http://cfbot.cputube.org/patch_37_2377.log Marked as waiting-on-author. Greetings, Andres Freund
On Mon, Mar 21, 2022 at 06:28:28PM -0700, Andres Freund wrote: > Doesn't apply cleanly anymore: http://cfbot.cputube.org/patch_37_2377.log > > Marked as waiting-on-author. FWIW, per my review the bit of the patch set that I found the most relevant is the addition of a note in the docs of pg_stat_file() about the case where "filename" is a link, because the code internally uses stat(). The function name makes that obvious, but that's not commonly known, I guess. Please see the attached, that I would be fine to apply. -- Michael
Attachment
On Wed, Mar 23, 2022 at 03:17:35PM +0900, Michael Paquier wrote: > FWIW, per my review the bit of the patch set that I found the most > relevant is the addition of a note in the docs of pg_stat_file() about > the case where "filename" is a link, because the code internally uses > stat(). The function name makes that obvious, but that's not > commonly known, I guess. Please see the attached, that I would be > fine to apply. Hmm. I am having second thoughts on this one, as on Windows we rely on GetFileInformationByHandle() for the emulation of stat() in win32stat.c, and it looks like this returns some information about the junction point and not the directory or file this is pointing to, it seems. At the end, it looks better to keep things simple here, so let's drop it. -- Michael
Attachment
On Sat, Mar 26, 2022 at 08:23:54PM +0900, Michael Paquier wrote: > On Wed, Mar 23, 2022 at 03:17:35PM +0900, Michael Paquier wrote: > > FWIW, per my review the bit of the patch set that I found the most > > relevant is the addition of a note in the docs of pg_stat_file() about > > the case where "filename" is a link, because the code internally uses > > stat(). The function name makes that obvious, but that's not > > commonly known, I guess. Please see the attached, that I would be > > fine to apply. > > Hmm. I am having second thoughts on this one, as on Windows we rely > on GetFileInformationByHandle() for the emulation of stat() in > win32stat.c, and it looks like this returns some information about the > junction point and not the directory or file this is pointing to, it > seems. Where did you find that ? What metadata does it return about the junction point ? We only care about a handful of fields.
On Mon, Mar 14, 2022 at 09:37:25PM -0500, Justin Pryzby wrote: > The original, minimal goal of this patch was to show shared tempdirs in > pg_ls_tmpfile() - rather than hiding them misleadingly as currently happens. > 20200310183037.GA29065@telsasoft.com > 20200313131232.GO29065@telsasoft.com > > I added the metadata function 2 years ago since it's silly to show metadata for > tmpdir but not other, arbitrary directories. > 20200310183037.GA29065@telsasoft.com > 20200313131232.GO29065@telsasoft.com > 20201223191710.GR30237@telsasoft.com I renamed the CF entry to make even more clear the original motive for the patches (I'm not maintaining the patch to add the metadata function just to avoid writing a lateral join). > > In the whole set, improving the docs as of 0001 makes sense, but the > > change is incomplete. Most of 0002 also makes sense and should be > > stable enough. I am less enthusiastic about any of the other changes > > proposed and what we can gain from these parts. > > It is frustrating to hear this feedback now, after the patch has gone through > multiple rewrites over 2 years - based on other positive feedback and review. > I went to the effort to ask, numerous times, whether to write the patch and how > its interfaces should look. Now, I'm hearing that not only the implementation > but its goals are wrong. What should I have done to avoid that ? > > 20200503024215.GJ28974@telsasoft.com > 20191227195918.GF12890@telsasoft.com > 20200116003924.GJ26045@telsasoft.com > 20200908195126.GB18552@telsasoft.com Michael said he's not enthusiastic about the patch. But I haven't heard a suggestion about how else to address the issue that pg_ls_tmpdir() hides shared filesets. On Mon, Mar 28, 2022 at 09:13:52PM -0500, Justin Pryzby wrote: > On Sat, Mar 26, 2022 at 08:23:54PM +0900, Michael Paquier wrote: > > On Wed, Mar 23, 2022 at 03:17:35PM +0900, Michael Paquier wrote: > > > FWIW, per my review the bit of the patch set that I found the most > > > relevant is the addition of a note in the docs of pg_stat_file() about > > > the case where "filename" is a link, because the code internally uses > > > stat(). The function name makes that obvious, but that's not > > > commonly known, I guess. Please see the attached, that I would be > > > fine to apply. > > > > Hmm. I am having second thoughts on this one, as on Windows we rely > > on GetFileInformationByHandle() for the emulation of stat() in > > win32stat.c, and it looks like this returns some information about the > > junction point and not the directory or file this is pointing to, it > > seems. > > Where did you find that ? What metadata does it return about the junction > point ? We only care about a handful of fields. Pending your feedback, I didn't modify this beyond your original suggestion - which seemed like a good one. This also adds some comments you requested and fixes your coding style complaints, and causes cfbot to test my proposed patch rather than your doc patch. -- Justin
Attachment
- v35-0001-Document-historic-behavior-of-links-to-directori.patch
- v35-0002-Add-tests-before-changing-pg_ls_.patch
- v35-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v35-0004-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v35-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v35-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v35-0007-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
The cfbot is failing testing this patch. It seems... unlikely given the nature of the patch modifying an admin function that doesn't even modify the database that it should be breaking a streaming test. Perhaps the streaming test is using this function in the testing scaffolding? [03:19:29.564] # Failed test 'regression tests pass' [03:19:29.564] # at t/027_stream_regress.pl line 76. [03:19:29.564] # got: '256' [03:19:29.564] # expected: '0' [03:19:29.564] # Looks like you failed 1 test of 5. [03:19:29.565] [03:19:27] t/027_stream_regress.pl .............. [03:19:29.565] Dubious, test returned 1 (wstat 256, 0x100) [03:19:29.565] Failed 1/5 subtests
This thread has been going for 2.5 years, so here's a(nother) recap. This omits the patches for recursion, since they're optional and evidently a distraction from the main patches. On Fri, Dec 27, 2019 at 11:02:20AM -0600, Justin Pryzby wrote: > The goal is to somehow show tmpfiles (or at least dirs) used by parallel > workers. On Thu, Jan 16, 2020 at 08:38:46AM -0600, Justin Pryzby wrote: > I think if someone wants the full generality, they can do this: > > postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 'base/pgsql_tmp'p)p, pg_ls_dir(p)name, pg_stat_file(p||'/'||name)s; > name | size | modification | isdir > ------+------+------------------------+------- > .foo | 4096 | 2020-01-16 08:57:04-05 | t > > In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to > SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces: > WITH x AS (SELECT format('/PG_%s_%s', split_part(current_setting('server_version'), '.', 1), catalog_version_no) suffixFROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix,'base') a FROM pg_tablespace,x)a) SELECT a, pg_ls_dir(a||'/pgsql_tmp')FROM y WHERE d='pgsql_tmp'; On Tue, Mar 10, 2020 at 01:30:37PM -0500, Justin Pryzby wrote: > I took a step back, and I wondered whether we should add a generic function for > listing a dir with metadata, possibly instead of changing the existing > functions. Then one could do pg_ls_dir_metadata('pg_wal',false,false); > > Since pg8.1, we have pg_ls_dir() to show a list of files. Since pg10, we've > had pg_ls_logdir and pg_ls_waldir, which show not only file names but also > (some) metadata (size, mtime). And since pg12, we've had pg_ls_tmpfile and > pg_ls_archive_statusdir, which also show metadata. > > ...but there's no a function which lists the metadata of an directory other > than tmp, wal, log. > > One can do this: > |SELECT b.*, c.* FROM (SELECT 'base' a)a, LATERAL (SELECT a||'/'||pg_ls_dir(a.a)b)b, pg_stat_file(b)c; > ..but that's not as helpful as allowing: > |SELECT * FROM pg_ls_dir_metadata('.',true,true); > > There's also no function which recurses into an arbitrary directory, so it > seems shortsighted to provide a function to recursively list a tmpdir. > > Also, since pg_ls_dir_metadata indicates whether the path is a dir, one can > write a SQL function to show the dir recursively. It'd be trivial to plug in > wal/log/tmp (it seems like tmpdirs of other tablespace's are not entirely > trivial). > |SELECT * FROM pg_ls_dir_recurse('base/pgsql_tmp'); > It's pretty unfortunate if a function called > pg_ls_tmpdir hides shared filesets, so maybe it really is best to change that > (it's new in v12). On Fri, Mar 13, 2020 at 08:12:32AM -0500, Justin Pryzby wrote: > The merge conflict presents another opportunity to solicit comments on the new > approach. Rather than making "recurse into tmpdir" the end goal: > > - add a function to show metadata of an arbitrary dir; > - add isdir arguments to pg_ls_* functions (including pg_ls_tmpdir but not > pg_ls_dir). > - maybe add pg_ls_dir_recurse, which satisfies the original need; > - retire pg_ls_dir (does this work with tuplestore?) > - profit > > The alternative seems to be to go back to Alvaro's earlier proposal: > - not only add "isdir", but also recurse; > > I think I would insist on adding a general function to recurse into any dir. > And *optionally* change ps_ls_* to recurse (either by accepting an argument, or > by making that a separate patch to debate). On Tue, Mar 31, 2020 at 03:08:12PM -0500, Justin Pryzby wrote: > The patch intends to fix the issue of "failing to show failed filesets" > (because dirs are skipped) while also generalizing existing functions (to show > directories and "isdir" column) and providing some more flexible ones (to list > file and metadata of a dir, which is currently possible [only] for "special" > directories, or by recursively calling pg_stat_file). On Wed, Dec 23, 2020 at 01:17:10PM -0600, Justin Pryzby wrote: > However, pg_ls_tmpdir is special since it handles tablespace tmpdirs, which it > seems is not trivial to get from sql: > > +SELECT * FROM (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(b.oid),'')||suffix, 'base/pgsql_tmp') AS dir > +FROM pg_tablespace b, pg_control_system() pcs, > +LATERAL format('/PG_%s_%s', left(current_setting('server_version_num'), 2), pcs.catalog_version_no) AS suffix) AS dir, > +LATERAL pg_ls_dir_recurse(dir) AS a; > > For context, the line of reasoning that led me to this patch series was > something like this: > > 0) Why can't I list shared tempfiles (dirs) using pg_ls_tmpdir() ? > 1) Implement recursion for pg_ls_tmpdir(); > 2) Eventually realize that it's silly to implement a function to recurse into > one particular directory when no general feature exists; > 3) Implement generic facility; On Tue, Apr 06, 2021 at 11:01:31AM -0500, Justin Pryzby wrote: > The first handful of patches address the original issue, and I think could be > "ready": > > $ git log --oneline origin..pg-ls-dir-new |tac > ... Document historic behavior of links to directories.. > ... Add tests on pg_ls_dir before changing it > ... Add pg_ls_dir_metadata to list a dir with file metadata.. > ... pg_ls_tmpdir to show directories and "isdir" argument.. > ... pg_ls_*dir to show directories and "isdir" column.. > > These others are optional: > ... pg_ls_logdir to ignore error if initial/top dir is missing.. > ... pg_ls_*dir to return all the metadata from pg_stat_file.. > > ..and these maybe requires more work for lstat on windows: > ... pg_stat_file and pg_ls_dir_* to use lstat().. > ... pg_ls_*/pg_stat_file to show file *type*.. > ... Preserve pg_stat_file() isdir.. > ... Add recursion option in pg_ls_dir_files.. On Tue, Jan 25, 2022 at 01:27:55PM -0600, Justin Pryzby wrote: > The original motive for the patch was that pg_ls_tmpdir doesn't show shared > filesets.
Attachment
- v36-0001-Document-historic-behavior-of-links-to-directori.patch
- v36-0002-Add-tests-before-changing-pg_ls_.patch
- v36-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v36-0004-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v36-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v36-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v36-0007-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
On Fri, Dec 13, 2019 at 03:03:47PM +1300, Thomas Munro wrote: > > Actually, I tried using pg_ls_tmpdir(), but it unconditionally masks > > non-regular files and thus shared filesets. Maybe that's worth > > discussion on a new thread ? > > > > src/backend/utils/adt/genfile.c > > /* Ignore anything but regular files */ > > if (!S_ISREG(attrib.st_mode)) > > continue; > > +1, that's worth fixing. @cfbot: rebased on eddc128be.
Attachment
- v37-0001-Document-historic-behavior-of-links-to-directori.patch
- v37-0002-Add-tests-before-changing-pg_ls_.patch
- v37-0003-Add-pg_ls_dir_metadata-to-list-a-dir-with-file-m.patch
- v37-0004-pg_ls_tmpdir-to-show-directories-and-isdir-argum.patch
- v37-0005-pg_ls_-dir-to-show-directories-and-isdir-column.patch
- v37-0006-pg_ls_logdir-to-ignore-error-if-initial-top-dir-.patch
- v37-0007-pg_ls_-dir-to-return-all-the-metadata-from-pg_st.patch
2024-01 Commitfest. Hi, this patch was marked in CF as "Needs Review", but there has been no activity on this thread for 14+ months. Since there seems not much interest, I have changed the status to "Returned with Feedback" [1]. Feel free to propose a stronger use case for the patch and add an entry for the same. ====== [1] https://commitfest.postgresql.org/46/2377/ Kind Regards, Peter Smith.