Thread: Inconvenience of pg_read_binary_file()
If I want to read a file that I'm not sure of the existence but I want to read the whole file if exists, I would call pg_read_binary_file('path', 0, -1, true) but unfortunately this doesn't work. Does it make sense to change the function so as to accept the parameter specification above? Or the arguments could be ('path', null, null, true) but (0,-1) is simpler considering the characteristics of the function. (We could also rearrange the the parameter order as "filename, missing_ok, offset, length" but that is simply confusing..) If it is, pg_read_file() is worth receive the same modification and I'll post the version containing doc part. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Tue, Jun 07, 2022 at 04:05:20PM +0900, Kyotaro Horiguchi wrote: > If I want to read a file that I'm not sure of the existence but I want > to read the whole file if exists, I would call > pg_read_binary_file('path', 0, -1, true) but unfortunately this > doesn't work. Yeah, the "normal" cases that I have seen in the past just used an extra call to pg_stat_file() to retrieve the size of the file before reading it, but arguably it does not help if the file gets extended between the stat() call and the read() call (we don't need to care about this case with pg_rewind that has been the reason why the missing_ok argument was introduced first, for one, as file extensions don't matter as we'd replay from the LSN point where the rewind began, adding the new blocks at replay). There is also an argument for supporting negative values rather than just -1. For example -2 could mean to read all the file except the last byte. Or you could have an extra flavor, as of pg_read_file(text, bool) to read the whole by default. Supporting just -1 as special value for the amount of data to read would be confusing IMO. So I would tend to choose for a set of arguments based on (text, bool). -- Michael
Attachment
At Tue, 7 Jun 2022 16:33:53 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Tue, Jun 07, 2022 at 04:05:20PM +0900, Kyotaro Horiguchi wrote: > > If I want to read a file that I'm not sure of the existence but I want > > to read the whole file if exists, I would call > > pg_read_binary_file('path', 0, -1, true) but unfortunately this > > doesn't work. > > Yeah, the "normal" cases that I have seen in the past just used an > extra call to pg_stat_file() to retrieve the size of the file before > reading it, but arguably it does not help if the file gets extended > between the stat() call and the read() call (we don't need to care > about this case with pg_rewind that has been the reason why the > missing_ok argument was introduced first, for one, as file extensions > don't matter as we'd replay from the LSN point where the rewind > began, adding the new blocks at replay). Sure. > There is also an argument for supporting negative values rather than > just -1. For example -2 could mean to read all the file except the > last byte. Or you could have an extra flavor, as of > pg_read_file(text, bool) to read the whole by default. Supporting > just -1 as special value for the amount of data to read would be > confusing IMO. So I would tend to choose for a set of arguments based > on (text, bool). I'm not sure about the negative length smaller than -1, since I don't find an apprpriate offset that represents (last byte + 1). pg_read_file(text, bool) makes sense to me, but it doesn't seem like to be able to share C function with other variations. pg_read_binary_file() need to accept some out-of-range value for offset or length to signal that offset and length are not specified. In the attached pg_read(_binary)_file_all() is modifiedf so that they have the different body from pg_read(_binary)_file(). (function comments needs to be edited and docs are needed) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
At Tue, 07 Jun 2022 17:29:31 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > pg_read_file(text, bool) makes sense to me, but it doesn't seem like > to be able to share C function with other variations. > pg_read_binary_file() need to accept some out-of-range value for > offset or length to signal that offset and length are not specified. In this version all the polypmorphic variations share the same body function. I tempted to add tail-reading feature but it would be another feature. > (function comments needs to be edited and docs are needed) - Simplified the implementation (by complexifying argument handling..). - REVOKEd EXECUTE from the new functions. - Edited the signature of the two functions. > - pg_read_file ( filename text [, offset bigint, length bigint [, missing_ok boolean ]] ) → text > + pg_read_file ( filename text [, offset bigint, length bigint ] [, missing_ok boolean ] ) → text And registered this to the next CF. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > - Simplified the implementation (by complexifying argument handling..). > - REVOKEd EXECUTE from the new functions. > - Edited the signature of the two functions. >> - pg_read_file ( filename text [, offset bigint, length bigint [, missing_ok boolean ]] ) $B"*(B text >> + pg_read_file ( filename text [, offset bigint, length bigint ] [, missing_ok boolean ] ) $B"*(B text I'm okay with allowing this variant of the functions. Since there's no implicit cast between bigint and bool, plus the fact that you can't give just offset without length, there shouldn't be much risk of confusion as to which variant to invoke. I don't really like the implementation style though. That mess of PG_NARGS tests is illegible code already and this makes it worse. I think it'd be way cleaner to have all the PG_GETARG calls in the bottom SQL-callable functions (which are already one-per-signature) and then pass them on to a common function that has an ordinary C call signature, along the lines of static Datum pg_read_file_common(text *filename_t, int64 seek_offset, int64 bytes_to_read, bool read_to_eof, bool missing_ok) { if (read_to_eof) bytes_to_read = -1; // or just Assert that it's -1 ? else if (bytes_to_read < 0) ereport(...); ... } Datum pg_read_file_off_len(PG_FUNCTION_ARGS) { text *filename_t = PG_GETARG_TEXT_PP(0); int64 seek_offset = PG_GETARG_INT64(1); int64 bytes_to_read = PG_GETARG_INT64(2); return pg_read_file_common(filename_t, seek_offset, bytes_to_read, false, false); } regards, tom lane
Thanks for taking a look! At Thu, 28 Jul 2022 16:22:17 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > > - Simplified the implementation (by complexifying argument handling..). > > - REVOKEd EXECUTE from the new functions. > > - Edited the signature of the two functions. > > >> - pg_read_file ( filename text [, offset bigint, length bigint [, missing_ok boolean ]] ) → text > >> + pg_read_file ( filename text [, offset bigint, length bigint ] [, missing_ok boolean ] ) → text > > I'm okay with allowing this variant of the functions. Since there's > no implicit cast between bigint and bool, plus the fact that you > can't give just offset without length, there shouldn't be much risk > of confusion as to which variant to invoke. Grad to hear that. > I don't really like the implementation style though. That mess of > PG_NARGS tests is illegible code already and this makes it worse. Ah..., I have to admit that I faintly felt that feeling while on it... > I think it'd be way cleaner to have all the PG_GETARG calls in the > bottom SQL-callable functions (which are already one-per-signature) > and then pass them on to a common function that has an ordinary C > call signature, along the lines of > > static Datum > pg_read_file_common(text *filename_t, > int64 seek_offset, int64 bytes_to_read, > bool read_to_eof, bool missing_ok) > { > if (read_to_eof) > bytes_to_read = -1; // or just Assert that it's -1 ? I prefer assertion since that parameter cannot be passed by users. > else if (bytes_to_read < 0) > ereport(...); > ... > } This function cannot return NULL directly. Without the ability to return NULL, it is pointless for the function to return Datum. In the attached the function returns text*. > Datum > pg_read_file_off_len(PG_FUNCTION_ARGS) > { > text *filename_t = PG_GETARG_TEXT_PP(0); > int64 seek_offset = PG_GETARG_INT64(1); > int64 bytes_to_read = PG_GETARG_INT64(2); > > return pg_read_file_common(filename_t, seek_offset, bytes_to_read, > false, false); As the result this function need to return NULL or TEXT_P according to the returned value from pg_read_file_common. + if (!ret) + PG_RETURN_NULL(); + + PG_RETURN_TEXT_P(ret); > } # I'm tempted to call read_text_file() directly from each SQL functions.. Please find the attached. I added some regression tests for both pg_read_file() and pg_read_binary_file(). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > Please find the attached. I added some regression tests for both > pg_read_file() and pg_read_binary_file(). Yeah, I definitely find this way cleaner even if it's a bit more verbose. I think that the PG_RETURN_NULL code paths are not reachable in the wrappers that don't have missing_ok. I concur with your decision to write them all the same, though. Pushed after some fooling with the docs and test cases. (Notably, I do not think we can assume that pg_hba.conf exists in $PGDATA; some installations keep it elsewhere. I used postgresql.auto.conf instead.) regards, tom lane
On Fri, Jul 29, 2022 at 03:44:25PM -0400, Tom Lane wrote: > Pushed after some fooling with the docs and test cases. (Notably, > I do not think we can assume that pg_hba.conf exists in $PGDATA; some > installations keep it elsewhere. I used postgresql.auto.conf instead.) Are you sure that this last part is a good idea? We don't force the creation of postgresql.auto.conf when starting a server, so this impacts the portability of the tests with installcheck if one decides to remove it from the data folder, and it sounds plausible to me that some distributions do exactly that.. I guess that you could rely on config_file or hba_file instead. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Fri, Jul 29, 2022 at 03:44:25PM -0400, Tom Lane wrote: >> Pushed after some fooling with the docs and test cases. (Notably, >> I do not think we can assume that pg_hba.conf exists in $PGDATA; some >> installations keep it elsewhere. I used postgresql.auto.conf instead.) > Are you sure that this last part is a good idea? We don't force the > creation of postgresql.auto.conf when starting a server, so this > impacts the portability of the tests with installcheck if one decides > to remove it from the data folder, and it sounds plausible to me that > some distributions do exactly that.. Hm. I considered reading PG_VERSION instead, or postmaster.pid. PG_VERSION would be a very boring test case, but it should certainly be present in $PGDATA. regards, tom lane
On Fri, Jul 29, 2022 at 11:35:36PM -0400, Tom Lane wrote: > Hm. I considered reading PG_VERSION instead, or postmaster.pid. > PG_VERSION would be a very boring test case, but it should certainly > be present in $PGDATA. PG_VERSION would be simpler. Looking at postmaster.pid would require a lookup at external_pid_file, and as it is not set by default you would need to add some extra logic in the tests where external_pid_file = NULL <=> PGDATA/postmaster.pid. -- Michael
Attachment
On 2022-Jul-30, Michael Paquier wrote: > PG_VERSION would be simpler. Looking at postmaster.pid would require > a lookup at external_pid_file, and as it is not set by default you > would need to add some extra logic in the tests where > external_pid_file = NULL <=> PGDATA/postmaster.pid. Hmm, no? as I recall external_pid_file is an *additional* PID file; it doesn't supplant postmaster.pid. postmaster.opts is also an option. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "E pur si muove" (Galileo Galilei)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2022-Jul-30, Michael Paquier wrote: >> PG_VERSION would be simpler. Looking at postmaster.pid would require >> a lookup at external_pid_file, and as it is not set by default you >> would need to add some extra logic in the tests where >> external_pid_file = NULL <=> PGDATA/postmaster.pid. > Hmm, no? as I recall external_pid_file is an *additional* PID file; it > doesn't supplant postmaster.pid. Right. postmaster.pid absolutely should be there if the postmaster is up (and if it ain't, you're going to have lots of other difficulty in running the regression tests...). It doesn't feel quite as static as PG_VERSION, though. regards, tom lane
At Sat, 30 Jul 2022 10:24:39 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > On 2022-Jul-30, Michael Paquier wrote: > >> PG_VERSION would be simpler. Looking at postmaster.pid would require > >> a lookup at external_pid_file, and as it is not set by default you > >> would need to add some extra logic in the tests where > >> external_pid_file = NULL <=> PGDATA/postmaster.pid. > > > Hmm, no? as I recall external_pid_file is an *additional* PID file; it > > doesn't supplant postmaster.pid. > > Right. postmaster.pid absolutely should be there if the postmaster > is up (and if it ain't, you're going to have lots of other difficulty > in running the regression tests...). It doesn't feel quite as static > as PG_VERSION, though. Thanks for committing it. Also the revised test (being suggested by Michael) looks good. regards. -- Kyotaro Horiguchi NTT Open Source Software Center