Thread: pg_read_file() with virtual files returns empty string

pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
Since pg11 pg_read_file() and friends can be used with absolute paths as long as
the user is superuser or explicitly granted the role pg_read_server_files.

I noticed that when trying to read a virtual file, e.g.:

  SELECT pg_read_file('/proc/self/status');

the returned result is a zero length string.

However this works fine:

  SELECT pg_read_file('/proc/self/status', 127, 128);

The reason for that is pg_read_file_v2() sets bytes_to_read=-1 if no offset and
length are supplied as arguments when it is called. It passes bytes_to_read down
to read_binary_file().

When the latter function sees bytes_to_read < 0 it tries to read the entire file
by getting the file size via stat, which returns 0 for a virtual file size.

The attached patch fixes this for me. I think it ought to be backpatched through
pg11.

Comments?

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Attachment

Re: pg_read_file() with virtual files returns empty string

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> The attached patch fixes this for me. I think it ought to be backpatched through
> pg11.

> Comments?

1. Doesn't seem to be accounting for the possibility of an error in fread().

2. Don't we want to remove the stat() call altogether, if we're not
going to believe its length?

3. This bit might need to cast the RHS to int64:
    if (bytes_to_read > (MaxAllocSize - VARHDRSZ))
otherwise it might be treated as an unsigned comparison.
Or you could check for bytes_to_read < 0 separately.

4. appendStringInfoString seems like quite the wrong thing to use
when the input is binary data.

5. Don't like the comment.  Whether the file is virtual or not isn't
very relevant here.

6. If the file size exceeds 1GB, I fear we'll get some rather opaque
failure from the stringinfo infrastructure.  It'd be better to
check for that here and give a file-too-large error.

            regards, tom lane



Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 6/27/20 3:43 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> The attached patch fixes this for me. I think it ought to be backpatched through
>> pg11.
>
>> Comments?
>
> 1. Doesn't seem to be accounting for the possibility of an error in fread().
>
> 2. Don't we want to remove the stat() call altogether, if we're not
> going to believe its length?
>
> 3. This bit might need to cast the RHS to int64:
>     if (bytes_to_read > (MaxAllocSize - VARHDRSZ))
> otherwise it might be treated as an unsigned comparison.
> Or you could check for bytes_to_read < 0 separately.
>
> 4. appendStringInfoString seems like quite the wrong thing to use
> when the input is binary data.
>
> 5. Don't like the comment.  Whether the file is virtual or not isn't
> very relevant here.
>
> 6. If the file size exceeds 1GB, I fear we'll get some rather opaque
> failure from the stringinfo infrastructure.  It'd be better to
> check for that here and give a file-too-large error.


All good stuff -- I believe the attached checks all the boxes.

I noted while at this, that the current code can never hit this case:

!     if (bytes_to_read < 0)
!     {
!         if (seek_offset < 0)
!             bytes_to_read = -seek_offset;

The intention here seems to be that if you pass bytes_to_read = -1 with a
negative offset, it will give you the last offset bytes of the file.

But all of the SQL exposed paths disallow an explicit negative value for
bytes_to_read. This was also not documented as far as I can tell so I eliminated
that case in the attached. Is that actually a case I should fix/support instead?

Separately, it seems to me that a two argument version of pg_read_file() would
be useful:

  pg_read_file('filename', offset)

In that case bytes_to_read = -1 could be passed down in order to read the entire
file after the offset. In fact I think that would nicely handle the negative
offset case as well.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Attachment

Re: pg_read_file() with virtual files returns empty string

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> All good stuff -- I believe the attached checks all the boxes.

Looks okay to me, except I think you want

!     if (bytes_to_read > 0)

to be

!     if (bytes_to_read >= 0)

As it stands, a zero request will be treated like -1 (read all the
rest of the file) while ISTM it ought to be an expensive way to
read zero bytes --- perhaps useful to check the filename and seek
offset validity?

> The intention here seems to be that if you pass bytes_to_read = -1 with a
> negative offset, it will give you the last offset bytes of the file.

I think it's just trying to convert bytes_to_read = -1 into an explicit
positive length-to-read in all cases.  We don't need that anymore with
this code, so dropping it is fine.

            regards, tom lane



Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 6/28/20 6:00 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> All good stuff -- I believe the attached checks all the boxes.
>
> Looks okay to me, except I think you want
>
> !     if (bytes_to_read > 0)
>
> to be
>
> !     if (bytes_to_read >= 0)

Yep -- thanks.

I did some performance testing of the worst case/largest possible file and found
that skipping the stat and bulk read does cause a significant regression.
Current HEAD takes about 400ms on my desktop, and with that version of the patch
more like 1100ms.

In the attached patch I was able to get most of the performance degradation back
-- ~600ms. Hopefully you don't think what I did was "too cute by half" :-). Do
you think this is good enough or should we go back to using the stat file size
when it is > 0?

As noted in the comment, the downside of that method is that the largest
supported file size is 1 byte smaller when "reading the entire file" versus
"reading a specified size" due to StringInfo reserving the last byte for a
trailing null.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Attachment

Re: pg_read_file() with virtual files returns empty string

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> I did some performance testing of the worst case/largest possible file and found
> that skipping the stat and bulk read does cause a significant regression.

Yeah, I was wondering a little bit if that'd be an issue.

> In the attached patch I was able to get most of the performance degradation back
> -- ~600ms. Hopefully you don't think what I did was "too cute by half" :-). Do
> you think this is good enough or should we go back to using the stat file size
> when it is > 0?

I don't think it's unreasonable to "get in bed" with the innards of the
StringInfo; plenty of other places do already, such as pqformat.h or
pgp_armor_decode, just to name the first couple that I came across in a
quick grep.

However, if we're going to get in bed with it, let's get all the way in
and just read directly into the StringInfo's buffer, as per attached.
This saves all the extra memcpy'ing and reduces the number of fread calls
to at most log(N).

(This also fixes a bug in your version, which is that it captured
the buf.data pointer before any repalloc that might happen.)

            regards, tom lane

diff --git a/contrib/adminpack/expected/adminpack.out b/contrib/adminpack/expected/adminpack.out
index 5738b0f6c4..edf3ebfcba 100644
--- a/contrib/adminpack/expected/adminpack.out
+++ b/contrib/adminpack/expected/adminpack.out
@@ -79,7 +79,7 @@ SELECT pg_file_rename('test_file1', 'test_file2');
 (1 row)

 SELECT pg_read_file('test_file1');  -- not there
-ERROR:  could not stat file "test_file1": No such file or directory
+ERROR:  could not open file "test_file1" for reading: No such file or directory
 SELECT pg_read_file('test_file2');
  pg_read_file
 --------------
@@ -108,7 +108,7 @@ SELECT pg_file_rename('test_file2', 'test_file3', 'test_file3_archive');
 (1 row)

 SELECT pg_read_file('test_file2');  -- not there
-ERROR:  could not stat file "test_file2": No such file or directory
+ERROR:  could not open file "test_file2" for reading: No such file or directory
 SELECT pg_read_file('test_file3');
  pg_read_file
 --------------
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index ceaa6180da..aa49df4d0c 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -106,33 +106,11 @@ read_binary_file(const char *filename, int64 seek_offset, int64 bytes_to_read,
                  bool missing_ok)
 {
     bytea       *buf;
-    size_t        nbytes;
+    size_t        nbytes = 0;
     FILE       *file;

-    if (bytes_to_read < 0)
-    {
-        if (seek_offset < 0)
-            bytes_to_read = -seek_offset;
-        else
-        {
-            struct stat fst;
-
-            if (stat(filename, &fst) < 0)
-            {
-                if (missing_ok && errno == ENOENT)
-                    return NULL;
-                else
-                    ereport(ERROR,
-                            (errcode_for_file_access(),
-                             errmsg("could not stat file \"%s\": %m", filename)));
-            }
-
-            bytes_to_read = fst.st_size - seek_offset;
-        }
-    }
-
-    /* not sure why anyone thought that int64 length was a good idea */
-    if (bytes_to_read > (MaxAllocSize - VARHDRSZ))
+    /* clamp request size to what we can actually deliver */
+    if (bytes_to_read > (int64) (MaxAllocSize - VARHDRSZ))
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                  errmsg("requested length too large")));
@@ -154,9 +132,56 @@ read_binary_file(const char *filename, int64 seek_offset, int64 bytes_to_read,
                 (errcode_for_file_access(),
                  errmsg("could not seek in file \"%s\": %m", filename)));

-    buf = (bytea *) palloc((Size) bytes_to_read + VARHDRSZ);
+    if (bytes_to_read >= 0)
+    {
+        /* If passed explicit read size just do it */
+        buf = (bytea *) palloc((Size) bytes_to_read + VARHDRSZ);
+
+        nbytes = fread(VARDATA(buf), 1, (size_t) bytes_to_read, file);
+    }
+    else
+    {
+        /* Negative read size, read rest of file */
+        StringInfoData sbuf;
+
+        initStringInfo(&sbuf);
+        /* Leave room in the buffer for the varlena length word */
+        sbuf.len += VARHDRSZ;
+        Assert(sbuf.len < sbuf.maxlen);

-    nbytes = fread(VARDATA(buf), 1, (size_t) bytes_to_read, file);
+        while (!(feof(file) || ferror(file)))
+        {
+            size_t        rbytes;
+
+            /* Minimum amount to read at a time */
+#define MIN_READ_SIZE 4096
+
+            /*
+             * Verify that enlargeStringInfo won't fail; we'd rather give the
+             * error message for that ourselves.
+             */
+            if (((Size) MIN_READ_SIZE) >= (MaxAllocSize - (Size) sbuf.len))
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                         errmsg("requested length too large")));
+
+            /* OK, ensure that we can read at least MIN_READ_SIZE */
+            enlargeStringInfo(&sbuf, MIN_READ_SIZE);
+
+            /*
+             * stringinfo.c likes to allocate in powers of 2, so it's likely
+             * that much more space is available than we asked for.  Use all
+             * of it, rather than making more fread calls than necessary.
+             */
+            rbytes = fread(sbuf.data + sbuf.len, 1,
+                           (size_t) (sbuf.maxlen - sbuf.len - 1), file);
+            sbuf.len += rbytes;
+            nbytes += rbytes;
+        }
+
+        /* Now we can commandeer the stringinfo's buffer as the result */
+        buf = (bytea *) sbuf.data;
+    }

     if (ferror(file))
         ereport(ERROR,

Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 7/1/20 4:12 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> I did some performance testing of the worst case/largest possible file and found
>> that skipping the stat and bulk read does cause a significant regression.
>
> Yeah, I was wondering a little bit if that'd be an issue.
>
>> In the attached patch I was able to get most of the performance degradation back
>> -- ~600ms. Hopefully you don't think what I did was "too cute by half" :-). Do
>> you think this is good enough or should we go back to using the stat file size
>> when it is > 0?
>
> I don't think it's unreasonable to "get in bed" with the innards of the
> StringInfo; plenty of other places do already, such as pqformat.h or
> pgp_armor_decode, just to name the first couple that I came across in a
> quick grep.
>
> However, if we're going to get in bed with it, let's get all the way in
> and just read directly into the StringInfo's buffer, as per attached.
> This saves all the extra memcpy'ing and reduces the number of fread calls
> to at most log(N).


Works for me. I'll retest to see how well it does performance-wise and report back.

> (This also fixes a bug in your version, which is that it captured
> the buf.data pointer before any repalloc that might happen.)

Yeah, I saw that after sending this.

Thanks,

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 7/1/20 5:17 PM, Joe Conway wrote:
> On 7/1/20 4:12 PM, Tom Lane wrote:
>> Joe Conway <mail@joeconway.com> writes:
>>> I did some performance testing of the worst case/largest possible file and found
>>> that skipping the stat and bulk read does cause a significant regression.
>>
>> Yeah, I was wondering a little bit if that'd be an issue.
>>
>>> In the attached patch I was able to get most of the performance degradation back
>>> -- ~600ms. Hopefully you don't think what I did was "too cute by half" :-). Do
>>> you think this is good enough or should we go back to using the stat file size
>>> when it is > 0?
>>
>> I don't think it's unreasonable to "get in bed" with the innards of the
>> StringInfo; plenty of other places do already, such as pqformat.h or
>> pgp_armor_decode, just to name the first couple that I came across in a
>> quick grep.
>>
>> However, if we're going to get in bed with it, let's get all the way in
>> and just read directly into the StringInfo's buffer, as per attached.
>> This saves all the extra memcpy'ing and reduces the number of fread calls
>> to at most log(N).
>
> Works for me. I'll retest to see how well it does performance-wise and report back.

A quick test shows that this gets performance back on par with HEAD.

The only downside is that the max filesize is reduced to (MaxAllocSize -
MIN_READ_SIZE - 1) compared to MaxAllocSize with the old method.

But anyone pushing that size limit is going to run into other issues anyway. I.e
(on pg11):

8<---------------
select length(pg_read_binary_file('/tmp/rbftest4.bin'));
   length

------------
 1073737726
(1 row)

select pg_read_binary_file('/tmp/rbftest4.bin');
ERROR:  invalid memory alloc request size 2147475455
8<---------------

So probably not worth worrying about.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: pg_read_file() with virtual files returns empty string

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> The only downside is that the max filesize is reduced to (MaxAllocSize -
> MIN_READ_SIZE - 1) compared to MaxAllocSize with the old method.

Hm, I was expecting that the last successful iteration of
enlargeStringInfo would increase the buffer size to MaxAllocSize,
so that we'd really only be losing one byte (which we can't avoid
if we use stringinfo).  But you're right that it's most likely moot
since later manipulations of such a result would risk hitting overflows.

I marked the CF entry as RFC.

            regards, tom lane



Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 7/1/20 6:22 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> The only downside is that the max filesize is reduced to (MaxAllocSize -
>> MIN_READ_SIZE - 1) compared to MaxAllocSize with the old method.
>
> Hm, I was expecting that the last successful iteration of
> enlargeStringInfo would increase the buffer size to MaxAllocSize,
> so that we'd really only be losing one byte (which we can't avoid
> if we use stringinfo).  But you're right that it's most likely moot
> since later manipulations of such a result would risk hitting overflows.
>
> I marked the CF entry as RFC.

Sorry to open this can of worms again, but I couldn't get my head past the fact
that reading the entire file would have a different size limit than reading the
exact number of bytes in the file.

So, inspired by what you did (and StringInfo itself) I came up with the
attached. This version performs equivalently to your patch (and HEAD), and
allows files up to and including (MaxAllocSize - VARHDRSZ) -- i.e. exactly the
same as the specified-length case and legacy behavior for the full file read.

But if you object I will just go with your version barring any other opinions.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Attachment

Re: pg_read_file() with virtual files returns empty string

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> On 7/1/20 6:22 PM, Tom Lane wrote:
>> Hm, I was expecting that the last successful iteration of
>> enlargeStringInfo would increase the buffer size to MaxAllocSize,
>> so that we'd really only be losing one byte (which we can't avoid
>> if we use stringinfo).  But you're right that it's most likely moot
>> since later manipulations of such a result would risk hitting overflows.

> Sorry to open this can of worms again, but I couldn't get my head past the fact
> that reading the entire file would have a different size limit than reading the
> exact number of bytes in the file.

Are you sure there actually is any such limit in the other code,
after accounting for the way that stringinfo.c will enlarge its
buffer?  That is, I believe that the limit is MaxAllocSize minus
five bytes, not something less.

> So, inspired by what you did (and StringInfo itself) I came up with the
> attached. This version performs equivalently to your patch (and HEAD), and
> allows files up to and including (MaxAllocSize - VARHDRSZ) -- i.e. exactly the
> same as the specified-length case and legacy behavior for the full file read.

I find this way overcomplicated for what it accomplishes.  In the
real world there's not much difference between MaxAllocSize minus
five and MaxAllocSize minus four.

            regards, tom lane



Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 7/2/20 3:36 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> On 7/1/20 6:22 PM, Tom Lane wrote:
>>> Hm, I was expecting that the last successful iteration of
>>> enlargeStringInfo would increase the buffer size to MaxAllocSize,
>>> so that we'd really only be losing one byte (which we can't avoid
>>> if we use stringinfo).  But you're right that it's most likely moot
>>> since later manipulations of such a result would risk hitting overflows.
>
>> Sorry to open this can of worms again, but I couldn't get my head past the fact
>> that reading the entire file would have a different size limit than reading the
>> exact number of bytes in the file.
>
> Are you sure there actually is any such limit in the other code,
> after accounting for the way that stringinfo.c will enlarge its
> buffer?  That is, I believe that the limit is MaxAllocSize minus
> five bytes, not something less.
>
>> So, inspired by what you did (and StringInfo itself) I came up with the
>> attached. This version performs equivalently to your patch (and HEAD), and
>> allows files up to and including (MaxAllocSize - VARHDRSZ) -- i.e. exactly the
>> same as the specified-length case and legacy behavior for the full file read.
>
> I find this way overcomplicated for what it accomplishes.  In the
> real world there's not much difference between MaxAllocSize minus
> five and MaxAllocSize minus four.

Ok, so your version was not as bad as I thought.:

ll /tmp/rbftest*.bin
-rw-r--r-- 1 postgres postgres 1073741819 Jul  2 15:48 /tmp/rbftest1.bin
-rw-r--r-- 1 postgres postgres 1073741818 Jul  2 15:47 /tmp/rbftest2.bin
-rw-r--r-- 1 postgres postgres 1073741817 Jul  2 15:53 /tmp/rbftest3.bin

rbftest1.bin == MaxAllocSize - 4
rbftest2.bin == MaxAllocSize - 5
rbftest3.bin == MaxAllocSize - 6

postgres=# select length(pg_read_binary_file('/tmp/rbftest1.bin'));
ERROR:  requested length too large
postgres=# select length(pg_read_binary_file('/tmp/rbftest2.bin'));
ERROR:  requested length too large
postgres=# select length(pg_read_binary_file('/tmp/rbftest3.bin'));
   length
------------
 1073741817

When I saw originally MaxAllocSize - 5 fail I skipped to something smaller by
4096 and it worked. But here I see that the actual max size is MaxAllocSize - 6.
I guess I can live with that.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: pg_read_file() with virtual files returns empty string

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> When I saw originally MaxAllocSize - 5 fail I skipped to something smaller by
> 4096 and it worked. But here I see that the actual max size is MaxAllocSize - 6.

Huh, I wonder why it's not max - 5.  Probably not worth worrying about,
though.

            regards, tom lane



Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 7/2/20 4:27 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> When I saw originally MaxAllocSize - 5 fail I skipped to something smaller by
>> 4096 and it worked. But here I see that the actual max size is MaxAllocSize - 6.
>
> Huh, I wonder why it's not max - 5.  Probably not worth worrying about,
> though.

Well this part:

+    rbytes = fread(sbuf.data + sbuf.len, 1,
+       (size_t) (sbuf.maxlen - sbuf.len - 1), file);

could actually be:

+    rbytes = fread(sbuf.data + sbuf.len, 1,
+       (size_t) (sbuf.maxlen - sbuf.len), file);

because there is no actual need to reserve a byte for the trailing null, since
we are not using appendBinaryStringInfo() anymore, and that is where the
trailing NULL gets written.

With that change (and some elog(NOTICE,...) calls) we have:

select length(pg_read_binary_file('/tmp/rbftest2.bin'));
NOTICE:  loop start - buf max len: 1024; buf len 4
NOTICE:  loop end - buf max len: 8192; buf len 8192
NOTICE:  loop start - buf max len: 8192; buf len 8192
NOTICE:  loop end - buf max len: 16384; buf len 16384
NOTICE:  loop start - buf max len: 16384; buf len 16384
[...]
NOTICE:  loop end - buf max len: 536870912; buf len 536870912
NOTICE:  loop start - buf max len: 536870912; buf len 536870912
NOTICE:  loop end - buf max len: 1073741823; buf len 1073741822
   length
------------
 1073741818
(1 row)

Or max - 5, so we got our byte back :-)

In fact, in principle there is no reason we can't get to max - 4 with this code
except that when the filesize is exactly 1073741819, we need to try to read one
more byte to find the EOF that way I did in my patch. I.e.:

-- use 1073741819 byte file
select length(pg_read_binary_file('/tmp/rbftest1.bin'));
NOTICE:  loop start - buf max len: 1024; buf len 4
NOTICE:  loop end - buf max len: 8192; buf len 8192
NOTICE:  loop start - buf max len: 8192; buf len 8192
NOTICE:  loop end - buf max len: 16384; buf len 16384
NOTICE:  loop start - buf max len: 16384; buf len 16384
[...]
NOTICE:  loop end - buf max len: 536870912; buf len 536870912
NOTICE:  loop start - buf max len: 536870912; buf len 536870912
NOTICE:  loop end - buf max len: 1073741823; buf len 1073741823
NOTICE:  loop start - buf max len: 1073741823; buf len 1073741823
ERROR:  requested length too large

Because we read the last byte, but not beyond, EOF is not reached, so on the
next loop iteration we continue and fail on max size rather than exit the loop.

But I am guessing that test in particular was what you thought too complicated
for what it accomplishes?

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: pg_read_file() with virtual files returns empty string

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> On 7/2/20 4:27 PM, Tom Lane wrote:
>> Huh, I wonder why it's not max - 5.  Probably not worth worrying about,
>> though.

> Well this part:

> +    rbytes = fread(sbuf.data + sbuf.len, 1,
> +       (size_t) (sbuf.maxlen - sbuf.len - 1), file);
> could actually be:
> +    rbytes = fread(sbuf.data + sbuf.len, 1,
> +       (size_t) (sbuf.maxlen - sbuf.len), file);
> because there is no actual need to reserve a byte for the trailing null, since
> we are not using appendBinaryStringInfo() anymore, and that is where the
> trailing NULL gets written.

No, I'd put a big -1 on that, because so far as stringinfo.c is concerned
you're violating the invariant that len must be less than maxlen.  The fact
that you happen to not hit any assertions right at the moment does not
make this code okay.

> In fact, in principle there is no reason we can't get to max - 4 with this code
> except that when the filesize is exactly 1073741819, we need to try to read one
> more byte to find the EOF that way I did in my patch. I.e.:

Ah, right, *that* is where the extra byte is lost: we need a buffer
workspace one byte more than the file size, or we won't ever actually
see the EOF indication.

I still can't get excited about contorting the code to remove that
issue.

            regards, tom lane



Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 7/2/20 5:37 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> In fact, in principle there is no reason we can't get to max - 4 with this code
>> except that when the filesize is exactly 1073741819, we need to try to read one
>> more byte to find the EOF that way I did in my patch. I.e.:
>
> Ah, right, *that* is where the extra byte is lost: we need a buffer
> workspace one byte more than the file size, or we won't ever actually
> see the EOF indication.
>
> I still can't get excited about contorting the code to remove that
> issue.

It doesn't seem much worse than the oom test that was there before -- see attached.

In any case I will give you the last word and then quit bugging you about it ;-)

Are we in agreement that whatever gets pushed should be backpatched through pg11
(see start of thread)?

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Attachment

Re: pg_read_file() with virtual files returns empty string

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> On 7/2/20 5:37 PM, Tom Lane wrote:
>> I still can't get excited about contorting the code to remove that
>> issue.

> It doesn't seem much worse than the oom test that was there before -- see attached.

Personally I would not bother, but it's your patch.

> Are we in agreement that whatever gets pushed should be backpatched through pg11
> (see start of thread)?

OK by me.

            regards, tom lane



Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 7/2/20 6:29 PM, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> On 7/2/20 5:37 PM, Tom Lane wrote:
>>> I still can't get excited about contorting the code to remove that
>>> issue.
>
>> It doesn't seem much worse than the oom test that was there before -- see attached.
>
> Personally I would not bother, but it's your patch.

Thanks, committed that way, ...

>> Are we in agreement that whatever gets pushed should be backpatched through pg11
>> (see start of thread)?
>
> OK by me.

... and backpatched to v11.

I changed the new error message to "file length too large" instead of "requested
length too large" since that seems more descriptive of what is actually
happening there. I also changed the corresponding error code to match the one
enlargeStringInfo() would have used because I thought it was more apropos.

Thanks for all the help with this!

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: pg_read_file() with virtual files returns empty string

From
Justin Pryzby
Date:
Hi Joe

Thanks for addressing this.

But I noticed that cfbot is now populating with failures like:

https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/704898559
genfile.c: In function ‘read_binary_file’:
genfile.c:192:5: error: ignoring return value of ‘fread’, declared with attribute warn_unused_result
[-Werror=unused-result]
     fread(rbuf, 1, 1, file);
     ^
cc1: all warnings being treated as errors
<builtin>: recipe for target 'genfile.o' failed

-- 
Justin



Re: pg_read_file() with virtual files returns empty string

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> But I noticed that cfbot is now populating with failures like:

> genfile.c: In function ‘read_binary_file’:
> genfile.c:192:5: error: ignoring return value of ‘fread’, declared with attribute warn_unused_result
[-Werror=unused-result]
>      fread(rbuf, 1, 1, file);
>      ^

Yeah, some of the pickier buildfarm members (eg spurfowl) are showing
that as a warning, too.  Maybe make it like

                if (fread(rbuf, 1, 1, file) != 0 || !feof(file))
                    ereport(ERROR,

Probably the feof test is redundant this way, but I'd be inclined to
leave it in anyhow.

            regards, tom lane



Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 7/4/20 12:52 PM, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
>> But I noticed that cfbot is now populating with failures like:
>
>> genfile.c: In function ‘read_binary_file’:
>> genfile.c:192:5: error: ignoring return value of ‘fread’, declared with attribute warn_unused_result
[-Werror=unused-result]
>>      fread(rbuf, 1, 1, file);
>>      ^
>
> Yeah, some of the pickier buildfarm members (eg spurfowl) are showing
> that as a warning, too.  Maybe make it like
>
>                 if (fread(rbuf, 1, 1, file) != 0 || !feof(file))
>                     ereport(ERROR,
>
> Probably the feof test is redundant this way, but I'd be inclined to
> leave it in anyhow.

Ok, will fix. Thanks for the heads up.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment

Re: pg_read_file() with virtual files returns empty string

From
Joe Conway
Date:
On 7/4/20 1:10 PM, Joe Conway wrote:
> On 7/4/20 12:52 PM, Tom Lane wrote:
>> Justin Pryzby <pryzby@telsasoft.com> writes:
>>> But I noticed that cfbot is now populating with failures like:
>>
>>> genfile.c: In function ‘read_binary_file’:
>>> genfile.c:192:5: error: ignoring return value of ‘fread’, declared with attribute warn_unused_result
[-Werror=unused-result]
>>>      fread(rbuf, 1, 1, file);
>>>      ^
>>
>> Yeah, some of the pickier buildfarm members (eg spurfowl) are showing
>> that as a warning, too.  Maybe make it like
>>
>>                 if (fread(rbuf, 1, 1, file) != 0 || !feof(file))
>>                     ereport(ERROR,
>>
>> Probably the feof test is redundant this way, but I'd be inclined to
>> leave it in anyhow.
>
> Ok, will fix. Thanks for the heads up.

And pushed -- thanks!

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Attachment