On 7/1/20 5:17 PM, Joe Conway wrote:
> On 7/1/20 4:12 PM, Tom Lane wrote:
>> Joe Conway <mail@joeconway.com> writes:
>>> I did some performance testing of the worst case/largest possible file and found
>>> that skipping the stat and bulk read does cause a significant regression.
>>
>> Yeah, I was wondering a little bit if that'd be an issue.
>>
>>> In the attached patch I was able to get most of the performance degradation back
>>> -- ~600ms. Hopefully you don't think what I did was "too cute by half" :-). Do
>>> you think this is good enough or should we go back to using the stat file size
>>> when it is > 0?
>>
>> I don't think it's unreasonable to "get in bed" with the innards of the
>> StringInfo; plenty of other places do already, such as pqformat.h or
>> pgp_armor_decode, just to name the first couple that I came across in a
>> quick grep.
>>
>> However, if we're going to get in bed with it, let's get all the way in
>> and just read directly into the StringInfo's buffer, as per attached.
>> This saves all the extra memcpy'ing and reduces the number of fread calls
>> to at most log(N).
>
> Works for me. I'll retest to see how well it does performance-wise and report back.
A quick test shows that this gets performance back on par with HEAD.
The only downside is that the max filesize is reduced to (MaxAllocSize -
MIN_READ_SIZE - 1) compared to MaxAllocSize with the old method.
But anyone pushing that size limit is going to run into other issues anyway. I.e
(on pg11):
8<---------------
select length(pg_read_binary_file('/tmp/rbftest4.bin'));
length
------------
1073737726
(1 row)
select pg_read_binary_file('/tmp/rbftest4.bin');
ERROR: invalid memory alloc request size 2147475455
8<---------------
So probably not worth worrying about.
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development