Home > mailing lists

Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c) - Mailing list pgsql-hackers

From	Ranier Vilela
Subject	Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c)
Date	September 15, 2020 20:43:56
Msg-id	CAEudQArpwWoA1qzpuC3MX_-0HdQi=qsgOyeU+oh3v8HRfEOULA@mail.gmail.com Whole thread Raw
In response to	Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c) (Ranier Vilela <ranier.vf@gmail.com>)
Responses	Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c)
List	pgsql-hackers

Tree view

Em sex., 11 de set. de 2020 às 18:43, Ranier Vilela <ranier.vf@gmail.com> escreveu:

Em sex., 11 de set. de 2020 às 17:44, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Ranier Vilela <ranier.vf@gmail.com> writes:
> New version, with support to read Virtual File (pipe, FIFO and socket).
> With assert, in case, erroneous, of trying to read a pipe, with offset.

Really, could you do a little more thinking and testing of your own,
rather than expecting the rest of us to point out holes in your thinking?
Yes, I can.

* bytes_to_read == 0 is a legal case.
Ok. Strange call to read a file with zero bytes.

* "Assert(seek_offset != 0)" is an equally awful idea.
I was trying to predict the case of reading a Virtual File, with bytes_to_read == -1 and seek_offset == 0,
which is bound to fail in fseeko.
In any case, 96d1f423f95d it will fail with any Virtual File.

* Removing the seek code from the negative-bytes_to_read path
is just broken.
Ok.

* The only reason this code is shorter than the previous version is
you took out all the comments explaining what it was doing, and
failed to replace them. This is just as subtle as before, so I
don't find that acceptable.
It was not my intention.

I do agree that it might be worth skipping the fseeko call in the
probably-common case where seek_offset is zero. Otherwise I don't
see much reason to change what's there.
Well, I think that v3 is a little better, but it’s just my opinion.
v3 try to copy directly in the final buffer, which will certainly be a little faster.

v4 patch, simple benchmark, read binary file with size > 6GB.

PostgreSQL main:

postgres=# \timing on
Timing is on.
postgres=# select pg_read_file('c:\users\ranier\downloads\macOS_Catalina.7z');
ERROR: file length too large
Time: 3068,459 ms (00:03,068)
postgres=#

PostgreSQL patched (v4 read_binary_file):

postgres=# \timing on
Timing is on.
postgres=# select pg_read_file('c:\users\ranier\downloads\macOS_Catalina.7z');
ERROR: file length too large
Time: 701,035 ms
postgres=#

4.37x faster, very good.

v4 handles pipes very well.

Tested with https://docs.microsoft.com/en-us/windows/win32/ipc/multithreaded-pipe-server

Terminal one:

C:\usr\src\tests\pipes>pipe_server

Pipe Server: Main thread awaiting client connection on \\.\pipe\mynamedpipe

Terminal two:

postgres=# \timing on
Timing is on.
postgres=# select pg_read_file('\\.\pipe\mynamedpipe');

Terminal one:

Client connected, creating a processing thread.

Pipe Server: Main thread awaiting client connection on \\.\pipe\mynamedpipe
InstanceThread created, receiving and processing messages.

Pipe Server: Main thread awaiting client connection on \\.\pipe\mynamedpipe
InstanceThread created, receiving and processing messages.
^C
C:\usr\src\tests\pipes>

Terminal two:

postgres=# select pg_read_file('\\.\pipe\mynamedpipe');

pg_read_file
--------------

(1 row)

Time: 77267,481 ms (01:17,267)
postgres=#

regards,

Ranier Vilela

Attachment

v4-0001-simplified_read_binary_file.patch

pgsql-hackers by date:

From: Tom Lane
Date: 15 September 2020, 20:41:02
Subject: Re: pg_restore causing deadlocks on partitioned tables

From: Alvaro Herrera
Date: 15 September 2020, 20:54:18
Subject: Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c)

Re: Simplified version of read_binary_file (src/backend/utils/adt/genfile.c) - Mailing list pgsql-hackers

Attachment

Previous

Next