Thread: When does Postgres use binary I/O?

When does Postgres use binary I/O?

From
Paul A Jungwirth
Date:
Hello,

I've read the docs at [1] and also this interesting recent post about
adding binary I/O to the hashtypes extension. I wrote send & recv
functions for my new multirange types, but I'm not sure how to test
them. After running installcheck or installcheck-world, the code
coverage report says they aren't tested, nor are the send/recv
functions for ranges or arrays. When does Postgres actually use these
functions? Is there a good way to test them?

Thanks,
Paul

[1] https://www.postgresql.org/docs/11/xtypes.html
[2] http://www.myrkraverk.com/blog/2019/08/postgresql-retroactively-adding-binary-i-o-to-a-type/



Re: When does Postgres use binary I/O?

From
Tom Lane
Date:
Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
> I've read the docs at [1] and also this interesting recent post about
> adding binary I/O to the hashtypes extension. I wrote send & recv
> functions for my new multirange types, but I'm not sure how to test
> them. After running installcheck or installcheck-world, the code
> coverage report says they aren't tested, nor are the send/recv
> functions for ranges or arrays. When does Postgres actually use these
> functions? Is there a good way to test them?

The core regression tests don't systematically exercise binary I/O,
and they certainly wouldn't magically cover a new type they didn't
use to.  You'd need to add test case(s).

Likely it'd be good to have some more consistent approach to
testing that ... right now it's not even very obvious where
is a good place to add such tests.  I do see a small test in
src/test/regress/input/misc.source (COPY BINARY stud_emp),
but that solution doesn't scale easily because of its
dependence on absolute file pathnames.

            regards, tom lane



Re: When does Postgres use binary I/O?

From
Paul Jungwirth
Date:
On 9/18/19 7:26 AM, Tom Lane wrote:
> The core regression tests don't systematically exercise binary I/O,
> and they certainly wouldn't magically cover a new type they didn't
> use to.  You'd need to add test case(s).

Thanks! I thought psql might use binary IO internally when available, or 
at least pg_dump with its custom format, but it looks like neither of 
those are true---and now that I think about it I suppose the code in adt 
isn't something they would be able to use, so it makes sense.

So I guess my choices are to use COPY WITH (FORMAT BINARY) or write 
something that calls PQexecParams. Does anything else use binary I/O?

> Likely it'd be good to have some more consistent approach to
> testing that ... right now it's not even very obvious where
> is a good place to add such tests.  I do see a small test in
> src/test/regress/input/misc.source (COPY BINARY stud_emp),
> but that solution doesn't scale easily because of its
> dependence on absolute file pathnames.

I'm not sure how hard this would be, but we could add a new schema to 
the regression db and then do something like this:

psql -d regression -c 'COPY public.t TO STDOUT WITH (FORMAT BINARY)' \
   | psql -d regression -c 'COPY binaryio.t FROM STDIN WITH (FORMAT BINARY)'

and then verify that `SELECT *` looks the same for the two tables. I can 
think of several complications already though. :-)

But even if that's not an automated test at least it gives me an easy 
way to manually exercise my own multirange_{send,recv} functions.

Thanks!

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com



Re: When does Postgres use binary I/O?

From
Tom Lane
Date:
Paul Jungwirth <pj@illuminatedcomputing.com> writes:
> On 9/18/19 7:26 AM, Tom Lane wrote:
>> Likely it'd be good to have some more consistent approach to
>> testing that ... right now it's not even very obvious where
>> is a good place to add such tests.  I do see a small test in
>> src/test/regress/input/misc.source (COPY BINARY stud_emp),
>> but that solution doesn't scale easily because of its
>> dependence on absolute file pathnames.

> I'm not sure how hard this would be, but we could add a new schema to 
> the regression db and then do something like this:

> psql -d regression -c 'COPY public.t TO STDOUT WITH (FORMAT BINARY)' \
>    | psql -d regression -c 'COPY binaryio.t FROM STDIN WITH (FORMAT BINARY)'

> and then verify that `SELECT *` looks the same for the two tables. I can 
> think of several complications already though. :-)

Yeah, that's more or less what the existing test case does, but only
for a rather small set of types.

The major thing that this style of test misses, IMO, is that it
proves nothing at all about the stability of the binary representation.
In particular, it'd be almost trivially easy for someone to screw up
the expectation of endianness independence for the binary format.
(I live in fear that we will, or have already, shipped such a bug
with nobody noticing.)

To address that, I think we'd not only want to check the round-trip
behavior, but also compare the binary output to a "known good"
reference file.  If we messed up on endianness independence, the
bigendian members of the buildfarm would find it immediately.

            regards, tom lane



Re: When does Postgres use binary I/O?

From
Tom Lane
Date:
I wrote:
> The major thing that this style of test misses, IMO, is that it
> proves nothing at all about the stability of the binary representation.
> In particular, it'd be almost trivially easy for someone to screw up
> the expectation of endianness independence for the binary format.

> To address that, I think we'd not only want to check the round-trip
> behavior, but also compare the binary output to a "known good"
> reference file.  If we messed up on endianness independence, the
> bigendian members of the buildfarm would find it immediately.

I thought of an easier-to-maintain approach to that part than having
a reference file.  Binary send functions are invokable from SQL,
so we could just imagine adding test cases along the lines of

regression=# select int8send(42);
      int8send      
--------------------
 \x000000000000002a
(1 row)

for each data type.  This would be enough to detect endianness issues,
garbage in padding bytes, etc.

The receive functions are not so easy to call, so we still need a
round-trip test, but you could imagine a TAP test framework for that.
Or, perhaps, the thing to do would be to provide a generic test function
that takes a value, runs it through the type's send and then receive
functions, and returns the result (or just complains if it gets different
bits out ...)

            regards, tom lane



Re: When does Postgres use binary I/O?

From
Paul A Jungwirth
Date:
On Sun, Sep 22, 2019 at 11:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I thought of an easier-to-maintain approach to that part than having
> a reference file.  Binary send functions are invokable from SQL,
> so we could just imagine adding test cases along the lines of
>
> regression=# select int8send(42);
>       int8send
> --------------------
>  \x000000000000002a
> (1 row)
>
> for each data type.  This would be enough to detect endianness issues,
> garbage in padding bytes, etc.

I just finished my multirange patch (well, "finished" :-), so I might
be willing to sign up for this. Would you scatter these tests around
in the various existing files? Or add a new cross-cutting file (like
opr_sanity etc)?

> The receive functions are not so easy to call, so we still need a
> round-trip test, but you could imagine a TAP test framework for that.
> Or, perhaps, the thing to do would be to provide a generic test function
> that takes a value, runs it through the type's send and then receive
> functions, and returns the result (or just complains if it gets different
> bits out ...)

So you're saying the latter option is to add a new function that
someone can call from SQL, that just round-trips a value through send
+ recv? And then call that from an ordinary regress test? I guess the
tests themselves can't define the function (like they define
binary_coercible), because you need to call *_recv from C, so this
would actually be a function we ship and document, right? That seems
within my abilities.

Should I move this thread over to pgsql-hackers for this?

Paul



Re: When does Postgres use binary I/O?

From
Tom Lane
Date:
Paul A Jungwirth <pj@illuminatedcomputing.com> writes:
> On Sun, Sep 22, 2019 at 11:53 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I thought of an easier-to-maintain approach to that part than having
>> a reference file.

> I just finished my multirange patch (well, "finished" :-), so I might
> be willing to sign up for this. Would you scatter these tests around
> in the various existing files? Or add a new cross-cutting file (like
> opr_sanity etc)?

I think adding them to the existing datatype-specific regression tests
is probably the way to go.  It seems like it'd be more likely that
someone writing a new datatype would emulate one of those test scripts
than that they'd notice they ought to add a section to some other
script.

> So you're saying the latter option is to add a new function that
> someone can call from SQL, that just round-trips a value through send
> + recv? And then call that from an ordinary regress test?

Yeah, something roughly like "send_recv_round_trip(any) returns bool",
I guess.

> I guess the
> tests themselves can't define the function (like they define
> binary_coercible), because you need to call *_recv from C, so this
> would actually be a function we ship and document, right? That seems
> within my abilities.

I'm not sure we'd want to expose it as a generally available function.
One idea is to put it in regress.c, although most of the functions
in there today are not created till create_function_1.sql which runs
too late to be useful for this.  Maybe it's okay as a core function.

> Should I move this thread over to pgsql-hackers for this?

Yeah.

            regards, tom lane