Thread: [PATCH] Add get_bytes() and set_bytes() functions

[PATCH] Add get_bytes() and set_bytes() functions

From

Aleksander Alekseev

Date:

14 August 2024, 11:01:22

Hi,

While discussing another patch [1] it was discovered that we don't
have a convenient way of casting a bytea to an integer / bigint and
vice versa, extracting integers larger than one byte from byteas, etc.

For instance, casting '\x11223344' to 0x11223344 may look like this:

```
WITH vals AS (
      SELECT '\x11223344'::bytea AS x
)
SELECT
    (get_byte(x, 0) :: bigint << 24) |
    (get_byte(x, 1) << 16) |
    (get_byte(x, 2) << 8) |
    get_byte(x, 3)
FROM vals;
```

There seems to be a demand for this functionality [2][3] and it costs
us nothing to maintain it, so I propose adding it.

The proposed patch adds get_bytes() and set_bytes() functions. The
semantics is similar to get_byte() and set_byte() we already have but
the functions operate with bigints rather than bytes and the user can
specify the size of the integer. This allows working with int2s,
int4s, int8s or even int5s if needed.

Examples:

```
SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
 ?column?
----------
 t

SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
     set_bytes
--------------------
 \x11aabb4455667788
```

Thoughts?

[1]: https://postgr.es/m/CAJ7c6TNMTGnqnG%3DyXXUQh9E88JDckmR45H2Q%2B%3DucaCLMOW1QQw%40mail.gmail.com
[2]: https://stackoverflow.com/questions/32944267/postgresql-converting-bytea-to-bigint
[3]: https://postgr.es/m/AANLkTikip9xs8iXc8e%2BMgz1T1701i8Xk6QtbVB3KJQzX%40mail.gmail.com
-- 
Best regards,
Aleksander Alekseev

Attachment

v1-0001-Add-get_bytes-and-set_bytes-functions.patch

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

"Joel Jacobson"

Date:

14 August 2024, 11:21:49

On Wed, Aug 14, 2024, at 13:01, Aleksander Alekseev wrote:
> The proposed patch adds get_bytes() and set_bytes() functions. The
> semantics is similar to get_byte() and set_byte() we already have but
> the functions operate with bigints rather than bytes and the user can
> specify the size of the integer. This allows working with int2s,
> int4s, int8s or even int5s if needed.

+1

I wanted this myself many times.

I wonder if get_bytes() and set_bytes() will behave differently
on little-endian vs big-endian systems?

If so, then I think it would be nice to enforce a consistent byte order
(e.g., big-endian), to ensure consistent behavior across platforms.

Regards,
Joel

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

Aleksander Alekseev

Date:

14 August 2024, 11:31:31

Hi,

> +1
>
> I wanted this myself many times.
>
> I wonder if get_bytes() and set_bytes() will behave differently
> on little-endian vs big-endian systems?
>
> If so, then I think it would be nice to enforce a consistent byte order
> (e.g., big-endian), to ensure consistent behavior across platforms.

No, the returned value will not depend on the CPU endiness. Current
implementation uses big-endian / network order which in my humble
opinion is what most users would expect.

I believe we also need reverse(bytea) and repeat(bytea, integer)
functions e.g. for those who want little-endian. However I want to
propose them separately when we are done with this patch.

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

"Joel Jacobson"

Date:

14 August 2024, 15:34:06

On Wed, Aug 14, 2024, at 13:31, Aleksander Alekseev wrote:
>> I wonder if get_bytes() and set_bytes() will behave differently
>> on little-endian vs big-endian systems?
> No, the returned value will not depend on the CPU endiness. Current
> implementation uses big-endian / network order which in my humble
> opinion is what most users would expect.

Nice.

I've reviewed and tested the patch.
It looks straight-forward to me.
I don't see any potential problems.
I've marked it Ready for Committer.

> I believe we also need reverse(bytea) and repeat(bytea, integer)
> functions e.g. for those who want little-endian. However I want to
> propose them separately when we are done with this patch.

I agree those functions would be nice too.

I also think it would be nice to provide these convenience functions:
to_bytes(bigint) -> bytea
from_bytes(bytea) -> bigint

Since if not having a current bytea value,
and just wanting to convert a bigint to bytea,
then one would need to construct an zeroed bytea
of the proper size first, to then use set_bytes().

And if just wanting to convert the entire bytea to a bigint,
then one would need to pass 0 as offset and the length
of the bytea as size.

Regards,

Joel

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

David Fetter

Date:

14 August 2024, 17:43:51

On Wed, Aug 14, 2024 at 02:34:06PM +0200, Joel Jacobson wrote:
> On Wed, Aug 14, 2024, at 13:31, Aleksander Alekseev wrote:
> >> I wonder if get_bytes() and set_bytes() will behave differently
> >> on little-endian vs big-endian systems?
> > No, the returned value will not depend on the CPU endiness. Current
> > implementation uses big-endian / network order which in my humble
> > opinion is what most users would expect.
> 
> Nice.

Indeed!

> I've reviewed and tested the patch.
> It looks straight-forward to me.
> I don't see any potential problems.
> I've marked it Ready for Committer.
> 
> > I believe we also need reverse(bytea) and repeat(bytea, integer)
> > functions e.g. for those who want little-endian. However I want to
> > propose them separately when we are done with this patch.
> 
> I agree those functions would be nice too.
> 
> I also think it would be nice to provide these convenience functions:
> to_bytes(bigint) -> bytea
> from_bytes(bytea) -> bigint

Along with these, would it make sense to have other forms of these
that won't choke at 63 bits, e.g. NUMERIC or TEXT?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

"Joel Jacobson"

Date:

14 August 2024, 18:39:32

On Wed, Aug 14, 2024, at 16:43, David Fetter wrote:
>> I also think it would be nice to provide these convenience functions:
>> to_bytes(bigint) -> bytea
>> from_bytes(bytea) -> bigint
>
> Along with these, would it make sense to have other forms of these
> that won't choke at 63 bits, e.g. NUMERIC or TEXT?

I wonder what would be good names for such functions though?

Since NUMERIC can have decimal digits, and since BYTEA can already be casted to
TEXT, which will just be \x followed by the hex digits, maybe such names should
include "integer" or some other word, to indicate what is being returned?

It's already quite easy to convert to NUMERIC though,
for users who are aware of tricks like this:
SELECT ('0x'||encode('\xCAFEBABEDEADBEEF'::bytea,'hex'))::numeric;
       numeric
----------------------
 14627333968688430831
(1 row)

But, I think it would be better to provide functions,
since many users probably have to google+stackoverflow or gpt
to learn such tricks, which are not in the official documentation.

Regards,
Joel

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

David Fetter

Date:

14 August 2024, 19:31:44

On Wed, Aug 14, 2024 at 05:39:32PM +0200, Joel Jacobson wrote:
> On Wed, Aug 14, 2024, at 16:43, David Fetter wrote:
> >> I also think it would be nice to provide these convenience functions:
> >> to_bytes(bigint) -> bytea
> >> from_bytes(bytea) -> bigint
> >
> > Along with these, would it make sense to have other forms of these
> > that won't choke at 63 bits, e.g. NUMERIC or TEXT?
> 
> I wonder what would be good names for such functions though?

decimal_to_bytes(numeric), decimal_to_bytes(text) on one side
decimal_from_bytes(bytea, typeoid)

Naming Things™ is one of the hard problems in computer science. Bad
joke that includes cache coherency and off-by-one included by
reference.

> Since NUMERIC can have decimal digits, and since BYTEA can already be casted to
> TEXT, which will just be \x followed by the hex digits, maybe such names should
> include "integer" or some other word, to indicate what is being returned?
> 
> It's already quite easy to convert to NUMERIC though,
> for users who are aware of tricks like this:
> SELECT ('0x'||encode('\xCAFEBABEDEADBEEF'::bytea,'hex'))::numeric;
>        numeric
> ----------------------
>  14627333968688430831
> (1 row)
> 
> But, I think it would be better to provide functions,
> since many users probably have to google+stackoverflow or gpt
> to learn such tricks, which are not in the official documentation.

As usual, I see "official documentation lacks helpful and/or
non-obvious examples" as a problem best approached by making good the
lack. I am aware that my ideas about pedagogy, documentation, etc. are
not shared universally, but they're widely shared by people whose main
interaction with documents is trying to get help from them.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

"Joel Jacobson"

Date:

14 August 2024, 20:25:51

On Wed, Aug 14, 2024, at 18:31, David Fetter wrote:
> On Wed, Aug 14, 2024 at 05:39:32PM +0200, Joel Jacobson wrote:
>> On Wed, Aug 14, 2024, at 16:43, David Fetter wrote:
>> >> I also think it would be nice to provide these convenience functions:
>> >> to_bytes(bigint) -> bytea
>> >> from_bytes(bytea) -> bigint
>> >
>> > Along with these, would it make sense to have other forms of these
>> > that won't choke at 63 bits, e.g. NUMERIC or TEXT?
>>
>> I wonder what would be good names for such functions though?
>
> decimal_to_bytes(numeric), decimal_to_bytes(text) on one side
> decimal_from_bytes(bytea, typeoid)

I assume decimal_to_bytes() will only accept integer numerics,
that is, that don't have a decimal digits part?
Hmm, it's perhaps then a bit counter intuitive that the name
contains "decimal", since some people might associate the word
"decimal" stronger with "decimal digits" rather than the radix/base 10.

What do we want to happen if passing a numeric with decimal digits,
to decimal_to_bytes()? It must be an error, right?

Example: SELECT decimal_to_bytes(1.23);

> Naming Things™ is one of the hard problems in computer science. Bad
> joke that includes cache coherency and off-by-one included by
> reference.

So true :)

> As usual, I see "official documentation lacks helpful and/or
> non-obvious examples" as a problem best approached by making good the
> lack. I am aware that my ideas about pedagogy, documentation, etc. are
> not shared universally, but they're widely shared by people whose main
> interaction with documents is trying to get help from them.

Well spoken.

Regards,
Joel

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

"Joel Jacobson"

Date:

15 August 2024, 07:19:36

On Wed, Aug 14, 2024, at 19:25, Joel Jacobson wrote:
> What do we want to happen if passing a numeric with decimal digits,
> to decimal_to_bytes()? It must be an error, right?
>
> Example: SELECT decimal_to_bytes(1.23);

Hmm, an error feels quite ugly on second thought.
Would be nicer if all numerics could be represented,
in a bytea representation that is meaningful to other systems.

I think we need a tuple somehow.

Example:

SELECT numeric_to_bytes(223195403574957);
(\xcafebabedeadbeef,0,false)

SELECT numeric_to_bytes(-223195403574957);
(\xcafebabedeadbeef,0,true)

SELECT numeric_to_bytes(2231954035749.57);
(\xcafebabedeadbeef,2,false)

SELECT numeric_from_bytes('\xcafebabedeadbeef'::bytea,0,false);
223195403574957

SELECT numeric_from_bytes('\xcafebabedeadbeef'::bytea,0,true);
-223195403574957

SELECT numeric_from_bytes('\xcafebabedeadbeef'::bytea,2,false);
2231954035749.57

But then what about Inf,-Inf,NaN?

Regards,
Joel

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

Dean Rasheed

Date:

15 August 2024, 13:14:55

On Thu, 15 Aug 2024 at 05:20, Joel Jacobson <joel@compiler.org> wrote:
>
> On Wed, Aug 14, 2024, at 19:25, Joel Jacobson wrote:
> > What do we want to happen if passing a numeric with decimal digits,
> > to decimal_to_bytes()? It must be an error, right?
> >
> > Example: SELECT decimal_to_bytes(1.23);
>
> Hmm, an error feels quite ugly on second thought.
> Would be nicer if all numerics could be represented,
>
> But then what about Inf,-Inf,NaN?
>

Perhaps we should also add casts between bytea and the integer/numeric
types. That might be easier to use than functions in some
circumstances.

When casting a numeric to an integer, the result is rounded to the
nearest integer, and NaN/Inf generate errors, so we should probably do
the same here.

Regards,
Dean

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

Aleksander Alekseev

Date:

15 August 2024, 13:58:03

Hi,

> Perhaps we should also add casts between bytea and the integer/numeric
> types. That might be easier to use than functions in some
> circumstances.
>
> When casting a numeric to an integer, the result is rounded to the
> nearest integer, and NaN/Inf generate errors, so we should probably do
> the same here.

Yes, I was also thinking about adding NUMERIC versions of get_bytes()
/ set_bytes(). This would allow converting more than 8 bytes to/from
an integer. I dropped this idea because I thought there would be not
much practical use for it. On the flip side you never know who uses
Postgres and for what purpose.

I will add corresponding casts unless the idea will get a push-back
from the community. IMO the existence of these casts will at least not
make things worse.

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

Yugo Nagata

Date:

16 August 2024, 10:49:36

On Thu, 15 Aug 2024 13:58:03 +0300
Aleksander Alekseev <aleksander@timescale.com> wrote:

> Hi,
> 
> > Perhaps we should also add casts between bytea and the integer/numeric
> > types. That might be easier to use than functions in some
> > circumstances.
> >
> > When casting a numeric to an integer, the result is rounded to the
> > nearest integer, and NaN/Inf generate errors, so we should probably do
> > the same here.
> 
> Yes, I was also thinking about adding NUMERIC versions of get_bytes()
> / set_bytes(). This would allow converting more than 8 bytes to/from
> an integer. I dropped this idea because I thought there would be not
> much practical use for it. On the flip side you never know who uses
> Postgres and for what purpose.
> 
> I will add corresponding casts unless the idea will get a push-back
> from the community. IMO the existence of these casts will at least not
> make things worse.

When we add such casts between bytea and the integer/numeric types,
one of the problems mentioned the first of the thread, that is, 
"we don't have a convenient way of casting a bytea to an integer / bigint
and vice versa", would seem be resolved. 

On the other hand, I suppose get_bytes() and set_bytes() are still useful
for extracting bytes from byteas, etc. If casting is no longer the main
purpose of these functions, are variations that get_bytes returns bytea
instead of bigint, and set_bytes receives bytea as the newvalue argument
useful? I wonder it would eliminate the restrict that size cannot be larger
than 8.

Here are my very trivial comments on the patch.

+ * this routine treats "bytea" as an array of bytes.

Maybe, the sentence should start with "This ... ".

+    while(size)
+    {

I wonder inserting a space after "while" is the standard style.

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

Peter Eisentraut

Date:

16 August 2024, 13:11:18

On 14.08.24 13:01, Aleksander Alekseev wrote:
> The proposed patch adds get_bytes() and set_bytes() functions. The
> semantics is similar to get_byte() and set_byte() we already have but
> the functions operate with bigints rather than bytes and the user can
> specify the size of the integer. This allows working with int2s,
> int4s, int8s or even int5s if needed.
> 
> Examples:
> 
> ```
> SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
>   ?column?
> ----------
>   t
> 
> SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
>       set_bytes
> --------------------
>   \x11aabb4455667788
> ```

I think these functions do about three things at once, and I don't think 
they address the originally requested purpose very well.

Converting between integers and byte arrays of matching size seems like 
reasonable functionality.  (You can already do one half of that by 
calling int2send(), int4send(), and int8send(), but the other direction 
(intXrecv()) is not user-callable).

The other things are extracting that byte array from a larger byte array 
and sticking it back into a larger byte array; those seem like separate 
operations.  There is already substr() for bytea for the first part, and 
there might be another string-like operationg for the second part, or 
maybe we could add one.

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

Yugo Nagata

Date:

16 August 2024, 13:32:24

On Fri, 16 Aug 2024 11:41:37 +0300
Aleksander Alekseev <aleksander@timescale.com> wrote:

> Hi,
> 
> > When we add such casts between bytea and the integer/numeric types,
> > one of the problems mentioned the first of the thread, that is,
> > "we don't have a convenient way of casting a bytea to an integer / bigint
> > and vice versa", would seem be resolved.
> >
> > On the other hand, I suppose get_bytes() and set_bytes() are still useful
> > for extracting bytes from byteas, etc. If casting is no longer the main
> > purpose of these functions, are variations that get_bytes returns bytea
> > instead of bigint, and set_bytes receives bytea as the newvalue argument
> > useful? I wonder it would eliminate the restrict that size cannot be larger
> > than 8.
> 
> No, casting between bytea and numeric will not replace get_bytes() /
> set_bytes() for performance reasons.
> 
> Consider the case when you want to extract an int4 from a bytea.
> get_bytes() is going to be very fast while substr() -> ::numeric ->
> ::integer chain will require unnecessary copying and conversions.
> Casting between bytea and numeric is only useful when one has to deal
> with integers larger than 8 bytes. Whether this happens often is a
> debatable question.

Thank you for explanation. I understood the performance drawback.

I supposed interfaces similar to lo_get, lo_put, loread, lowrite of
large objects since they might be useful to access or modify a part of
bytea like a binary file read by pg_read_binary_file. 

> 
> > Here are my very trivial comments on the patch.
> >
> > + * this routine treats "bytea" as an array of bytes.
> >
> > Maybe, the sentence should start with "This ... ".
> >
> > +       while(size)
> > +       {
> >
> > I wonder inserting a space after "while" is the standard style.
> 
> Thanks, fixed.

Should we fix the comment on byteaGetByte in passing, too?

Regards,
Yugo Nagata

-- 
Yugo Nagata <nagata@sraoss.co.jp>

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

Aleksander Alekseev

Date:

19 August 2024, 17:10:15

Hi Peter,

Thanks for the feedback.

> I think these functions do about three things at once, and I don't think
> they address the originally requested purpose very well.

The amount of things that the function does is a matter of
interpretation. I can claim that it does one thing ("extracts an
integer from a bytea"), or as many things as there are lines of code.
IMO the actual question is whether this is a good user interface or
not. Since we already have get_byte() / set_byte() the interface is
arguably OK.

> Converting between integers and byte arrays of matching size seems like
> reasonable functionality.  (You can already do one half of that by
> calling int2send(), int4send(), and int8send(), but the other direction
> (intXrecv()) is not user-callable).
>
> The other things are extracting that byte array from a larger byte array
> and sticking it back into a larger byte array; those seem like separate
> operations.  There is already substr() for bytea for the first part, and
> there might be another string-like operationg for the second part, or
> maybe we could add one.

If I understand correctly, you propose doing (1):

```
SELECT substr('\x1122334455667788'::bytea, 2, 2) :: int2;
```

... instead of:

```
SELECT get_bytes('\x1122334455667788'::bytea, 1, 2)
```

... and (2):

```
WITH vals AS (
    SELECT '\x1122334455667788'::bytea AS x
) SELECT substr(x, 1, 1) || int2send(1234::int2) || substr(x, 4, 5) FROM vals;
```

... instead of:

```
SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
```

There is nothing to do for query (2), it already works. It's not much
better than the query from my first email though.

To clarify, supporting bytea<->integer (and/or bytea<->numeric) casts
doesn't strike me as a terrible idea but it doesn't address the issue
I'm proposing to solve.

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

Peter Eisentraut

Date:

21 August 2024, 19:39:21

On 19.08.24 16:10, Aleksander Alekseev wrote:
> To clarify, supporting bytea<->integer (and/or bytea<->numeric) casts
> doesn't strike me as a terrible idea but it doesn't address the issue
> I'm proposing to solve.

What is the issue you are proposing to solve?

You linked to a couple of threads and stackoverflow pages, and you 
concluded from that that we should add get_bytes() and set_bytes() 
functions.  It's not obvious how you get from the former to the latter, 
and I don't think the functions you propose are well-designed in isolation.

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

Dean Rasheed

Date:

01 March, 14:30:47

On Fri, 24 Jan 2025 at 13:00, Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Thank you. Here is the corrected patch.
>

This looks pretty good to me. I have a just a couple of minor comments:

* The loop in bytea_integer() can be written more simply as a "for"
loop. Given that it's only a few lines of code, it might as well just
be coded directly in each cast function, which avoids the need to go
via a 64-bit integer for every case. In addition, it should use the
BITS_PER_BYTE macro rather than "8". Doing that leads to code that's
consistent with bittoint4() and bittoint8().

* In pg_proc.dat, it's not necessary to write "proleakproof => 'f'",
because that's the default, and no other function does that.

* I think it's worth using slightly more non-trivial doc examples,
including positive and negative cases, to make the behaviour more
obvious.

* I also tweaked the regression tests a bit, and copied the existing
test style which displays both the expected and actual results from
each test.

With those updates, I think this is ready for commit, which I'll do in
a day or two, if there are no further comments.

Regards,
Dean

Attachment

v10-0001-Allow-casting-between-bytea-and-integer-types.patch

Re: [PATCH] Add get_bytes() and set_bytes() functions

From

Dean Rasheed

Date:

07 March, 13:08:41

On Sat, 1 Mar 2025 at 11:30, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> With those updates, I think this is ready for commit, which I'll do in
> a day or two, if there are no further comments.
>

Committed.

Regards,
Dean