Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode
Date
Msg-id CAEudQAro-n+HG+=YCjgVcKQ095OXzjPimo9L1YWjdT5EFdVNMg@mail.gmail.com
Whole thread Raw
In response to AW: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode  (Hans Buschmann <buschmann@nidsa.net>)
Responses Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers

Em seg., 16 de ago. de 2021 às 13:19, Hans Buschmann <buschmann@nidsa.net> escreveu:

Von: Ranier Vilela <ranier.vf@gmail.com>
Gesendet: Montag, 16. August 2021 17:04
An: Hans Buschmann
Cc: pgsql-hackers@postgresql.org
Betreff: Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode
 
Hello Ranier,

Thank you for your quick response.

>Is always good to remove immutables from loops, if safe and secure.

I think it's  worse because a loop changed variable is involved in the test, so it seems to be not immutable.

>Decode has regression too.

good catch, I overlooked it.

>I think that we can take one more step here.
>pg_hex_decode can be improved too.

+1

By looking a bit more precisely I detected the same checks in common/base64.c also involving loop-changed variables. Could you please make the same changes to base64.c?
I will take a look.



>We can remove limitations from all functions that use hex functions.
>byteaout from (varlena.c) has an artificial limitation to handle only MaxAllocSize length, but
>nothing prevents her from being prepared for that limitation to be removed one day.

+1, but I don't know all implications of the type change to size_t
uint64 and size_t in 64 bits are equivalents.
uint64 and size_t in 32 bits can be weird, but anyway size_t at 32 bits can handle 1GB.
 

>Please, take a look at the new version attached.

Two remarks for decoding (by eyeball inspection of patch file only because of stilstruggling with git etc.):

1. The odd-number-of-digits-check for decoding is still part of the loop. It should be before the loop and called only once (by testing for an even number of srclen)
So we can eliminate the block if (s == srcend) {} inside the loop!
I'm afraid that is not possible.
I tested some variations for this test and regress fails at strings tests.
Anyway for test purposes, I changed it temporarily in this version, but I'm afraid we'll have to go back.


2. I noticed that the error messages for hex_decode should be changed as 

s/encoding/decoding/

(as in pg_log_fatal("overflow of destination buffer in hex encoding");)
Ok. Changed.


>If possible can you review the tests if there is an overflow at
>pg_hex_encode and pg_hex_decode functions?

I would prefer to wait for another patch version from you (my development troubles), perhaps also dealing with base64.c
base64.c can be made in another patch.


I don't know how and where any call to the encoding/decoding functions creates an overflow situation in normal operation.

I  will nonceless try the tests but I don't have any experience with it.
Thanks.


BTW the root cause for my work is an attempt to vastly accelerate these functions (hex and base64 encode/decode), but this is left for another day (not finished yet) and certainly only on master/new development.
I think this can speed up a little.


Lateron support on this topic would be highly appreciated...
If I can help, count on me.

regards,
Ranier Vilela
Attachment

pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: badly calculated width of emoji in psql
Next
From: Álvaro Herrera
Date:
Subject: Re: Autovacuum on partitioned table (autoanalyze)