Re: custom function for converting human readable sizes to bytes - Mailing list pgsql-hackers
From | Vitaly Burovoy |
---|---|
Subject | Re: custom function for converting human readable sizes to bytes |
Date | |
Msg-id | CAKOSWNn=p8GX-P5Y-B4t4dg-aAHaTup+CrG41hQ9BeobZwX3KQ@mail.gmail.com Whole thread Raw |
In response to | Re: custom function for converting human readable sizes to bytes (Pavel Stehule <pavel.stehule@gmail.com>) |
Responses |
Re: custom function for converting human readable sizes to bytes
(Pavel Stehule <pavel.stehule@gmail.com>)
Re: custom function for converting human readable sizes to bytes (Pavel Stehule <pavel.stehule@gmail.com>) |
List | pgsql-hackers |
On 1/20/16, Pavel Stehule <pavel.stehule@gmail.com> wrote: > ... > New version is attached > > Regards > Pavel Hello! 1. Why the function is marked as VOLATILE? It depends only on input value, it does change nothing in the DB. I guess it should be IMMUTABLE for efficient caching its result. 2. > text *arg = PG_GETARG_TEXT_PP(0); ... > str = text_to_cstring(arg); > Hmm... My question was _why_ you get TEXT and convert it instead of getting an argument of type cstring (not about usage of VARSIZE_ANY_EXHDR). It was enough to give me a link[1] and leave the usage of VARSIZE_ANY_EXHDR as is. I think all the other claims below are mostly cosmetic. Main behavior is OK (considering its usage will not be in heavy queries). === Documentation: Besides currently added row in a table I think it is worth to add detailed comment after a block "<function>pg_size_pretty</> can be used" similar to (but the best way is to get advice for a correct phrase): "pg_size_bytes can be used to get a size in bytes as bigint from a human-readable format for a comparison purposes (it is the opposite to pg_size_pretty function). The format is numeric with an optional size unit: kB, MB, GB or TB." (and delete unit sizes from the table row). === Tests: Since there was a letter with an explanation why units bigger than "TB" can't be used[2], it is a good reason to add that size units ("PB", "EB", "ZB") in tests with a note to update the documentation part when that unit are supported. === Code style: 1. When you declare pointers, you must align _names_ by the left border, i.e. asterisks must be at one position to the left from the aligned names, i.e. one to three spaces instead of TAB before them. 2. > int multiplier; One more TAB to align it with the name at the next line. === Code: > /* allow whitespace between integer and unit */ May be "numeric" instead of "integer"? > "\"%s\" is not in expected format" It is not clear what format is expected. > /* ignore plus symbol */ It seems it is not ignored, but copied... ;-) > to ger hintstr s/ger/get/ > Elsewhere is used space trimmed buffer. I'd write it as "Otherwise trimmed value is used." I'm not good at English, but that full block looks little odd for me. I tried to reword it in the attachment single_buffer.c, but I don't think it is enough. > We allow ending spaces. "trailing"? > but this message can be little bit no intuitive - better text is "is not a valid number" It was a block with a comment "don't allow empty string", i.e. absence of a number... > Automatic memory deallocation doesn't cover all possible situations where > the function can be used - for example DirectFunctionCall - so explicit > deallocation can descrease a memory requirements when you call these > functions from C. DirectFunctionCall uses a memory context of the caller. For example, you don't free "num" allocated by numeric_in and numeric_mul, also mul_num allocated by int8_numeric... I'd ask experienced hackers for importance of freeing (or leaving) "str" and "buffer". > postgres=# select pg_size_bytes('+912+ kB'); > ERROR: invalid unit: "+ kB" > This is difficult - you have to divide string to two parts and first world is number, second world is unit. Your code looks like a duplicated (not by lines, but by behavior). numeric_in has similar checks, let it do them for you. The goal of your function is just split mantissa and size unit, and if the last one is found, turn it into an exponent. See my example in the attachment check_mantissa_by_numeric_in.c. There is just searching non-space char that can't be part of numeric. Then all before it passes to numeric_in and let it check anything it should. I even left first spaces to show full numeric part to a user if an error occurs (for any case). The second attachment single_buffer.c is an attempt to avoid allocating the second buffer (the first is "str" allocated by text_to_cstring) and copying into it. I don't think it gives a big improvement, but nevertheless. === [1] http://www.postgresql.org/message-id/29618.1451882238@sss.pgh.pa.us [2] http://www.postgresql.org/message-id/CAB7nPqS6Wob4WnZb=DHB3O0Pc-nX1v3xJSzKN3z9KBeXgcQTRg@mail.gmail.com -- Best regards, Vitaly Burovoy
Attachment
pgsql-hackers by date: