Home > mailing lists

Re: custom function for converting human readable sizes to bytes - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: custom function for converting human readable sizes to bytes
Date	January 22, 2016 06:49:03
Msg-id	CAFj8pRBNnaPi7Yy7poxo7v5ps_uDrR9134Yj0xWO5Sh0ywztBQ@mail.gmail.com Whole thread Raw
In response to	Re: custom function for converting human readable sizes to bytes (Vitaly Burovoy <vitaly.burovoy@gmail.com>)
List	pgsql-hackers

Tree view

2016-01-22 7:03 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:

On 1/20/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> ...
> New version is attached
>
> Regards
> Pavel

Hello!

1. Why the function is marked as VOLATILE? It depends only on input
value, it does change nothing in the DB.
I guess it should be IMMUTABLE for efficient caching its result.

2.
> text *arg = PG_GETARG_TEXT_PP(0);
...
> str = text_to_cstring(arg);
>
Hmm... My question was _why_ you get TEXT and convert it instead of
getting an argument of type cstring (not about usage of
VARSIZE_ANY_EXHDR).
It was enough to give me a link[1] and leave the usage of
VARSIZE_ANY_EXHDR as is.

I think all the other claims below are mostly cosmetic. Main behavior
is OK (considering its usage will not be in heavy queries).

===
Documentation:
Besides currently added row in a table I think it is worth to add
detailed comment after a block "<function>pg_size_pretty</> can be
used" similar to (but the best way is to get advice for a correct
phrase):
"pg_size_bytes can be used to get a size in bytes as bigint from a
human-readable format for a comparison purposes (it is the opposite to
pg_size_pretty function). The format is numeric with an optional size
unit: kB, MB, GB or TB."
(and delete unit sizes from the table row).

===
Tests:
Since there was a letter with an explanation why units bigger than
"TB" can't be used[2], it is a good reason to add that size units
("PB", "EB", "ZB") in tests with a note to update the documentation
part when that unit are supported.

===
Code style:
1. When you declare pointers, you must align _names_ by the left
border, i.e. asterisks must be at one position to the left from the
aligned names, i.e. one to three spaces instead of TAB before them.

2.
> int multiplier;
One more TAB to align it with the name at the next line.

===
Code:

> /* allow whitespace between integer and unit */
May be "numeric" instead of "integer"?

> "\"%s\" is not in expected format"
It is not clear what format is expected.

> /* ignore plus symbol */
It seems it is not ignored, but copied... ;-)

> to ger hintstr
s/ger/get/
> Elsewhere is used space trimmed buffer.
I'd write it as "Otherwise trimmed value is used."
I'm not good at English, but that full block looks little odd for me.
I tried to reword it in the attachment single_buffer.c, but I don't
think it is enough.

> We allow ending spaces.
"trailing"?

> but this message can be little bit no intuitive - better text is "is not a valid number"
It was a block with a comment "don't allow empty string", i.e. absence
of a number...

> Automatic memory deallocation doesn't cover all possible situations where
> the function can be used - for example DirectFunctionCall - so explicit
> deallocation can descrease a memory requirements when you call these
> functions from C.
DirectFunctionCall uses a memory context of the caller. For example,
you don't free "num" allocated by numeric_in and numeric_mul, also
mul_num allocated by int8_numeric...
I'd ask experienced hackers for importance of freeing (or leaving)
"str" and "buffer".

Sometimes, the memory can be released by caller only - or the memory usage is pretty minimal. But when you can release, memory then you should to do it to decrease push to memory allocator.

> postgres=# select pg_size_bytes('+912+ kB');
> ERROR: invalid unit: "+ kB"
> This is difficult - you have to divide string to two parts and first world is number, second world is unit.
Your code looks like a duplicated (not by lines, but by behavior).
numeric_in has similar checks, let it do them for you. The goal of
your function is just split mantissa and size unit, and if the last
one is found, turn it into an exponent.
See my example in the attachment check_mantissa_by_numeric_in.c. There
is just searching non-space char that can't be part of numeric. Then
all before it passes to numeric_in and let it check anything it
should. I even left first spaces to show full numeric part to a user
if an error occurs (for any case).
The second attachment single_buffer.c is an attempt to avoid
allocating the second buffer (the first is "str" allocated by
text_to_cstring) and copying into it. I don't think it gives a big
improvement, but nevertheless.

This code is little bit more complex than it is necessary (but few lines more) to produce readable error messages. I am thinking so it is correct.

I'll look on this patch next week.

Thank you for review

Regards

Pavel

===
[1] http://www.postgresql.org/message-id/29618.1451882238@sss.pgh.pa.us
[2] http://www.postgresql.org/message-id/CAB7nPqS6Wob4WnZb=DHB3O0Pc-nX1v3xJSzKN3z9KBeXgcQTRg@mail.gmail.com

--
Best regards,
Vitaly Burovoy

pgsql-hackers by date:

From: Vitaly Burovoy
Date: 22 January 2016, 06:31:10
Subject: Re: Extracting fields from 'infinity'::TIMESTAMP[TZ]

From: Craig Ringer
Date: 22 January 2016, 07:46:50
Subject: Re: WIP: Failover Slots

Re: custom function for converting human readable sizes to bytes - Mailing list pgsql-hackers

Previous

Next