Re: custom function for converting human readable sizes to bytes - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: custom function for converting human readable sizes to bytes
Date
Msg-id CAFj8pRAvQMcHi-vm0jaF8_DGXG7h83PJqt4932WqCLY+mpCq-Q@mail.gmail.com
Whole thread Raw
In response to Re: custom function for converting human readable sizes to bytes  (Vitaly Burovoy <vitaly.burovoy@gmail.com>)
Responses Re: custom function for converting human readable sizes to bytes  (Vitaly Burovoy <vitaly.burovoy@gmail.com>)
List pgsql-hackers
Hi

2016-01-26 6:25 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
Hello!


Regression tests are present (see p.II).

pg_proc.h has changed, so the CATALOG_VERSION_NO must be also changed.

this is not a part of patch - only a commiter knows CATALOG_VERSION_NO
 

Code comments, error message string, DESCR in pg_proc.h and doc
changes need proof-reading by a native English speaker, which I am
not.


===
The last patch fixes only part of my last notes, so I can only return
it to the next round with the same notes.

The list of my notes has not been done (all points listed below was
mentioned in my last review[1]):

1. The documentation has duplicate parts of size units (in table and
in the text). I think that one of them should be deleted (from the
table).

fixed
 

2. Test has a note about sharing(using) size units via(from)
memory_unit_conversion_table and a requirement to update the
documentation when new units are supported, but the note will not be
seen because there is no tests for that fail for units ("PB", "EB",
"ZB") which will change in the future by the
"memory_unit_conversion_table".

fixed


3. Code design is looked a little odd for me.

3a. There is an attempt to check a correctness of a numeric value
instead of using numeric_in which does the same thing. In the last
case the code looks more compact and nice (see the full explanation in
the previous review[1] and both PoC there).

Possibly that way is chosen to emit its own error messages ("invalid
size:" instead of "invalid input syntax for type numeric:"), but it
still leads errors like 'invalid unit: "+ kB"' for input value "+912+
kB" (see "dbsize.out").

In tried to enhanced this point (I don't agree, so this is a bug)
 

3b. There is an extra copying from the buffer "str" to the "buffer"
(it can be avoided and shown in an PoC "single_buffer.c").

3c. There is freeing buffers "str" and "buffer" that looks as a
counter-design since MemoryContext frees them at once with other
expired buffers (like "num" from numeric_in and numeric_mul; also
"mul_num" from int8_numeric) 

Numeric operations are on critical path for OLAP applications - so the code is rewritten to minimize palloc/free operations. I don't expect usage pg_size_bytes in OLAP critical path.
 


===
While I was writing that text I found five additional notes. Both
parts (above and below) are important (see concusion).

I. The new version of the patch (pg-size-bytes-10.patch) has the
simplest typo in an updated (from v9) row "inavalid size".

fixed 

II. There is no test for an empty input value, but it works as
expected (an error 'invalid size: ""').

fixed 

III. There is no support of 'bytes' unit, it seems such behavior got
majority approval[2].

No, because I have to use the supported units by configuration. The configuration supports only three chars long units. Support for "bytes" was removed, when I removed proprietary unit table.
 

IV. Code design of the function "parse_memory_unit" is returning
"bool" value and set "multiplier" via pointer. But while Tom Lane was
reviewed my patch[3] in the similar case he changed code that now
returns (see NonFiniteTimestampTzPart) not a flag, but a value (where
zero still means an error). Be honest it looks better, but the current
code is written like nearby functions.

It is a commiter decision. This is a implementation detail, and I don't see any variant as significant better - using 0 as error flag isn't the best too. But this is really detail.

V. The documentation lacks a note that the base of the "pg_size_bytes"
is 1024 whereas the base of the "pg_size_pretty" is 1000.

It isn't true, base for both are 1024. It was designed when special table was used for pg_size_bytes. But when we share same control table, then is wrong to use different table. The result can be optically different, but semantically same.

postgres=# select pg_size_pretty(pg_size_bytes('1MB'));
┌────────────────┐
│ pg_size_pretty │
╞════════════════╡
│ 1024 kB        │
└────────────────┘
(1 row)

Time: 0.787 ms
postgres=# select pg_size_pretty(pg_size_bytes('1024MB'));
┌────────────────┐
│ pg_size_pretty │
╞════════════════╡
│ 1024 MB        │
└────────────────┘
(1 row)


 

===
Concusion:

I *can't* mark this patch as "Ready for Committer" with all previous
notes, and I ask experienced hackers for a help.
I'm not a strong Postgres hacker yet and sending to the next round
with the same notes seems a nitpicking for me.

So questions to the hackers:

a. Are the rest of the notes from the previous review[1] (and listed
in pp.1-3c) a nitpicking or should be done?

b. Does the function in PoC "single_buffer.c" look more complex or easer?

c. Is it worth for p.II to change error message like 'Empty input
value is not a valid size for pg_size_bytes: "%s"'?

It is simply to implement, but It is looking like overengineering
 

d. Should the function "pg_size_bytes" handle "bytes" as a size unit
(see p.III)?
There was no disagreement, but if I saw that thread before I became a
reviewer and saw silent approval I would have argued. Moreover
supporting of negative sizes was explained[4] by a symmetry for the
"pg_size_pretty", which should include all size units produced by the
"pg_size_pretty". On the other hand those functions can't be symmetric
due to difference in bases (see p.V and [5])

negative values is fully supported.

support of "bytes" depends on support "bytes" unit by GUC. When "bytes" unit will be supported, then it can be used in pg_size_bytes immediately.

updated patch attached

Regards

Pavel
 

e. What way is preferred for p.IV?

f. If p.V is actual, how to write it in the documentation (be honest
I've no idea)? Should a description of the "pg_size_pretty" be changed
too?


[1]http://www.postgresql.org/message-id/CAKOSWNn=p8GX-P5Y-B4t4dg-aAHaTup+CrG41hQ9BeobZwX3KQ@mail.gmail.com
[2]http://www.postgresql.org/message-id/CACACo5QW7fFsFfhKsTjtYcP4QF3Oh9zA14SC6Z3DXx2yssJjSw@mail.gmail.com
[3]http://git.postgresql.org/pg/commitdiff/647d87c56ab6da70adb753c08d7cdf7ee905ea8a
[4]http://www.postgresql.org/message-id/CA+TgmoZFomG4eYorZZGf7pzotG9PxpUhtQvxLfsKiM4iZH8KRQ@mail.gmail.com
[5]http://www.postgresql.org/message-id/CA+TgmoYgS_xixUQy+xyw9futtMrTbC_S0c1C0_wOBwVbNf8aZA@mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Add generate_series(date,date) and generate_series(date,date,integer)
Next
From: Pavel Stehule
Date:
Subject: Re: proposal: PL/Pythonu - function ereport