Thread: custom function for converting human readable sizes to bytes

custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

I have to write some filters, and filtering by size is "unfriendly" due calculation in bytes.

I propose inversion function to pg_size_pretty function - like pg_human_size.

Usage:

SELECT * FROM pg_class
  WHERE pg_table_size(oid) > pg_human_size('2GB');


Ideas, comments?

Regards

Pavel
 

Re: custom function for converting human readable sizes to bytes

From
Jim Nasby
Date:
On 11/21/15 12:49 AM, Pavel Stehule wrote:
>
> I propose inversion function to pg_size_pretty function - like
> pg_human_size.
>
> Usage:
>
> SELECT * FROM pg_class
>    WHERE pg_table_size(oid) > pg_human_size('2GB');

I'm not a big fan of the name, but +1 on the general idea.

Maybe pg_size_pretty(text)?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2015-11-22 18:30 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 11/21/15 12:49 AM, Pavel Stehule wrote:

I propose inversion function to pg_size_pretty function - like
pg_human_size.

Usage:

SELECT * FROM pg_class
   WHERE pg_table_size(oid) > pg_human_size('2GB');

I'm not a big fan of the name, but +1 on the general idea.

I am for any other good name
 

Maybe pg_size_pretty(text)?

I understand to your idea, but it can be too strange - function and inversion function share same name.

What about pg_size(text), pg_size(value bigint, unit text) ?

Regards

Pavel
 
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: custom function for converting human readable sizes to bytes

From
Marko Tiikkaja
Date:
On 2015-11-22 18:30, Jim Nasby wrote:
> On 11/21/15 12:49 AM, Pavel Stehule wrote:
>>
>> I propose inversion function to pg_size_pretty function - like
>> pg_human_size.
>>
>> Usage:
>>
>> SELECT * FROM pg_class
>>     WHERE pg_table_size(oid) > pg_human_size('2GB');
>
> I'm not a big fan of the name, but +1 on the general idea.
>
> Maybe pg_size_pretty(text)?

pg_ytterp_ezis(text)



Re: custom function for converting human readable sizes to bytes

From
Jim Nasby
Date:
On 11/22/15 2:11 PM, Pavel Stehule wrote:
> What about pg_size(text), pg_size(value bigint, unit text) ?

I like, though I'd make it numeric or float. pg_size(3.5, 'GB') 
certainly seems like a reasonable use case...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2015-11-22 21:19 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 11/22/15 2:11 PM, Pavel Stehule wrote:
What about pg_size(text), pg_size(value bigint, unit text) ?

I like, though I'd make it numeric or float. pg_size(3.5, 'GB') certainly seems like a reasonable use case...

yes, good note.

Regards

Pavel
 

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: custom function for converting human readable sizes to bytes

From
Guillaume Lelarge
Date:
<p dir="ltr"><br /> Le 22 nov. 2015 21:29, "Pavel Stehule" <<a
href="mailto:pavel.stehule@gmail.com">pavel.stehule@gmail.com</a>>a écrit :<br /> ><br /> ><br /> ><br />
>2015-11-22 21:19 GMT+01:00 Jim Nasby <<a
href="mailto:Jim.Nasby@bluetreble.com">Jim.Nasby@bluetreble.com</a>>:<br/> >><br /> >> On 11/22/15 2:11
PM,Pavel Stehule wrote:<br /> >>><br /> >>> What about pg_size(text), pg_size(value bigint, unit
text)?<br /> >><br /> >><br /> >> I like, though I'd make it numeric or float. pg_size(3.5, 'GB')
certainlyseems like a reasonable use case...<br /> ><br /> ><br /> > yes, good note.<br /> ><p
dir="ltr">Whatabout pg_size_unpretty()?  

Re: custom function for converting human readable sizes to bytes

From
Corey Huinker
Date:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><p dir="ltr">What about pg_size_unpretty()? </blockquote></div>I was
goingto suggest pg_size_ugly(), but unpretty does emphasize the inverse (rather than opposite) nature of the
function.</div><divclass="gmail_extra"><br /></div></div> 

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2015-11-22 23:54 GMT+01:00 Corey Huinker <corey.huinker@gmail.com>:

What about pg_size_unpretty()?

I was going to suggest pg_size_ugly(), but unpretty does emphasize the inverse (rather than opposite) nature of the function.

"unpretty", "ugly" aren't good names

maybe pg_size_bytes or different approach

we can introduce data type - bytesize - default input/output is human readable text - and conversion to bigint is implicit

Some like

select .. where pg_table_size(oid) > bytesize '3.5GB'

and instead pg_size_pretty(pg_table_size(oid)) we can write pg_table_size(oid)::bytesize

Regards

Pavel

Re: custom function for converting human readable sizes to bytes

From
Corey Huinker
Date:
On Sun, Nov 22, 2015 at 11:24 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-11-22 23:54 GMT+01:00 Corey Huinker <corey.huinker@gmail.com>:

What about pg_size_unpretty()?

I was going to suggest pg_size_ugly(), but unpretty does emphasize the inverse (rather than opposite) nature of the function.

"unpretty", "ugly" aren't good names

maybe pg_size_bytes or different approach

we can introduce data type - bytesize - default input/output is human readable text - and conversion to bigint is implicit

Some like

select .. where pg_table_size(oid) > bytesize '3.5GB'

and instead pg_size_pretty(pg_table_size(oid)) we can write pg_table_size(oid)::bytesize

Regards

Pavel

+1 to both pg_size_bytes() and ::bytesize. Both contribute to making the statements more self-documenting.

Re: custom function for converting human readable sizes to bytes

From
Jim Nasby
Date:
On 11/23/15 3:11 AM, Corey Huinker wrote:
> +1 to both pg_size_bytes() and ::bytesize. Both contribute to making the
> statements more self-documenting.

The function seems like overkill to me if we have the type. Just my 
opinion though. I'm thinking the type could just be called 'size' too 
(or prettysize?). No reason it has to be tied to bytes (in particular 
this would work for bits too).

If we're going to add this, I suppose it should support the 'i prefixes' 
too (GiB, MiB, etc).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: custom function for converting human readable sizes to bytes

From
Tom Lane
Date:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 11/23/15 3:11 AM, Corey Huinker wrote:
>> +1 to both pg_size_bytes() and ::bytesize. Both contribute to making the
>> statements more self-documenting.

> The function seems like overkill to me if we have the type. Just my 
> opinion though. I'm thinking the type could just be called 'size' too 
> (or prettysize?). No reason it has to be tied to bytes (in particular 
> this would work for bits too).

Please, no.  That's *way* too generic a name.

I do not actually agree with making a type for this anyway.  I can
tolerate a function, but adding a datatype is overkill; and it will
introduce far more definitional issues than it's worth.  (eg, which
other types should have casts to/from it, and at what level)
        regards, tom lane



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2015-11-23 18:04 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 11/23/15 3:11 AM, Corey Huinker wrote:
>> +1 to both pg_size_bytes() and ::bytesize. Both contribute to making the
>> statements more self-documenting.

> The function seems like overkill to me if we have the type. Just my
> opinion though. I'm thinking the type could just be called 'size' too
> (or prettysize?). No reason it has to be tied to bytes (in particular
> this would work for bits too).

Please, no.  That's *way* too generic a name.

I do not actually agree with making a type for this anyway.  I can
tolerate a function, but adding a datatype is overkill; and it will
introduce far more definitional issues than it's worth.  (eg, which
other types should have casts to/from it, and at what level)

so pg_size_bytes is good enough for everybody?

Regards

Pavel
 

                        regards, tom lane

Re: custom function for converting human readable sizes to bytes

From
Gavin Flower
Date:
On 24/11/15 06:31, Pavel Stehule wrote:
>
>
> 2015-11-23 18:04 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us 
> <mailto:tgl@sss.pgh.pa.us>>:
>
>     Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
>     > On 11/23/15 3:11 AM, Corey Huinker wrote:
>     >> +1 to both pg_size_bytes() and ::bytesize. Both contribute to
>     making the
>     >> statements more self-documenting.
>
>     > The function seems like overkill to me if we have the type. Just my
>     > opinion though. I'm thinking the type could just be called
>     'size' too
>     > (or prettysize?). No reason it has to be tied to bytes (in
>     particular
>     > this would work for bits too).
>
>     Please, no.  That's *way* too generic a name.
>
>     I do not actually agree with making a type for this anyway.  I can
>     tolerate a function, but adding a datatype is overkill; and it will
>     introduce far more definitional issues than it's worth. (eg, which
>     other types should have casts to/from it, and at what level)
>
>
> so pg_size_bytes is good enough for everybody?
>
> Regards
>
> Pavel
>
>
>                             regards, tom lane
>
>
perhaps pg_size_bites for those people who want: KiB,  MiB, GiB, TiB, 
PiB, ,..   ???    :-)



Cheers,
Gavin



Re: custom function for converting human readable sizes to bytes

From
Alvaro Herrera
Date:
Pavel Stehule wrote:

> so pg_size_bytes is good enough for everybody?

That seems good enough to me.

I would have it accept GiB and GB and have both transform to base 2, and
have an optional boolean flag whose non-default value turns the GB
interpretation into base 10, leaving the GiB interpretation unaffected.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: custom function for converting human readable sizes to bytes

From
Robert Haas
Date:
On Mon, Nov 23, 2015 at 1:47 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Pavel Stehule wrote:
>
>> so pg_size_bytes is good enough for everybody?
>
> That seems good enough to me.
>
> I would have it accept GiB and GB and have both transform to base 2, and
> have an optional boolean flag whose non-default value turns the GB
> interpretation into base 10, leaving the GiB interpretation unaffected.

I think it should be orange.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

2015-11-23 19:47 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Pavel Stehule wrote:

> so pg_size_bytes is good enough for everybody?

That seems good enough to me.

I would have it accept GiB and GB and have both transform to base 2, and
have an optional boolean flag whose non-default value turns the GB
interpretation into base 10, leaving the GiB interpretation unaffected.

attached proof concept based on parser "parse_int" from guc.c

It works well to 1TB what is enough for memory setting, but too low for proposed target.

There are two ways

1. enhance the "parse_int"

2. using independent implementation - there is some redundant code, but we can support duble insted int, and we can support some additional units.

Regards

Pavel


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: custom function for converting human readable sizes to bytes

From
Jim Nasby
Date:
On 11/24/15 10:57 PM, Pavel Stehule wrote:
> +              errmsg("parameter \"%s\" isn't valid size value",

Should read " isn't a valid size value"
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: custom function for converting human readable sizes to bytes

From
Kevin Grittner
Date:
On Wed, Nov 25, 2015 at 12:57 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

> Should read " isn't a valid size value"

http://www.postgresql.org/docs/devel/static/error-style-guide.html#AEN110918

| Contractions. Avoid contractions, like "can't"; use "cannot" instead.

So " is not a valid size value" would be better.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


Hi


2. using independent implementation - there is some redundant code, but we can support duble insted int, and we can support some additional units.

 new patch is based on own implementation - use numeric/bigint calculations

+ regress tests and doc

Regards

Pavel


Attachment

Re: custom function for converting human readable sizes to bytes

From
Kyotaro HORIGUCHI
Date:
Hello, The meaning of "be orange" (I couldn't find it) interests
me but putting it aside..

I have some random comments.

At Mon, 30 Nov 2015 18:30:18 +0100, Pavel Stehule <pavel.stehule@gmail.com> wrote in
<CAFj8pRCd6We3TQMR0VbCN98wF0P3O3H6NaU4BTaosWcj443Qjw@mail.gmail.com>
> Hi
> 
> 
> > 2. using independent implementation - there is some redundant code, but we
> > can support duble insted int, and we can support some additional units.
> >
> 
>  new patch is based on own implementation - use numeric/bigint calculations
> 
> + regress tests and doc

1. What do you think about binary byte prefixes? (kiB, MiB...)
I couldn't read what Robert wrote upthread but I also would liketo have 2-base byte unit prifixes. (Sorry I couldn't
putitaside..)
 


2. Doesn't it allow units in lower case?
I think '1gb' and so should be acceptable as an input.


3. Why are you allow positive sign prefixing digits? I suppose that positive sign also shouldn't be allowed if it
rejectsprifixing negative sign.
 

4. Useless includes
dbsize.c is modified to include guc.h but it is useless.

5. Error message

+     if (ndigits == 0)
+         ereport(ERROR,
+                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                      errmsg("string is empty")));
If pg_size_bytes allows prefixing positive sign or spaces,ndigits == 0 here doesn't mean that the whole string is
empty.


6. Symmetry with pg_size_pretty
pg_size_pretty doesn't have units larger than TB. Just adding PBto it breaks backward compatibility, but leaving as it
isbreakssymmetry with this function. Some possible solution for thisthat I can guess for now are..
 
 - Leave it as is.
 - New GUC to allow PB for pg_size_pretty().
 - Expanded function such like pg_size_pretty2 (oops..)
 - New polymorphic (sibling?) function with additional   parameters to instruct how they work. (The following
involving2-base representations)
 
   pg_size_pretty(n, <2base>, <allow_PB or such..>);   pg_size_bytes(n, <2base>, <allow_PB or such..>);

7. Docs

+          Converts a size in human-readable format with size units
+          to bytes
The descriptions for the functions around use 'into' instead of'to' for the preposition.


8. Regression
The regression in the patch looks good enough as is (except thatit forgets the unit 'B' or prifixing spaces or sings),
buttheyare necessarily have individual tests. The following SQLstatement will do them at once.
 
 SELECT s, pg_size_bytes(s) FROM (VALUES ('1'), ('1B')...) as t(s);

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2015-12-01 11:02 GMT+01:00 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
Hello, The meaning of "be orange" (I couldn't find it) interests
me but putting it aside..

I have some random comments.

At Mon, 30 Nov 2015 18:30:18 +0100, Pavel Stehule <pavel.stehule@gmail.com> wrote in <CAFj8pRCd6We3TQMR0VbCN98wF0P3O3H6NaU4BTaosWcj443Qjw@mail.gmail.com>
> Hi
>
>
> > 2. using independent implementation - there is some redundant code, but we
> > can support duble insted int, and we can support some additional units.
> >
>
>  new patch is based on own implementation - use numeric/bigint calculations
>
> + regress tests and doc

1. What do you think about binary byte prefixes? (kiB, MiB...)

I don't know this mechanics - can you write it? It can be good idea/
 

 I couldn't read what Robert wrote upthread but I also would like
 to have 2-base byte unit prifixes. (Sorry I couldn't put it
 aside..)


2. Doesn't it allow units in lower case?

The parser is consistent with a behave used in configure file processing. We can change it, but then there will be divergence between this function and GUC parser.
 

 I think '1gb' and so should be acceptable as an input.


3. Why are you allow positive sign prefixing digits? I suppose
  that positive sign also shouldn't be allowed if it rejects
  prifixing negative sign.

I have not strong opinion about it. '-' is exactly wrong, but '+' can be acceptable. But it can be changed.
 

4. Useless includes

 dbsize.c is modified to include guc.h but it is useless.

I'll fix it.
 

5. Error message

+       if (ndigits == 0)
+               ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                        errmsg("string is empty")));

 If pg_size_bytes allows prefixing positive sign or spaces,
 ndigits == 0 here doesn't mean that the whole string is empty.


I'll fix it.
 

6. Symmetry with pg_size_pretty

 pg_size_pretty doesn't have units larger than TB. Just adding PB
 to it breaks backward compatibility, but leaving as it is breaks
 symmetry with this function. Some possible solution for this
 that I can guess for now are..

I prefer a warning about possible compatibility issue in pg_size_pretty and support PB directly there.
 

  - Leave it as is.

  - New GUC to allow PB for pg_size_pretty().

  - Expanded function such like pg_size_pretty2 (oops..)

  - New polymorphic (sibling?) function with additional
    parameters to instruct how they work. (The following
    involving 2-base representations)

    pg_size_pretty(n, <2base>, <allow_PB or such..>);
    pg_size_bytes(n, <2base>, <allow_PB or such..>);

7. Docs

+          Converts a size in human-readable format with size units
+          to bytes

 The descriptions for the functions around use 'into' instead of
 'to' for the preposition.


8. Regression

 The regression in the patch looks good enough as is (except that
 it forgets the unit 'B' or prifixing spaces or sings), but they
 are necessarily have individual tests. The following SQL
 statement will do them at once.

  SELECT s, pg_size_bytes(s) FROM (VALUES ('1'), ('1B')...) as t(s);


I'll fix it.

Thank you for your ideas

Regards

Pavel
 
regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2015-12-01 11:12 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2015-12-01 11:02 GMT+01:00 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
Hello, The meaning of "be orange" (I couldn't find it) interests
me but putting it aside..

I have some random comments.

At Mon, 30 Nov 2015 18:30:18 +0100, Pavel Stehule <pavel.stehule@gmail.com> wrote in <CAFj8pRCd6We3TQMR0VbCN98wF0P3O3H6NaU4BTaosWcj443Qjw@mail.gmail.com>
> Hi
>
>
> > 2. using independent implementation - there is some redundant code, but we
> > can support duble insted int, and we can support some additional units.
> >
>
>  new patch is based on own implementation - use numeric/bigint calculations
>
> + regress tests and doc

1. What do you think about binary byte prefixes? (kiB, MiB...)

I don't know this mechanics - can you write it? It can be good idea/
 

 I couldn't read what Robert wrote upthread but I also would like
 to have 2-base byte unit prifixes. (Sorry I couldn't put it
 aside..)


2. Doesn't it allow units in lower case?

The parser is consistent with a behave used in configure file processing. We can change it, but then there will be divergence between this function and GUC parser.
 

 I think '1gb' and so should be acceptable as an input.


3. Why are you allow positive sign prefixing digits? I suppose
  that positive sign also shouldn't be allowed if it rejects
  prifixing negative sign.

I have not strong opinion about it. '-' is exactly wrong, but '+' can be acceptable. But it can be changed.
 

4. Useless includes

 dbsize.c is modified to include guc.h but it is useless.

I'll fix it.
 

5. Error message

+       if (ndigits == 0)
+               ereport(ERROR,
+                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                        errmsg("string is empty")));

 If pg_size_bytes allows prefixing positive sign or spaces,
 ndigits == 0 here doesn't mean that the whole string is empty.


I'll fix it.
 

6. Symmetry with pg_size_pretty

 pg_size_pretty doesn't have units larger than TB. Just adding PB
 to it breaks backward compatibility, but leaving as it is breaks
 symmetry with this function. Some possible solution for this
 that I can guess for now are..

I prefer a warning about possible compatibility issue in pg_size_pretty and support PB directly there.
 

  - Leave it as is.

  - New GUC to allow PB for pg_size_pretty().

  - Expanded function such like pg_size_pretty2 (oops..)

  - New polymorphic (sibling?) function with additional
    parameters to instruct how they work. (The following
    involving 2-base representations)

    pg_size_pretty(n, <2base>, <allow_PB or such..>);
    pg_size_bytes(n, <2base>, <allow_PB or such..>);

7. Docs

+          Converts a size in human-readable format with size units
+          to bytes

 The descriptions for the functions around use 'into' instead of
 'to' for the preposition.


8. Regression

 The regression in the patch looks good enough as is (except that
 it forgets the unit 'B' or prifixing spaces or sings), but they
 are necessarily have individual tests. The following SQL
 statement will do them at once.

  SELECT s, pg_size_bytes(s) FROM (VALUES ('1'), ('1B')...) as t(s);


I'll fix it.

here is updated patch

Regards

Pavel
 

Thank you for your ideas

Regards

Pavel
 
regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center




Attachment

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

new update:

1. unit searching is case insensitive

2. initial support for binary byte prefixes - KiB, MiB, ..  (IEC standard), change behave for SI units

Second point is much more complex then it is looking - if pg_size_bytes should be consistent with pg_size_pretty.

The current pg_size_pretty and transformations in guc.c are based on JEDEC standard. Using this standard for GUC has sense - using it for object sizes is probably unhappy.

I tried to fix (and enhance) pg_size_pretty - now reports correct units, and via second parameter it allows to specify base: 2 (binary, IEC  - default) or 10 (SI).

I think it is good to have it. These standards are generic and wide used, but should to be pretty explained in documentation if we will use JEDEC for configuration. Probably better to leave JEDEC and prefer SI and IEC.

Plan B is fix Postgres on JEDEC only - it is trivial, simple - but it can look like archaic in next years.

Comments, notices?

Regards

Pavel





Attachment

Re: custom function for converting human readable sizes to bytes

From
Robert Haas
Date:
On Sun, Dec 20, 2015 at 4:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> new update:
>
> 1. unit searching is case insensitive
>
> 2. initial support for binary byte prefixes - KiB, MiB, ..  (IEC standard),
> change behave for SI units
>
> Second point is much more complex then it is looking - if pg_size_bytes
> should be consistent with pg_size_pretty.
>
> The current pg_size_pretty and transformations in guc.c are based on JEDEC
> standard. Using this standard for GUC has sense - using it for object sizes
> is probably unhappy.
>
> I tried to fix (and enhance) pg_size_pretty - now reports correct units, and
> via second parameter it allows to specify base: 2 (binary, IEC  - default)
> or 10 (SI).

-1 from me.  I don't think we should muck with the way pg_size_pretty works.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: custom function for converting human readable sizes to bytes

From
Michael Paquier
Date:
On Tue, Dec 22, 2015 at 12:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Dec 20, 2015 at 4:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> new update:
>>
>> 1. unit searching is case insensitive
>>
>> 2. initial support for binary byte prefixes - KiB, MiB, ..  (IEC standard),
>> change behave for SI units
>>
>> Second point is much more complex then it is looking - if pg_size_bytes
>> should be consistent with pg_size_pretty.
>>
>> The current pg_size_pretty and transformations in guc.c are based on JEDEC
>> standard. Using this standard for GUC has sense - using it for object sizes
>> is probably unhappy.
>>
>> I tried to fix (and enhance) pg_size_pretty - now reports correct units, and
>> via second parameter it allows to specify base: 2 (binary, IEC  - default)
>> or 10 (SI).
>
> -1 from me.  I don't think we should muck with the way pg_size_pretty works.

Yeah.

+ static const unit_multiplier unit_multiplier_table[] =
+ {
+     {"B", 1L},
+     {"kiB", 1024L},
+     {"MiB", 1024L * 1024},
+     {"GiB", 1024L * 1024 * 1024},
+     {"TiB", 1024L * 1024 * 1024 * 1024},
+     {"PiB", 1024L * 1024 * 1024 * 1024 * 1024},
This is rather close to memory_unit_conversion_table in guc.c. Would
it be worth refactoring those unit tables into something more generic
instead of duplicating them?
-- 
Michael



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2015-12-22 6:22 GMT+01:00 Michael Paquier <michael.paquier@gmail.com>:
On Tue, Dec 22, 2015 at 12:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Dec 20, 2015 at 4:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> new update:
>>
>> 1. unit searching is case insensitive
>>
>> 2. initial support for binary byte prefixes - KiB, MiB, ..  (IEC standard),
>> change behave for SI units
>>
>> Second point is much more complex then it is looking - if pg_size_bytes
>> should be consistent with pg_size_pretty.
>>
>> The current pg_size_pretty and transformations in guc.c are based on JEDEC
>> standard. Using this standard for GUC has sense - using it for object sizes
>> is probably unhappy.
>>
>> I tried to fix (and enhance) pg_size_pretty - now reports correct units, and
>> via second parameter it allows to specify base: 2 (binary, IEC  - default)
>> or 10 (SI).
>
> -1 from me.  I don't think we should muck with the way pg_size_pretty works.

Yeah.

+ static const unit_multiplier unit_multiplier_table[] =
+ {
+     {"B", 1L},
+     {"kiB", 1024L},
+     {"MiB", 1024L * 1024},
+     {"GiB", 1024L * 1024 * 1024},
+     {"TiB", 1024L * 1024 * 1024 * 1024},
+     {"PiB", 1024L * 1024 * 1024 * 1024 * 1024},
This is rather close to memory_unit_conversion_table in guc.c. Would
it be worth refactoring those unit tables into something more generic
instead of duplicating them?

yes, it is possible with following impacts:

1. We need add PB to memory_unit_conversion_table in guc.c
2. This table holds multipliers in JEDEC standard - and introduce other standards IEC, SI there isn't good idea.

Is it ok?

Regards

Pavel

 
--
Michael

Re: custom function for converting human readable sizes to bytes

From
Michael Paquier
Date:
On Tue, Dec 22, 2015 at 2:33 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2015-12-22 6:22 GMT+01:00 Michael Paquier <michael.paquier@gmail.com>:
>>
>> On Tue, Dec 22, 2015 at 12:11 AM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>> > On Sun, Dec 20, 2015 at 4:54 AM, Pavel Stehule <pavel.stehule@gmail.com>
>> > wrote:
>> >> new update:
>> >>
>> >> 1. unit searching is case insensitive
>> >>
>> >> 2. initial support for binary byte prefixes - KiB, MiB, ..  (IEC
>> >> standard),
>> >> change behave for SI units
>> >>
>> >> Second point is much more complex then it is looking - if pg_size_bytes
>> >> should be consistent with pg_size_pretty.
>> >>
>> >> The current pg_size_pretty and transformations in guc.c are based on
>> >> JEDEC
>> >> standard. Using this standard for GUC has sense - using it for object
>> >> sizes
>> >> is probably unhappy.
>> >>
>> >> I tried to fix (and enhance) pg_size_pretty - now reports correct
>> >> units, and
>> >> via second parameter it allows to specify base: 2 (binary, IEC  -
>> >> default)
>> >> or 10 (SI).
>> >
>> > -1 from me.  I don't think we should muck with the way pg_size_pretty
>> > works.
>>
>> Yeah.
>>
>> + static const unit_multiplier unit_multiplier_table[] =
>> + {
>> +     {"B", 1L},
>> +     {"kiB", 1024L},
>> +     {"MiB", 1024L * 1024},
>> +     {"GiB", 1024L * 1024 * 1024},
>> +     {"TiB", 1024L * 1024 * 1024 * 1024},
>> +     {"PiB", 1024L * 1024 * 1024 * 1024 * 1024},
>> This is rather close to memory_unit_conversion_table in guc.c. Would
>> it be worth refactoring those unit tables into something more generic
>> instead of duplicating them?
>
>
> yes, it is possible with following impacts:
>
> 1. We need add PB to memory_unit_conversion_table in guc.c

No real objection to that. It would would make sense to have it, but
we could not use it directly for a GUC. This just reminded me that
even if we support TB in GUC params, it is not possible to set for
example a GUC_UNIT_KB param to more than 2TB because those are limited
to be int32.

> 2. This table holds multipliers in JEDEC standard - and introduce other
> standards IEC, SI there isn't good idea.
>
> Is it ok?

Do you think it would be interesting to have GUC parameters able to
use those units? If this is a standard, it may make sense to actually
have them, no? Just a random thought, that's not something this patch
should take care of, but it would be good to avoid code duplication
where we can avoid it.
Regards,
-- 
Michael



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2015-12-22 6:57 GMT+01:00 Michael Paquier <michael.paquier@gmail.com>:
On Tue, Dec 22, 2015 at 2:33 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2015-12-22 6:22 GMT+01:00 Michael Paquier <michael.paquier@gmail.com>:
>>
>> On Tue, Dec 22, 2015 at 12:11 AM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>> > On Sun, Dec 20, 2015 at 4:54 AM, Pavel Stehule <pavel.stehule@gmail.com>
>> > wrote:
>> >> new update:
>> >>
>> >> 1. unit searching is case insensitive
>> >>
>> >> 2. initial support for binary byte prefixes - KiB, MiB, ..  (IEC
>> >> standard),
>> >> change behave for SI units
>> >>
>> >> Second point is much more complex then it is looking - if pg_size_bytes
>> >> should be consistent with pg_size_pretty.
>> >>
>> >> The current pg_size_pretty and transformations in guc.c are based on
>> >> JEDEC
>> >> standard. Using this standard for GUC has sense - using it for object
>> >> sizes
>> >> is probably unhappy.
>> >>
>> >> I tried to fix (and enhance) pg_size_pretty - now reports correct
>> >> units, and
>> >> via second parameter it allows to specify base: 2 (binary, IEC  -
>> >> default)
>> >> or 10 (SI).
>> >
>> > -1 from me.  I don't think we should muck with the way pg_size_pretty
>> > works.
>>
>> Yeah.
>>
>> + static const unit_multiplier unit_multiplier_table[] =
>> + {
>> +     {"B", 1L},
>> +     {"kiB", 1024L},
>> +     {"MiB", 1024L * 1024},
>> +     {"GiB", 1024L * 1024 * 1024},
>> +     {"TiB", 1024L * 1024 * 1024 * 1024},
>> +     {"PiB", 1024L * 1024 * 1024 * 1024 * 1024},
>> This is rather close to memory_unit_conversion_table in guc.c. Would
>> it be worth refactoring those unit tables into something more generic
>> instead of duplicating them?
>
>
> yes, it is possible with following impacts:
>
> 1. We need add PB to memory_unit_conversion_table in guc.c

No real objection to that. It would would make sense to have it, but
we could not use it directly for a GUC. This just reminded me that
even if we support TB in GUC params, it is not possible to set for
example a GUC_UNIT_KB param to more than 2TB because those are limited
to be int32.

> 2. This table holds multipliers in JEDEC standard - and introduce other
> standards IEC, SI there isn't good idea.
>
> Is it ok?

Do you think it would be interesting to have GUC parameters able to
use those units? If this is a standard, it may make sense to actually
have them, no? Just a random thought, that's not something this patch
should take care of, but it would be good to avoid code duplication
where we can avoid it.
Regards,

Nice to have it. But it can introduce a upgrade issues - there isn't possible to specify base. If we do some change here, we have to do related changes everywhere.

Our implementation is little bit obsolete, but nobody did some error reports, so probably now isn't time to change it. It's not too important.

Better to have two conversion tables (simple and small), than opening new topic, where I don't see any agreement to do changes - and these changes can be done separately.

Pavel

 
--
Michael

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

2015-12-21 16:11 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Sun, Dec 20, 2015 at 4:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> new update:
>
> 1. unit searching is case insensitive
>
> 2. initial support for binary byte prefixes - KiB, MiB, ..  (IEC standard),
> change behave for SI units
>
> Second point is much more complex then it is looking - if pg_size_bytes
> should be consistent with pg_size_pretty.
>
> The current pg_size_pretty and transformations in guc.c are based on JEDEC
> standard. Using this standard for GUC has sense - using it for object sizes
> is probably unhappy.
>
> I tried to fix (and enhance) pg_size_pretty - now reports correct units, and
> via second parameter it allows to specify base: 2 (binary, IEC  - default)
> or 10 (SI).

-1 from me.  I don't think we should muck with the way pg_size_pretty works.

new update - I reverted changes in pg_size_pretty

Regards

Pavel
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: custom function for converting human readable sizes to bytes

From
"Shulgin, Oleksandr"
Date:
On Tue, Dec 22, 2015 at 10:45 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2015-12-21 16:11 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Sun, Dec 20, 2015 at 4:54 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> new update:
>
> 1. unit searching is case insensitive
>
> 2. initial support for binary byte prefixes - KiB, MiB, ..  (IEC standard),
> change behave for SI units
>
> Second point is much more complex then it is looking - if pg_size_bytes
> should be consistent with pg_size_pretty.
>
> The current pg_size_pretty and transformations in guc.c are based on JEDEC
> standard. Using this standard for GUC has sense - using it for object sizes
> is probably unhappy.
>
> I tried to fix (and enhance) pg_size_pretty - now reports correct units, and
> via second parameter it allows to specify base: 2 (binary, IEC  - default)
> or 10 (SI).

-1 from me.  I don't think we should muck with the way pg_size_pretty works.

new update - I reverted changes in pg_size_pretty

Hi,

I didn't check out earlier versions of this patch, but the latest one still changes pg_size_pretty() to emit PB suffix.

I don't think it is worth it to throw a number of changes together like that.  We should focus on adding pg_size_bytes() first and make it compatible with both pg_size_pretty() and existing GUC units: that is support suffixes up to TB and make sure they have the meaning of powers of 2^10, not 10^3.  Re-using the table present in guc.c would be a plus.

Next, we could think about adding handling of PB suffix on input and output, but I don't see a big problem if that is emitted as 1024TB or the user has to specify it as 1024TB in a GUC or argument to pg_size_bytes(): an minor inconvenience only.

--
Alex

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi


-1 from me.  I don't think we should muck with the way pg_size_pretty works.

new update - I reverted changes in pg_size_pretty

Hi,

I didn't check out earlier versions of this patch, but the latest one still changes pg_size_pretty() to emit PB suffix.

I don't think it is worth it to throw a number of changes together like that.  We should focus on adding pg_size_bytes() first and make it compatible with both pg_size_pretty() and existing GUC units: that is support suffixes up to TB and make sure they have the meaning of powers of 2^10, not 10^3.  Re-using the table present in guc.c would be a plus.

Next, we could think about adding handling of PB suffix on input and output, but I don't see a big problem if that is emitted as 1024TB or the user has to specify it as 1024TB in a GUC or argument to pg_size_bytes(): an minor inconvenience only.

Last version still support BP in pg_size_pretty. It isn't big change. PB isn't issue.

We have to do significant decision - should to support SI units in pg_size_bytes? We cannot to change it later. There is disagreement for SI units in pg_size_pretty, so SI units in pg_size_bytes can be inconsistent.

Regards

Pavel
 

--
Alex


Re: custom function for converting human readable sizes to bytes

From
"Shulgin, Oleksandr"
Date:
On Tue, Dec 29, 2015 at 7:15 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I didn't check out earlier versions of this patch, but the latest one still changes pg_size_pretty() to emit PB suffix.

I don't think it is worth it to throw a number of changes together like that.  We should focus on adding pg_size_bytes() first and make it compatible with both pg_size_pretty() and existing GUC units: that is support suffixes up to TB and make sure they have the meaning of powers of 2^10, not 10^3.  Re-using the table present in guc.c would be a plus.

Next, we could think about adding handling of PB suffix on input and output, but I don't see a big problem if that is emitted as 1024TB or the user has to specify it as 1024TB in a GUC or argument to pg_size_bytes(): an minor inconvenience only.

Last version still support BP in pg_size_pretty. It isn't big change. PB isn't issue.

We have to do significant decision - should to support SI units in pg_size_bytes? We cannot to change it later. There is disagreement for SI units in pg_size_pretty, so SI units in pg_size_bytes can be inconsistent.

There is no way at this point to add support for SI units in a consistent and backwards-compatible manner: both GUC and pg_size_pretty() use SI suffixes (kB, MB, GB, TB) with the meaning of 2^(10*n) (KiB, MiB, GiB, TiB).  But given that the size relates to memory or disk space, it is quite customary *not* to use SI units, so I don't see a point in adding those.

--
Alex

Re: custom function for converting human readable sizes to bytes

From
Robert Haas
Date:
On Mon, Dec 28, 2015 at 8:45 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> I didn't check out earlier versions of this patch, but the latest one still
> changes pg_size_pretty() to emit PB suffix.
>
> I don't think it is worth it to throw a number of changes together like
> that.  We should focus on adding pg_size_bytes() first and make it
> compatible with both pg_size_pretty() and existing GUC units: that is
> support suffixes up to TB and make sure they have the meaning of powers of
> 2^10, not 10^3.  Re-using the table present in guc.c would be a plus.
>
> Next, we could think about adding handling of PB suffix on input and output,
> but I don't see a big problem if that is emitted as 1024TB or the user has
> to specify it as 1024TB in a GUC or argument to pg_size_bytes(): an minor
> inconvenience only.

+1 to everything in this email.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2015-12-30 17:33 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Dec 28, 2015 at 8:45 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> I didn't check out earlier versions of this patch, but the latest one still
> changes pg_size_pretty() to emit PB suffix.
>
> I don't think it is worth it to throw a number of changes together like
> that.  We should focus on adding pg_size_bytes() first and make it
> compatible with both pg_size_pretty() and existing GUC units: that is
> support suffixes up to TB and make sure they have the meaning of powers of
> 2^10, not 10^3.  Re-using the table present in guc.c would be a plus.
>
> Next, we could think about adding handling of PB suffix on input and output,
> but I don't see a big problem if that is emitted as 1024TB or the user has
> to specify it as 1024TB in a GUC or argument to pg_size_bytes(): an minor
> inconvenience only.

+1 to everything in this email.

so I removed support for PB and SI units. Now the memory_unit_conversion_table is shared.

Regards

Pavel
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: custom function for converting human readable sizes to bytes

From
"Shulgin, Oleksandr"
Date:
On Wed, Dec 30, 2015 at 8:28 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-12-30 17:33 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Dec 28, 2015 at 8:45 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> I didn't check out earlier versions of this patch, but the latest one still
> changes pg_size_pretty() to emit PB suffix.
>
> I don't think it is worth it to throw a number of changes together like
> that.  We should focus on adding pg_size_bytes() first and make it
> compatible with both pg_size_pretty() and existing GUC units: that is
> support suffixes up to TB and make sure they have the meaning of powers of
> 2^10, not 10^3.  Re-using the table present in guc.c would be a plus.
>
> Next, we could think about adding handling of PB suffix on input and output,
> but I don't see a big problem if that is emitted as 1024TB or the user has
> to specify it as 1024TB in a GUC or argument to pg_size_bytes(): an minor
> inconvenience only.

+1 to everything in this email.

so I removed support for PB and SI units. Now the memory_unit_conversion_table is shared.

Looks better, thanks.

I'm not sure why the need to touch the regression test for pg_size_pretty():

!                10.5 | 10.5 bytes     | -10.5 bytes
!              1000.5 | 1000.5 bytes   | -1000.5 bytes
!           1000000.5 | 977 kB         | -977 kB
!        1000000000.5 | 954 MB         | -954 MB
!     1000000000000.5 | 931 GB         | -931 GB
!  1000000000000000.5 | 909 TB         | -909 TB

A nitpick, this loop:

+ while (*cp)
+ {
+ if ((isdigit(*cp) || *cp == '.') && ndigits < MAX_DIGITS)
+ digits[ndigits++] = *cp++;
+ else
+ break;
+ }

would be a bit easier to parse if spelled as:

+ while (*cp && (isdigit(*cp) || *cp == '.') && ndigits < MAX_DIGITS)
+ digits[ndigits++] = *cp++;

On the other hand, this seems to truncate the digits silently:

+ digits[ndigits] = '\0';

I don't think we want that, e.g:

postgres=# select pg_size_bytes('9223372036854775807.9');
ERROR:  invalid unit "9"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

I think making a mutable copy of the input string and truncating it before passing to numeric_in() would make more sense--no need to hard-code MAX_DIGITS.  The same goes for hard-coding MAX_UNIT_LEN, e.g. compare the following two outputs:

postgres=# select pg_size_bytes('1 KiB');
ERROR:  invalid unit "KiB"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

postgres=# select pg_size_bytes('1024 bytes');
ERROR:  invalid format

I believe we should see a similar error message and a hint in the latter case.  (No, I don't think we should add support for 'bytes' as a unit, not even for "compatibility" with pg_size_pretty()--for one, I don't think it would be wise to expect pg_size_bytes() to be able to deparse *every* possible output produced by pg_size_pretty() as it's purpose is human-readable display; also, pg_size_pretty() can easily produce output that doesn't fit into bigint type, or is just negative)

Code comments and doc change need proof-reading by a native English speaker, which I am not.

--
Alex

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

2016-01-04 12:46 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Wed, Dec 30, 2015 at 8:28 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2015-12-30 17:33 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Dec 28, 2015 at 8:45 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> I didn't check out earlier versions of this patch, but the latest one still
> changes pg_size_pretty() to emit PB suffix.
>
> I don't think it is worth it to throw a number of changes together like
> that.  We should focus on adding pg_size_bytes() first and make it
> compatible with both pg_size_pretty() and existing GUC units: that is
> support suffixes up to TB and make sure they have the meaning of powers of
> 2^10, not 10^3.  Re-using the table present in guc.c would be a plus.
>
> Next, we could think about adding handling of PB suffix on input and output,
> but I don't see a big problem if that is emitted as 1024TB or the user has
> to specify it as 1024TB in a GUC or argument to pg_size_bytes(): an minor
> inconvenience only.

+1 to everything in this email.

so I removed support for PB and SI units. Now the memory_unit_conversion_table is shared.

Looks better, thanks.

I'm not sure why the need to touch the regression test for pg_size_pretty():

!                10.5 | 10.5 bytes     | -10.5 bytes
!              1000.5 | 1000.5 bytes   | -1000.5 bytes
!           1000000.5 | 977 kB         | -977 kB
!        1000000000.5 | 954 MB         | -954 MB
!     1000000000000.5 | 931 GB         | -931 GB
!  1000000000000000.5 | 909 TB         | -909 TB


fixed
 
A nitpick, this loop:

+ while (*cp)
+ {
+ if ((isdigit(*cp) || *cp == '.') && ndigits < MAX_DIGITS)
+ digits[ndigits++] = *cp++;
+ else
+ break;
+ }

would be a bit easier to parse if spelled as:

+ while (*cp && (isdigit(*cp) || *cp == '.') && ndigits < MAX_DIGITS)
+ digits[ndigits++] = *cp++;

fixed
 

On the other hand, this seems to truncate the digits silently:

+ digits[ndigits] = '\0';

I don't think we want that, e.g:

postgres=# select pg_size_bytes('9223372036854775807.9');
ERROR:  invalid unit "9"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

I think making a mutable copy of the input string and truncating it before passing to numeric_in() would make more sense--no need to hard-code MAX_DIGITS.  The same goes for hard-coding MAX_UNIT_LEN, e.g. compare the following two outputs:

postgres=# select pg_size_bytes('1 KiB');
ERROR:  invalid unit "KiB"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

postgres=# select pg_size_bytes('1024 bytes');
ERROR:  invalid format


fixed
 
I believe we should see a similar error message and a hint in the latter case.  (No, I don't think we should add support for 'bytes' as a unit, not even for "compatibility" with pg_size_pretty()--for one, I don't think it would be wise to expect pg_size_bytes() to be able to deparse *every* possible output produced by pg_size_pretty() as it's purpose is human-readable display; also, pg_size_pretty() can easily produce output that doesn't fit into bigint type, or is just negative)

Code comments and doc change need proof-reading by a native English speaker, which I am not.


Regards

Pavel
 

--
Alex


Attachment

Re: custom function for converting human readable sizes to bytes

From
Robert Haas
Date:
On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> [ new patch ]

+         case '-':
+             ereport(ERROR,
+                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                      errmsg("size cannot be negative")));

Why not?  I bet if you copy any - sign to the buffer, this will Just Work.

+         if ( conv->base_unit == GUC_UNIT_KB &&

Whitespace.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-01-04 16:51 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> [ new patch ]

+         case '-':
+             ereport(ERROR,
+                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                      errmsg("size cannot be negative")));

Why not?  I bet if you copy any - sign to the buffer, this will Just Work.

true, fixed
 

+         if ( conv->base_unit == GUC_UNIT_KB &&

Whitespace.

I don't see it ??

Regards

Pavel
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: custom function for converting human readable sizes to bytes

From
Thom Brown
Date:
On 4 January 2016 at 15:17, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> 2016-01-04 12:46 GMT+01:00 Shulgin, Oleksandr
> <oleksandr.shulgin@zalando.de>:
>>
>> On Wed, Dec 30, 2015 at 8:28 PM, Pavel Stehule <pavel.stehule@gmail.com>
>> wrote:
>>>
>>>
>>>
>>> 2015-12-30 17:33 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
>>>>
>>>> On Mon, Dec 28, 2015 at 8:45 AM, Shulgin, Oleksandr
>>>> <oleksandr.shulgin@zalando.de> wrote:
>>>> > I didn't check out earlier versions of this patch, but the latest one
>>>> > still
>>>> > changes pg_size_pretty() to emit PB suffix.
>>>> >
>>>> > I don't think it is worth it to throw a number of changes together
>>>> > like
>>>> > that.  We should focus on adding pg_size_bytes() first and make it
>>>> > compatible with both pg_size_pretty() and existing GUC units: that is
>>>> > support suffixes up to TB and make sure they have the meaning of
>>>> > powers of
>>>> > 2^10, not 10^3.  Re-using the table present in guc.c would be a plus.
>>>> >
>>>> > Next, we could think about adding handling of PB suffix on input and
>>>> > output,
>>>> > but I don't see a big problem if that is emitted as 1024TB or the user
>>>> > has
>>>> > to specify it as 1024TB in a GUC or argument to pg_size_bytes(): an
>>>> > minor
>>>> > inconvenience only.
>>>>
>>>> +1 to everything in this email.
>>>
>>>
>>> so I removed support for PB and SI units. Now the
>>> memory_unit_conversion_table is shared.
>>
>>
>> Looks better, thanks.
>>
>> I'm not sure why the need to touch the regression test for
>> pg_size_pretty():
>>
>> !                10.5 | 10.5 bytes     | -10.5 bytes
>> !              1000.5 | 1000.5 bytes   | -1000.5 bytes
>> !           1000000.5 | 977 kB         | -977 kB
>> !        1000000000.5 | 954 MB         | -954 MB
>> !     1000000000000.5 | 931 GB         | -931 GB
>> !  1000000000000000.5 | 909 TB         | -909 TB
>>
>
> fixed
>
>>
>> A nitpick, this loop:
>>
>> + while (*cp)
>> + {
>> + if ((isdigit(*cp) || *cp == '.') && ndigits < MAX_DIGITS)
>> + digits[ndigits++] = *cp++;
>> + else
>> + break;
>> + }
>>
>> would be a bit easier to parse if spelled as:
>>
>> + while (*cp && (isdigit(*cp) || *cp == '.') && ndigits < MAX_DIGITS)
>> + digits[ndigits++] = *cp++;
>
>
> fixed
>
>>
>>
>> On the other hand, this seems to truncate the digits silently:
>>
>> + digits[ndigits] = '\0';
>>
>> I don't think we want that, e.g:
>>
>> postgres=# select pg_size_bytes('9223372036854775807.9');
>> ERROR:  invalid unit "9"
>> HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
>>
>> I think making a mutable copy of the input string and truncating it before
>> passing to numeric_in() would make more sense--no need to hard-code
>> MAX_DIGITS.  The same goes for hard-coding MAX_UNIT_LEN, e.g. compare the
>> following two outputs:
>>
>> postgres=# select pg_size_bytes('1 KiB');
>> ERROR:  invalid unit "KiB"
>> HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
>>
>> postgres=# select pg_size_bytes('1024 bytes');
>> ERROR:  invalid format
>>
>
> fixed

Hi,

Some feedback:

+          Converts a size in human-readable format with size units
+          into bytes.  The parameter is case insensitive string. Following
+          units are supported: kB, MB, GB, TB.

May I suggest:

"Converts a size in human-readable format with size units into bytes.
The parameter is case-insensitive, and the supported size units are
are: kB, MB, GB and TB."

+  * Convert human readable size to long int.

Big int?

+  * Due suppor decimal value and case insensitivity of units
+  * a function parse_intcannot be used.

Is this supposed to be saying:

"Due to support for decimal values..."?

and "a function parse_int cannot be used."?

+          * Use buffer as unit if there are not any nonspace char,
+          * else use a original unit string.

s/use a/use an/

+          * Now, the multiplier is in KB unit. It should be multiplied by 1024
+          * before usage

s/unit/units/

Regards

Thom



Re: custom function for converting human readable sizes to bytes

From
"Shulgin, Oleksandr"
Date:
On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> [ new patch ]

+         case '-':
+             ereport(ERROR,
+                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                      errmsg("size cannot be negative")));

Why not?  I bet if you copy any - sign to the buffer, this will Just Work.

I'm also inclined on dropping that explicit check for empty string below and let numeric_in() error out on that.  Does this look OK, or can it confuse someone:

postgres=# select pg_size_bytes('');
ERROR:  invalid input syntax for type numeric: ""

?

+         if ( conv->base_unit == GUC_UNIT_KB &&

Between "(" and "conv->..." I believe.

---
Alex

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> [ new patch ]

+         case '-':
+             ereport(ERROR,
+                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                      errmsg("size cannot be negative")));

Why not?  I bet if you copy any - sign to the buffer, this will Just Work.

I'm also inclined on dropping that explicit check for empty string below and let numeric_in() error out on that.  Does this look OK, or can it confuse someone:

postgres=# select pg_size_bytes('');
ERROR:  invalid input syntax for type numeric: ""

?

+         if ( conv->base_unit == GUC_UNIT_KB &&

Between "(" and "conv->..." I believe.

both fixed

Regards

Pavel
 

---
Alex


Attachment

Re: custom function for converting human readable sizes to bytes

From
Robert Haas
Date:
On Mon, Jan 4, 2016 at 11:48 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com>
>> wrote:
>> > [ new patch ]
>>
>> +         case '-':
>> +             ereport(ERROR,
>> +                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                      errmsg("size cannot be negative")));
>>
>> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.
>
>
> I'm also inclined on dropping that explicit check for empty string below and
> let numeric_in() error out on that.  Does this look OK, or can it confuse
> someone:
>
> postgres=# select pg_size_bytes('');
> ERROR:  invalid input syntax for type numeric: ""

I think that's a pretty bad error message.  I mean, the user is
calling a function that takes text as an input data type.  So, where's
numeric involved?

I'm also kind of wondering what the intended use case for this
function is.  Why do we want it?  Do we want it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: custom function for converting human readable sizes to bytes

From
"Shulgin, Oleksandr"
Date:
On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
>>
>> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>
>>> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>> > [ new patch ]
>>>
>>> +         case '-':
>>> +             ereport(ERROR,
>>> +                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> +                      errmsg("size cannot be negative")));
>>>
>>> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.
>>
>>
>> I'm also inclined on dropping that explicit check for empty string below and let numeric_in() error out on that.  Does this look OK, or can it confuse someone:
>>
>> postgres=# select pg_size_bytes('');
>> ERROR:  invalid input syntax for type numeric: ""
>>
>> ?
>>
>>> +         if ( conv->base_unit == GUC_UNIT_KB &&
>>
>>
>> Between "(" and "conv->..." I believe.
>
>
> both fixed

Hm...

> + switch (*strptr)
> + {
> + /* ignore plus symbol */
> + case '+':
> + case '-':
> + *bufptr++ = *strptr++;
> + break;
> + }

Well, to that amount you don't need any special checks, I'm just not sure if reported error message is not misleading if we let numeric_in() handle all the errors.  At least it can cope with the leading spaces, +/- and empty input quite well.

--
Alex

Re: custom function for converting human readable sizes to bytes

From
"Shulgin, Oleksandr"
Date:
On Mon, Jan 4, 2016 at 6:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 4, 2016 at 11:48 AM, Shulgin, Oleksandr
>
> postgres=# select pg_size_bytes('');
> ERROR:  invalid input syntax for type numeric: ""

I think that's a pretty bad error message.  I mean, the user is
calling a function that takes text as an input data type.  So, where's
numeric involved?

Is there a way to force CONTEXT output in the reported error?  I guess that could help.

I'm also kind of wondering what the intended use case for this
function is.  Why do we want it?  Do we want it?

As suggested above a usecase could be like the following:

SELECT relname FROM pg_class WHERE pg_relation_size(oid) > pg_size_bytes('100 GB');

I think it's neat and useful.

--
Alex

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-01-04 18:14 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Jan 4, 2016 at 11:48 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com>
>> wrote:
>> > [ new patch ]
>>
>> +         case '-':
>> +             ereport(ERROR,
>> +                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                      errmsg("size cannot be negative")));
>>
>> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.
>
>
> I'm also inclined on dropping that explicit check for empty string below and
> let numeric_in() error out on that.  Does this look OK, or can it confuse
> someone:
>
> postgres=# select pg_size_bytes('');
> ERROR:  invalid input syntax for type numeric: ""

I think that's a pretty bad error message.  I mean, the user is
calling a function that takes text as an input data type.  So, where's
numeric involved?

last version is little bit better

postgres=# select pg_size_bytes('');
ERROR:  22023: "" is not number

 

I'm also kind of wondering what the intended use case for this
function is.  Why do we want it?  Do we want it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-01-04 18:13 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
>>
>> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>
>>> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>> > [ new patch ]
>>>
>>> +         case '-':
>>> +             ereport(ERROR,
>>> +                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> +                      errmsg("size cannot be negative")));
>>>
>>> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.
>>
>>
>> I'm also inclined on dropping that explicit check for empty string below and let numeric_in() error out on that.  Does this look OK, or can it confuse someone:
>>
>> postgres=# select pg_size_bytes('');
>> ERROR:  invalid input syntax for type numeric: ""
>>
>> ?
>>
>>> +         if ( conv->base_unit == GUC_UNIT_KB &&
>>
>>
>> Between "(" and "conv->..." I believe.
>
>
> both fixed

Hm...

> + switch (*strptr)
> + {
> + /* ignore plus symbol */
> + case '+':
> + case '-':
> + *bufptr++ = *strptr++;
> + break;
> + }

Well, to that amount you don't need any special checks, I'm just not sure if reported error message is not misleading if we let numeric_in() handle all the errors.  At least it can cope with the leading spaces, +/- and empty input quite well.

I don't would to catch a exception from numeric_in - so I try to solve some simple situations, where I can raise possible better error message.

Regards

Pavel
 

--
Alex


Re: custom function for converting human readable sizes to bytes

From
Robert Haas
Date:
On Mon, Jan 4, 2016 at 12:17 PM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
>> I'm also kind of wondering what the intended use case for this
>> function is.  Why do we want it?  Do we want it?
>
> As suggested above a usecase could be like the following:
>
> SELECT relname FROM pg_class WHERE pg_relation_size(oid) >
> pg_size_bytes('100 GB');
>
> I think it's neat and useful.

But you could also write SELECT relname FROM pg_class WHERE
pg_relation_size(oid) > 100 * 1024^3, which is actually fewer
characters.  Maybe pg_size_bytes('100 GB') is easier for some people
to remember than 100 * 1024^3, but I'm probably not one of those
people.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-01-04 21:29 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Jan 4, 2016 at 12:17 PM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
>> I'm also kind of wondering what the intended use case for this
>> function is.  Why do we want it?  Do we want it?
>
> As suggested above a usecase could be like the following:
>
> SELECT relname FROM pg_class WHERE pg_relation_size(oid) >
> pg_size_bytes('100 GB');
>
> I think it's neat and useful.

But you could also write SELECT relname FROM pg_class WHERE
pg_relation_size(oid) > 100 * 1024^3, which is actually fewer
characters.  Maybe pg_size_bytes('100 GB') is easier for some people
to remember than 100 * 1024^3, but I'm probably not one of those
people.

I am really one who hasn't memory for numbers - "100GB" is much more verbose for me. It is more clean in more complex maintenance queries. And if you need short syntax, you can write own SQL function

CREATE OR REPLACE FUNCTION size(text) RETURNS bigint AS $$ SELECT pg_size_bytes($1) $$ LANGUAGE sql;

then ... pg_relation_size(oid) > size('100GB')

Regards

Pavel
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: custom function for converting human readable sizes to bytes

From
Alvaro Herrera
Date:
Robert Haas wrote:

> But you could also write SELECT relname FROM pg_class WHERE
> pg_relation_size(oid) > 100 * 1024^3, which is actually fewer
> characters.  Maybe pg_size_bytes('100 GB') is easier for some people
> to remember than 100 * 1024^3, but I'm probably not one of those
> people.

Nah, that might work for geek types, but I doubt it's the preferred
spelling for most people.  I think the proposal is quite reasonable.

If we were only catering for people who can do 2^10 arithmetic off the
top of their heads, we wouldn't have pg_size_pretty at all, would we?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: custom function for converting human readable sizes to bytes

From
Robert Haas
Date:
On Mon, Jan 4, 2016 at 3:48 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>
>> But you could also write SELECT relname FROM pg_class WHERE
>> pg_relation_size(oid) > 100 * 1024^3, which is actually fewer
>> characters.  Maybe pg_size_bytes('100 GB') is easier for some people
>> to remember than 100 * 1024^3, but I'm probably not one of those
>> people.
>
> Nah, that might work for geek types, but I doubt it's the preferred
> spelling for most people.  I think the proposal is quite reasonable.
>
> If we were only catering for people who can do 2^10 arithmetic off the
> top of their heads, we wouldn't have pg_size_pretty at all, would we?

Well, I don't know what 100 * 1024^3 is off the top of my head, but I
know that I can compute it by typing exactly that.  So I think one
direction is easier than the other.  However, IJWH.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 1/4/16, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
>> [ new patch ]
>
> +         case '-':
> +             ereport(ERROR,
> +                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                      errmsg("size cannot be negative")));
>
> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.

Hmm. The function's name is pg_size_bytes. How number of bytes can be
negative? How any length can be negative? If anyone insert '-' sign to
an argument, it is copy-paste error. I don't see any case where there
is possible negatives as input value.

I prefer error message instead of getting all relations (by using
comparison from the initial letter) just because of copy-paste mistake
or incomplete checking of input values at app-level.

> ...
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 1/4/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2016-01-04 18:13 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> :
>> On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> > 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
>> >> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> >>
>> >> I'm also inclined on dropping that explicit check for empty string
>> >> below and let numeric_in() error out on that.  Does this look OK, or can
>> >> it confuse someone:
>> >> postgres=# select pg_size_bytes('');
>> >> ERROR:  invalid input syntax for type numeric: ""
>> >
>> > both fixed
>>
>> Hm...
>>
>> > + switch (*strptr)
>> > + {
>> > + /* ignore plus symbol */
>> > + case '+':
>> > + case '-':
>> > + *bufptr++ = *strptr++;
>> > + break;
>> > + }
>>
>> Well, to that amount you don't need any special checks, I'm just not sure
>> if reported error message is not misleading if we let numeric_in() handle
>> all the errors.  At least it can cope with the leading spaces, +/- and
>> empty input quite well.
>>
>
> I don't would to catch a exception from numeric_in - so I try to solve some
> simple situations, where I can raise possible better error message.

There are several cases where your behavior gives strange errors (see below).

Next batch of notes:

src/include/catalog/pg_proc.h:
---
+ DATA(insert OID = 3317 ( pg_size_bytes...
now oid 3317 is used (by pg_stat_get_wal_receiver), 3318 is free

---
+ DESCR("convert a human readable text with size units to big int bytes");
May be the best way is to copy the first sentence from the doc?
("convert a size in human-readable format with size units into bytes")

====
src/backend/utils/adt/dbsize.c:
+ text           *arg = PG_GETARG_TEXT_PP(0);
+ char           *str = text_to_cstring(arg);
...
+     /* working buffer cannot be longer than original string */
+     buffer = (char *) palloc(VARSIZE_ANY_EXHDR(arg) + 1);
Is there any reason to get TEXT for only converting it to cstring and
call VARSIZE_ANY_EXHDR instead of strlen?

---
+     text           *arg = PG_GETARG_TEXT_PP(0);
+     char           *str = text_to_cstring(arg);
+     char    *strptr = str;
+     char           *buffer;
There are wrong offsets of variable names after their types (among all
body of the "pg_size_bytes" function).
See variable declarations in nearby functions ( >> "make the new code look like the existing code around it"
http://www.postgresql.org/docs/devel/static/source-format.html
)

---
+                      errmsg("\"%s\" is not number", str)));
s/is not number/is not a number/
(the second version can be found in a couple places besides translations)

---
+     if (*strptr != '\0')
...
+         while (*strptr && !isspace(*strptr))
Sometimes it explicitly compares to '\0', sometimes implicitly.
Common use is explicit comparison and it is preferred due to different
compilers (their conversions to boolean).

---
+     /* Skip leading spaces */
...
+         /* ignore plus symbol */
...
+     /* copy digits to working buffer */
...
+     /* allow whitespace between integer and unit */
I'm also inclined on dropping that explicit skipping spaces, checking
for +/- symbols, but copying all digits, spaces, dots and '+-' symbols
and let numeric_in() error out on that.

It allows to get correct error messages for something like:
postgres=# select pg_size_bytes('.+912');
ERROR:  invalid input syntax for type numeric: ".+912"
postgres=# select pg_size_bytes('+912+ kB');
ERROR:  invalid input syntax for type numeric: "+912+ "
postgres=# select pg_size_bytes('++123 kB');
ERROR:  invalid input syntax for type numeric: "++123 "

instead of current:
postgres=# select pg_size_bytes('.+912');
ERROR:  invalid input syntax for type numeric: "."
postgres=# select pg_size_bytes('+912+ kB');
ERROR:  invalid unit: "+ kB"
postgres=# select pg_size_bytes('++123 kB');
ERROR:  invalid input syntax for type numeric: "+"

---
+     while (isspace((unsigned char) *strptr))
...
+     while (isspace(*strptr))
...
+         while (*strptr && !isspace(*strptr))
...
+         while (isspace(*strptr))
The first occurece of isspace's parameter is casting to "unsigned
char" whereas the others are not.
Note:
"The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF"
Proof:
http://en.cppreference.com/w/c/string/byte/isspace

---
+     pfree(buffer);
+     pfree(str);
pfree-s here are not necessary. See:
http://www.neilconway.org/talks/hacking/hack_slides.pdf (page 17)

-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 1/18/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> <<overquoting>>
> ---
> +     if (*strptr != '\0')
> ...
> +         while (*strptr && !isspace(*strptr))
> Sometimes it explicitly compares to '\0', sometimes implicitly.
> Common use is explicit comparison and it is preferred due to different
> compilers (their conversions to boolean).
>
> ---
> <<overquoting>>

It seems I distracted on something else... That lines are ok, skip that block.

-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Robert Haas
Date:
On Mon, Jan 18, 2016 at 6:56 PM, Vitaly Burovoy
<vitaly.burovoy@gmail.com> wrote:
> On 1/4/16, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com>
>> wrote:
>>> [ new patch ]
>>
>> +         case '-':
>> +             ereport(ERROR,
>> +                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                      errmsg("size cannot be negative")));
>>
>> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.
>
> Hmm. The function's name is pg_size_bytes. How number of bytes can be
> negative? How any length can be negative? If anyone insert '-' sign to
> an argument, it is copy-paste error. I don't see any case where there
> is possible negatives as input value.
>
> I prefer error message instead of getting all relations (by using
> comparison from the initial letter) just because of copy-paste mistake
> or incomplete checking of input values at app-level.

Well, we just recently did this:

commit 8a1fab36aba7506fcf4559c4ef95fcacdd0b439a
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Nov 6 11:03:02 2015 -0500
   pg_size_pretty: Format negative values similar to positive ones.
   Previously, negative values were always displayed in bytes, regardless   of how large they were.
   Adrian Vondendriesch, reviewed by Julien Rouhaud and myself

Since we went to the trouble of making the handling of positive and
negative values symmetric for that function, it seems it should be
done here also.  Otherwise, it is asymmetric.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-01-19 13:42 GMT+01:00 Robert Haas <robertmhaas@gmail.com>:
On Mon, Jan 18, 2016 at 6:56 PM, Vitaly Burovoy
<vitaly.burovoy@gmail.com> wrote:
> On 1/4/16, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com>
>> wrote:
>>> [ new patch ]
>>
>> +         case '-':
>> +             ereport(ERROR,
>> +                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                      errmsg("size cannot be negative")));
>>
>> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.
>
> Hmm. The function's name is pg_size_bytes. How number of bytes can be
> negative? How any length can be negative? If anyone insert '-' sign to
> an argument, it is copy-paste error. I don't see any case where there
> is possible negatives as input value.
>
> I prefer error message instead of getting all relations (by using
> comparison from the initial letter) just because of copy-paste mistake
> or incomplete checking of input values at app-level.

Well, we just recently did this:

commit 8a1fab36aba7506fcf4559c4ef95fcacdd0b439a
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Nov 6 11:03:02 2015 -0500

    pg_size_pretty: Format negative values similar to positive ones.

    Previously, negative values were always displayed in bytes, regardless
    of how large they were.

    Adrian Vondendriesch, reviewed by Julien Rouhaud and myself

Since we went to the trouble of making the handling of positive and
negative values symmetric for that function, it seems it should be
done here also.  Otherwise, it is asymmetric.

the last patch (pg-size-bytes-08.patch) at 2016-01-04 17:03:02 allows negative size

Regards

Pavel



--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

2016-01-19 0:56 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
On 1/4/16, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
>> [ new patch ]
>
> +         case '-':
> +             ereport(ERROR,
> +                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                      errmsg("size cannot be negative")));
>
> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.

Hmm. The function's name is pg_size_bytes. How number of bytes can be
negative? How any length can be negative? If anyone insert '-' sign to
an argument, it is copy-paste error. I don't see any case where there
is possible negatives as input value.

I prefer error message instead of getting all relations (by using
comparison from the initial letter) just because of copy-paste mistake
or incomplete checking of input values at app-level.

the last version of this patch support negative numbers.

Regards

Pavel
 

> ...
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

--
Best regards,
Vitaly Burovoy

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-01-19 4:45 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
On 1/4/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2016-01-04 18:13 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> :
>> On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> > 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
>> >> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> >>
>> >> I'm also inclined on dropping that explicit check for empty string
>> >> below and let numeric_in() error out on that.  Does this look OK, or can
>> >> it confuse someone:
>> >> postgres=# select pg_size_bytes('');
>> >> ERROR:  invalid input syntax for type numeric: ""
>> >
>> > both fixed
>>
>> Hm...
>>
>> > + switch (*strptr)
>> > + {
>> > + /* ignore plus symbol */
>> > + case '+':
>> > + case '-':
>> > + *bufptr++ = *strptr++;
>> > + break;
>> > + }
>>
>> Well, to that amount you don't need any special checks, I'm just not sure
>> if reported error message is not misleading if we let numeric_in() handle
>> all the errors.  At least it can cope with the leading spaces, +/- and
>> empty input quite well.
>>
>
> I don't would to catch a exception from numeric_in - so I try to solve some
> simple situations, where I can raise possible better error message.

There are several cases where your behavior gives strange errors (see below).

Next batch of notes:

src/include/catalog/pg_proc.h:
---
+ DATA(insert OID = 3317 ( pg_size_bytes...
now oid 3317 is used (by pg_stat_get_wal_receiver), 3318 is free

fixed
 

---
+ DESCR("convert a human readable text with size units to big int bytes");
May be the best way is to copy the first sentence from the doc?
("convert a size in human-readable format with size units into bytes")

fixed


====
src/backend/utils/adt/dbsize.c:
+ text             *arg = PG_GETARG_TEXT_PP(0);
+ char             *str = text_to_cstring(arg);
...
+       /* working buffer cannot be longer than original string */
+       buffer = (char *) palloc(VARSIZE_ANY_EXHDR(arg) + 1);
Is there any reason to get TEXT for only converting it to cstring and
call VARSIZE_ANY_EXHDR instead of strlen?

performance - but these strings should be short, so I can use strlen

fixed
 

---
+       text               *arg = PG_GETARG_TEXT_PP(0);
+       char               *str = text_to_cstring(arg);
+       char    *strptr = str;
+       char               *buffer;
There are wrong offsets of variable names after their types (among all
body of the "pg_size_bytes" function).
See variable declarations in nearby functions (
  >> "make the new code look like the existing code around it"
  http://www.postgresql.org/docs/devel/static/source-format.html
)


fixed
 
---
+                                        errmsg("\"%s\" is not number", str)));
s/is not number/is not a number/
(the second version can be found in a couple places besides translations)

fixed

but this message can be little bit no intuitive - better text is "is not a valid number"
 

---
+       if (*strptr != '\0')
...
+               while (*strptr && !isspace(*strptr))
Sometimes it explicitly compares to '\0', sometimes implicitly.
Common use is explicit comparison and it is preferred due to different
compilers (their conversions to boolean).

fixed

---
+       /* Skip leading spaces */
...
+               /* ignore plus symbol */
...
+       /* copy digits to working buffer */
...
+       /* allow whitespace between integer and unit */
I'm also inclined on dropping that explicit skipping spaces, checking
for +/- symbols, but copying all digits, spaces, dots and '+-' symbols
and let numeric_in() error out on that.

This is difficult - you have to divide string to two parts and first world is number, second world is unit.

For example "+912+ kB" is correct number +912 and broken unit "+ kB".


It allows to get correct error messages for something like:
postgres=# select pg_size_bytes('.+912');
ERROR:  invalid input syntax for type numeric: ".+912"
postgres=# select pg_size_bytes('+912+ kB');
ERROR:  invalid input syntax for type numeric: "+912+ "
postgres=# select pg_size_bytes('++123 kB');
ERROR:  invalid input syntax for type numeric: "++123 "

instead of current:
postgres=# select pg_size_bytes('.+912');
ERROR:  invalid input syntax for type numeric: "."
postgres=# select pg_size_bytes('+912+ kB');
ERROR:  invalid unit: "+ kB"
postgres=# select pg_size_bytes('++123 kB');
ERROR:  invalid input syntax for type numeric: "+"


I redesigned this check. Now it is

popostgres=# select pg_size_bytes('.+912');
ERROR:  22023: ".+912" is not a valid number
stgres=# select pg_size_bytes('++123 kB');
ERROR:  22023: "++123 kB" is not a valid number

 
---
+       while (isspace((unsigned char) *strptr))
...
+       while (isspace(*strptr))
...
+               while (*strptr && !isspace(*strptr))
...
+               while (isspace(*strptr))
The first occurece of isspace's parameter is casting to "unsigned
char" whereas the others are not.
Note:
"The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF"


 
Proof:
http://en.cppreference.com/w/c/string/byte/isspace

fixed

 


---
+       pfree(buffer);
+       pfree(str);
pfree-s here are not necessary. See:
http://www.neilconway.org/talks/hacking/hack_slides.pdf (page 17)

Automatic memory deallocation doesn't cover all possible situations where the function can be used - for example DirectFunctionCall - so explicit deallocation can descrease a memory requirements when you call these functions from C.

New version is attached

Regards

Pavel
 

--
Best regards,
Vitaly Burovoy

Attachment

Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 1/20/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> ...
> New version is attached
>
> Regards
> Pavel

I'm sorry I'll do a review only tonight.
-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-01-21 11:51 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
On 1/20/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> ...
> New version is attached
>
> Regards
> Pavel

I'm sorry I'll do a review only tonight.

ook :)

Thank you

Pavel

 
--
Best regards,
Vitaly Burovoy

Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 1/20/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> ...
> New version is attached
>
> Regards
> Pavel

Hello!

1. Why the function is marked as VOLATILE? It depends only on input
value, it does change nothing in the DB.
I guess it should be IMMUTABLE for efficient caching its result.

2.
>    text        *arg = PG_GETARG_TEXT_PP(0);
...
>    str = text_to_cstring(arg);
>
Hmm... My question was _why_ you get TEXT and convert it instead of
getting an argument of type cstring (not about usage of
VARSIZE_ANY_EXHDR).
It was enough to give me a link[1] and leave the usage of
VARSIZE_ANY_EXHDR as is.



I think all the other claims below are mostly cosmetic. Main behavior
is OK (considering its usage will not be in heavy queries).

===
Documentation:
Besides currently added row in a table I think it is worth to add
detailed comment after a block "<function>pg_size_pretty</> can be
used" similar to (but the best way is to get advice for a correct
phrase):
"pg_size_bytes can be used to get a size in bytes as bigint from a
human-readable format for a comparison purposes (it is the opposite to
pg_size_pretty function). The format is numeric with an optional size
unit: kB, MB, GB or TB."
(and delete unit sizes from the table row).

===
Tests:
Since there was a letter with an explanation why units bigger than
"TB" can't be used[2], it is a good reason to add that size units
("PB", "EB", "ZB") in tests with a note to update the documentation
part when that unit are supported.

===
Code style:
1. When you declare pointers, you must align _names_ by the left
border, i.e. asterisks must be at one position to the left from the
aligned names, i.e. one to three spaces instead of TAB before them.

2.
> int        multiplier;
One more TAB to align it with the name at the next line.

===
Code:

> /* allow whitespace between integer and unit */
May be "numeric" instead of "integer"?

> "\"%s\" is not in expected format"
It is not clear what format is expected.

> /* ignore plus symbol */
It seems it is not ignored, but copied... ;-)

> to ger hintstr
s/ger/get/
> Elsewhere is used space trimmed buffer.
I'd write it as "Otherwise trimmed value is used."
I'm not good at English, but that full block looks little odd for me.
I tried to reword it in the attachment single_buffer.c, but I don't
think it is enough.

> We allow ending spaces.
"trailing"?

> but this message can be little bit no intuitive - better text is "is not a valid number"
It was a block with a comment "don't allow empty string", i.e. absence
of a number...

> Automatic memory deallocation doesn't cover all possible situations where
> the function can be used - for example DirectFunctionCall - so explicit
> deallocation can descrease a memory requirements when you call these
> functions from C.
DirectFunctionCall uses a memory context of the caller. For example,
you don't free "num" allocated by numeric_in and numeric_mul, also
mul_num allocated by int8_numeric...
I'd ask experienced hackers for importance of freeing (or leaving)
"str" and "buffer".

> postgres=# select pg_size_bytes('+912+ kB');
> ERROR:  invalid unit: "+ kB"
> This is difficult - you have to divide string to two parts and first world is number, second world is unit.
Your code looks like a duplicated (not by lines, but by behavior).
numeric_in has similar checks, let it do them for you. The goal of
your function is just split mantissa and size unit, and if the last
one is found, turn it into an exponent.
See my example in the attachment check_mantissa_by_numeric_in.c. There
is just searching non-space char that can't be part of numeric. Then
all before it passes to numeric_in and let it check anything it
should. I even left first spaces to show full numeric part to a user
if an error occurs (for any case).
The second attachment single_buffer.c is an attempt to avoid
allocating the second buffer (the first is "str" allocated by
text_to_cstring) and copying into it. I don't think it gives a big
improvement, but nevertheless.

===
[1] http://www.postgresql.org/message-id/29618.1451882238@sss.pgh.pa.us
[2] http://www.postgresql.org/message-id/CAB7nPqS6Wob4WnZb=DHB3O0Pc-nX1v3xJSzKN3z9KBeXgcQTRg@mail.gmail.com

--
Best regards,
Vitaly Burovoy

Attachment

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-01-22 7:03 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
On 1/20/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> ...
> New version is attached
>
> Regards
> Pavel

Hello!

1. Why the function is marked as VOLATILE? It depends only on input
value, it does change nothing in the DB.
I guess it should be IMMUTABLE for efficient caching its result.

2.
>       text            *arg = PG_GETARG_TEXT_PP(0);
...
>       str = text_to_cstring(arg);
>
Hmm... My question was _why_ you get TEXT and convert it instead of
getting an argument of type cstring (not about usage of
VARSIZE_ANY_EXHDR).
It was enough to give me a link[1] and leave the usage of
VARSIZE_ANY_EXHDR as is.



I think all the other claims below are mostly cosmetic. Main behavior
is OK (considering its usage will not be in heavy queries).

===
Documentation:
Besides currently added row in a table I think it is worth to add
detailed comment after a block "<function>pg_size_pretty</> can be
used" similar to (but the best way is to get advice for a correct
phrase):
"pg_size_bytes can be used to get a size in bytes as bigint from a
human-readable format for a comparison purposes (it is the opposite to
pg_size_pretty function). The format is numeric with an optional size
unit: kB, MB, GB or TB."
(and delete unit sizes from the table row).

===
Tests:
Since there was a letter with an explanation why units bigger than
"TB" can't be used[2], it is a good reason to add that size units
("PB", "EB", "ZB") in tests with a note to update the documentation
part when that unit are supported.

===
Code style:
1. When you declare pointers, you must align _names_ by the left
border, i.e. asterisks must be at one position to the left from the
aligned names, i.e. one to three spaces instead of TAB before them.

2.
> int           multiplier;
One more TAB to align it with the name at the next line.

===
Code:

> /* allow whitespace between integer and unit */
May be "numeric" instead of "integer"?

> "\"%s\" is not in expected format"
It is not clear what format is expected.

> /* ignore plus symbol */
It seems it is not ignored, but copied... ;-)

> to ger hintstr
s/ger/get/
> Elsewhere is used space trimmed buffer.
I'd write it as "Otherwise trimmed value is used."
I'm not good at English, but that full block looks little odd for me.
I tried to reword it in the attachment single_buffer.c, but I don't
think it is enough.

> We allow ending spaces.
"trailing"?

> but this message can be little bit no intuitive - better text is "is not a valid number"
It was a block with a comment "don't allow empty string", i.e. absence
of a number...

> Automatic memory deallocation doesn't cover all possible situations where
> the function can be used - for example DirectFunctionCall - so explicit
> deallocation can descrease a memory requirements when you call these
> functions from C.
DirectFunctionCall uses a memory context of the caller. For example,
you don't free "num" allocated by numeric_in and numeric_mul, also
mul_num allocated by int8_numeric...
I'd ask experienced hackers for importance of freeing (or leaving)
"str" and "buffer".

Sometimes, the memory can be released by caller only - or the memory usage is pretty minimal. But when you can release, memory then you should to do it to decrease push to memory allocator.

 

> postgres=# select pg_size_bytes('+912+ kB');
> ERROR:  invalid unit: "+ kB"
> This is difficult - you have to divide string to two parts and first world is number, second world is unit.
Your code looks like a duplicated (not by lines, but by behavior).
numeric_in has similar checks, let it do them for you. The goal of
your function is just split mantissa and size unit, and if the last
one is found, turn it into an exponent.
See my example in the attachment check_mantissa_by_numeric_in.c. There
is just searching non-space char that can't be part of numeric. Then
all before it passes to numeric_in and let it check anything it
should. I even left first spaces to show full numeric part to a user
if an error occurs (for any case).
The second attachment single_buffer.c is an attempt to avoid
allocating the second buffer (the first is "str" allocated by
text_to_cstring) and copying into it. I don't think it gives a big
improvement, but nevertheless.

This code is little bit more complex than it is necessary (but few lines more) to produce readable error messages. I am thinking so it is correct.

I'll look on this patch next week.

Thank you for review

Regards

Pavel
 

===
[1] http://www.postgresql.org/message-id/29618.1451882238@sss.pgh.pa.us
[2] http://www.postgresql.org/message-id/CAB7nPqS6Wob4WnZb=DHB3O0Pc-nX1v3xJSzKN3z9KBeXgcQTRg@mail.gmail.com

--
Best regards,
Vitaly Burovoy

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

2016-01-22 7:03 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
On 1/20/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> ...
> New version is attached
>
> Regards
> Pavel

Hello!

1. Why the function is marked as VOLATILE? It depends only on input
value, it does change nothing in the DB.
I guess it should be IMMUTABLE for efficient caching its result.

sure, it should be immutable,
fixed

copy/paste bug

?? why pg_size_pretty is volatile??
 

2.
>       text            *arg = PG_GETARG_TEXT_PP(0);
...
>       str = text_to_cstring(arg);
>
Hmm... My question was _why_ you get TEXT and convert it instead of
getting an argument of type cstring (not about usage of
VARSIZE_ANY_EXHDR).
It was enough to give me a link[1] and leave the usage of
VARSIZE_ANY_EXHDR as is.


I understand to this question now. This is PostgreSQL pattern - cstring is internal type only - used for in/out functions only

you can try

select * from pg_proc where 'cstring'::regtype::oid = any(proargtypes::oid[]);

With cstring you are bypass PostgreSQL type system (cast from cstring and to cstring is allowed for anytype) - so it should not be used in usual functions.
 


I think all the other claims below are mostly cosmetic. Main behavior
is OK (considering its usage will not be in heavy queries).

===
Documentation:
Besides currently added row in a table I think it is worth to add
detailed comment after a block "<function>pg_size_pretty</> can be
used" similar to (but the best way is to get advice for a correct
phrase):
"pg_size_bytes can be used to get a size in bytes as bigint from a
human-readable format for a comparison purposes (it is the opposite to
pg_size_pretty function). The format is numeric with an optional size
unit: kB, MB, GB or TB."
(and delete unit sizes from the table row).

fixed partially

===
Tests:
Since there was a letter with an explanation why units bigger than
"TB" can't be used[2], it is a good reason to add that size units
("PB", "EB", "ZB") in tests with a note to update the documentation
part when that unit are supported.

fixed
 

===
Code style:
1. When you declare pointers, you must align _names_ by the left
border, i.e. asterisks must be at one position to the left from the
aligned names, i.e. one to three spaces instead of TAB before them.

fixed

2.
> int           multiplier;
One more TAB to align it with the name at the next line.

fixed

===
Code:

> /* allow whitespace between integer and unit */
May be "numeric" instead of "integer"?

used "number"

> "\"%s\" is not in expected format"
It is not clear what format is expected.

I changed to string "invalid size: \"%s\"", but the final form of these messages should be written by native speaker.
 

> /* ignore plus symbol */
It seems it is not ignored, but copied... ;-)

removed

> to ger hintstr
s/ger/get/
> Elsewhere is used space trimmed buffer.
 
I'd write it as "Otherwise trimmed value is used."
I'm not good at English, but that full block looks little odd for me.
I tried to reword it in the attachment single_buffer.c, but I don't
think it is enough.

> We allow ending spaces.
"trailing"?


fixed
 
> but this message can be little bit no intuitive - better text is "is not a valid number"
It was a block with a comment "don't allow empty string", i.e. absence
of a number...

> Automatic memory deallocation doesn't cover all possible situations where
> the function can be used - for example DirectFunctionCall - so explicit
> deallocation can descrease a memory requirements when you call these
> functions from C.
DirectFunctionCall uses a memory context of the caller. For example,
you don't free "num" allocated by numeric_in and numeric_mul, also
mul_num allocated by int8_numeric...
I'd ask experienced hackers for importance of freeing (or leaving)
"str" and "buffer".

> postgres=# select pg_size_bytes('+912+ kB');
> ERROR:  invalid unit: "+ kB"
> This is difficult - you have to divide string to two parts and first world is number, second world is unit.
Your code looks like a duplicated (not by lines, but by behavior).
numeric_in has similar checks, let it do them for you. The goal of
your function is just split mantissa and size unit, and if the last
one is found, turn it into an exponent.
See my example in the attachment check_mantissa_by_numeric_in.c. There
is just searching non-space char that can't be part of numeric. Then
all before it passes to numeric_in and let it check anything it
should. I even left first spaces to show full numeric part to a user
if an error occurs (for any case).
The second attachment single_buffer.c is an attempt to avoid
allocating the second buffer (the first is "str" allocated by
text_to_cstring) and copying into it. I don't think it gives a big
improvement, but nevertheless.

I don't think, so it is improvement - now the code is relative easy, and I need a original string, because it is part of error message.

sending updated patch

Regards

Pavel
 

===
[1] http://www.postgresql.org/message-id/29618.1451882238@sss.pgh.pa.us
[2] http://www.postgresql.org/message-id/CAB7nPqS6Wob4WnZb=DHB3O0Pc-nX1v3xJSzKN3z9KBeXgcQTRg@mail.gmail.com

--
Best regards,
Vitaly Burovoy

Attachment

Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
Hello!

I have reviewed this patch. It applies and compiles cleanly at the
current master 1129c2b0ad2732f301f696ae2cf98fb063a4c1f8 and implements
the behavior reached by a consensus.

All size units (the same as used in the GUC) are supported.

The documentation is present and describes behavior of the function.

The code is mostly clear and commented enough, but I doubt about
design (see p.3).

Regression tests are present (see p.II).

pg_proc.h has changed, so the CATALOG_VERSION_NO must be also changed.

Code comments, error message string, DESCR in pg_proc.h and doc
changes need proof-reading by a native English speaker, which I am
not.


===
The last patch fixes only part of my last notes, so I can only return
it to the next round with the same notes.

The list of my notes has not been done (all points listed below was
mentioned in my last review[1]):

1. The documentation has duplicate parts of size units (in table and
in the text). I think that one of them should be deleted (from the
table).

2. Test has a note about sharing(using) size units via(from)
memory_unit_conversion_table and a requirement to update the
documentation when new units are supported, but the note will not be
seen because there is no tests for that fail for units ("PB", "EB",
"ZB") which will change in the future by the
"memory_unit_conversion_table".

3. Code design is looked a little odd for me.

3a. There is an attempt to check a correctness of a numeric value
instead of using numeric_in which does the same thing. In the last
case the code looks more compact and nice (see the full explanation in
the previous review[1] and both PoC there).

Possibly that way is chosen to emit its own error messages ("invalid
size:" instead of "invalid input syntax for type numeric:"), but it
still leads errors like 'invalid unit: "+ kB"' for input value "+912+
kB" (see "dbsize.out").

3b. There is an extra copying from the buffer "str" to the "buffer"
(it can be avoided and shown in an PoC "single_buffer.c").

3c. There is freeing buffers "str" and "buffer" that looks as a
counter-design since MemoryContext frees them at once with other
expired buffers (like "num" from numeric_in and numeric_mul; also
"mul_num" from int8_numeric)


===
While I was writing that text I found five additional notes. Both
parts (above and below) are important (see concusion).

I. The new version of the patch (pg-size-bytes-10.patch) has the
simplest typo in an updated (from v9) row "inavalid size".

II. There is no test for an empty input value, but it works as
expected (an error 'invalid size: ""').

III. There is no support of 'bytes' unit, it seems such behavior got
majority approval[2].

IV. Code design of the function "parse_memory_unit" is returning
"bool" value and set "multiplier" via pointer. But while Tom Lane was
reviewed my patch[3] in the similar case he changed code that now
returns (see NonFiniteTimestampTzPart) not a flag, but a value (where
zero still means an error). Be honest it looks better, but the current
code is written like nearby functions.

V. The documentation lacks a note that the base of the "pg_size_bytes"
is 1024 whereas the base of the "pg_size_pretty" is 1000.

===
Concusion:

I *can't* mark this patch as "Ready for Committer" with all previous
notes, and I ask experienced hackers for a help.
I'm not a strong Postgres hacker yet and sending to the next round
with the same notes seems a nitpicking for me.

So questions to the hackers:

a. Are the rest of the notes from the previous review[1] (and listed
in pp.1-3c) a nitpicking or should be done?

b. Does the function in PoC "single_buffer.c" look more complex or easer?

c. Is it worth for p.II to change error message like 'Empty input
value is not a valid size for pg_size_bytes: "%s"'?

d. Should the function "pg_size_bytes" handle "bytes" as a size unit
(see p.III)?
There was no disagreement, but if I saw that thread before I became a
reviewer and saw silent approval I would have argued. Moreover
supporting of negative sizes was explained[4] by a symmetry for the
"pg_size_pretty", which should include all size units produced by the
"pg_size_pretty". On the other hand those functions can't be symmetric
due to difference in bases (see p.V and [5])

e. What way is preferred for p.IV?

f. If p.V is actual, how to write it in the documentation (be honest
I've no idea)? Should a description of the "pg_size_pretty" be changed
too?


[1]http://www.postgresql.org/message-id/CAKOSWNn=p8GX-P5Y-B4t4dg-aAHaTup+CrG41hQ9BeobZwX3KQ@mail.gmail.com
[2]http://www.postgresql.org/message-id/CACACo5QW7fFsFfhKsTjtYcP4QF3Oh9zA14SC6Z3DXx2yssJjSw@mail.gmail.com
[3]http://git.postgresql.org/pg/commitdiff/647d87c56ab6da70adb753c08d7cdf7ee905ea8a
[4]http://www.postgresql.org/message-id/CA+TgmoZFomG4eYorZZGf7pzotG9PxpUhtQvxLfsKiM4iZH8KRQ@mail.gmail.com
[5]http://www.postgresql.org/message-id/CA+TgmoYgS_xixUQy+xyw9futtMrTbC_S0c1C0_wOBwVbNf8aZA@mail.gmail.com



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

2016-01-26 6:25 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
Hello!


Regression tests are present (see p.II).

pg_proc.h has changed, so the CATALOG_VERSION_NO must be also changed.

this is not a part of patch - only a commiter knows CATALOG_VERSION_NO
 

Code comments, error message string, DESCR in pg_proc.h and doc
changes need proof-reading by a native English speaker, which I am
not.


===
The last patch fixes only part of my last notes, so I can only return
it to the next round with the same notes.

The list of my notes has not been done (all points listed below was
mentioned in my last review[1]):

1. The documentation has duplicate parts of size units (in table and
in the text). I think that one of them should be deleted (from the
table).

fixed
 

2. Test has a note about sharing(using) size units via(from)
memory_unit_conversion_table and a requirement to update the
documentation when new units are supported, but the note will not be
seen because there is no tests for that fail for units ("PB", "EB",
"ZB") which will change in the future by the
"memory_unit_conversion_table".

fixed


3. Code design is looked a little odd for me.

3a. There is an attempt to check a correctness of a numeric value
instead of using numeric_in which does the same thing. In the last
case the code looks more compact and nice (see the full explanation in
the previous review[1] and both PoC there).

Possibly that way is chosen to emit its own error messages ("invalid
size:" instead of "invalid input syntax for type numeric:"), but it
still leads errors like 'invalid unit: "+ kB"' for input value "+912+
kB" (see "dbsize.out").

In tried to enhanced this point (I don't agree, so this is a bug)
 

3b. There is an extra copying from the buffer "str" to the "buffer"
(it can be avoided and shown in an PoC "single_buffer.c").

3c. There is freeing buffers "str" and "buffer" that looks as a
counter-design since MemoryContext frees them at once with other
expired buffers (like "num" from numeric_in and numeric_mul; also
"mul_num" from int8_numeric) 

Numeric operations are on critical path for OLAP applications - so the code is rewritten to minimize palloc/free operations. I don't expect usage pg_size_bytes in OLAP critical path.
 


===
While I was writing that text I found five additional notes. Both
parts (above and below) are important (see concusion).

I. The new version of the patch (pg-size-bytes-10.patch) has the
simplest typo in an updated (from v9) row "inavalid size".

fixed 

II. There is no test for an empty input value, but it works as
expected (an error 'invalid size: ""').

fixed 

III. There is no support of 'bytes' unit, it seems such behavior got
majority approval[2].

No, because I have to use the supported units by configuration. The configuration supports only three chars long units. Support for "bytes" was removed, when I removed proprietary unit table.
 

IV. Code design of the function "parse_memory_unit" is returning
"bool" value and set "multiplier" via pointer. But while Tom Lane was
reviewed my patch[3] in the similar case he changed code that now
returns (see NonFiniteTimestampTzPart) not a flag, but a value (where
zero still means an error). Be honest it looks better, but the current
code is written like nearby functions.

It is a commiter decision. This is a implementation detail, and I don't see any variant as significant better - using 0 as error flag isn't the best too. But this is really detail.

V. The documentation lacks a note that the base of the "pg_size_bytes"
is 1024 whereas the base of the "pg_size_pretty" is 1000.

It isn't true, base for both are 1024. It was designed when special table was used for pg_size_bytes. But when we share same control table, then is wrong to use different table. The result can be optically different, but semantically same.

postgres=# select pg_size_pretty(pg_size_bytes('1MB'));
┌────────────────┐
│ pg_size_pretty │
╞════════════════╡
│ 1024 kB        │
└────────────────┘
(1 row)

Time: 0.787 ms
postgres=# select pg_size_pretty(pg_size_bytes('1024MB'));
┌────────────────┐
│ pg_size_pretty │
╞════════════════╡
│ 1024 MB        │
└────────────────┘
(1 row)


 

===
Concusion:

I *can't* mark this patch as "Ready for Committer" with all previous
notes, and I ask experienced hackers for a help.
I'm not a strong Postgres hacker yet and sending to the next round
with the same notes seems a nitpicking for me.

So questions to the hackers:

a. Are the rest of the notes from the previous review[1] (and listed
in pp.1-3c) a nitpicking or should be done?

b. Does the function in PoC "single_buffer.c" look more complex or easer?

c. Is it worth for p.II to change error message like 'Empty input
value is not a valid size for pg_size_bytes: "%s"'?

It is simply to implement, but It is looking like overengineering
 

d. Should the function "pg_size_bytes" handle "bytes" as a size unit
(see p.III)?
There was no disagreement, but if I saw that thread before I became a
reviewer and saw silent approval I would have argued. Moreover
supporting of negative sizes was explained[4] by a symmetry for the
"pg_size_pretty", which should include all size units produced by the
"pg_size_pretty". On the other hand those functions can't be symmetric
due to difference in bases (see p.V and [5])

negative values is fully supported.

support of "bytes" depends on support "bytes" unit by GUC. When "bytes" unit will be supported, then it can be used in pg_size_bytes immediately.

updated patch attached

Regards

Pavel
 

e. What way is preferred for p.IV?

f. If p.V is actual, how to write it in the documentation (be honest
I've no idea)? Should a description of the "pg_size_pretty" be changed
too?


[1]http://www.postgresql.org/message-id/CAKOSWNn=p8GX-P5Y-B4t4dg-aAHaTup+CrG41hQ9BeobZwX3KQ@mail.gmail.com
[2]http://www.postgresql.org/message-id/CACACo5QW7fFsFfhKsTjtYcP4QF3Oh9zA14SC6Z3DXx2yssJjSw@mail.gmail.com
[3]http://git.postgresql.org/pg/commitdiff/647d87c56ab6da70adb753c08d7cdf7ee905ea8a
[4]http://www.postgresql.org/message-id/CA+TgmoZFomG4eYorZZGf7pzotG9PxpUhtQvxLfsKiM4iZH8KRQ@mail.gmail.com
[5]http://www.postgresql.org/message-id/CA+TgmoYgS_xixUQy+xyw9futtMrTbC_S0c1C0_wOBwVbNf8aZA@mail.gmail.com

Attachment

Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
Hello, Pavel!

That letter was not a complain against you. I'm sorry if it seems like
that for you.
It was an intermediate review with several points to be clear for _me_
from experienced hackers, mostly about a code design.

26.01.2016 07:05, Pavel Stehule пишет:
>> pg_proc.h has changed, so the CATALOG_VERSION_NO must be also changed.
> this is not a part of patch - only a commiter knows CATALOG_VERSION_NO
>
CATALOG_VERSION_NO is mentioned for a committer, it is not your fault.

>> III. There is no support of 'bytes' unit, it seems such behavior got
>> majority approval[2].
>
> No, because I have to use the supported units by configuration. The configuration supports only three chars long
units.Support for "bytes" was removed, when I removed proprietary unit table. 
>
Point "III" is the only for the question "d". Also to collect any
possible features from the thread in one place.

>> V. The documentation lacks a note that the base of the "pg_size_bytes"
>> is 1024 whereas the base of the "pg_size_pretty" is 1000.
>
> It isn't true, base for both are 1024. It was designed when special table was used for pg_size_bytes. But when we
sharesame control table, then is wrong to use different table. The result can be optically different, but semantically
same.
>
Oops, I was wrong about a base of pg_size_pretty. It was a morning
after a hard night...

> negative values is fully supported.
You have mentioned it at least three times before. It is not my new
requirement or a point to your fault, it is an argument for
symmetry/asymmetry of the function.

> support of "bytes" depends on support "bytes" unit by GUC. When "bytes" unit will be supported, then it can be used
inpg_size_bytes immediately. 
By the way there can be a comparison for a special size unit before
calling parse_memory_unit.

> Regards
> Pavel
>
>> [2]http://www.postgresql.org/message-id/CACACo5QW7fFsFfhKsTjtYcP4QF3Oh9zA14SC6Z3DXx2yssJjSw@mail.gmail.com



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

2016-01-26 13:48 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
Hello, Pavel!

That letter was not a complain against you. I'm sorry if it seems like
that for you.

ok.

It was an intermediate review with several points to be clear for _me_
from experienced hackers, mostly about a code design.

26.01.2016 07:05, Pavel Stehule пишет:
>> pg_proc.h has changed, so the CATALOG_VERSION_NO must be also changed.
> this is not a part of patch - only a commiter knows CATALOG_VERSION_NO
>
CATALOG_VERSION_NO is mentioned for a committer, it is not your fault.

>> III. There is no support of 'bytes' unit, it seems such behavior got
>> majority approval[2].
>
> No, because I have to use the supported units by configuration. The configuration supports only three chars long units. Support for "bytes" was removed, when I removed proprietary unit table.
>
Point "III" is the only for the question "d". Also to collect any
possible features from the thread in one place.

>> V. The documentation lacks a note that the base of the "pg_size_bytes"
>> is 1024 whereas the base of the "pg_size_pretty" is 1000.
>
> It isn't true, base for both are 1024. It was designed when special table was used for pg_size_bytes. But when we share same control table, then is wrong to use different table. The result can be optically different, but semantically same.
>
Oops, I was wrong about a base of pg_size_pretty. It was a morning
after a hard night... 

> negative values is fully supported.
You have mentioned it at least three times before. It is not my new
requirement or a point to your fault, it is an argument for
symmetry/asymmetry of the function.

> support of "bytes" depends on support "bytes" unit by GUC. When "bytes" unit will be supported, then it can be used in pg_size_bytes immediately.
By the way there can be a comparison for a special size unit before
calling parse_memory_unit.

there was more requirements based on original proposal (based in  separated control tables). When I leaved this design, I dropped more requirements and now is little bit hard to distinguish what is related and with. Some features can be added later without any possible compatibility issues in next commit fest or later, and I am searching some good borders for this patch. Now, this border is a shared memory unit control table. And I am trying to hold this border.

I am grateful for review - now this patch is better, and I hope near final stage.

Regards

Pavel


 

> Regards
> Pavel
>
>> [2]http://www.postgresql.org/message-id/CACACo5QW7fFsFfhKsTjtYcP4QF3Oh9zA14SC6Z3DXx2yssJjSw@mail.gmail.com

Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
Hello,Pavel!

On 1/26/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I am grateful for review - now this patch is better, and I hope near final
> stage.
>
> Regards
> Pavel

I'm pretty sure we are close to it.

Now besides a code design I mentioned[1] before (and no one has
answered me about it), there are a few small notes.

Now you have all messages in one style ("invalid size: \"%s\"") except
size units ("invalid unit \"%s\", there are two places). I guess
according to the Error Style Guide[2] it should be like
"errmsg("invalid size: \"%s\"", str), errdetail("Invalid size unit
\"%s\"", unitstr),".

If it is not hard for you, it'll look better if "select" is uppercased
in tests and a comment about sharing "memory_unit_conversion_table" is
close to the "SELECT pg_size_bytes('1PB');". Moreover your comment has
a flaw: "pg_size_pretty" doesn't use that array at all, so the phrase
"These functions shares" is wrong.

Also I've just found that numeric format supports values with exponent
like "1e5" which is not allowed in pg_size_bytes. I guess the phrase
"The format is numeric with an optional size unit" should be replaced
to "The format is a fixed point number with an optional size unit" in
the documentation.

Have a look for a formatting: in the rows "num = DatumGetNumeric",
"mul_num = DatumGetNumeric", "num = DatumGetNumeric" and in error
reporting close to the "/* only alphabetic character are allowed */"
parameters in the next lines are not under the first parameters.

I insist there is no reason to call "pfree" for "str" and "buffer".
But if you do so, clean up also buffers from numeric_in, numeric_mul
and int8_numeric. If you insist it should be left as is, I leave that
decision to a committer.

P.S.: Have you thought to wrap the call "numeric_in" by a
PG_TRY/PG_CATCH instead of checking for correctness by yourself?

[1]http://www.postgresql.org/message-id/CAKOSWNm+SSgGKYsv09n_6HhfWzmRr+cSjpgfhBPFf5nNPfJ5JQ@mail.gmail.com
[2]http://www.postgresql.org/docs/devel/static/error-style-guide.html#AEN111165
-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi


P.S.: Have you thought to wrap the call "numeric_in" by a
PG_TRY/PG_CATCH instead of checking for correctness by yourself?

I though about it, but it is not possible. Every PG_TRY/CATCH must be finished by RETHROW. Only when you will open subtransaction and you are playing with resource manager, you can do it. It is pretty expensive.

You can see in our code lot of situation when some function returns true, false, "error message" instead raising a exception. I would not to refactor numericin function in this style. This function is in critical path of COPY FROM, and any more calls can decrease performance. And then I have to do these checks before calling.

Regards

Pavel
 

Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 1/30/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> P.S.: Have you thought to wrap the call "numeric_in" by a
>> PG_TRY/PG_CATCH instead of checking for correctness by yourself?
>
> I though about it, but it is not possible. Every PG_TRY/CATCH must be
> finished by RETHROW.

No, src/include/utils/elog.h tells different (emphasizes are mine):
"The error recovery code can _optionally_ do PG_RE_THROW() to
propagate the _same_ error outwards."

So you can use it without rethrowing.

> Only when you will open subtransaction and you are playing with resource manager, you can do it.

Really? I did not find it around the "#define PG_TRY()" definition in
"src/include/utils/elog.h".

I guess it is important to use a subtransaction if you want to catch
an exception and go further. In case of calling "numeric_in" from the
"pg_size_bytes" there is no reason to use a subtransaction that may
close any open relation etc., because a new ereport with different
message is emitted, which fails the current transaction anyway.

PG_TRY is only calls sigsetjmp and sets PG_exception_stack. If no
exception is emitted, penalty is the only sigsetjmp call (but I don't
know how heavy it is), if an exception is emitted, there is no matter
how long it handles.

> It is pretty expensive.

Ok. Performance makes sense.

> You can see in our code lot of situation when some function returns true,
> false, "error message" instead raising a exception.

I know. It is a common style in C programs.

> I would not to refactor numeric_in function in this style.

No doubt. It is not necessary.

> This function is in critical path of COPY
> FROM, and any more calls can decrease performance. And then I have to do
> these checks before calling.
>
> Regards
> Pavel

-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Tom Lane
Date:
Vitaly Burovoy <vitaly.burovoy@gmail.com> writes:
> On 1/30/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> I though about it, but it is not possible. Every PG_TRY/CATCH must be
>> finished by RETHROW.

> No, src/include/utils/elog.h tells different (emphasizes are mine):
> "The error recovery code can _optionally_ do PG_RE_THROW() to
> propagate the _same_ error outwards."

> So you can use it without rethrowing.

You can use it without re-throwing, but ONLY if you use the full
subtransaction mechanism to clean up.  Assuming that the only possible
errors are ones that don't need cleanup will get your patch rejected.

IME, in practically all cases in simple C functions, it's better to find
a way that doesn't require using TRY/CATCH to catch an expected error.
        regards, tom lane



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi Vitaly,

please, can you send your version of this patch, how we talked about it in Moscow?

Thank you

Pavel

Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 2/10/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi Vitaly,
>
> please, can you send your version of this patch, how we talked about it in
> Moscow?
>
> Thank you
>
> Pavel

Hello, Pavel!

Please find attached my version of the patch (it applies cleanly on
top of 64d89a9 which is current master).

It is time to change oid. 3331 is used by "bytea_sortsupport", I set
3334 to "pg_size_bytes".

I got a code design of numbers checking from json_lex_number in
src/backend/utils/adt/json.c
For me it seems more structured. I adapted it a little and it allows
to add parsing an exponent (like '10е3 Mb') easily for full support of
numeric (if sometimes it is necessary).

When I added "trimming" for size units (playing with avoiding an extra
buffer), I found there is easy to support "bytes" unit (but "byte" is
still unsupported).


Also this version includes all changes I mentioned in my last review[1]:
1. parse_memory_unit returns value instead of using a pointer (return
zero if noting is found) for it;
2. all messages are in a single style (nuances are in errdetails);
3. "select"s are in uppercase, rephrased and moved a comment block in test;
4. several tests are added (also with supporting of "bytes" unit);
5. a sentence in a documentation is rephrased (numeric->fixed-point
number); "bytes" unit is added to both functions;
6. fixed indentation a little;
7. pfree is removed (it is easier than removing all other allocated resources).


I still think my changes are little and they are based on your work
(and research).

[1]http://www.postgresql.org/message-id/CAKOSWNk13WVDem06LRo-HucR0pR6Et29+dvqY6J5sKxzarUbRw@mail.gmail.com

--
Best regards,
Vitaly Burovoy

Attachment

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-02-11 7:44 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
On 2/10/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi Vitaly,
>
> please, can you send your version of this patch, how we talked about it in
> Moscow?
>
> Thank you
>
> Pavel

Hello, Pavel!

Please find attached my version of the patch (it applies cleanly on
top of 64d89a9 which is current master).

It is time to change oid. 3331 is used by "bytea_sortsupport", I set
3334 to "pg_size_bytes".

I got a code design of numbers checking from json_lex_number in
src/backend/utils/adt/json.c
For me it seems more structured. I adapted it a little and it allows
to add parsing an exponent (like '10е3 Mb') easily for full support of
numeric (if sometimes it is necessary).

yes, it is better structured
 

When I added "trimming" for size units (playing with avoiding an extra
buffer), I found there is easy to support "bytes" unit (but "byte" is
still unsupported).

I am little bit unsure about support the unit unsupported by GUC parser. But for usage in custom space and for this current usage, it is acceptable.
 


Also this version includes all changes I mentioned in my last review[1]:
1. parse_memory_unit returns value instead of using a pointer (return
zero if noting is found) for it;
2. all messages are in a single style (nuances are in errdetails);
3. "select"s are in uppercase, rephrased and moved a comment block in test;
4. several tests are added (also with supporting of "bytes" unit);
5. a sentence in a documentation is rephrased (numeric->fixed-point
number); "bytes" unit is added to both functions;
6. fixed indentation a little;
7. pfree is removed (it is easier than removing all other allocated resources).

ok, thank you
 


I still think my changes are little and they are based on your work
(and research).

thank you very much - but you refactoring is significant and helpful. I'll reassign your version to opened commitfest.

Regards

Pavel
 

[1]http://www.postgresql.org/message-id/CAKOSWNk13WVDem06LRo-HucR0pR6Et29+dvqY6J5sKxzarUbRw@mail.gmail.com

--
Best regards,
Vitaly Burovoy

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
This is last incarnation of my patch (cleaned and refactored by Vitaly Burovoy)

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:

Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

2016-02-11 7:44 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
On 2/10/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi Vitaly,
>
> please, can you send your version of this patch, how we talked about it in
> Moscow?
>
> Thank you
>
> Pavel

Hello, Pavel!

Please find attached my version of the patch (it applies cleanly on
top of 64d89a9 which is current master).


I tested this patch and all tests passed

Regards

Pavel
 

Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
Hello!

On 2/11/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> assigned https://commitfest.postgresql.org/9/514/
>
> Regards
> Pavel


This patch was almost done by the end of the previous CF(2016-01):
there was few little flaws which are solved by now.

I have reviewed this patch.
It applies cleanly at the top of current master
(f144f73242acef574bc27a4c70e809a64806e4a4), compiles silently and
implements the behavior reached by consensus.

All size units (the same as used in the GUC) are supported.
In addition "bytes" is also supported that makes it be a inverse
function for the pg_size_pretty[1].

The documentation describes behavior of the function and possible size
units. I don't see what else needs to be said.

The code is clean and commented.

Regression tests cover possible use cases and corner cases.


Notes for a committer:
1. pg_proc.h has changed, so the CATALOG_VERSION_NO must also be changed.
2. Code comments, error message strings, DESCR in pg_proc.h and
changes in the documentation need proof-reading by a native English
speaker, which the Author and I are not.


[CF] https://commitfest.postgresql.org/9/514/
[1] http://www.postgresql.org/message-id/CA+TgmoZFomG4eYorZZGf7pzotG9PxpUhtQvxLfsKiM4iZH8KRQ@mail.gmail.com

-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

2016-02-12 2:28 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
Hello!

On 2/11/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Hi
>
> assigned https://commitfest.postgresql.org/9/514/
>
> Regards
> Pavel


This patch was almost done by the end of the previous CF(2016-01):
there was few little flaws which are solved by now.

I have reviewed this patch.
It applies cleanly at the top of current master
(f144f73242acef574bc27a4c70e809a64806e4a4), compiles silently and
implements the behavior reached by consensus.

All size units (the same as used in the GUC) are supported.
In addition "bytes" is also supported that makes it be a inverse
function for the pg_size_pretty[1].

The documentation describes behavior of the function and possible size
units. I don't see what else needs to be said.

The code is clean and commented.

Regression tests cover possible use cases and corner cases.


Notes for a committer:
1. pg_proc.h has changed, so the CATALOG_VERSION_NO must also be changed.
2. Code comments, error message strings, DESCR in pg_proc.h and
changes in the documentation need proof-reading by a native English
speaker, which the Author and I are not.

Thank you for review and other work

Nice day

Pavel
 


[CF] https://commitfest.postgresql.org/9/514/
[1] http://www.postgresql.org/message-id/CA+TgmoZFomG4eYorZZGf7pzotG9PxpUhtQvxLfsKiM4iZH8KRQ@mail.gmail.com

--
Best regards,
Vitaly Burovoy

Re: custom function for converting human readable sizes to bytes

From
Dean Rasheed
Date:
On 12 February 2016 at 06:25, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Thank you for review and other work
>

This seems like a reasonable first patch for me as a committer, so
I'll take it unless anyone else was planning to do so.

Regards,
Dean



Re: custom function for converting human readable sizes to bytes

From
Heikki Linnakangas
Date:
On 12/02/16 10:19, Dean Rasheed wrote:
> On 12 February 2016 at 06:25, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Thank you for review and other work
>>
>
> This seems like a reasonable first patch for me as a committer, so
> I'll take it unless anyone else was planning to do so.

Great! You'll want to copy-edit the comments and docs a bit, to fix 
spellings etc, and run pgindent on it. And don't forget to bump 
catversion! :-)

- Heikki




Re: custom function for converting human readable sizes to bytes

From
Dean Rasheed
Date:
> On 12/02/16 10:19, Dean Rasheed wrote:
>> This seems like a reasonable first patch for me as a committer, so
>> I'll take it unless anyone else was planning to do so.
>

So looking at this, it seems that for the most part pg_size_bytes()
will parse any output produced by pg_size_pretty(). The exception is
that there are 2 versions of pg_size_pretty(), one that takes bigint
and one that takes numeric, whereas pg_size_bytes() returns bigint, so
it can't handle all inputs. Is there any reason not to make
pg_size_bytes() return numeric?

It would still be compatible with the example use cases, but it would
be a better inverse of both variants of pg_size_pretty() and would be
more future-proof. It already works internally using numeric, so it's
a trivial change to make now, but impossible to change in the future
without introducing a new function with a different name, which is
messy.

Thoughts?

Regards,
Dean



Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 2/15/16, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> On 12/02/16 10:19, Dean Rasheed wrote:
>>> This seems like a reasonable first patch for me as a committer, so
>>> I'll take it unless anyone else was planning to do so.
>>
>
> So looking at this, it seems that for the most part pg_size_bytes()
> will parse any output produced by pg_size_pretty(). The exception is
> that there are 2 versions of pg_size_pretty(), one that takes bigint
> and one that takes numeric, whereas pg_size_bytes() returns bigint, so
> it can't handle all inputs. Is there any reason not to make
> pg_size_bytes() return numeric?

I guess the only reason is a comparison purpose. The example from the
first letter[1]:

SELECT * FROM pg_class WHERE pg_table_size(oid) > pg_human_size('2GB');

... but not for something like:
WITH sel AS ( SELECT pg_size_pretty(oid) AS _size, * FROM pg_class WHERE ...
)
SELECT pg_size_bytes(_size), * FROM sel;


So the use case of the "pg_size_bytes" is getting a value to compare
with a result from pg_table_size, ..., even with pg_database_size,
_each_ of whom returns bigint (or just int), but not numeric:

postgres=# \df pg_*_size                                  List of functions  Schema   |          Name          | Result
datatype | Argument
 
data types |  Type
------------+------------------------+------------------+---------------------+--------pg_catalog | pg_column_size
  | integer          | "any"      | normalpg_catalog | pg_database_size       | bigint           | name      |
normalpg_catalog| pg_database_size       | bigint           | oid      | normalpg_catalog | pg_indexes_size        |
bigint          | regclass      | normalpg_catalog | pg_relation_size       | bigint           | regclass      |
normalpg_catalog| pg_relation_size       | bigint           | regclass,
 
text      | normalpg_catalog | pg_table_size          | bigint           | regclass      | normalpg_catalog |
pg_tablespace_size    | bigint           | name      | normalpg_catalog | pg_tablespace_size     | bigint           |
oid     | normalpg_catalog | pg_total_relation_size | bigint           | regclass      | normal
 
(10 rows)

The commit message for "pg_size_pretty(numeric)" explains[2] it was
introduced for using with "pg_xlog_location_diff"[3] only.
Since "pg_size_bytes" supports up to (4 EiB-1B)[4] I hope nobody will
want to send a query like "tell me where difference between two
transaction log locations is bigger or equal than 4EiB"...

But comparison between int8 and numeric is OK, so usage of rational
input will not break queries:

postgres=# select '10e200'::numeric > 10::int8 as check;check
-------t
(1 row)


P.S.: "bytes" size unit was added just for consistency: each group
should have a name, even with an exponent of 1.

> It would still be compatible with the example use cases, but it would
> be a better inverse of both variants of pg_size_pretty() and would be
> more future-proof. It already works internally using numeric, so it's
> a trivial change to make now, but impossible to change in the future
> without introducing a new function with a different name, which is
> messy.
>
> Thoughts?
>
> Regards,
> Dean

[1]http://www.postgresql.org/message-id/CAFj8pRD-tGoDKnxdYgECzA4On01_uRqPrwF-8LdkSE-6bDHp0w@mail.gmail.com
[2]http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4a2d7ad76f5f275ef2d6a57e1a61d5bf756349e8
[3]http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE
[4]https://en.wikipedia.org/wiki/Binary_prefix#Adoption_by_IEC.2C_NIST_and_ISO

-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 2/15/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> P.S.: "bytes" size unit was added just for consistency: each group
> should have a name, even with an exponent of 1.

Oops... Of course, "even with an exponent of 0".
-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:
Hi

2016-02-15 10:16 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
> On 12/02/16 10:19, Dean Rasheed wrote:
>> This seems like a reasonable first patch for me as a committer, so
>> I'll take it unless anyone else was planning to do so.
>

So looking at this, it seems that for the most part pg_size_bytes()
will parse any output produced by pg_size_pretty(). The exception is
that there are 2 versions of pg_size_pretty(), one that takes bigint
and one that takes numeric, whereas pg_size_bytes() returns bigint, so
it can't handle all inputs. Is there any reason not to make
pg_size_bytes() return numeric? 

It would still be compatible with the example use cases, but it would
be a better inverse of both variants of pg_size_pretty() and would be
more future-proof. It already works internally using numeric, so it's
a trivial change to make now, but impossible to change in the future
without introducing a new function with a different name, which is
messy.

Thoughts?

This is a question. I have not a strong opinion about it. There are no any technical objection - the result will be +/- same. But you will enforce Numeric into outer expression evaluation.

The result will not be used together with function pg_size_pretty, but much more with functions pg_relation_size, pg_relation_size, .. and these functions doesn't return Numeric. These functions returns bigint. Bigint is much more natural type for this purpose.

Is there any use case for Numeric?

Regards

Pavel
 

Regards,
Dean

Re: custom function for converting human readable sizes to bytes

From
Dean Rasheed
Date:
On 16 February 2016 at 05:01, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2016-02-15 10:16 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
>> Is there any reason not to make
>> pg_size_bytes() return numeric?
>>
> This is a question. I have not a strong opinion about it. There are no any
> technical objection - the result will be +/- same. But you will enforce
> Numeric into outer expression evaluation.
>
> The result will not be used together with function pg_size_pretty, but much
> more with functions pg_relation_size, pg_relation_size, .. and these
> functions doesn't return Numeric. These functions returns bigint. Bigint is
> much more natural type for this purpose.
>
> Is there any use case for Numeric?
>

[Shrug] I don't really have a strong opinion about it either, but it
seemed that maybe the function might have some wider uses beyond just
comparing object sizes, and since it's already computing the numeric
size, it might as well just return it.


Looking at the rest of the patch, I think there are other things that
need tidying up -- the unit parsing code for one. This is going to
some effort to reuse the memory_unit_conversion_table from guc.c, but
the result is that it has added quite a bit of new code and now the
responsibility for parsing different units is handled by different
functions in different files, which IMO is quite messy. Worse, the
error message is wrong and misleading:

select pg_size_bytes('10 bytes'); -- OK
select pg_size_bytes('10 gallons');
ERROR:  invalid size: "10 gallons"
DETAIL:  Invalid size unit "gallons"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

which fails to mention that "bytes" is also a valid unit.

Fixing that in parse_memory_unit() would be messy because it assumes a
base unit of kB, so it would require a negative multiplier, and
pg_size_bytes() would have to be taught to divide by the magnitude of
negative multipliers in the same way that guc.c does.

ISTM that it would be far less code, and much simpler and more
readable to just parse the supported units directly in
pg_size_bytes(), rather than trying to share code with guc.c, when the
supported units are actually different and may well diverge further in
the future.

I'll try to hack something up, and see what it looks like.

Regards,
Dean



Re: custom function for converting human readable sizes to bytes

From
Pavel Stehule
Date:


2016-02-16 11:25 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
On 16 February 2016 at 05:01, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2016-02-15 10:16 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
>> Is there any reason not to make
>> pg_size_bytes() return numeric?
>>
> This is a question. I have not a strong opinion about it. There are no any
> technical objection - the result will be +/- same. But you will enforce
> Numeric into outer expression evaluation.
>
> The result will not be used together with function pg_size_pretty, but much
> more with functions pg_relation_size, pg_relation_size, .. and these
> functions doesn't return Numeric. These functions returns bigint. Bigint is
> much more natural type for this purpose.
>
> Is there any use case for Numeric?
>

[Shrug] I don't really have a strong opinion about it either, but it
seemed that maybe the function might have some wider uses beyond just
comparing object sizes, and since it's already computing the numeric
size, it might as well just return it.

I see only one disadvantage - when we return numeric, then all following expression will be in numeric, and for some functions, or some expression the users will have to cast explicitly to bigint.
 


Looking at the rest of the patch, I think there are other things that
need tidying up -- the unit parsing code for one. This is going to
some effort to reuse the memory_unit_conversion_table from guc.c, but
the result is that it has added quite a bit of new code and now the
responsibility for parsing different units is handled by different
functions in different files, which IMO is quite messy. Worse, the
error message is wrong and misleading:

select pg_size_bytes('10 bytes'); -- OK
select pg_size_bytes('10 gallons');
ERROR:  invalid size: "10 gallons"
DETAIL:  Invalid size unit "gallons"
HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".

which fails to mention that "bytes" is also a valid unit.

true
 

Fixing that in parse_memory_unit() would be messy because it assumes a
base unit of kB, so it would require a negative multiplier, and
pg_size_bytes() would have to be taught to divide by the magnitude of
negative multipliers in the same way that guc.c does.

ISTM that it would be far less code, and much simpler and more
readable to just parse the supported units directly in
pg_size_bytes(), rather than trying to share code with guc.c, when the
supported units are actually different and may well diverge further in
the future.

yes, if we have different units, then independent control tables has more sense.

 

I'll try to hack something up, and see what it looks like.

Regards,
Dean

Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 2/16/16, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 16 February 2016 at 05:01, Pavel Stehule <pavel.stehule@gmail.com>
> wrote:
>> 2016-02-15 10:16 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
>>> Is there any reason not to make
>>> pg_size_bytes() return numeric?
>>>
>> This is a question. I have not a strong opinion about it. There are no
>> any
>> technical objection - the result will be +/- same. But you will enforce
>> Numeric into outer expression evaluation.
>>
>> The result will not be used together with function pg_size_pretty, but
>> much
>> more with functions pg_relation_size, pg_relation_size, .. and these
>> functions doesn't return Numeric. These functions returns bigint. Bigint
>> is
>> much more natural type for this purpose.
>>
>> Is there any use case for Numeric?
>>
>
> [Shrug] I don't really have a strong opinion about it either, but it
> seemed that maybe the function might have some wider uses beyond just
> comparing object sizes, and since it's already computing the numeric
> size, it might as well just return it.

I agree with Pavel, it leads to a comparison in numeric, which is
overhead. int8 can always be casted to numeric on-demand, but not vice
versa. The main reasons to return int8 instead of numeric are
performance and inability to imagine use case where so big numbers
could take place.

> Looking at the rest of the patch, I think there are other things that
> need tidying up -- the unit parsing code for one. This is going to
> some effort to reuse the memory_unit_conversion_table from guc.c, but
> the result is that it has added quite a bit of new code and now the
> responsibility for parsing different units is handled by different
> functions in different files, which IMO is quite messy. Worse, the
> error message is wrong and misleading:
>
> select pg_size_bytes('10 bytes'); -- OK
> select pg_size_bytes('10 gallons');
> ERROR:  invalid size: "10 gallons"
> DETAIL:  Invalid size unit "gallons"
> HINT:  Valid units for this parameter are "kB", "MB", "GB", and "TB".
>
> which fails to mention that "bytes" is also a valid unit.

:-D

Yes, it is fail. I missed it...

> Fixing that in parse_memory_unit() would be messy because it assumes a
> base unit of kB, so it would require a negative multiplier, and
> pg_size_bytes() would have to be taught to divide by the magnitude of
> negative multipliers in the same way that guc.c does.

I guess the best way is to simply make "bytes" a valid size unit even
in GUC by adding it to the memory_unit_conversion_table with
reflecting it in memory_units_hint and removing an extra checking from
pg_size_bytes.

> ISTM that it would be far less code, and much simpler and more
> readable to just parse the supported units directly in
> pg_size_bytes(), rather than trying to share code with guc.c, when the
> supported units are actually different and may well diverge further in
> the future.
>
> I'll try to hack something up, and see what it looks like.
>
> Regards,
> Dean

-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 2/16/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> On 2/16/16, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> On 16 February 2016 at 05:01, Pavel Stehule <pavel.stehule@gmail.com>
>> wrote:
>>> 2016-02-15 10:16 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
>> Fixing that in parse_memory_unit() would be messy because it assumes a
>> base unit of kB, so it would require a negative multiplier, and
>> pg_size_bytes() would have to be taught to divide by the magnitude of
>> negative multipliers in the same way that guc.c does.

Well... pg_size_bytes already knows what negative multiplier is (see
the constant skip_unit), but it seems it is hard to teach other
functions expecting values in KB to understand negative multipliers as
dividers.

> I guess the best way is to simply make "bytes" a valid size unit even
> in GUC by adding it to the memory_unit_conversion_table

Oops. I forgot they are not in bytes.

> with reflecting it in memory_units_hint and removing
> an extra checking from pg_size_bytes.
>
>> ISTM that it would be far less code, and much simpler and more
>> readable to just parse the supported units directly in
>> pg_size_bytes(), rather than trying to share code with guc.c, when the
>> supported units are actually different and may well diverge further in
>> the future.

Initially it was not shared with guc.c and now it is by the letter[1]
of Oleksandr Shulgin and Robert Haas.

I guess since parse_memory_unit uses memory_unit_conversion_table
where values (not for GUC_UNIT_KB but nevertheless) can be negatives
and for KB it can be a single value (-1024), move "bytes" check to
parse_memory_unit, add a constant memory_units_bytes_hint as
copy-paste with included "bytes" size unit (and with a comment it is
special for parse_memory_unit function).

Also change "skip_unit" in dbsize.c to -1024.

>> I'll try to hack something up, and see what it looks like.

[1]http://www.postgresql.org/message-id/CA+TgmoaNoT7uGjsbiBFUQgacHk2LZrypMKJVVUgp5r197YUvTg@mail.gmail.com
-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 2/16/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> On 2/16/16, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> Fixing that in parse_memory_unit() would be messy because it assumes a
>> base unit of kB, so it would require a negative multiplier, and
>> pg_size_bytes() would have to be taught to divide by the magnitude of
>> negative multipliers in the same way that guc.c does.

Now parse_memory_unit returns -1024 for bytes as divider, constant
"bytes" has moved there.
Add new memory_units_bytes_hint which differs from an original
memory_units_int by "bytes" size unit.
Allow hintmsg be NULL and if so, skip setting dereferenced variable to
memory_units_bytes_hint.

>> ISTM that it would be far less code, and much simpler and more
>> readable to just parse the supported units directly in
>> pg_size_bytes(), rather than trying to share code with guc.c, when the
>> supported units are actually different and may well diverge further in
>> the future.
>
> Initially it was not shared with guc.c and now it is by the letter[1]
> of Oleksandr Shulgin and Robert Haas.
>
>> I'll try to hack something up, and see what it looks like.
>>
>> Regards,
>> Dean

Current version contains correct hint message, it passes tests.
Besides mentioned above there are some comment rewordings, and
variable renaming.

[1]http://www.postgresql.org/message-id/CA+TgmoaNoT7uGjsbiBFUQgacHk2LZrypMKJVVUgp5r197YUvTg@mail.gmail.com

--
Best regards,
Vitaly Burovoy

Attachment

Re: custom function for converting human readable sizes to bytes

From
Dean Rasheed
Date:
On 17 February 2016 at 00:39, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
> On 2/16/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
>> On 2/16/16, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>> Fixing that in parse_memory_unit() would be messy because it assumes a
>>> base unit of kB, so it would require a negative multiplier, and
>>> pg_size_bytes() would have to be taught to divide by the magnitude of
>>> negative multipliers in the same way that guc.c does.
>
> Now parse_memory_unit returns -1024 for bytes as divider, constant
> "bytes" has moved there.
> Add new memory_units_bytes_hint which differs from an original
> memory_units_int by "bytes" size unit.
> Allow hintmsg be NULL and if so, skip setting dereferenced variable to
> memory_units_bytes_hint.
>

I think that approach is getting more and more unwieldy, and it simply
isn't worth the effort just to share a few values from the unit
conversion table, especially given that the set of supported units
differs between the two places.


>>> ISTM that it would be far less code, and much simpler and more
>>> readable to just parse the supported units directly in
>>> pg_size_bytes(), rather than trying to share code with guc.c, when the
>>> supported units are actually different and may well diverge further in
>>> the future.
>>

I've gone with this approach and it is indeed far less code, and much
simpler and easier to read. This will also make it easier to
maintain/extend in the future.

I've made a few minor tweaks to the docs, and added a note to make it
clear that the units in these functions work in powers of 2 not 10.

I also took the opportunity to tidy up the number scanning code
somewhat (I was tempted to rip it out entirely, since it feels awfully
close to duplicating the numeric code, but it's probably worth it for
the better error message).

Additionally, note that I replaced strcasecmp() with pg_strcasecmp(),
since AIUI the former is not available on all supported platforms.

Barring objections, and subject to some more testing, I intend to
commit this version.

Regards,
Dean

Attachment

Re: custom function for converting human readable sizes to bytes

From
Vitaly Burovoy
Date:
On 2/17/16, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 17 February 2016 at 00:39, Vitaly Burovoy <vitaly.burovoy@gmail.com>
> wrote:
>> Now parse_memory_unit returns -1024 for bytes as divider, constant
>> "bytes" has moved there.
>> Add new memory_units_bytes_hint which differs from an original
>> memory_units_int by "bytes" size unit.
>> Allow hintmsg be NULL and if so, skip setting dereferenced variable to
>> memory_units_bytes_hint.
>>
>
> I think that approach is getting more and more unwieldy, and it simply
> isn't worth the effort just to share a few values from the unit
> conversion table, especially given that the set of supported units
> differs between the two places.
>
>> On 2/16/16, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>> ISTM that it would be far less code, and much simpler and more
>>> readable to just parse the supported units directly in
>>> pg_size_bytes(), rather than trying to share code with guc.c, when the
>>> supported units are actually different and may well diverge further in
>>> the future.
>
> I've gone with this approach and it is indeed far less code, and much
> simpler and easier to read. This will also make it easier to
> maintain/extend in the future.
>
> I've made a few minor tweaks to the docs, and added a note to make it
> clear that the units in these functions work in powers of 2 not 10.
>
> I also took the opportunity to tidy up the number scanning code
> somewhat (I was tempted to rip it out entirely, since it feels awfully
> close to duplicating the numeric code, but it's probably worth it for
> the better error message).

Yes, the better error message was the only reason to leave that scan.

> Additionally, note that I replaced strcasecmp() with pg_strcasecmp(),
> since AIUI the former is not available on all supported platforms.

Thank you. I didn't know that function exists.

> Barring objections, and subject to some more testing, I intend to
> commit this version.
>
> Regards,
> Dean

Than you for your work. Your version seems even better that refactored by me.
But I have several questions (below).

> +    bool        have_digits;
Agree, "main_digits" and "error" together was redundant, "have_digits"
is enough.


> +    else
> +        have_digits = false;
Does it worth to move it to the declaration and remove that "else" block?
+    bool        have_digits = false;


Is it important to use:
> +  ObjectIdGetDatum(InvalidOid),
> +  Int32GetDatum(-1)));
instead of "0, -1"? Previously I left immediate constants because I
found the same thing in jsonb.c (rows 363, 774)...


> +  if (pg_strcasecmp(strptr, "bytes") == 0)
> +  else if
> +  else if
> +  else if
It seems little ugly for me. In that case (not using values from GUC)
I'd create a static array for it and do a small cycle via it. Would it
better?


> + errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
Agree. Usage of "text_to_cstring" again is a good idea for improving
readability.


> + NumericGetDatum(mul_num),
Two more space for indentation.


Also I've found that with allowing exponent there is a weird result
(we tried to avoid "type numeric" in the error message):
SELECT pg_size_bytes('123e100000000000000000000000000000000000000000000000000000000000000000000000
kB');
ERROR:  invalid input syntax for type numeric:
"123e100000000000000000000000000000000000000000000000000000000000000000000000"


[1]http://www.postgresql.org/docs/9.5/static/error-style-guide.html#AEN110702
-- 
Best regards,
Vitaly Burovoy



Re: custom function for converting human readable sizes to bytes

From
Dean Rasheed
Date:
On 18 February 2016 at 02:00, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
>> +     else
>> +             have_digits = false;
> Does it worth to move it to the declaration and remove that "else" block?
> +       bool            have_digits = false;

OK, that's probably a bit neater.


> Is it important to use:
>> +  ObjectIdGetDatum(InvalidOid),
>> +  Int32GetDatum(-1)));
> instead of "0, -1"? Previously I left immediate constants because I
> found the same thing in jsonb.c (rows 363, 774)...

I think it's preferable to use the macros because they make it clearer
what datatypes are being passed around. I think that's the more common
style, but the JSONB code is not technically wrong either.


>> +  if (pg_strcasecmp(strptr, "bytes") == 0)
>> +  else if
>> +  else if
>> +  else if
> It seems little ugly for me. In that case (not using values from GUC)
> I'd create a static array for it and do a small cycle via it. Would it
> better?

That's a matter of personal preference. I prefer the values inline
because then the values are closer to where they're being used, which
for me makes it easier to follow (see e.g., convert_priv_string()).

In guc.c they're in lookup tables because they're referred to from
multiple places, and because it needs to switch between tables based
on context, neither of which is true here. If there is a need to
re-use these values elsewhere in the future it can be refactored, but
right now that feels like an over-engineered solution for this
problem.


>> + NumericGetDatum(mul_num),
> Two more space for indentation.

pgindent pulled that back.


> Also I've found that with allowing exponent there is a weird result
> (we tried to avoid "type numeric" in the error message):
> SELECT pg_size_bytes('123e100000000000000000000000000000000000000000000000000000000000000000000000
> kB');
> ERROR:  invalid input syntax for type numeric:
> "123e100000000000000000000000000000000000000000000000000000000000000000000000"

OK, I'll add a check for that.
Thanks for testing.

Regards,
Dean



Re: custom function for converting human readable sizes to bytes

From
Dean Rasheed
Date:
On 18 February 2016 at 10:05, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> OK, I'll add a check for that.
> Thanks for testing.
>

Pushed, with a catversion bump.

Regards,
Dean