Thread: [PATCH] expand the units that pg_size_pretty supports on output

[PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
Hi folks,

Enclosed is a patch that expands the unit output for pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing numeric output code to account for the larger number of units we're using rather than just adding nesting levels.

There are also a few other places that could benefit from expanded units, but this is a standalone starting point.

Best,

David
Attachment

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
Michael Paquier
Date:
On Wed, Apr 14, 2021 at 11:13:47AM -0500, David Christensen wrote:
> Enclosed is a patch that expands the unit output for
> pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing
> numeric output code to account for the larger number of units we're using
> rather than just adding nesting levels.
>
> There are also a few other places that could benefit from expanded units,
> but this is a standalone starting point.

Please don't forget to add this patch to the next commit fest of July
if you want to get some reviews:
https://commitfest.postgresql.org/33/

Note that the development of Postgres 14 is over, and that there was a
feature freeze last week, but this can be considered for 15.
--
Michael

Attachment

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
A second patch to teach the same units to pg_size_bytes().

Best,

David

On Wed, Apr 14, 2021 at 11:13 AM David Christensen <david.christensen@crunchydata.com> wrote:
Hi folks,

Enclosed is a patch that expands the unit output for pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing numeric output code to account for the larger number of units we're using rather than just adding nesting levels.

There are also a few other places that could benefit from expanded units, but this is a standalone starting point.

Best,

David
Attachment

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
Asif Rehman
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

Hi David,

I was reviewing this patch and the compilation failed with following error on CentOS 7.

dbsize.c: In function ‘pg_size_bytes’:
dbsize.c:808:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   const int unit_count = 9; /* sizeof units table */
   ^
dbsize.c:809:3: error: variable length array ‘units’ is used [-Werror=vla]
   const char *units[unit_count] = {
   ^

I believe "unit_count" ought to be a #define here.

Regards,
Asif

The new status of this patch is: Waiting on Author

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
New versions attached to address the initial CF feedback and rebase on HEAD as of now.

0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch 

- expands the units that pg_size_pretty() can handle up to YB.

0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch

- expands the units that pg_size_bytes() can handle, up to YB.
Attachment
>From: David Christensen <david.christensen@crunchydata.com> 
>Sent: Friday, June 4, 2021 4:18 AM
>To: PostgreSQL-development <pgsql-hackers@postgresql.org>
>Subject: Re: [PATCH] expand the units that pg_size_pretty supports on output
>
>New versions attached to address the initial CF feedback and rebase on HEAD as of now.
>
>0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch 
>
>- expands the units that pg_size_pretty() can handle up to YB.
>
>0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch
>
>- expands the units that pg_size_bytes() can handle, up to YB.
>
I don't see the need to extend the unit to YB.
What use case do you have in mind?

Regards,
Shinya Kato

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
> I don't see the need to extend the unit to YB.
> What use case do you have in mind?

Practical or no, I saw no reason not to support all defined units. I assume we’ll get to a need sooner or later. :)

David




>> I don't see the need to extend the unit to YB.
>> What use case do you have in mind?
>
>Practical or no, I saw no reason not to support all defined units. I assume we’ll
>get to a need sooner or later. :)

Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?

Best regards,
Shinya Kato


Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Tue, 15 Jun 2021 at 21:24, <Shinya11.Kato@nttdata.com> wrote:
> Hmmm, I didn't think YB was necessary, but what do others think?

For me personally, without consulting Wikipedia, I know that Petabyte
comes after Terabyte and then I'm pretty sure it's Exabyte.  After
that, I'd need to check.

Assuming I'm not the only person who can't tell exactly how many bytes
are in a Yottabyte, would it actually be a readability improvement if
we started showing these units to people?

I'd say there might be some argument to implement as far as PB one
day, maybe not that far out into the future, especially if we got
something like built-in clustering. But I just don't think there's any
need to go all out and take it all the way to YB.  There's an above
zero chance we'll break something of someones by doing this, so I
think any changes here should be driven off an actual requirement.

I really think this change is more likely to upset someone than please someone.

Just my thoughts.

David



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
Isaac Morland
Date:
On Tue, 15 Jun 2021 at 05:24, <Shinya11.Kato@nttdata.com> wrote:
>> I don't see the need to extend the unit to YB.
>> What use case do you have in mind?
>
>Practical or no, I saw no reason not to support all defined units. I assume we’ll
>get to a need sooner or later. :)

Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?

If I’m reading the code correctly, the difference between supporting YB and not supporting it is whether there is a line for it in the list of prefixes and their multiples. As such, I don’t see why we’re even discussing whether or not to include all the standard prefixes. A YB is still an absurd amount of storage, but that’s not the point; just put all the standard prefixes and be done with it. If actual code changes were required in the new code as they are in the old it might be worth discussing.

One question: why is there no “k” in the list of prefixes?

Also: why not have only the prefixes in the array, and use a single fixed output format "%s %sB" all the time?

It feels like it should be possible to calculate the appropriate idx to use (while adjusting the number to print as is done now) and then just have one psprintf call for all cases.

A more significant question is YB vs. YiB. I know there is a long tradition within computer-related fields of saying that k = 1024, M = 1024^2, etc., but we’re not special enough to override the more general principles of SI (Système International) which provide that k = 1000, M = 1000^2 and so on universally and provide the alternate prefixes ki, Mi, etc. which use 1024 as the multiple.

So I would suggest either display 2000000 as 2MB or as 1.907MiB.

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
On Tue, Jun 15, 2021 at 8:31 AM Isaac Morland <isaac.morland@gmail.com> wrote:
On Tue, 15 Jun 2021 at 05:24, <Shinya11.Kato@nttdata.com> wrote:
>> I don't see the need to extend the unit to YB.
>> What use case do you have in mind?
>
>Practical or no, I saw no reason not to support all defined units. I assume we’ll
>get to a need sooner or later. :)

Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?

If I’m reading the code correctly, the difference between supporting YB and not supporting it is whether there is a line for it in the list of prefixes and their multiples. As such, I don’t see why we’re even discussing whether or not to include all the standard prefixes. A YB is still an absurd amount of storage, but that’s not the point; just put all the standard prefixes and be done with it. If actual code changes were required in the new code as they are in the old it might be worth discussing.

Agreed, this is why I went this way.  One and done.
 
One question: why is there no “k” in the list of prefixes?

kB has a special-case code block before you get to this point.  I didn't look into the reasons, but assume there are some.
 
Also: why not have only the prefixes in the array, and use a single fixed output format "%s %sB" all the time?

It feels like it should be possible to calculate the appropriate idx to use (while adjusting the number to print as is done now) and then just have one psprintf call for all cases.

Sure, if that seems more readable/understandable.
 
A more significant question is YB vs. YiB. I know there is a long tradition within computer-related fields of saying that k = 1024, M = 1024^2, etc., but we’re not special enough to override the more general principles of SI (Système International) which provide that k = 1000, M = 1000^2 and so on universally and provide the alternate prefixes ki, Mi, etc. which use 1024 as the multiple.

So I would suggest either display 2000000 as 2MB or as 1.907MiB.

Heh, I was just expanding the existing logic; if others want to have this particular battle go ahead and I'll adjust the code/prefixes, but obviously the logic will need to change if we want to support true MB instead of MiB as MB.

Also, this will presumably be a breaking change for anyone using the existing units MB == 1024 * 1024, as we've had for something like 20 years.  Changing these units to the *iB will be trivial with this patch, but not looking forward to garnering the consensus to change this part.

David

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
On Tue, Jun 15, 2021 at 8:26 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 15 Jun 2021 at 21:24, <Shinya11.Kato@nttdata.com> wrote:
> Hmmm, I didn't think YB was necessary, but what do others think?

For me personally, without consulting Wikipedia, I know that Petabyte
comes after Terabyte and then I'm pretty sure it's Exabyte.  After
that, I'd need to check.

Assuming I'm not the only person who can't tell exactly how many bytes
are in a Yottabyte, would it actually be a readability improvement if
we started showing these units to people?

I hadn't really thought about that TBH; to me it seemed like an improvement, but I do see that others might not, and adding confusion is definitely not helpful.  That said, it seems like having the code structured in a way that we can expand via adding an element to a table instead of the existing way it's written with nested if blocks is still a useful refactor, whatever we decide the cutoff units should be.
 
I'd say there might be some argument to implement as far as PB one
day, maybe not that far out into the future, especially if we got
something like built-in clustering. But I just don't think there's any
need to go all out and take it all the way to YB.  There's an above
zero chance we'll break something of someones by doing this, so I
think any changes here should be driven off an actual requirement.

I got motivated to do this due to some (granted synthetic) work/workloads, where I was seeing 6+digit TB numbers and thought it was ugly.  Looked at the code and thought the refactor was the way to go, and just stuck all of the known units in.
 
I really think this change is more likely to upset someone than please someone.

I'd be interested to see reactions from people; to me, it seems a +1, but seems like -1, 0, +1 all valid opinions here; I'd expect more 0's and +1s, but I'm probably biased since I wrote this. :-)
 

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Wed, 16 Jun 2021 at 02:58, David Christensen
<david.christensen@crunchydata.com> wrote:
> That said, it seems like having the code structured in a way that we can expand via adding an element to a table
insteadof the existing way it's written with nested if blocks is still a useful refactor, whatever we decide the cutoff
unitsshould be.
 

I had not really looked at the patch, but if there's a cleanup portion
to the same patch as you're adding the YB too, then maybe it's worth
separating those out into another patch so that the two can be
considered independently.

David



>I had not really looked at the patch, but if there's a cleanup portion to the same
>patch as you're adding the YB too, then maybe it's worth separating those out
>into another patch so that the two can be considered independently.

I agree with this opinion. It seems to me that we should think about units and refactoring separately.
Sorry for the confusion.

Best regards,
Shinya Kato

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
>> I had not really looked at the patch, but if there's a cleanup portion to the same
>> patch as you're adding the YB too, then maybe it's worth separating those out
>> into another patch so that the two can be considered independently.
>
> I agree with this opinion. It seems to me that we should think about units and refactoring separately.
> Sorry for the confusion.

Sure thing, I think that makes sense. Refactor with existing units and debate the number of additions units to include.
Ido think Petabytes and Exabytes are at least within the realm of ones we should include; less tied to ZB and YB; just
includedfor completeness.  


Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
Shinya11.Kato@nttdata.com writes:

>>I had not really looked at the patch, but if there's a cleanup portion to the same
>>patch as you're adding the YB too, then maybe it's worth separating those out
>>into another patch so that the two can be considered independently.
>
> I agree with this opinion. It seems to me that we should think about units and refactoring separately.
> Sorry for the confusion.
>
> Best regards,
> Shinya Kato

Hi folks,

Had some time to rework this patch from the two that had previously been
here into two separate parts:

1) A basic refactor of the existing code to easily handle expanding the
units we use into a table-based format.  This also includes changing the
return value of `pg_size_bytes()` from an int64 into a numeric, and
minor test adjustments to reflect this.

2) Expanding the units that both pg_size_bytes() and pg_size_pretty()
recognize up through Yottabytes.  This includes documentation and test
updates to reflect the changes made here.  How many additional units we
add here is up for discussion (inevitably), but my opinion remains that
there is no harm in supporting all units available.


Best,

David

From ac30b06e3ddcb57eebb380560c2f4a47430dfd74 Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Tue, 29 Jun 2021 10:20:05 -0500
Subject: [PATCH 1/2] Refactor pg_size_pretty and pg_size_bytes to allow for
 supported unit expansion

---
 src/backend/utils/adt/dbsize.c       | 90 ++++++++++++++++------------
 src/include/catalog/pg_proc.dat      |  2 +-
 src/test/regress/expected/dbsize.out |  4 --
 src/test/regress/sql/dbsize.sql      |  2 -
 4 files changed, 53 insertions(+), 45 deletions(-)

diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 9c39e7d3b3..df08845932 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -638,7 +638,7 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
     Numeric        size = PG_GETARG_NUMERIC(0);
     Numeric        limit,
                 limit2;
-    char       *result;
+    char       *result = NULL;
 
     limit = int64_to_numeric(10 * 1024);
     limit2 = int64_to_numeric(10 * 1024 * 2 - 1);
@@ -660,31 +660,32 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
         }
         else
         {
-            /* size >>= 10 */
-            size = numeric_shift_right(size, 10);
-            if (numeric_is_less(numeric_absolute(size), limit2))
-            {
-                size = numeric_half_rounded(size);
-                result = psprintf("%s MB", numeric_to_cstring(size));
-            }
-            else
-            {
+            int idx, max_iter = 2; /* highest index of table_below */
+            char *output_formats[] = {
+                "%s MB",
+                "%s GB",
+                "%s TB"
+            };
+
+            for (idx = 0; idx < max_iter; idx++) {
                 /* size >>= 10 */
                 size = numeric_shift_right(size, 10);
-
                 if (numeric_is_less(numeric_absolute(size), limit2))
                 {
                     size = numeric_half_rounded(size);
-                    result = psprintf("%s GB", numeric_to_cstring(size));
-                }
-                else
-                {
-                    /* size >>= 10 */
-                    size = numeric_shift_right(size, 10);
-                    size = numeric_half_rounded(size);
-                    result = psprintf("%s TB", numeric_to_cstring(size));
+                    result = psprintf(output_formats[idx], numeric_to_cstring(size));
+                    break;
                 }
             }
+
+            if (!result) {
+                /* this uses the last format in the table above for anything else */
+
+                /* size >>= 10 */
+                size = numeric_shift_right(size, 10);
+                size = numeric_half_rounded(size);
+                result = psprintf(output_formats[max_iter], numeric_to_cstring(size));
+            }
         }
     }
 
@@ -703,7 +704,6 @@ pg_size_bytes(PG_FUNCTION_ARGS)
                *endptr;
     char        saved_char;
     Numeric        num;
-    int64        result;
     bool        have_digits = false;
 
     str = text_to_cstring(arg);
@@ -786,7 +786,16 @@ pg_size_bytes(PG_FUNCTION_ARGS)
     /* Handle possible unit */
     if (*strptr != '\0')
     {
-        int64        multiplier = 0;
+        int64        multiplier = 1;
+        int         i;
+        int         unit_count = 5; /* sizeof units table */
+        char       *units[] = {
+            "bytes",
+            "kb",
+            "mb",
+            "gb",
+            "tb",
+        };
 
         /* Trim any trailing whitespace */
         endptr = str + VARSIZE_ANY_EXHDR(arg) - 1;
@@ -797,21 +806,20 @@ pg_size_bytes(PG_FUNCTION_ARGS)
         endptr++;
         *endptr = '\0';
 
-        /* Parse the unit case-insensitively */
-        if (pg_strcasecmp(strptr, "bytes") == 0)
-            multiplier = (int64) 1;
-        else if (pg_strcasecmp(strptr, "kb") == 0)
-            multiplier = (int64) 1024;
-        else if (pg_strcasecmp(strptr, "mb") == 0)
-            multiplier = ((int64) 1024) * 1024;
-
-        else if (pg_strcasecmp(strptr, "gb") == 0)
-            multiplier = ((int64) 1024) * 1024 * 1024;
-
-        else if (pg_strcasecmp(strptr, "tb") == 0)
-            multiplier = ((int64) 1024) * 1024 * 1024 * 1024;
+        for (i = 0; i < unit_count; i++) {
+            printf("strptr: %s units: %s", strptr, units[i]);
+            if (pg_strcasecmp(strptr, units[i]) == 0)
+                break;
+            /* 
+             * Note: int64 isn't large enough to store the full multiplier
+             * going past ~ 9EB, but since this is a fixed value, we can apply
+             * it twice, thus storing use 2 ** 5 = 32 here, but 2 ** 10 = 1024
+             * on actual conversion to numeric.
+             */
+            multiplier *= 32;
+        }
 
-        else
+        if (i == unit_count)
             ereport(ERROR,
                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                      errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
@@ -827,13 +835,19 @@ pg_size_bytes(PG_FUNCTION_ARGS)
             num = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
                                                       NumericGetDatum(mul_num),
                                                       NumericGetDatum(num)));
+
+            /* second application to get around int64 limitations in unit multipliers */
+            num = DatumGetNumeric(DirectFunctionCall2(numeric_mul,
+                                                      NumericGetDatum(mul_num),
+                                                      NumericGetDatum(num)));
         }
     }
 
-    result = DatumGetInt64(DirectFunctionCall1(numeric_int8,
-                                               NumericGetDatum(num)));
+    /* now finally truncate, since this is always in integer-like units */
+    num = DatumGetNumeric(DirectFunctionCall1(numeric_ceil,
+                                              NumericGetDatum(num)));
 
-    PG_RETURN_INT64(result);
+    PG_RETURN_NUMERIC(num);
 }
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fde251fa4f..73326e3618 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7187,7 +7187,7 @@
   prosrc => 'pg_size_pretty_numeric' },
 { oid => '3334',
   descr => 'convert a size in human-readable format with size units into bytes',
-  proname => 'pg_size_bytes', prorettype => 'int8', proargtypes => 'text',
+  proname => 'pg_size_bytes', prorettype => 'numeric', proargtypes => 'text',
   prosrc => 'pg_size_bytes' },
 { oid => '2997',
   descr => 'disk space usage for the specified table, including TOAST, free space and visibility map',
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index e901a2c92a..624948ccd2 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -114,10 +114,6 @@ SELECT pg_size_bytes('1 AB A    ');
 ERROR:  invalid size: "1 AB A    "
 DETAIL:  Invalid size unit: "AB A".
 HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
-SELECT pg_size_bytes('9223372036854775807.9');
-ERROR:  bigint out of range
-SELECT pg_size_bytes('1e100');
-ERROR:  bigint out of range
 SELECT pg_size_bytes('1e1000000000000000000');
 ERROR:  value overflows numeric format
 SELECT pg_size_bytes('1 byte');  -- the singular "byte" is not supported
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index d10a4d7f68..37023a01c3 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -34,8 +34,6 @@ SELECT size, pg_size_bytes(size) FROM
 SELECT pg_size_bytes('1 AB');
 SELECT pg_size_bytes('1 AB A');
 SELECT pg_size_bytes('1 AB A    ');
-SELECT pg_size_bytes('9223372036854775807.9');
-SELECT pg_size_bytes('1e100');
 SELECT pg_size_bytes('1e1000000000000000000');
 SELECT pg_size_bytes('1 byte');  -- the singular "byte" is not supported
 SELECT pg_size_bytes('');
-- 
2.30.1 (Apple Git-130)

From e3fc63d9be6af5725a0ed24e16513cb711631f4a Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Tue, 29 Jun 2021 10:02:50 -0500
Subject: [PATCH 2/2] Full expansion of all units

---
 doc/src/sgml/func.sgml               |   2 +-
 src/backend/utils/adt/dbsize.c       |  22 +++-
 src/test/regress/expected/dbsize.out | 180 +++++++++++++++++----------
 src/test/regress/sql/dbsize.sql      |  39 ++++--
 4 files changed, 164 insertions(+), 79 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6388385edc..687d2a7ac8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26354,7 +26354,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
        </para>
        <para>
         Converts a size in bytes into a more easily human-readable format with
-        size units (bytes, kB, MB, GB or TB as appropriate).  Note that the
+        size units (bytes, kB, MB, GB, TB, PB, EB, ZB, or YB as appropriate).  Note that the
         units are powers of 2 rather than powers of 10, so 1kB is 1024 bytes,
         1MB is 1024<superscript>2</superscript> = 1048576 bytes, and so on.
        </para></entry>
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index df08845932..5650bb9f2b 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -660,11 +660,15 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS)
         }
         else
         {
-            int idx, max_iter = 2; /* highest index of table_below */
+            int idx, max_iter = 6; /* highest index of table_below */
             char *output_formats[] = {
                 "%s MB",
                 "%s GB",
-                "%s TB"
+                "%s TB",
+                "%s PB",
+                "%s EB",
+                "%s ZB",
+                "%s YB"
             };
 
             for (idx = 0; idx < max_iter; idx++) {
@@ -755,8 +759,9 @@ pg_size_bytes(PG_FUNCTION_ARGS)
         char       *cp;
 
         /*
-         * Note we might one day support EB units, so if what follows 'E'
-         * isn't a number, just treat it all as a unit to be parsed.
+         * If what follows 'e' isn't a number, we just treat it all as a unit
+         * to be parsed; this allows us to support both exponential notation
+         * and EB units.
          */
         exponent = strtol(endptr + 1, &cp, 10);
         (void) exponent;        /* Silence -Wunused-result warnings */
@@ -788,13 +793,17 @@ pg_size_bytes(PG_FUNCTION_ARGS)
     {
         int64        multiplier = 1;
         int         i;
-        int         unit_count = 5; /* sizeof units table */
+        int         unit_count = 9; /* sizeof units table */
         char       *units[] = {
             "bytes",
             "kb",
             "mb",
             "gb",
             "tb",
+            "pb",
+            "eb",
+            "zb",
+            "yb",
         };
 
         /* Trim any trailing whitespace */
@@ -824,7 +833,8 @@ pg_size_bytes(PG_FUNCTION_ARGS)
                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                      errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
                      errdetail("Invalid size unit: \"%s\".", strptr),
-                     errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", and \"TB\".")));
+                     errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", \"TB\", "
+                         "\"PB\", \"EB\", \"ZB\", and \"YB\".")));
 
         if (multiplier > 1)
         {
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index 624948ccd2..8993522904 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -13,77 +13,116 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
 (6 rows)
 
 SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
-    (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
-            (1000000000::numeric), (1000000000000::numeric),
+    (VALUES (10::numeric),
+            (1000::numeric),
+            (1000000::numeric),
+            (1000000000::numeric),
+            (1000000000000::numeric),
             (1000000000000000::numeric),
-            (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
-            (1000000000.5::numeric), (1000000000000.5::numeric),
-            (1000000000000000.5::numeric)) x(size);
-        size        | pg_size_pretty | pg_size_pretty 
---------------------+----------------+----------------
-                 10 | 10 bytes       | -10 bytes
-               1000 | 1000 bytes     | -1000 bytes
-            1000000 | 977 kB         | -977 kB
-         1000000000 | 954 MB         | -954 MB
-      1000000000000 | 931 GB         | -931 GB
-   1000000000000000 | 909 TB         | -909 TB
-               10.5 | 10.5 bytes     | -10.5 bytes
-             1000.5 | 1000.5 bytes   | -1000.5 bytes
-          1000000.5 | 977 kB         | -977 kB
-       1000000000.5 | 954 MB         | -954 MB
-    1000000000000.5 | 931 GB         | -931 GB
- 1000000000000000.5 | 909 TB         | -909 TB
-(12 rows)
+            (1000000000000000000::numeric),
+            (1000000000000000000000::numeric),
+            (1000000000000000000000000::numeric),
+            (1000000000000000000000000000::numeric),
+            (1000000000000000000000000000000::numeric),
+            (10.5::numeric),
+            (1000.5::numeric),
+            (1000000.5::numeric),
+            (1000000000.5::numeric),
+            (1000000000000.5::numeric),
+            (1000000000000000.5::numeric),
+            (1000000000000000000.5::numeric),
+            (1000000000000000000000.5::numeric),
+            (1000000000000000000000000.5::numeric),
+            (1000000000000000000000000000.5::numeric),
+            (1000000000000000000000000000000.5::numeric)
+            ) x(size);
+               size                | pg_size_pretty | pg_size_pretty 
+-----------------------------------+----------------+----------------
+                                10 | 10 bytes       | -10 bytes
+                              1000 | 1000 bytes     | -1000 bytes
+                           1000000 | 977 kB         | -977 kB
+                        1000000000 | 954 MB         | -954 MB
+                     1000000000000 | 931 GB         | -931 GB
+                  1000000000000000 | 909 TB         | -909 TB
+               1000000000000000000 | 888 PB         | -888 PB
+            1000000000000000000000 | 867 EB         | -867 EB
+         1000000000000000000000000 | 847 ZB         | -847 ZB
+      1000000000000000000000000000 | 827 YB         | -827 YB
+   1000000000000000000000000000000 | 827181 YB      | -827181 YB
+                              10.5 | 10.5 bytes     | -10.5 bytes
+                            1000.5 | 1000.5 bytes   | -1000.5 bytes
+                         1000000.5 | 977 kB         | -977 kB
+                      1000000000.5 | 954 MB         | -954 MB
+                   1000000000000.5 | 931 GB         | -931 GB
+                1000000000000000.5 | 909 TB         | -909 TB
+             1000000000000000000.5 | 888 PB         | -888 PB
+          1000000000000000000000.5 | 867 EB         | -867 EB
+       1000000000000000000000000.5 | 847 ZB         | -847 ZB
+    1000000000000000000000000000.5 | 827 YB         | -827 YB
+ 1000000000000000000000000000000.5 | 827181 YB      | -827181 YB
+(22 rows)
 
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
-            ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
-   size   |  pg_size_bytes   
-----------+------------------
- 1        |                1
- 123bytes |              123
- 1kB      |             1024
- 1MB      |          1048576
-  1 GB    |       1073741824
- 1.5 GB   |       1610612736
- 1TB      |    1099511627776
- 3000 TB  | 3298534883328000
- 1e6 MB   |    1048576000000
-(9 rows)
+            ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB'), ('5.1ZB'),
+            ('1.17 YB')) x(size);
+   size   |       pg_size_bytes       
+----------+---------------------------
+ 1        |                         1
+ 123bytes |                       123
+ 1kB      |                      1024
+ 1MB      |                   1048576
+  1 GB    |                1073741824
+ 1.5 GB   |                1610612736
+ 1TB      |             1099511627776
+ 3000 TB  |          3298534883328000
+ 1e6 MB   |             1048576000000
+ 99 PB    |        111464090777419776
+ 45 EB    |      51881467707308113920
+ 5.1ZB    |    6021017265658797647463
+ 1.17 YB  | 1414443208949116134406226
+(13 rows)
 
 -- case-insensitive units are supported
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
-            ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
-   size   |  pg_size_bytes   
-----------+------------------
- 1        |                1
- 123bYteS |              123
- 1kb      |             1024
- 1mb      |          1048576
-  1 Gb    |       1073741824
- 1.5 gB   |       1610612736
- 1tb      |    1099511627776
- 3000 tb  | 3298534883328000
- 1e6 mb   |    1048576000000
-(9 rows)
+            ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB'), ('5.1Zb'),
+            ('1.17 yb')) x(size);
+   size   |       pg_size_bytes       
+----------+---------------------------
+ 1        |                         1
+ 123bYteS |                       123
+ 1kb      |                      1024
+ 1mb      |                   1048576
+  1 Gb    |                1073741824
+ 1.5 gB   |                1610612736
+ 1tb      |             1099511627776
+ 3000 tb  |          3298534883328000
+ 1e6 mb   |             1048576000000
+ 99 pb    |        111464090777419776
+ 45 eB    |      51881467707308113920
+ 5.1Zb    |    6021017265658797647463
+ 1.17 yb  | 1414443208949116134406226
+(13 rows)
 
 -- negative numbers are supported
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
-            ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
-   size    |   pg_size_bytes   
------------+-------------------
- -1        |                -1
- -123bytes |              -123
- -1kb      |             -1024
- -1mb      |          -1048576
-  -1 Gb    |       -1073741824
- -1.5 gB   |       -1610612736
- -1tb      |    -1099511627776
- -3000 TB  | -3298534883328000
- -10e-1 MB |          -1048576
-(9 rows)
+            ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb'), ('-18YB')) x(size);
+   size    |        pg_size_bytes        
+-----------+-----------------------------
+ -1        |                          -1
+ -123bytes |                        -123
+ -1kb      |                       -1024
+ -1mb      |                    -1048576
+  -1 Gb    |                 -1073741824
+ -1.5 gB   |                 -1610612736
+ -1tb      |              -1099511627776
+ -3000 TB  |           -3298534883328000
+ -10e-1 MB |                    -1048576
+ -19e-4eb  |           -2190550858753009
+ -18YB     | -21760664753063325144711168
+(11 rows)
 
 -- different cases with allowed points
 SELECT size, pg_size_bytes(size) FROM
@@ -101,25 +140,38 @@ SELECT size, pg_size_bytes(size) FROM
  -.0 gb |             0
 (8 rows)
 
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+    pg_size_bytes    
+---------------------
+ 9223372036854775808
+(1 row)
+
+SELECT pg_size_bytes('1e100');
+                                             pg_size_bytes                                             
+-------------------------------------------------------------------------------------------------------
+ 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+(1 row)
+
 -- invalid inputs
 SELECT pg_size_bytes('1 AB');
 ERROR:  invalid size: "1 AB"
 DETAIL:  Invalid size unit: "AB".
-HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT:  Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
 SELECT pg_size_bytes('1 AB A');
 ERROR:  invalid size: "1 AB A"
 DETAIL:  Invalid size unit: "AB A".
-HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT:  Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
 SELECT pg_size_bytes('1 AB A    ');
 ERROR:  invalid size: "1 AB A    "
 DETAIL:  Invalid size unit: "AB A".
-HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT:  Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
 SELECT pg_size_bytes('1e1000000000000000000');
 ERROR:  value overflows numeric format
 SELECT pg_size_bytes('1 byte');  -- the singular "byte" is not supported
 ERROR:  invalid size: "1 byte"
 DETAIL:  Invalid size unit: "byte".
-HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT:  Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
 SELECT pg_size_bytes('');
 ERROR:  invalid size: ""
 SELECT pg_size_bytes('kb');
@@ -137,6 +189,6 @@ ERROR:  invalid size: ".+912"
 SELECT pg_size_bytes('+912+ kB');
 ERROR:  invalid size: "+912+ kB"
 DETAIL:  Invalid size unit: "+ kB".
-HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT:  Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB".
 SELECT pg_size_bytes('++123 kB');
 ERROR:  invalid size: "++123 kB"
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index 37023a01c3..2a4abcfc65 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -4,32 +4,55 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
             (1000000000000000::bigint)) x(size);
 
 SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
-    (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
-            (1000000000::numeric), (1000000000000::numeric),
+    (VALUES (10::numeric),
+            (1000::numeric),
+            (1000000::numeric),
+            (1000000000::numeric),
+            (1000000000000::numeric),
             (1000000000000000::numeric),
-            (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
-            (1000000000.5::numeric), (1000000000000.5::numeric),
-            (1000000000000000.5::numeric)) x(size);
+            (1000000000000000000::numeric),
+            (1000000000000000000000::numeric),
+            (1000000000000000000000000::numeric),
+            (1000000000000000000000000000::numeric),
+            (1000000000000000000000000000000::numeric),
+            (10.5::numeric),
+            (1000.5::numeric),
+            (1000000.5::numeric),
+            (1000000000.5::numeric),
+            (1000000000000.5::numeric),
+            (1000000000000000.5::numeric),
+            (1000000000000000000.5::numeric),
+            (1000000000000000000000.5::numeric),
+            (1000000000000000000000000.5::numeric),
+            (1000000000000000000000000000.5::numeric),
+            (1000000000000000000000000000000.5::numeric)
+            ) x(size);
 
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
-            ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
+            ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB'), ('5.1ZB'),
+            ('1.17 YB')) x(size);
 
 -- case-insensitive units are supported
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
-            ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
+            ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB'), ('5.1Zb'),
+            ('1.17 yb')) x(size);
 
 -- negative numbers are supported
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
-            ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
+            ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb'), ('-18YB')) x(size);
 
 -- different cases with allowed points
 SELECT size, pg_size_bytes(size) FROM
      (VALUES ('-1.'), ('-1.kb'), ('-1. kb'), ('-0. gb'),
              ('-.1'), ('-.1kb'), ('-.1 kb'), ('-.0 gb')) x(size);
 
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+SELECT pg_size_bytes('1e100');
+
 -- invalid inputs
 SELECT pg_size_bytes('1 AB');
 SELECT pg_size_bytes('1 AB A');
-- 
2.30.1 (Apple Git-130)


Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Wed, 30 Jun 2021 at 05:11, David Christensen
<david.christensen@crunchydata.com> wrote:
> 1) A basic refactor of the existing code to easily handle expanding the
> units we use into a table-based format.  This also includes changing the
> return value of `pg_size_bytes()` from an int64 into a numeric, and
> minor test adjustments to reflect this.

This is not quite what I had imagined when you said about adding a
table to make it easier to add new units in the future. I expected a
single table that handles all units, not just the ones above kB and
not one for each function.

There are actually two pg_size_pretty functions, one for BIGINT and
one for NUMERIC. I see you only changed the NUMERIC version.  I'd
expect them both to have the same treatment and use the same table so
there's consistency between the two functions.

The attached is more like what I had in mind. There's a very small net
reduction in lines of code with this and it also helps keep
pg_size_pretty() and pg_size_pretty_numeric() in sync.

I don't really like the fact that I had to add the doHalfRound field
to get the same rounding behaviour as the original functions.  I'm
wondering if it would just be too clever just to track how many bits
we've shifted right by in pg_size_pretty* and compare that to the
value of multishift for the current unit and do appropriate rounding
to get us to the value of multishift.  In theory, we could just keep
calling the half_rounded macro until we make it to the multishift
value.  My current thoughts are that it's unlikely that anyone would
twiddle with the size_pretty_units array in such a way for the extra
code to be worth it. Maybe someone else feels differently.

David

Attachment

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Mon, 5 Jul 2021 at 20:00, David Rowley <dgrowleyml@gmail.com> wrote:
> I don't really like the fact that I had to add the doHalfRound field
> to get the same rounding behaviour as the original functions.  I'm
> wondering if it would just be too clever just to track how many bits
> we've shifted right by in pg_size_pretty* and compare that to the
> value of multishift for the current unit and do appropriate rounding
> to get us to the value of multishift.  In theory, we could just keep
> calling the half_rounded macro until we make it to the multishift
> value.  My current thoughts are that it's unlikely that anyone would
> twiddle with the size_pretty_units array in such a way for the extra
> code to be worth it. Maybe someone else feels differently.

I made another pass over this and ended up removing the doHalfRound
field in favour of just doing rounding based on the previous
bitshifts.

I did a few other tidy ups and I think it's a useful patch as it
reduces the amount of code a bit and makes it dead simple to add new
units in the future.  Most importantly it'll help keep pg_size_pretty,
pg_size_pretty_numeric and pg_size_bytes all in sync in regards to
what units they support.

Does anyone want to have a look over this?  If not, I plan to push it
in the next day or so.

(I'm not sure why pgindent removed the space between != and NULL, but
it did, so I left it.)

David

Attachment

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
Dean Rasheed
Date:
On Tue, 6 Jul 2021 at 10:20, David Rowley <dgrowleyml@gmail.com> wrote:
>
> I made another pass over this and ended up removing the doHalfRound
> field in favour of just doing rounding based on the previous
> bitshifts.
>

When I first read this:

+            /* half-round until we get down to unitBits */
+            while (rightshifts++ < unit->unitBits)
+                size = half_rounded(size);

it looked to me like it would be invoking half_rounded() multiple
times, which raised alarm bells because that would risk rounding the
wrong way. Eventually I realised that by the time it reaches that,
rightshifts will always equal unit->unitBits or unit->unitBits - 1, so
it'll never do more than one half-round, which is important.

So perhaps using doHalfRound would be clearer, but it could just be a
local variable tracking whether or not it's the first time through the
loop.

Regards,
Dean



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Tue, 6 Jul 2021 at 23:39, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> When I first read this:
>
> +            /* half-round until we get down to unitBits */
> +            while (rightshifts++ < unit->unitBits)
> +                size = half_rounded(size);
>
> it looked to me like it would be invoking half_rounded() multiple
> times, which raised alarm bells because that would risk rounding the
> wrong way. Eventually I realised that by the time it reaches that,
> rightshifts will always equal unit->unitBits or unit->unitBits - 1, so
> it'll never do more than one half-round, which is important.

It's true that based on how the units table is set up now, it'll only
ever do it once for all but the first loop.

I wrote the attached .c file just to try to see if it ever goes wrong
and I didn't manage to find any inputs where it did.  I always seem to
get the half rounded value either the same as the shifted value or 1
higher towards positive infinity

$ ./half_rounded -102 3
1. half_round(-102) == -51 :: -102 >> 1 = -51
2. half_round(-51) == -25 :: -51 >> 1 = -26
3. half_round(-25) == -12 :: -26 >> 1 = -13

$ ./half_rounded 6432 3
1. half_round(6432) == 3216 :: 6432 >> 1 = 3216
2. half_round(3216) == 1608 :: 3216 >> 1 = 1608
3. half_round(1608) == 804 :: 1608 >> 1 = 804

Can you give an example where calling half_rounded too many times will
give the wrong value? Keeping in mind we call half_rounded the number
of times that the passed in value would need to be left-shifted by to
get the equivalent truncated value.

David

Attachment

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
Dean Rasheed
Date:
On Tue, 6 Jul 2021 at 13:15, David Rowley <dgrowleyml@gmail.com> wrote:
>
> Can you give an example where calling half_rounded too many times will
> give the wrong value? Keeping in mind we call half_rounded the number
> of times that the passed in value would need to be left-shifted by to
> get the equivalent truncated value.
>

./half_rounded 10241 10
1. half_round(10241) == 5121 :: 10241 >> 1 = 5120
2. half_round(5121) == 2561 :: 5120 >> 1 = 2560
3. half_round(2561) == 1281 :: 2560 >> 1 = 1280
4. half_round(1281) == 641 :: 1280 >> 1 = 640
5. half_round(641) == 321 :: 640 >> 1 = 320
6. half_round(321) == 161 :: 320 >> 1 = 160
7. half_round(161) == 81 :: 160 >> 1 = 80
8. half_round(81) == 41 :: 80 >> 1 = 40
9. half_round(41) == 21 :: 40 >> 1 = 20
10. half_round(21) == 11 :: 20 >> 1 = 10

The correct result should be 10 (it would be very odd to claim that
10241 bytes should be displayed as 11kb), but the half-rounding keeps
rounding up at each stage.

That's a general property of rounding -- you need to be very careful
when rounding more than once, since otherwise errors will propagate.
C.f. 4083f445c0, which removed a double-round in numeric sqrt().

To be clear, I'm not saying that the current code half-rounds more
than once, just that it reads as if it does.

Regards,
Dean



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
David Rowley writes:

> On Mon, 5 Jul 2021 at 20:00, David Rowley <dgrowleyml@gmail.com> wrote:
>> I don't really like the fact that I had to add the doHalfRound field
>> to get the same rounding behaviour as the original functions.  I'm
>> wondering if it would just be too clever just to track how many bits
>> we've shifted right by in pg_size_pretty* and compare that to the
>> value of multishift for the current unit and do appropriate rounding
>> to get us to the value of multishift.  In theory, we could just keep
>> calling the half_rounded macro until we make it to the multishift
>> value.  My current thoughts are that it's unlikely that anyone would
>> twiddle with the size_pretty_units array in such a way for the extra
>> code to be worth it. Maybe someone else feels differently.
>
> I made another pass over this and ended up removing the doHalfRound
> field in favour of just doing rounding based on the previous
> bitshifts.
>
> I did a few other tidy ups and I think it's a useful patch as it
> reduces the amount of code a bit and makes it dead simple to add new
> units in the future.  Most importantly it'll help keep pg_size_pretty,
> pg_size_pretty_numeric and pg_size_bytes all in sync in regards to
> what units they support.
>
> Does anyone want to have a look over this?  If not, I plan to push it
> in the next day or so.
>
> (I'm not sure why pgindent removed the space between != and NULL, but
> it did, so I left it.)
>
> David

I like the approach you took here; much cleaner to have one table for all of the individual
codepaths.  Testing worked as expected; if we do decide to expand the units table there will be a
few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch
to `numeric`).

Thanks,

David



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> Does anyone want to have a look over this?  If not, I plan to push it
> in the next day or so.

Minor nit: use "const char *text" in the struct declaration, so
that all of the static data can be placed in fixed storage.

> (I'm not sure why pgindent removed the space between != and NULL, but
> it did, so I left it.)

It did that because "text" is a typedef name, so it's a bit confused
about whether the statement is really a declaration.  Personally I'd
have used "name" or something like that for that field, anyway.

            regards, tom lane



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Wed, 7 Jul 2021 at 00:51, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> 10. half_round(21) == 11 :: 20 >> 1 = 10
>
> The correct result should be 10 (it would be very odd to claim that
> 10241 bytes should be displayed as 11kb), but the half-rounding keeps
> rounding up at each stage.
>
> That's a general property of rounding -- you need to be very careful
> when rounding more than once, since otherwise errors will propagate.
> C.f. 4083f445c0, which removed a double-round in numeric sqrt().

Thanks. I've adjusted the patch to re-add the round bool flag and get
rid of the rightShift field.  I'm now calculating how many bits to
shift right by based on the difference between the unitbits of the
current and next unit then taking 1 bit less if the next unit does
half rounding and the current one does not, or adding an extra bit on
in the opposite case.

I'll post another patch shortly.

David



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Wed, 7 Jul 2021 at 02:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Minor nit: use "const char *text" in the struct declaration, so
> that all of the static data can be placed in fixed storage.

Thanks for pointing that out.

> David Rowley <dgrowleyml@gmail.com> writes:
> > (I'm not sure why pgindent removed the space between != and NULL, but
> > it did, so I left it.)
>
> It did that because "text" is a typedef name, so it's a bit confused
> about whether the statement is really a declaration.  Personally I'd
> have used "name" or something like that for that field, anyway.

I should have thought of that. Thanks for highlighting it.  I've
renamed the field.

Updated patch attached.

David

Attachment

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Wed, 7 Jul 2021 at 02:46, David Christensen
<david.christensen@crunchydata.com> wrote:
> if we do decide to expand the units table there will be a
> few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch
> to `numeric`).

I wonder if it's worth changing pg_size_bytes() to return NUMERIC
regardless of if we add any additional units or not.

Would you like to create 2 patches, one to change the return type and
another to add the new units, both based on top of the v2 patch I sent
earlier?

David



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
Dean Rasheed
Date:
On Wed, 7 Jul 2021 at 03:47, David Rowley <dgrowleyml@gmail.com> wrote:
>
> Updated patch attached.
>

Hmm, this looked easy, but...

It occurred to me that there ought to be regression tests for the edge
cases where it steps from one unit to the next. So, in the style of
the existing regression tests, I tried the following:

SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
    (VALUES (10239::bigint), (10240::bigint),
            (10485247::bigint), (10485248::bigint),
            (10736893951::bigint), (10736893952::bigint),
            (10994579406847::bigint), (10994579406848::bigint),
            (11258449312612351::bigint), (11258449312612352::bigint)) x(size);

       size        | pg_size_pretty | pg_size_pretty
-------------------+----------------+----------------
             10239 | 10239 bytes    | -10239 bytes
             10240 | 10 kB          | -10 kB
          10485247 | 10239 kB       | -10 MB
          10485248 | 10 MB          | -10 MB
       10736893951 | 10239 MB       | -10 GB
       10736893952 | 10 GB          | -10 GB
    10994579406847 | 10239 GB       | -10 TB
    10994579406848 | 10 TB          | -10 TB
 11258449312612351 | 10239 TB       | -10239 TB
 11258449312612352 | 10240 TB       | -10239 TB
(10 rows)

SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
    (VALUES (10239::numeric), (10240::numeric),
            (10485247::numeric), (10485248::numeric),
            (10736893951::numeric), (10736893952::numeric),
            (10994579406847::numeric), (10994579406848::numeric),
            (11258449312612351::numeric), (11258449312612352::numeric)) x(size);

       size        | pg_size_pretty | pg_size_pretty
-------------------+----------------+----------------
             10239 | 10239 bytes    | -10239 bytes
             10240 | 10 kB          | -10 kB
          10485247 | 10239 kB       | -10239 kB
          10485248 | 10 MB          | -10 MB
       10736893951 | 10239 MB       | -10239 MB
       10736893952 | 10 GB          | -10 GB
    10994579406847 | 10239 GB       | -10239 GB
    10994579406848 | 10 TB          | -10 TB
 11258449312612351 | 10239 TB       | -10239 TB
 11258449312612352 | 10240 TB       | -10240 TB
(10 rows)

Under the assumption that what we're trying to achieve here is
schoolbook rounding (ties away from zero), the numeric results are
correct and the bigint results are wrong.

The reason is that bit shifting isn't the same as division for
negative numbers, since bit shifting rounds towards negative infinity
whereas division rounds towards zero (truncates), which is what I
think we really need.

Regards,
Dean



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
David Rowley writes:

> On Wed, 7 Jul 2021 at 02:46, David Christensen
> <david.christensen@crunchydata.com> wrote:
>> if we do decide to expand the units table there will be a
>> few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch
>> to `numeric`).
>
> I wonder if it's worth changing pg_size_bytes() to return NUMERIC
> regardless of if we add any additional units or not.
>
> Would you like to create 2 patches, one to change the return type and
> another to add the new units, both based on top of the v2 patch I sent
> earlier?
>
> David

Enclosed is the patch to change the return type to numeric, as well as one for expanding units to
add PB and EB.

If we decide to expand further, the current implementation will need to change, as
ZB and YB have 70 and 80 bits needing to be shifted accordingly, so int64 isn't enough to hold
it. (I fixed this particular issue in the original version of this patch, so there is at least a
blueprint of how to fix.)

I figured that PB and EB are probably good enough additions at this point, so we can debate whether
to add the others.

Best,

David

From 57bd2eafff5404313426a10f63b0b7098314998a Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 7 Jul 2021 11:46:09 -0500
Subject: [PATCH] Make pg_size_bytes() return numeric instead of bigint

---
 src/backend/utils/adt/dbsize.c       |  7 ++++---
 src/include/catalog/pg_proc.dat      |  2 +-
 src/test/regress/expected/dbsize.out | 17 +++++++++++++----
 src/test/regress/sql/dbsize.sql      |  6 ++++--
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4b7331d85c..a5a3ebe34e 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -813,10 +813,11 @@ pg_size_bytes(PG_FUNCTION_ARGS)
         }
     }
 
-    result = DatumGetInt64(DirectFunctionCall1(numeric_int8,
-                                               NumericGetDatum(num)));
+    /* now finally truncate, since this is always in integer-like units */
+    num = DatumGetNumeric(DirectFunctionCall1(numeric_ceil,
+                                              NumericGetDatum(num)));
 
-    PG_RETURN_INT64(result);
+    PG_RETURN_NUMERIC(num);
 }
 
 /*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fde251fa4f..73326e3618 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7187,7 +7187,7 @@
   prosrc => 'pg_size_pretty_numeric' },
 { oid => '3334',
   descr => 'convert a size in human-readable format with size units into bytes',
-  proname => 'pg_size_bytes', prorettype => 'int8', proargtypes => 'text',
+  proname => 'pg_size_bytes', prorettype => 'numeric', proargtypes => 'text',
   prosrc => 'pg_size_bytes' },
 { oid => '2997',
   descr => 'disk space usage for the specified table, including TOAST, free space and visibility map',
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index e901a2c92a..2950e017d0 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -101,6 +101,19 @@ SELECT size, pg_size_bytes(size) FROM
  -.0 gb |             0
 (8 rows)
 
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+    pg_size_bytes    
+---------------------
+ 9223372036854775808
+(1 row)
+
+SELECT pg_size_bytes('1e100');
+                                             pg_size_bytes                                             
+-------------------------------------------------------------------------------------------------------
+ 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
+(1 row)
+
 -- invalid inputs
 SELECT pg_size_bytes('1 AB');
 ERROR:  invalid size: "1 AB"
@@ -114,10 +127,6 @@ SELECT pg_size_bytes('1 AB A    ');
 ERROR:  invalid size: "1 AB A    "
 DETAIL:  Invalid size unit: "AB A".
 HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
-SELECT pg_size_bytes('9223372036854775807.9');
-ERROR:  bigint out of range
-SELECT pg_size_bytes('1e100');
-ERROR:  bigint out of range
 SELECT pg_size_bytes('1e1000000000000000000');
 ERROR:  value overflows numeric format
 SELECT pg_size_bytes('1 byte');  -- the singular "byte" is not supported
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index d10a4d7f68..e88184e9c9 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -30,12 +30,14 @@ SELECT size, pg_size_bytes(size) FROM
      (VALUES ('-1.'), ('-1.kb'), ('-1. kb'), ('-0. gb'),
              ('-.1'), ('-.1kb'), ('-.1 kb'), ('-.0 gb')) x(size);
 
+-- valid inputs outside bigint range (previous errors)
+SELECT pg_size_bytes('9223372036854775807.9');
+SELECT pg_size_bytes('1e100');
+
 -- invalid inputs
 SELECT pg_size_bytes('1 AB');
 SELECT pg_size_bytes('1 AB A');
 SELECT pg_size_bytes('1 AB A    ');
-SELECT pg_size_bytes('9223372036854775807.9');
-SELECT pg_size_bytes('1e100');
 SELECT pg_size_bytes('1e1000000000000000000');
 SELECT pg_size_bytes('1 byte');  -- the singular "byte" is not supported
 SELECT pg_size_bytes('');
-- 
2.30.1 (Apple Git-130)

From 8d6972abf01c01d2c8713d94bbb46ae44682195e Mon Sep 17 00:00:00 2001
From: David Christensen <david.christensen@crunchydata.com>
Date: Wed, 7 Jul 2021 12:37:41 -0500
Subject: [PATCH] Expand units for pg_size_*() up through EB

---
 src/backend/utils/adt/dbsize.c       |   5 +-
 src/test/regress/expected/dbsize.out | 138 +++++++++++++++++----------
 src/test/regress/sql/dbsize.sql      |  33 +++++--
 3 files changed, 114 insertions(+), 62 deletions(-)

diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index a5a3ebe34e..cd68ff10c8 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -54,6 +54,8 @@ static const struct size_pretty_unit size_pretty_units[] = {
     {"MB", 20 * 1024 - 1, 10, 20},
     {"GB", 20 * 1024 - 1, 10, 30},
     {"TB", 20 * 1024 - 1, 10, 40},
+    {"PB", 20 * 1024 - 1, 10, 50},
+    {"EB", 20 * 1024 - 1, 10, 60},
     {NULL, 0, 0, 0}
 };
 
@@ -799,7 +801,8 @@ pg_size_bytes(PG_FUNCTION_ARGS)
                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                      errmsg("invalid size: \"%s\"", text_to_cstring(arg)),
                      errdetail("Invalid size unit: \"%s\".", strptr),
-                     errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", and \"TB\".")));
+                     errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", \"TB\", "
+                         "\"PB\", and \"EB\".")));
 
         if (multiplier > 1)
         {
diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out
index 2950e017d0..cb260f69a3 100644
--- a/src/test/regress/expected/dbsize.out
+++ b/src/test/regress/expected/dbsize.out
@@ -13,65 +13,96 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
 (6 rows)
 
 SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
-    (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
-            (1000000000::numeric), (1000000000000::numeric),
+    (VALUES (10::numeric),
+            (1000::numeric),
+            (1000000::numeric),
+            (1000000000::numeric),
+            (1000000000000::numeric),
             (1000000000000000::numeric),
-            (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
-            (1000000000.5::numeric), (1000000000000.5::numeric),
-            (1000000000000000.5::numeric)) x(size);
-        size        | pg_size_pretty | pg_size_pretty 
---------------------+----------------+----------------
-                 10 | 10 bytes       | -10 bytes
-               1000 | 1000 bytes     | -1000 bytes
-            1000000 | 977 kB         | -977 kB
-         1000000000 | 954 MB         | -954 MB
-      1000000000000 | 931 GB         | -931 GB
-   1000000000000000 | 909 TB         | -909 TB
-               10.5 | 10.5 bytes     | -10.5 bytes
-             1000.5 | 1000.5 bytes   | -1000.5 bytes
-          1000000.5 | 977 kB         | -977 kB
-       1000000000.5 | 954 MB         | -954 MB
-    1000000000000.5 | 931 GB         | -931 GB
- 1000000000000000.5 | 909 TB         | -909 TB
-(12 rows)
+            (1000000000000000000::numeric),
+            (1000000000000000000000::numeric),
+            (1000000000000000000000000::numeric),
+            (1000000000000000000000000000::numeric),
+            (1000000000000000000000000000000::numeric),
+            (10.5::numeric),
+            (1000.5::numeric),
+            (1000000.5::numeric),
+            (1000000000.5::numeric),
+            (1000000000000.5::numeric),
+            (1000000000000000.5::numeric),
+            (1000000000000000000.5::numeric),
+            (1000000000000000000000.5::numeric),
+            (1000000000000000000000000.5::numeric),
+            (1000000000000000000000000000.5::numeric),
+            (1000000000000000000000000000000.5::numeric)
+            ) x(size);
+               size                | pg_size_pretty  |  pg_size_pretty  
+-----------------------------------+-----------------+------------------
+                                10 | 10 bytes        | -10 bytes
+                              1000 | 1000 bytes      | -1000 bytes
+                           1000000 | 977 kB          | -977 kB
+                        1000000000 | 954 MB          | -954 MB
+                     1000000000000 | 931 GB          | -931 GB
+                  1000000000000000 | 909 TB          | -909 TB
+               1000000000000000000 | 888 PB          | -888 PB
+            1000000000000000000000 | 867 EB          | -867 EB
+         1000000000000000000000000 | 867362 EB       | -867362 EB
+      1000000000000000000000000000 | 867361738 EB    | -867361738 EB
+   1000000000000000000000000000000 | 867361737988 EB | -867361737988 EB
+                              10.5 | 10.5 bytes      | -10.5 bytes
+                            1000.5 | 1000.5 bytes    | -1000.5 bytes
+                         1000000.5 | 977 kB          | -977 kB
+                      1000000000.5 | 954 MB          | -954 MB
+                   1000000000000.5 | 931 GB          | -931 GB
+                1000000000000000.5 | 909 TB          | -909 TB
+             1000000000000000000.5 | 888 PB          | -888 PB
+          1000000000000000000000.5 | 867 EB          | -867 EB
+       1000000000000000000000000.5 | 867362 EB       | -867362 EB
+    1000000000000000000000000000.5 | 867361738 EB    | -867361738 EB
+ 1000000000000000000000000000000.5 | 867361737988 EB | -867361737988 EB
+(22 rows)
 
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
-            ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
-   size   |  pg_size_bytes   
-----------+------------------
- 1        |                1
- 123bytes |              123
- 1kB      |             1024
- 1MB      |          1048576
-  1 GB    |       1073741824
- 1.5 GB   |       1610612736
- 1TB      |    1099511627776
- 3000 TB  | 3298534883328000
- 1e6 MB   |    1048576000000
-(9 rows)
+            ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB')) x(size);
+   size   |    pg_size_bytes     
+----------+----------------------
+ 1        |                    1
+ 123bytes |                  123
+ 1kB      |                 1024
+ 1MB      |              1048576
+  1 GB    |           1073741824
+ 1.5 GB   |           1610612736
+ 1TB      |        1099511627776
+ 3000 TB  |     3298534883328000
+ 1e6 MB   |        1048576000000
+ 99 PB    |   111464090777419776
+ 45 EB    | 51881467707308113920
+(11 rows)
 
 -- case-insensitive units are supported
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
-            ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
-   size   |  pg_size_bytes   
-----------+------------------
- 1        |                1
- 123bYteS |              123
- 1kb      |             1024
- 1mb      |          1048576
-  1 Gb    |       1073741824
- 1.5 gB   |       1610612736
- 1tb      |    1099511627776
- 3000 tb  | 3298534883328000
- 1e6 mb   |    1048576000000
-(9 rows)
+            ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB')) x(size);
+   size   |    pg_size_bytes     
+----------+----------------------
+ 1        |                    1
+ 123bYteS |                  123
+ 1kb      |                 1024
+ 1mb      |              1048576
+  1 Gb    |           1073741824
+ 1.5 gB   |           1610612736
+ 1tb      |        1099511627776
+ 3000 tb  |     3298534883328000
+ 1e6 mb   |        1048576000000
+ 99 pb    |   111464090777419776
+ 45 eB    | 51881467707308113920
+(11 rows)
 
 -- negative numbers are supported
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
-            ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
+            ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb')) x(size);
    size    |   pg_size_bytes   
 -----------+-------------------
  -1        |                -1
@@ -83,7 +114,8 @@ SELECT size, pg_size_bytes(size) FROM
  -1tb      |    -1099511627776
  -3000 TB  | -3298534883328000
  -10e-1 MB |          -1048576
-(9 rows)
+ -19e-4eb  | -2190550858753009
+(10 rows)
 
 -- different cases with allowed points
 SELECT size, pg_size_bytes(size) FROM
@@ -118,21 +150,21 @@ SELECT pg_size_bytes('1e100');
 SELECT pg_size_bytes('1 AB');
 ERROR:  invalid size: "1 AB"
 DETAIL:  Invalid size unit: "AB".
-HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT:  Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB".
 SELECT pg_size_bytes('1 AB A');
 ERROR:  invalid size: "1 AB A"
 DETAIL:  Invalid size unit: "AB A".
-HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT:  Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB".
 SELECT pg_size_bytes('1 AB A    ');
 ERROR:  invalid size: "1 AB A    "
 DETAIL:  Invalid size unit: "AB A".
-HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT:  Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB".
 SELECT pg_size_bytes('1e1000000000000000000');
 ERROR:  value overflows numeric format
 SELECT pg_size_bytes('1 byte');  -- the singular "byte" is not supported
 ERROR:  invalid size: "1 byte"
 DETAIL:  Invalid size unit: "byte".
-HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT:  Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB".
 SELECT pg_size_bytes('');
 ERROR:  invalid size: ""
 SELECT pg_size_bytes('kb');
@@ -150,6 +182,6 @@ ERROR:  invalid size: ".+912"
 SELECT pg_size_bytes('+912+ kB');
 ERROR:  invalid size: "+912+ kB"
 DETAIL:  Invalid size unit: "+ kB".
-HINT:  Valid units are "bytes", "kB", "MB", "GB", and "TB".
+HINT:  Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB".
 SELECT pg_size_bytes('++123 kB');
 ERROR:  invalid size: "++123 kB"
diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql
index e88184e9c9..cfa993f63f 100644
--- a/src/test/regress/sql/dbsize.sql
+++ b/src/test/regress/sql/dbsize.sql
@@ -4,26 +4,43 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
             (1000000000000000::bigint)) x(size);
 
 SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
-    (VALUES (10::numeric), (1000::numeric), (1000000::numeric),
-            (1000000000::numeric), (1000000000000::numeric),
+    (VALUES (10::numeric),
+            (1000::numeric),
+            (1000000::numeric),
+            (1000000000::numeric),
+            (1000000000000::numeric),
             (1000000000000000::numeric),
-            (10.5::numeric), (1000.5::numeric), (1000000.5::numeric),
-            (1000000000.5::numeric), (1000000000000.5::numeric),
-            (1000000000000000.5::numeric)) x(size);
+            (1000000000000000000::numeric),
+            (1000000000000000000000::numeric),
+            (1000000000000000000000000::numeric),
+            (1000000000000000000000000000::numeric),
+            (1000000000000000000000000000000::numeric),
+            (10.5::numeric),
+            (1000.5::numeric),
+            (1000000.5::numeric),
+            (1000000000.5::numeric),
+            (1000000000000.5::numeric),
+            (1000000000000000.5::numeric),
+            (1000000000000000000.5::numeric),
+            (1000000000000000000000.5::numeric),
+            (1000000000000000000000000.5::numeric),
+            (1000000000000000000000000000.5::numeric),
+            (1000000000000000000000000000000.5::numeric)
+            ) x(size);
 
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '),
-            ('1TB'), ('3000 TB'), ('1e6 MB')) x(size);
+            ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB')) x(size);
 
 -- case-insensitive units are supported
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '),
-            ('1tb'), ('3000 tb'), ('1e6 mb')) x(size);
+            ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB')) x(size);
 
 -- negative numbers are supported
 SELECT size, pg_size_bytes(size) FROM
     (VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '),
-            ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size);
+            ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb')) x(size);
 
 -- different cases with allowed points
 SELECT size, pg_size_bytes(size) FROM
-- 
2.30.1 (Apple Git-130)


Re: [PATCH] expand the units that pg_size_pretty supports on output

From
Tom Lane
Date:
David Christensen <david.christensen@crunchydata.com> writes:
> Enclosed is the patch to change the return type to numeric, as well as one for expanding units to
> add PB and EB.

Can we really get away with changing the return type?  That would
by no stretch of the imagination be free; one could expect breakage
of a few user views, for example.

Independently of that, I'm pretty much -1 on going further than PB.
Even if the additional abbreviations you mention are actually recognized
standards, I think not that many people are familiar with them, and such
input is way more likely to be a typo than intended data.

            regards, tom lane



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Christensen
Date:
Tom Lane writes:

> David Christensen <david.christensen@crunchydata.com> writes:
>> Enclosed is the patch to change the return type to numeric, as well as one for expanding units to
>> add PB and EB.
>
> Can we really get away with changing the return type?  That would
> by no stretch of the imagination be free; one could expect breakage
> of a few user views, for example.

Hmm, that's a good point, and we can't really make the return type polymorphic (being as there isn't
a source type of the given return value).

> Independently of that, I'm pretty much -1 on going further than PB.
> Even if the additional abbreviations you mention are actually recognized
> standards, I think not that many people are familiar with them, and such
> input is way more likely to be a typo than intended data.

If we do go ahead and restrict the expansion to just PB, the return value of pg_size_bytes()  would
still support up to 8192 PB before running into range limitations.  I assume it's not worth creating
a pg_size_bytes_numeric() with the full range of supported units, but that is presumably an option
as well.

Best,

David



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Thu, 8 Jul 2021 at 01:32, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Hmm, this looked easy, but...
>
> It occurred to me that there ought to be regression tests for the edge
> cases where it steps from one unit to the next. So, in the style of
> the existing regression tests, I tried the following:
>
> SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM
>     (VALUES (10239::bigint), (10240::bigint),
>             (10485247::bigint), (10485248::bigint),
>             (10736893951::bigint), (10736893952::bigint),
>             (10994579406847::bigint), (10994579406848::bigint),
>             (11258449312612351::bigint), (11258449312612352::bigint)) x(size);


>  11258449312612352 | 10240 TB       | -10239 TB

Hmm, yeah, I noticed that pg_size_pretty(bigint) vs
pg_size_pretty(numeric) didn't always match when I was testing this
patch, but I was more focused on having my results matching the
unpatched version than I was with making sure bigint and numeric
matched.

I imagine this must date back to 8a1fab36ab.  Do you feel like it's
something this patch should fix?  I was mostly hoping to keep this
patch about rewriting the code to both make it easier to add new units
and also to make it easier to keep all 3 functions in sync.

It feels like if we're going to fix this negative rounding thing then
we should maybe do it and backpatch a fix then rebase this work on top
of that.

What are your thoughts?

David



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Thu, 8 Jul 2021 at 13:31, David Rowley <dgrowleyml@gmail.com> wrote:
> It feels like if we're going to fix this negative rounding thing then
> we should maybe do it and backpatch a fix then rebase this work on top
> of that.

Here's a patch which I believe makes pg_size_pretty() and
pg_size_pretty_numeric() match in regards to negative values.

Maybe this plus your regression test would be ok to back-patch?

David

Attachment

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
Dean Rasheed
Date:
On Thu, 8 Jul 2021 at 05:30, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 8 Jul 2021 at 13:31, David Rowley <dgrowleyml@gmail.com> wrote:
> > It feels like if we're going to fix this negative rounding thing then
> > we should maybe do it and backpatch a fix then rebase this work on top
> > of that.

Yes, that was my thinking too.

> Here's a patch which I believe makes pg_size_pretty() and
> pg_size_pretty_numeric() match in regards to negative values.

LGTM, except I think it's worth also making the numeric code not refer
to bit shifting either.

> Maybe this plus your regression test would be ok to back-patch?

+1

Here's an update with matching updates to the numeric code, plus the
regression tests.

Regards,
Dean

Attachment

Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Thu, 8 Jul 2021 at 07:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Christensen <david.christensen@crunchydata.com> writes:
> > Enclosed is the patch to change the return type to numeric, as well as one for expanding units to
> > add PB and EB.
>
> Can we really get away with changing the return type?  That would
> by no stretch of the imagination be free; one could expect breakage
> of a few user views, for example.

That's a good point. We should probably leave it alone then.  I had
had it in mind that it might be ok since we did this for extract() in
14. At least we have date_part() as a backup there. I'm fine to leave
the return value of pg_size_bytes as-is.

> Independently of that, I'm pretty much -1 on going further than PB.
> Even if the additional abbreviations you mention are actually recognized
> standards, I think not that many people are familiar with them, and such
> input is way more likely to be a typo than intended data.

I'm fine with that too.  In [1] I mentioned my concerns with adding
all the defined units up to Yottabyte. David reduced that down to just
exabytes, but I think if we're keeping pg_size_bytes returning bigint
then drawing the line at PB seems ok to me. Anything more than
pg_size_bytes('8 EB') would overflow.

David

[1] https://www.postgresql.org/message-id/CAApHDvp9ym+RSQNGoSRPjH+j6TJ1tFBhfT+JoLFf_RbZq1EszQ@mail.gmail.com



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
 tOn Thu, 8 Jul 2021 at 20:23, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> > On Thu, 8 Jul 2021 at 13:31, David Rowley <dgrowleyml@gmail.com> wrote:
> > Here's a patch which I believe makes pg_size_pretty() and
> > pg_size_pretty_numeric() match in regards to negative values.
>
> LGTM, except I think it's worth also making the numeric code not refer
> to bit shifting either.
>
> > Maybe this plus your regression test would be ok to back-patch?
>
> +1
>
> Here's an update with matching updates to the numeric code, plus the
> regression tests.

Looks good.

I gave it a bit of exercise by running pgbench and calling this procedure:

CREATE OR REPLACE PROCEDURE public.test_size_pretty2()
 LANGUAGE plpgsql
AS $procedure$
declare b bigint;
begin
        FOR i IN 1..1000 LOOP
                b := 0 - (random() * 9223372036854775807)::bigint;
                if pg_size_pretty(b) <> pg_size_pretty(b::numeric) then
                        raise notice '%. % != %', b,
pg_size_pretty(b), pg_size_pretty(b::numeric);
                end if;
        END LOOP;
END;
$procedure$

It ran 8526956 times, so with the loop that's 8.5 billion random
numbers. No variations between the two functions.  I got the same
after removing the 0 - to test positive numbers.

If you like, I can push this in my morning, or if you'd rather do it
yourself, please go ahead.

David



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
Dean Rasheed
Date:
On Thu, 8 Jul 2021 at 14:38, David Rowley <dgrowleyml@gmail.com> wrote:
>
> I gave it a bit of exercise by running pgbench and calling this procedure:
>
> It ran 8526956 times, so with the loop that's 8.5 billion random
> numbers. No variations between the two functions.  I got the same
> after removing the 0 - to test positive numbers.

Wow, that's a lot of testing! I just tried a few hand-picked edge cases.

> If you like, I can push this in my morning, or if you'd rather do it
> yourself, please go ahead.

No, I didn't get as much time as I thought I would today, so please go ahead.

Regards,
Dean



Re: [PATCH] expand the units that pg_size_pretty supports on output

From
David Rowley
Date:
On Thu, 8 Jul 2021 at 05:44, David Christensen
<david.christensen@crunchydata.com> wrote:
> Enclosed is the patch to change the return type to numeric, as well as one for expanding units to
> add PB and EB.

I ended up not changing the return type of pg_size_bytes().

> I figured that PB and EB are probably good enough additions at this point, so we can debate whether
> to add the others.

Per Tom's concern both with changing the return type of
pg_size_bytes() and his and my concern about going too far adding more
units, I've adjusted your patch to only add petabytes and pushed it.
The maximum range of BIGINT is only 8 exabytes, so the BIGINT version
would never show in exabytes anyway. It would still be measuring in
petabytes at the 64-bit range limit.

After a bit of searching, I found reports that the estimated entire
stored digital data on Earth as of 2020 to be 59 zettabytes, or about
0.06 yottabytes.  I feel like we've gone far enough by adding
petabytes today. Maybe that's worth revisiting in a couple of storage
generations. After we're done there, we can start working on the LSN
wraparound code.

David