Thread: [PATCH] expand the units that pg_size_pretty supports on output
Hi folks,
Enclosed is a patch that expands the unit output for pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing numeric output code to account for the larger number of units we're using rather than just adding nesting levels.
There are also a few other places that could benefit from expanded units, but this is a standalone starting point.
Best,
David
Attachment
On Wed, Apr 14, 2021 at 11:13:47AM -0500, David Christensen wrote: > Enclosed is a patch that expands the unit output for > pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing > numeric output code to account for the larger number of units we're using > rather than just adding nesting levels. > > There are also a few other places that could benefit from expanded units, > but this is a standalone starting point. Please don't forget to add this patch to the next commit fest of July if you want to get some reviews: https://commitfest.postgresql.org/33/ Note that the development of Postgres 14 is over, and that there was a feature freeze last week, but this can be considered for 15. -- Michael
Attachment
A second patch to teach the same units to pg_size_bytes().
Best,
David
On Wed, Apr 14, 2021 at 11:13 AM David Christensen <david.christensen@crunchydata.com> wrote:
Hi folks,Enclosed is a patch that expands the unit output for pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing numeric output code to account for the larger number of units we're using rather than just adding nesting levels.There are also a few other places that could benefit from expanded units, but this is a standalone starting point.Best,David
Attachment
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation: not tested Hi David, I was reviewing this patch and the compilation failed with following error on CentOS 7. dbsize.c: In function ‘pg_size_bytes’: dbsize.c:808:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] const int unit_count = 9; /* sizeof units table */ ^ dbsize.c:809:3: error: variable length array ‘units’ is used [-Werror=vla] const char *units[unit_count] = { ^ I believe "unit_count" ought to be a #define here. Regards, Asif The new status of this patch is: Waiting on Author
New versions attached to address the initial CF feedback and rebase on HEAD as of now.
0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch
- expands the units that pg_size_pretty() can handle up to YB.
0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch
- expands the units that pg_size_bytes() can handle, up to YB.
Attachment
>From: David Christensen <david.christensen@crunchydata.com> >Sent: Friday, June 4, 2021 4:18 AM >To: PostgreSQL-development <pgsql-hackers@postgresql.org> >Subject: Re: [PATCH] expand the units that pg_size_pretty supports on output > >New versions attached to address the initial CF feedback and rebase on HEAD as of now. > >0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch > >- expands the units that pg_size_pretty() can handle up to YB. > >0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch > >- expands the units that pg_size_bytes() can handle, up to YB. > I don't see the need to extend the unit to YB. What use case do you have in mind? Regards, Shinya Kato
> I don't see the need to extend the unit to YB. > What use case do you have in mind? Practical or no, I saw no reason not to support all defined units. I assume we’ll get to a need sooner or later. :) David
>> I don't see the need to extend the unit to YB. >> What use case do you have in mind? > >Practical or no, I saw no reason not to support all defined units. I assume we’ll >get to a need sooner or later. :) Thank you for your reply! Hmmm, I didn't think YB was necessary, but what do others think? Best regards, Shinya Kato
On Tue, 15 Jun 2021 at 21:24, <Shinya11.Kato@nttdata.com> wrote: > Hmmm, I didn't think YB was necessary, but what do others think? For me personally, without consulting Wikipedia, I know that Petabyte comes after Terabyte and then I'm pretty sure it's Exabyte. After that, I'd need to check. Assuming I'm not the only person who can't tell exactly how many bytes are in a Yottabyte, would it actually be a readability improvement if we started showing these units to people? I'd say there might be some argument to implement as far as PB one day, maybe not that far out into the future, especially if we got something like built-in clustering. But I just don't think there's any need to go all out and take it all the way to YB. There's an above zero chance we'll break something of someones by doing this, so I think any changes here should be driven off an actual requirement. I really think this change is more likely to upset someone than please someone. Just my thoughts. David
On Tue, 15 Jun 2021 at 05:24, <Shinya11.Kato@nttdata.com> wrote:
>> I don't see the need to extend the unit to YB.
>> What use case do you have in mind?
>
>Practical or no, I saw no reason not to support all defined units. I assume we’ll
>get to a need sooner or later. :)
Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?
If I’m reading the code correctly, the difference between supporting YB and not supporting it is whether there is a line for it in the list of prefixes and their multiples. As such, I don’t see why we’re even discussing whether or not to include all the standard prefixes. A YB is still an absurd amount of storage, but that’s not the point; just put all the standard prefixes and be done with it. If actual code changes were required in the new code as they are in the old it might be worth discussing.
One question: why is there no “k” in the list of prefixes?
Also: why not have only the prefixes in the array, and use a single fixed output format "%s %sB" all the time?
It feels like it should be possible to calculate the appropriate idx to use (while adjusting the number to print as is done now) and then just have one psprintf call for all cases.
A more significant question is YB vs. YiB. I know there is a long tradition within computer-related fields of saying that k = 1024, M = 1024^2, etc., but we’re not special enough to override the more general principles of SI (Système International) which provide that k = 1000, M = 1000^2 and so on universally and provide the alternate prefixes ki, Mi, etc. which use 1024 as the multiple.
So I would suggest either display 2000000 as 2MB or as 1.907MiB.
On Tue, Jun 15, 2021 at 8:31 AM Isaac Morland <isaac.morland@gmail.com> wrote:
On Tue, 15 Jun 2021 at 05:24, <Shinya11.Kato@nttdata.com> wrote:>> I don't see the need to extend the unit to YB.
>> What use case do you have in mind?
>
>Practical or no, I saw no reason not to support all defined units. I assume we’ll
>get to a need sooner or later. :)
Thank you for your reply!
Hmmm, I didn't think YB was necessary, but what do others think?If I’m reading the code correctly, the difference between supporting YB and not supporting it is whether there is a line for it in the list of prefixes and their multiples. As such, I don’t see why we’re even discussing whether or not to include all the standard prefixes. A YB is still an absurd amount of storage, but that’s not the point; just put all the standard prefixes and be done with it. If actual code changes were required in the new code as they are in the old it might be worth discussing.
Agreed, this is why I went this way. One and done.
One question: why is there no “k” in the list of prefixes?
kB has a special-case code block before you get to this point. I didn't look into the reasons, but assume there are some.
Also: why not have only the prefixes in the array, and use a single fixed output format "%s %sB" all the time?It feels like it should be possible to calculate the appropriate idx to use (while adjusting the number to print as is done now) and then just have one psprintf call for all cases.
Sure, if that seems more readable/understandable.
A more significant question is YB vs. YiB. I know there is a long tradition within computer-related fields of saying that k = 1024, M = 1024^2, etc., but we’re not special enough to override the more general principles of SI (Système International) which provide that k = 1000, M = 1000^2 and so on universally and provide the alternate prefixes ki, Mi, etc. which use 1024 as the multiple.So I would suggest either display 2000000 as 2MB or as 1.907MiB.
Heh, I was just expanding the existing logic; if others want to have this particular battle go ahead and I'll adjust the code/prefixes, but obviously the logic will need to change if we want to support true MB instead of MiB as MB.
Also, this will presumably be a breaking change for anyone using the existing units MB == 1024 * 1024, as we've had for something like 20 years. Changing these units to the *iB will be trivial with this patch, but not looking forward to garnering the consensus to change this part.
David
On Tue, Jun 15, 2021 at 8:26 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 15 Jun 2021 at 21:24, <Shinya11.Kato@nttdata.com> wrote:
> Hmmm, I didn't think YB was necessary, but what do others think?
For me personally, without consulting Wikipedia, I know that Petabyte
comes after Terabyte and then I'm pretty sure it's Exabyte. After
that, I'd need to check.
Assuming I'm not the only person who can't tell exactly how many bytes
are in a Yottabyte, would it actually be a readability improvement if
we started showing these units to people?
I hadn't really thought about that TBH; to me it seemed like an improvement, but I do see that others might not, and adding confusion is definitely not helpful. That said, it seems like having the code structured in a way that we can expand via adding an element to a table instead of the existing way it's written with nested if blocks is still a useful refactor, whatever we decide the cutoff units should be.
I'd say there might be some argument to implement as far as PB one
day, maybe not that far out into the future, especially if we got
something like built-in clustering. But I just don't think there's any
need to go all out and take it all the way to YB. There's an above
zero chance we'll break something of someones by doing this, so I
think any changes here should be driven off an actual requirement.
I got motivated to do this due to some (granted synthetic) work/workloads, where I was seeing 6+digit TB numbers and thought it was ugly. Looked at the code and thought the refactor was the way to go, and just stuck all of the known units in.
I really think this change is more likely to upset someone than please someone.
I'd be interested to see reactions from people; to me, it seems a +1, but seems like -1, 0, +1 all valid opinions here; I'd expect more 0's and +1s, but I'm probably biased since I wrote this. :-)
On Wed, 16 Jun 2021 at 02:58, David Christensen <david.christensen@crunchydata.com> wrote: > That said, it seems like having the code structured in a way that we can expand via adding an element to a table insteadof the existing way it's written with nested if blocks is still a useful refactor, whatever we decide the cutoff unitsshould be. I had not really looked at the patch, but if there's a cleanup portion to the same patch as you're adding the YB too, then maybe it's worth separating those out into another patch so that the two can be considered independently. David
>I had not really looked at the patch, but if there's a cleanup portion to the same >patch as you're adding the YB too, then maybe it's worth separating those out >into another patch so that the two can be considered independently. I agree with this opinion. It seems to me that we should think about units and refactoring separately. Sorry for the confusion. Best regards, Shinya Kato
>> I had not really looked at the patch, but if there's a cleanup portion to the same >> patch as you're adding the YB too, then maybe it's worth separating those out >> into another patch so that the two can be considered independently. > > I agree with this opinion. It seems to me that we should think about units and refactoring separately. > Sorry for the confusion. Sure thing, I think that makes sense. Refactor with existing units and debate the number of additions units to include. Ido think Petabytes and Exabytes are at least within the realm of ones we should include; less tied to ZB and YB; just includedfor completeness.
Shinya11.Kato@nttdata.com writes: >>I had not really looked at the patch, but if there's a cleanup portion to the same >>patch as you're adding the YB too, then maybe it's worth separating those out >>into another patch so that the two can be considered independently. > > I agree with this opinion. It seems to me that we should think about units and refactoring separately. > Sorry for the confusion. > > Best regards, > Shinya Kato Hi folks, Had some time to rework this patch from the two that had previously been here into two separate parts: 1) A basic refactor of the existing code to easily handle expanding the units we use into a table-based format. This also includes changing the return value of `pg_size_bytes()` from an int64 into a numeric, and minor test adjustments to reflect this. 2) Expanding the units that both pg_size_bytes() and pg_size_pretty() recognize up through Yottabytes. This includes documentation and test updates to reflect the changes made here. How many additional units we add here is up for discussion (inevitably), but my opinion remains that there is no harm in supporting all units available. Best, David From ac30b06e3ddcb57eebb380560c2f4a47430dfd74 Mon Sep 17 00:00:00 2001 From: David Christensen <david.christensen@crunchydata.com> Date: Tue, 29 Jun 2021 10:20:05 -0500 Subject: [PATCH 1/2] Refactor pg_size_pretty and pg_size_bytes to allow for supported unit expansion --- src/backend/utils/adt/dbsize.c | 90 ++++++++++++++++------------ src/include/catalog/pg_proc.dat | 2 +- src/test/regress/expected/dbsize.out | 4 -- src/test/regress/sql/dbsize.sql | 2 - 4 files changed, 53 insertions(+), 45 deletions(-) diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index 9c39e7d3b3..df08845932 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -638,7 +638,7 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS) Numeric size = PG_GETARG_NUMERIC(0); Numeric limit, limit2; - char *result; + char *result = NULL; limit = int64_to_numeric(10 * 1024); limit2 = int64_to_numeric(10 * 1024 * 2 - 1); @@ -660,31 +660,32 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS) } else { - /* size >>= 10 */ - size = numeric_shift_right(size, 10); - if (numeric_is_less(numeric_absolute(size), limit2)) - { - size = numeric_half_rounded(size); - result = psprintf("%s MB", numeric_to_cstring(size)); - } - else - { + int idx, max_iter = 2; /* highest index of table_below */ + char *output_formats[] = { + "%s MB", + "%s GB", + "%s TB" + }; + + for (idx = 0; idx < max_iter; idx++) { /* size >>= 10 */ size = numeric_shift_right(size, 10); - if (numeric_is_less(numeric_absolute(size), limit2)) { size = numeric_half_rounded(size); - result = psprintf("%s GB", numeric_to_cstring(size)); - } - else - { - /* size >>= 10 */ - size = numeric_shift_right(size, 10); - size = numeric_half_rounded(size); - result = psprintf("%s TB", numeric_to_cstring(size)); + result = psprintf(output_formats[idx], numeric_to_cstring(size)); + break; } } + + if (!result) { + /* this uses the last format in the table above for anything else */ + + /* size >>= 10 */ + size = numeric_shift_right(size, 10); + size = numeric_half_rounded(size); + result = psprintf(output_formats[max_iter], numeric_to_cstring(size)); + } } } @@ -703,7 +704,6 @@ pg_size_bytes(PG_FUNCTION_ARGS) *endptr; char saved_char; Numeric num; - int64 result; bool have_digits = false; str = text_to_cstring(arg); @@ -786,7 +786,16 @@ pg_size_bytes(PG_FUNCTION_ARGS) /* Handle possible unit */ if (*strptr != '\0') { - int64 multiplier = 0; + int64 multiplier = 1; + int i; + int unit_count = 5; /* sizeof units table */ + char *units[] = { + "bytes", + "kb", + "mb", + "gb", + "tb", + }; /* Trim any trailing whitespace */ endptr = str + VARSIZE_ANY_EXHDR(arg) - 1; @@ -797,21 +806,20 @@ pg_size_bytes(PG_FUNCTION_ARGS) endptr++; *endptr = '\0'; - /* Parse the unit case-insensitively */ - if (pg_strcasecmp(strptr, "bytes") == 0) - multiplier = (int64) 1; - else if (pg_strcasecmp(strptr, "kb") == 0) - multiplier = (int64) 1024; - else if (pg_strcasecmp(strptr, "mb") == 0) - multiplier = ((int64) 1024) * 1024; - - else if (pg_strcasecmp(strptr, "gb") == 0) - multiplier = ((int64) 1024) * 1024 * 1024; - - else if (pg_strcasecmp(strptr, "tb") == 0) - multiplier = ((int64) 1024) * 1024 * 1024 * 1024; + for (i = 0; i < unit_count; i++) { + printf("strptr: %s units: %s", strptr, units[i]); + if (pg_strcasecmp(strptr, units[i]) == 0) + break; + /* + * Note: int64 isn't large enough to store the full multiplier + * going past ~ 9EB, but since this is a fixed value, we can apply + * it twice, thus storing use 2 ** 5 = 32 here, but 2 ** 10 = 1024 + * on actual conversion to numeric. + */ + multiplier *= 32; + } - else + if (i == unit_count) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid size: \"%s\"", text_to_cstring(arg)), @@ -827,13 +835,19 @@ pg_size_bytes(PG_FUNCTION_ARGS) num = DatumGetNumeric(DirectFunctionCall2(numeric_mul, NumericGetDatum(mul_num), NumericGetDatum(num))); + + /* second application to get around int64 limitations in unit multipliers */ + num = DatumGetNumeric(DirectFunctionCall2(numeric_mul, + NumericGetDatum(mul_num), + NumericGetDatum(num))); } } - result = DatumGetInt64(DirectFunctionCall1(numeric_int8, - NumericGetDatum(num))); + /* now finally truncate, since this is always in integer-like units */ + num = DatumGetNumeric(DirectFunctionCall1(numeric_ceil, + NumericGetDatum(num))); - PG_RETURN_INT64(result); + PG_RETURN_NUMERIC(num); } /* diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index fde251fa4f..73326e3618 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7187,7 +7187,7 @@ prosrc => 'pg_size_pretty_numeric' }, { oid => '3334', descr => 'convert a size in human-readable format with size units into bytes', - proname => 'pg_size_bytes', prorettype => 'int8', proargtypes => 'text', + proname => 'pg_size_bytes', prorettype => 'numeric', proargtypes => 'text', prosrc => 'pg_size_bytes' }, { oid => '2997', descr => 'disk space usage for the specified table, including TOAST, free space and visibility map', diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out index e901a2c92a..624948ccd2 100644 --- a/src/test/regress/expected/dbsize.out +++ b/src/test/regress/expected/dbsize.out @@ -114,10 +114,6 @@ SELECT pg_size_bytes('1 AB A '); ERROR: invalid size: "1 AB A " DETAIL: Invalid size unit: "AB A". HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". -SELECT pg_size_bytes('9223372036854775807.9'); -ERROR: bigint out of range -SELECT pg_size_bytes('1e100'); -ERROR: bigint out of range SELECT pg_size_bytes('1e1000000000000000000'); ERROR: value overflows numeric format SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql index d10a4d7f68..37023a01c3 100644 --- a/src/test/regress/sql/dbsize.sql +++ b/src/test/regress/sql/dbsize.sql @@ -34,8 +34,6 @@ SELECT size, pg_size_bytes(size) FROM SELECT pg_size_bytes('1 AB'); SELECT pg_size_bytes('1 AB A'); SELECT pg_size_bytes('1 AB A '); -SELECT pg_size_bytes('9223372036854775807.9'); -SELECT pg_size_bytes('1e100'); SELECT pg_size_bytes('1e1000000000000000000'); SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported SELECT pg_size_bytes(''); -- 2.30.1 (Apple Git-130) From e3fc63d9be6af5725a0ed24e16513cb711631f4a Mon Sep 17 00:00:00 2001 From: David Christensen <david.christensen@crunchydata.com> Date: Tue, 29 Jun 2021 10:02:50 -0500 Subject: [PATCH 2/2] Full expansion of all units --- doc/src/sgml/func.sgml | 2 +- src/backend/utils/adt/dbsize.c | 22 +++- src/test/regress/expected/dbsize.out | 180 +++++++++++++++++---------- src/test/regress/sql/dbsize.sql | 39 ++++-- 4 files changed, 164 insertions(+), 79 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 6388385edc..687d2a7ac8 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -26354,7 +26354,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); </para> <para> Converts a size in bytes into a more easily human-readable format with - size units (bytes, kB, MB, GB or TB as appropriate). Note that the + size units (bytes, kB, MB, GB, TB, PB, EB, ZB, or YB as appropriate). Note that the units are powers of 2 rather than powers of 10, so 1kB is 1024 bytes, 1MB is 1024<superscript>2</superscript> = 1048576 bytes, and so on. </para></entry> diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index df08845932..5650bb9f2b 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -660,11 +660,15 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS) } else { - int idx, max_iter = 2; /* highest index of table_below */ + int idx, max_iter = 6; /* highest index of table_below */ char *output_formats[] = { "%s MB", "%s GB", - "%s TB" + "%s TB", + "%s PB", + "%s EB", + "%s ZB", + "%s YB" }; for (idx = 0; idx < max_iter; idx++) { @@ -755,8 +759,9 @@ pg_size_bytes(PG_FUNCTION_ARGS) char *cp; /* - * Note we might one day support EB units, so if what follows 'E' - * isn't a number, just treat it all as a unit to be parsed. + * If what follows 'e' isn't a number, we just treat it all as a unit + * to be parsed; this allows us to support both exponential notation + * and EB units. */ exponent = strtol(endptr + 1, &cp, 10); (void) exponent; /* Silence -Wunused-result warnings */ @@ -788,13 +793,17 @@ pg_size_bytes(PG_FUNCTION_ARGS) { int64 multiplier = 1; int i; - int unit_count = 5; /* sizeof units table */ + int unit_count = 9; /* sizeof units table */ char *units[] = { "bytes", "kb", "mb", "gb", "tb", + "pb", + "eb", + "zb", + "yb", }; /* Trim any trailing whitespace */ @@ -824,7 +833,8 @@ pg_size_bytes(PG_FUNCTION_ARGS) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid size: \"%s\"", text_to_cstring(arg)), errdetail("Invalid size unit: \"%s\".", strptr), - errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", and \"TB\"."))); + errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", \"TB\", " + "\"PB\", \"EB\", \"ZB\", and \"YB\"."))); if (multiplier > 1) { diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out index 624948ccd2..8993522904 100644 --- a/src/test/regress/expected/dbsize.out +++ b/src/test/regress/expected/dbsize.out @@ -13,77 +13,116 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM (6 rows) SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM - (VALUES (10::numeric), (1000::numeric), (1000000::numeric), - (1000000000::numeric), (1000000000000::numeric), + (VALUES (10::numeric), + (1000::numeric), + (1000000::numeric), + (1000000000::numeric), + (1000000000000::numeric), (1000000000000000::numeric), - (10.5::numeric), (1000.5::numeric), (1000000.5::numeric), - (1000000000.5::numeric), (1000000000000.5::numeric), - (1000000000000000.5::numeric)) x(size); - size | pg_size_pretty | pg_size_pretty ---------------------+----------------+---------------- - 10 | 10 bytes | -10 bytes - 1000 | 1000 bytes | -1000 bytes - 1000000 | 977 kB | -977 kB - 1000000000 | 954 MB | -954 MB - 1000000000000 | 931 GB | -931 GB - 1000000000000000 | 909 TB | -909 TB - 10.5 | 10.5 bytes | -10.5 bytes - 1000.5 | 1000.5 bytes | -1000.5 bytes - 1000000.5 | 977 kB | -977 kB - 1000000000.5 | 954 MB | -954 MB - 1000000000000.5 | 931 GB | -931 GB - 1000000000000000.5 | 909 TB | -909 TB -(12 rows) + (1000000000000000000::numeric), + (1000000000000000000000::numeric), + (1000000000000000000000000::numeric), + (1000000000000000000000000000::numeric), + (1000000000000000000000000000000::numeric), + (10.5::numeric), + (1000.5::numeric), + (1000000.5::numeric), + (1000000000.5::numeric), + (1000000000000.5::numeric), + (1000000000000000.5::numeric), + (1000000000000000000.5::numeric), + (1000000000000000000000.5::numeric), + (1000000000000000000000000.5::numeric), + (1000000000000000000000000000.5::numeric), + (1000000000000000000000000000000.5::numeric) + ) x(size); + size | pg_size_pretty | pg_size_pretty +-----------------------------------+----------------+---------------- + 10 | 10 bytes | -10 bytes + 1000 | 1000 bytes | -1000 bytes + 1000000 | 977 kB | -977 kB + 1000000000 | 954 MB | -954 MB + 1000000000000 | 931 GB | -931 GB + 1000000000000000 | 909 TB | -909 TB + 1000000000000000000 | 888 PB | -888 PB + 1000000000000000000000 | 867 EB | -867 EB + 1000000000000000000000000 | 847 ZB | -847 ZB + 1000000000000000000000000000 | 827 YB | -827 YB + 1000000000000000000000000000000 | 827181 YB | -827181 YB + 10.5 | 10.5 bytes | -10.5 bytes + 1000.5 | 1000.5 bytes | -1000.5 bytes + 1000000.5 | 977 kB | -977 kB + 1000000000.5 | 954 MB | -954 MB + 1000000000000.5 | 931 GB | -931 GB + 1000000000000000.5 | 909 TB | -909 TB + 1000000000000000000.5 | 888 PB | -888 PB + 1000000000000000000000.5 | 867 EB | -867 EB + 1000000000000000000000000.5 | 847 ZB | -847 ZB + 1000000000000000000000000000.5 | 827 YB | -827 YB + 1000000000000000000000000000000.5 | 827181 YB | -827181 YB +(22 rows) SELECT size, pg_size_bytes(size) FROM (VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '), - ('1TB'), ('3000 TB'), ('1e6 MB')) x(size); - size | pg_size_bytes -----------+------------------ - 1 | 1 - 123bytes | 123 - 1kB | 1024 - 1MB | 1048576 - 1 GB | 1073741824 - 1.5 GB | 1610612736 - 1TB | 1099511627776 - 3000 TB | 3298534883328000 - 1e6 MB | 1048576000000 -(9 rows) + ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB'), ('5.1ZB'), + ('1.17 YB')) x(size); + size | pg_size_bytes +----------+--------------------------- + 1 | 1 + 123bytes | 123 + 1kB | 1024 + 1MB | 1048576 + 1 GB | 1073741824 + 1.5 GB | 1610612736 + 1TB | 1099511627776 + 3000 TB | 3298534883328000 + 1e6 MB | 1048576000000 + 99 PB | 111464090777419776 + 45 EB | 51881467707308113920 + 5.1ZB | 6021017265658797647463 + 1.17 YB | 1414443208949116134406226 +(13 rows) -- case-insensitive units are supported SELECT size, pg_size_bytes(size) FROM (VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '), - ('1tb'), ('3000 tb'), ('1e6 mb')) x(size); - size | pg_size_bytes -----------+------------------ - 1 | 1 - 123bYteS | 123 - 1kb | 1024 - 1mb | 1048576 - 1 Gb | 1073741824 - 1.5 gB | 1610612736 - 1tb | 1099511627776 - 3000 tb | 3298534883328000 - 1e6 mb | 1048576000000 -(9 rows) + ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB'), ('5.1Zb'), + ('1.17 yb')) x(size); + size | pg_size_bytes +----------+--------------------------- + 1 | 1 + 123bYteS | 123 + 1kb | 1024 + 1mb | 1048576 + 1 Gb | 1073741824 + 1.5 gB | 1610612736 + 1tb | 1099511627776 + 3000 tb | 3298534883328000 + 1e6 mb | 1048576000000 + 99 pb | 111464090777419776 + 45 eB | 51881467707308113920 + 5.1Zb | 6021017265658797647463 + 1.17 yb | 1414443208949116134406226 +(13 rows) -- negative numbers are supported SELECT size, pg_size_bytes(size) FROM (VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '), - ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size); - size | pg_size_bytes ------------+------------------- - -1 | -1 - -123bytes | -123 - -1kb | -1024 - -1mb | -1048576 - -1 Gb | -1073741824 - -1.5 gB | -1610612736 - -1tb | -1099511627776 - -3000 TB | -3298534883328000 - -10e-1 MB | -1048576 -(9 rows) + ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb'), ('-18YB')) x(size); + size | pg_size_bytes +-----------+----------------------------- + -1 | -1 + -123bytes | -123 + -1kb | -1024 + -1mb | -1048576 + -1 Gb | -1073741824 + -1.5 gB | -1610612736 + -1tb | -1099511627776 + -3000 TB | -3298534883328000 + -10e-1 MB | -1048576 + -19e-4eb | -2190550858753009 + -18YB | -21760664753063325144711168 +(11 rows) -- different cases with allowed points SELECT size, pg_size_bytes(size) FROM @@ -101,25 +140,38 @@ SELECT size, pg_size_bytes(size) FROM -.0 gb | 0 (8 rows) +-- valid inputs outside bigint range (previous errors) +SELECT pg_size_bytes('9223372036854775807.9'); + pg_size_bytes +--------------------- + 9223372036854775808 +(1 row) + +SELECT pg_size_bytes('1e100'); + pg_size_bytes +------------------------------------------------------------------------------------------------------- + 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 +(1 row) + -- invalid inputs SELECT pg_size_bytes('1 AB'); ERROR: invalid size: "1 AB" DETAIL: Invalid size unit: "AB". -HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". +HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB". SELECT pg_size_bytes('1 AB A'); ERROR: invalid size: "1 AB A" DETAIL: Invalid size unit: "AB A". -HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". +HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB". SELECT pg_size_bytes('1 AB A '); ERROR: invalid size: "1 AB A " DETAIL: Invalid size unit: "AB A". -HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". +HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB". SELECT pg_size_bytes('1e1000000000000000000'); ERROR: value overflows numeric format SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported ERROR: invalid size: "1 byte" DETAIL: Invalid size unit: "byte". -HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". +HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB". SELECT pg_size_bytes(''); ERROR: invalid size: "" SELECT pg_size_bytes('kb'); @@ -137,6 +189,6 @@ ERROR: invalid size: ".+912" SELECT pg_size_bytes('+912+ kB'); ERROR: invalid size: "+912+ kB" DETAIL: Invalid size unit: "+ kB". -HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". +HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", and "YB". SELECT pg_size_bytes('++123 kB'); ERROR: invalid size: "++123 kB" diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql index 37023a01c3..2a4abcfc65 100644 --- a/src/test/regress/sql/dbsize.sql +++ b/src/test/regress/sql/dbsize.sql @@ -4,32 +4,55 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM (1000000000000000::bigint)) x(size); SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM - (VALUES (10::numeric), (1000::numeric), (1000000::numeric), - (1000000000::numeric), (1000000000000::numeric), + (VALUES (10::numeric), + (1000::numeric), + (1000000::numeric), + (1000000000::numeric), + (1000000000000::numeric), (1000000000000000::numeric), - (10.5::numeric), (1000.5::numeric), (1000000.5::numeric), - (1000000000.5::numeric), (1000000000000.5::numeric), - (1000000000000000.5::numeric)) x(size); + (1000000000000000000::numeric), + (1000000000000000000000::numeric), + (1000000000000000000000000::numeric), + (1000000000000000000000000000::numeric), + (1000000000000000000000000000000::numeric), + (10.5::numeric), + (1000.5::numeric), + (1000000.5::numeric), + (1000000000.5::numeric), + (1000000000000.5::numeric), + (1000000000000000.5::numeric), + (1000000000000000000.5::numeric), + (1000000000000000000000.5::numeric), + (1000000000000000000000000.5::numeric), + (1000000000000000000000000000.5::numeric), + (1000000000000000000000000000000.5::numeric) + ) x(size); SELECT size, pg_size_bytes(size) FROM (VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '), - ('1TB'), ('3000 TB'), ('1e6 MB')) x(size); + ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB'), ('5.1ZB'), + ('1.17 YB')) x(size); -- case-insensitive units are supported SELECT size, pg_size_bytes(size) FROM (VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '), - ('1tb'), ('3000 tb'), ('1e6 mb')) x(size); + ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB'), ('5.1Zb'), + ('1.17 yb')) x(size); -- negative numbers are supported SELECT size, pg_size_bytes(size) FROM (VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '), - ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size); + ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb'), ('-18YB')) x(size); -- different cases with allowed points SELECT size, pg_size_bytes(size) FROM (VALUES ('-1.'), ('-1.kb'), ('-1. kb'), ('-0. gb'), ('-.1'), ('-.1kb'), ('-.1 kb'), ('-.0 gb')) x(size); +-- valid inputs outside bigint range (previous errors) +SELECT pg_size_bytes('9223372036854775807.9'); +SELECT pg_size_bytes('1e100'); + -- invalid inputs SELECT pg_size_bytes('1 AB'); SELECT pg_size_bytes('1 AB A'); -- 2.30.1 (Apple Git-130)
On Wed, 30 Jun 2021 at 05:11, David Christensen <david.christensen@crunchydata.com> wrote: > 1) A basic refactor of the existing code to easily handle expanding the > units we use into a table-based format. This also includes changing the > return value of `pg_size_bytes()` from an int64 into a numeric, and > minor test adjustments to reflect this. This is not quite what I had imagined when you said about adding a table to make it easier to add new units in the future. I expected a single table that handles all units, not just the ones above kB and not one for each function. There are actually two pg_size_pretty functions, one for BIGINT and one for NUMERIC. I see you only changed the NUMERIC version. I'd expect them both to have the same treatment and use the same table so there's consistency between the two functions. The attached is more like what I had in mind. There's a very small net reduction in lines of code with this and it also helps keep pg_size_pretty() and pg_size_pretty_numeric() in sync. I don't really like the fact that I had to add the doHalfRound field to get the same rounding behaviour as the original functions. I'm wondering if it would just be too clever just to track how many bits we've shifted right by in pg_size_pretty* and compare that to the value of multishift for the current unit and do appropriate rounding to get us to the value of multishift. In theory, we could just keep calling the half_rounded macro until we make it to the multishift value. My current thoughts are that it's unlikely that anyone would twiddle with the size_pretty_units array in such a way for the extra code to be worth it. Maybe someone else feels differently. David
Attachment
On Mon, 5 Jul 2021 at 20:00, David Rowley <dgrowleyml@gmail.com> wrote: > I don't really like the fact that I had to add the doHalfRound field > to get the same rounding behaviour as the original functions. I'm > wondering if it would just be too clever just to track how many bits > we've shifted right by in pg_size_pretty* and compare that to the > value of multishift for the current unit and do appropriate rounding > to get us to the value of multishift. In theory, we could just keep > calling the half_rounded macro until we make it to the multishift > value. My current thoughts are that it's unlikely that anyone would > twiddle with the size_pretty_units array in such a way for the extra > code to be worth it. Maybe someone else feels differently. I made another pass over this and ended up removing the doHalfRound field in favour of just doing rounding based on the previous bitshifts. I did a few other tidy ups and I think it's a useful patch as it reduces the amount of code a bit and makes it dead simple to add new units in the future. Most importantly it'll help keep pg_size_pretty, pg_size_pretty_numeric and pg_size_bytes all in sync in regards to what units they support. Does anyone want to have a look over this? If not, I plan to push it in the next day or so. (I'm not sure why pgindent removed the space between != and NULL, but it did, so I left it.) David
Attachment
On Tue, 6 Jul 2021 at 10:20, David Rowley <dgrowleyml@gmail.com> wrote: > > I made another pass over this and ended up removing the doHalfRound > field in favour of just doing rounding based on the previous > bitshifts. > When I first read this: + /* half-round until we get down to unitBits */ + while (rightshifts++ < unit->unitBits) + size = half_rounded(size); it looked to me like it would be invoking half_rounded() multiple times, which raised alarm bells because that would risk rounding the wrong way. Eventually I realised that by the time it reaches that, rightshifts will always equal unit->unitBits or unit->unitBits - 1, so it'll never do more than one half-round, which is important. So perhaps using doHalfRound would be clearer, but it could just be a local variable tracking whether or not it's the first time through the loop. Regards, Dean
On Tue, 6 Jul 2021 at 23:39, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > When I first read this: > > + /* half-round until we get down to unitBits */ > + while (rightshifts++ < unit->unitBits) > + size = half_rounded(size); > > it looked to me like it would be invoking half_rounded() multiple > times, which raised alarm bells because that would risk rounding the > wrong way. Eventually I realised that by the time it reaches that, > rightshifts will always equal unit->unitBits or unit->unitBits - 1, so > it'll never do more than one half-round, which is important. It's true that based on how the units table is set up now, it'll only ever do it once for all but the first loop. I wrote the attached .c file just to try to see if it ever goes wrong and I didn't manage to find any inputs where it did. I always seem to get the half rounded value either the same as the shifted value or 1 higher towards positive infinity $ ./half_rounded -102 3 1. half_round(-102) == -51 :: -102 >> 1 = -51 2. half_round(-51) == -25 :: -51 >> 1 = -26 3. half_round(-25) == -12 :: -26 >> 1 = -13 $ ./half_rounded 6432 3 1. half_round(6432) == 3216 :: 6432 >> 1 = 3216 2. half_round(3216) == 1608 :: 3216 >> 1 = 1608 3. half_round(1608) == 804 :: 1608 >> 1 = 804 Can you give an example where calling half_rounded too many times will give the wrong value? Keeping in mind we call half_rounded the number of times that the passed in value would need to be left-shifted by to get the equivalent truncated value. David
Attachment
On Tue, 6 Jul 2021 at 13:15, David Rowley <dgrowleyml@gmail.com> wrote: > > Can you give an example where calling half_rounded too many times will > give the wrong value? Keeping in mind we call half_rounded the number > of times that the passed in value would need to be left-shifted by to > get the equivalent truncated value. > ./half_rounded 10241 10 1. half_round(10241) == 5121 :: 10241 >> 1 = 5120 2. half_round(5121) == 2561 :: 5120 >> 1 = 2560 3. half_round(2561) == 1281 :: 2560 >> 1 = 1280 4. half_round(1281) == 641 :: 1280 >> 1 = 640 5. half_round(641) == 321 :: 640 >> 1 = 320 6. half_round(321) == 161 :: 320 >> 1 = 160 7. half_round(161) == 81 :: 160 >> 1 = 80 8. half_round(81) == 41 :: 80 >> 1 = 40 9. half_round(41) == 21 :: 40 >> 1 = 20 10. half_round(21) == 11 :: 20 >> 1 = 10 The correct result should be 10 (it would be very odd to claim that 10241 bytes should be displayed as 11kb), but the half-rounding keeps rounding up at each stage. That's a general property of rounding -- you need to be very careful when rounding more than once, since otherwise errors will propagate. C.f. 4083f445c0, which removed a double-round in numeric sqrt(). To be clear, I'm not saying that the current code half-rounds more than once, just that it reads as if it does. Regards, Dean
David Rowley writes: > On Mon, 5 Jul 2021 at 20:00, David Rowley <dgrowleyml@gmail.com> wrote: >> I don't really like the fact that I had to add the doHalfRound field >> to get the same rounding behaviour as the original functions. I'm >> wondering if it would just be too clever just to track how many bits >> we've shifted right by in pg_size_pretty* and compare that to the >> value of multishift for the current unit and do appropriate rounding >> to get us to the value of multishift. In theory, we could just keep >> calling the half_rounded macro until we make it to the multishift >> value. My current thoughts are that it's unlikely that anyone would >> twiddle with the size_pretty_units array in such a way for the extra >> code to be worth it. Maybe someone else feels differently. > > I made another pass over this and ended up removing the doHalfRound > field in favour of just doing rounding based on the previous > bitshifts. > > I did a few other tidy ups and I think it's a useful patch as it > reduces the amount of code a bit and makes it dead simple to add new > units in the future. Most importantly it'll help keep pg_size_pretty, > pg_size_pretty_numeric and pg_size_bytes all in sync in regards to > what units they support. > > Does anyone want to have a look over this? If not, I plan to push it > in the next day or so. > > (I'm not sure why pgindent removed the space between != and NULL, but > it did, so I left it.) > > David I like the approach you took here; much cleaner to have one table for all of the individual codepaths. Testing worked as expected; if we do decide to expand the units table there will be a few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch to `numeric`). Thanks, David
David Rowley <dgrowleyml@gmail.com> writes: > Does anyone want to have a look over this? If not, I plan to push it > in the next day or so. Minor nit: use "const char *text" in the struct declaration, so that all of the static data can be placed in fixed storage. > (I'm not sure why pgindent removed the space between != and NULL, but > it did, so I left it.) It did that because "text" is a typedef name, so it's a bit confused about whether the statement is really a declaration. Personally I'd have used "name" or something like that for that field, anyway. regards, tom lane
On Wed, 7 Jul 2021 at 00:51, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > 10. half_round(21) == 11 :: 20 >> 1 = 10 > > The correct result should be 10 (it would be very odd to claim that > 10241 bytes should be displayed as 11kb), but the half-rounding keeps > rounding up at each stage. > > That's a general property of rounding -- you need to be very careful > when rounding more than once, since otherwise errors will propagate. > C.f. 4083f445c0, which removed a double-round in numeric sqrt(). Thanks. I've adjusted the patch to re-add the round bool flag and get rid of the rightShift field. I'm now calculating how many bits to shift right by based on the difference between the unitbits of the current and next unit then taking 1 bit less if the next unit does half rounding and the current one does not, or adding an extra bit on in the opposite case. I'll post another patch shortly. David
On Wed, 7 Jul 2021 at 02:54, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Minor nit: use "const char *text" in the struct declaration, so > that all of the static data can be placed in fixed storage. Thanks for pointing that out. > David Rowley <dgrowleyml@gmail.com> writes: > > (I'm not sure why pgindent removed the space between != and NULL, but > > it did, so I left it.) > > It did that because "text" is a typedef name, so it's a bit confused > about whether the statement is really a declaration. Personally I'd > have used "name" or something like that for that field, anyway. I should have thought of that. Thanks for highlighting it. I've renamed the field. Updated patch attached. David
Attachment
On Wed, 7 Jul 2021 at 02:46, David Christensen <david.christensen@crunchydata.com> wrote: > if we do decide to expand the units table there will be a > few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch > to `numeric`). I wonder if it's worth changing pg_size_bytes() to return NUMERIC regardless of if we add any additional units or not. Would you like to create 2 patches, one to change the return type and another to add the new units, both based on top of the v2 patch I sent earlier? David
On Wed, 7 Jul 2021 at 03:47, David Rowley <dgrowleyml@gmail.com> wrote: > > Updated patch attached. > Hmm, this looked easy, but... It occurred to me that there ought to be regression tests for the edge cases where it steps from one unit to the next. So, in the style of the existing regression tests, I tried the following: SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM (VALUES (10239::bigint), (10240::bigint), (10485247::bigint), (10485248::bigint), (10736893951::bigint), (10736893952::bigint), (10994579406847::bigint), (10994579406848::bigint), (11258449312612351::bigint), (11258449312612352::bigint)) x(size); size | pg_size_pretty | pg_size_pretty -------------------+----------------+---------------- 10239 | 10239 bytes | -10239 bytes 10240 | 10 kB | -10 kB 10485247 | 10239 kB | -10 MB 10485248 | 10 MB | -10 MB 10736893951 | 10239 MB | -10 GB 10736893952 | 10 GB | -10 GB 10994579406847 | 10239 GB | -10 TB 10994579406848 | 10 TB | -10 TB 11258449312612351 | 10239 TB | -10239 TB 11258449312612352 | 10240 TB | -10239 TB (10 rows) SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM (VALUES (10239::numeric), (10240::numeric), (10485247::numeric), (10485248::numeric), (10736893951::numeric), (10736893952::numeric), (10994579406847::numeric), (10994579406848::numeric), (11258449312612351::numeric), (11258449312612352::numeric)) x(size); size | pg_size_pretty | pg_size_pretty -------------------+----------------+---------------- 10239 | 10239 bytes | -10239 bytes 10240 | 10 kB | -10 kB 10485247 | 10239 kB | -10239 kB 10485248 | 10 MB | -10 MB 10736893951 | 10239 MB | -10239 MB 10736893952 | 10 GB | -10 GB 10994579406847 | 10239 GB | -10239 GB 10994579406848 | 10 TB | -10 TB 11258449312612351 | 10239 TB | -10239 TB 11258449312612352 | 10240 TB | -10240 TB (10 rows) Under the assumption that what we're trying to achieve here is schoolbook rounding (ties away from zero), the numeric results are correct and the bigint results are wrong. The reason is that bit shifting isn't the same as division for negative numbers, since bit shifting rounds towards negative infinity whereas division rounds towards zero (truncates), which is what I think we really need. Regards, Dean
David Rowley writes: > On Wed, 7 Jul 2021 at 02:46, David Christensen > <david.christensen@crunchydata.com> wrote: >> if we do decide to expand the units table there will be a >> few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch >> to `numeric`). > > I wonder if it's worth changing pg_size_bytes() to return NUMERIC > regardless of if we add any additional units or not. > > Would you like to create 2 patches, one to change the return type and > another to add the new units, both based on top of the v2 patch I sent > earlier? > > David Enclosed is the patch to change the return type to numeric, as well as one for expanding units to add PB and EB. If we decide to expand further, the current implementation will need to change, as ZB and YB have 70 and 80 bits needing to be shifted accordingly, so int64 isn't enough to hold it. (I fixed this particular issue in the original version of this patch, so there is at least a blueprint of how to fix.) I figured that PB and EB are probably good enough additions at this point, so we can debate whether to add the others. Best, David From 57bd2eafff5404313426a10f63b0b7098314998a Mon Sep 17 00:00:00 2001 From: David Christensen <david.christensen@crunchydata.com> Date: Wed, 7 Jul 2021 11:46:09 -0500 Subject: [PATCH] Make pg_size_bytes() return numeric instead of bigint --- src/backend/utils/adt/dbsize.c | 7 ++++--- src/include/catalog/pg_proc.dat | 2 +- src/test/regress/expected/dbsize.out | 17 +++++++++++++---- src/test/regress/sql/dbsize.sql | 6 ++++-- 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index 4b7331d85c..a5a3ebe34e 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -813,10 +813,11 @@ pg_size_bytes(PG_FUNCTION_ARGS) } } - result = DatumGetInt64(DirectFunctionCall1(numeric_int8, - NumericGetDatum(num))); + /* now finally truncate, since this is always in integer-like units */ + num = DatumGetNumeric(DirectFunctionCall1(numeric_ceil, + NumericGetDatum(num))); - PG_RETURN_INT64(result); + PG_RETURN_NUMERIC(num); } /* diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index fde251fa4f..73326e3618 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7187,7 +7187,7 @@ prosrc => 'pg_size_pretty_numeric' }, { oid => '3334', descr => 'convert a size in human-readable format with size units into bytes', - proname => 'pg_size_bytes', prorettype => 'int8', proargtypes => 'text', + proname => 'pg_size_bytes', prorettype => 'numeric', proargtypes => 'text', prosrc => 'pg_size_bytes' }, { oid => '2997', descr => 'disk space usage for the specified table, including TOAST, free space and visibility map', diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out index e901a2c92a..2950e017d0 100644 --- a/src/test/regress/expected/dbsize.out +++ b/src/test/regress/expected/dbsize.out @@ -101,6 +101,19 @@ SELECT size, pg_size_bytes(size) FROM -.0 gb | 0 (8 rows) +-- valid inputs outside bigint range (previous errors) +SELECT pg_size_bytes('9223372036854775807.9'); + pg_size_bytes +--------------------- + 9223372036854775808 +(1 row) + +SELECT pg_size_bytes('1e100'); + pg_size_bytes +------------------------------------------------------------------------------------------------------- + 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 +(1 row) + -- invalid inputs SELECT pg_size_bytes('1 AB'); ERROR: invalid size: "1 AB" @@ -114,10 +127,6 @@ SELECT pg_size_bytes('1 AB A '); ERROR: invalid size: "1 AB A " DETAIL: Invalid size unit: "AB A". HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". -SELECT pg_size_bytes('9223372036854775807.9'); -ERROR: bigint out of range -SELECT pg_size_bytes('1e100'); -ERROR: bigint out of range SELECT pg_size_bytes('1e1000000000000000000'); ERROR: value overflows numeric format SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql index d10a4d7f68..e88184e9c9 100644 --- a/src/test/regress/sql/dbsize.sql +++ b/src/test/regress/sql/dbsize.sql @@ -30,12 +30,14 @@ SELECT size, pg_size_bytes(size) FROM (VALUES ('-1.'), ('-1.kb'), ('-1. kb'), ('-0. gb'), ('-.1'), ('-.1kb'), ('-.1 kb'), ('-.0 gb')) x(size); +-- valid inputs outside bigint range (previous errors) +SELECT pg_size_bytes('9223372036854775807.9'); +SELECT pg_size_bytes('1e100'); + -- invalid inputs SELECT pg_size_bytes('1 AB'); SELECT pg_size_bytes('1 AB A'); SELECT pg_size_bytes('1 AB A '); -SELECT pg_size_bytes('9223372036854775807.9'); -SELECT pg_size_bytes('1e100'); SELECT pg_size_bytes('1e1000000000000000000'); SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported SELECT pg_size_bytes(''); -- 2.30.1 (Apple Git-130) From 8d6972abf01c01d2c8713d94bbb46ae44682195e Mon Sep 17 00:00:00 2001 From: David Christensen <david.christensen@crunchydata.com> Date: Wed, 7 Jul 2021 12:37:41 -0500 Subject: [PATCH] Expand units for pg_size_*() up through EB --- src/backend/utils/adt/dbsize.c | 5 +- src/test/regress/expected/dbsize.out | 138 +++++++++++++++++---------- src/test/regress/sql/dbsize.sql | 33 +++++-- 3 files changed, 114 insertions(+), 62 deletions(-) diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index a5a3ebe34e..cd68ff10c8 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -54,6 +54,8 @@ static const struct size_pretty_unit size_pretty_units[] = { {"MB", 20 * 1024 - 1, 10, 20}, {"GB", 20 * 1024 - 1, 10, 30}, {"TB", 20 * 1024 - 1, 10, 40}, + {"PB", 20 * 1024 - 1, 10, 50}, + {"EB", 20 * 1024 - 1, 10, 60}, {NULL, 0, 0, 0} }; @@ -799,7 +801,8 @@ pg_size_bytes(PG_FUNCTION_ARGS) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid size: \"%s\"", text_to_cstring(arg)), errdetail("Invalid size unit: \"%s\".", strptr), - errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", and \"TB\"."))); + errhint("Valid units are \"bytes\", \"kB\", \"MB\", \"GB\", \"TB\", " + "\"PB\", and \"EB\"."))); if (multiplier > 1) { diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out index 2950e017d0..cb260f69a3 100644 --- a/src/test/regress/expected/dbsize.out +++ b/src/test/regress/expected/dbsize.out @@ -13,65 +13,96 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM (6 rows) SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM - (VALUES (10::numeric), (1000::numeric), (1000000::numeric), - (1000000000::numeric), (1000000000000::numeric), + (VALUES (10::numeric), + (1000::numeric), + (1000000::numeric), + (1000000000::numeric), + (1000000000000::numeric), (1000000000000000::numeric), - (10.5::numeric), (1000.5::numeric), (1000000.5::numeric), - (1000000000.5::numeric), (1000000000000.5::numeric), - (1000000000000000.5::numeric)) x(size); - size | pg_size_pretty | pg_size_pretty ---------------------+----------------+---------------- - 10 | 10 bytes | -10 bytes - 1000 | 1000 bytes | -1000 bytes - 1000000 | 977 kB | -977 kB - 1000000000 | 954 MB | -954 MB - 1000000000000 | 931 GB | -931 GB - 1000000000000000 | 909 TB | -909 TB - 10.5 | 10.5 bytes | -10.5 bytes - 1000.5 | 1000.5 bytes | -1000.5 bytes - 1000000.5 | 977 kB | -977 kB - 1000000000.5 | 954 MB | -954 MB - 1000000000000.5 | 931 GB | -931 GB - 1000000000000000.5 | 909 TB | -909 TB -(12 rows) + (1000000000000000000::numeric), + (1000000000000000000000::numeric), + (1000000000000000000000000::numeric), + (1000000000000000000000000000::numeric), + (1000000000000000000000000000000::numeric), + (10.5::numeric), + (1000.5::numeric), + (1000000.5::numeric), + (1000000000.5::numeric), + (1000000000000.5::numeric), + (1000000000000000.5::numeric), + (1000000000000000000.5::numeric), + (1000000000000000000000.5::numeric), + (1000000000000000000000000.5::numeric), + (1000000000000000000000000000.5::numeric), + (1000000000000000000000000000000.5::numeric) + ) x(size); + size | pg_size_pretty | pg_size_pretty +-----------------------------------+-----------------+------------------ + 10 | 10 bytes | -10 bytes + 1000 | 1000 bytes | -1000 bytes + 1000000 | 977 kB | -977 kB + 1000000000 | 954 MB | -954 MB + 1000000000000 | 931 GB | -931 GB + 1000000000000000 | 909 TB | -909 TB + 1000000000000000000 | 888 PB | -888 PB + 1000000000000000000000 | 867 EB | -867 EB + 1000000000000000000000000 | 867362 EB | -867362 EB + 1000000000000000000000000000 | 867361738 EB | -867361738 EB + 1000000000000000000000000000000 | 867361737988 EB | -867361737988 EB + 10.5 | 10.5 bytes | -10.5 bytes + 1000.5 | 1000.5 bytes | -1000.5 bytes + 1000000.5 | 977 kB | -977 kB + 1000000000.5 | 954 MB | -954 MB + 1000000000000.5 | 931 GB | -931 GB + 1000000000000000.5 | 909 TB | -909 TB + 1000000000000000000.5 | 888 PB | -888 PB + 1000000000000000000000.5 | 867 EB | -867 EB + 1000000000000000000000000.5 | 867362 EB | -867362 EB + 1000000000000000000000000000.5 | 867361738 EB | -867361738 EB + 1000000000000000000000000000000.5 | 867361737988 EB | -867361737988 EB +(22 rows) SELECT size, pg_size_bytes(size) FROM (VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '), - ('1TB'), ('3000 TB'), ('1e6 MB')) x(size); - size | pg_size_bytes -----------+------------------ - 1 | 1 - 123bytes | 123 - 1kB | 1024 - 1MB | 1048576 - 1 GB | 1073741824 - 1.5 GB | 1610612736 - 1TB | 1099511627776 - 3000 TB | 3298534883328000 - 1e6 MB | 1048576000000 -(9 rows) + ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB')) x(size); + size | pg_size_bytes +----------+---------------------- + 1 | 1 + 123bytes | 123 + 1kB | 1024 + 1MB | 1048576 + 1 GB | 1073741824 + 1.5 GB | 1610612736 + 1TB | 1099511627776 + 3000 TB | 3298534883328000 + 1e6 MB | 1048576000000 + 99 PB | 111464090777419776 + 45 EB | 51881467707308113920 +(11 rows) -- case-insensitive units are supported SELECT size, pg_size_bytes(size) FROM (VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '), - ('1tb'), ('3000 tb'), ('1e6 mb')) x(size); - size | pg_size_bytes -----------+------------------ - 1 | 1 - 123bYteS | 123 - 1kb | 1024 - 1mb | 1048576 - 1 Gb | 1073741824 - 1.5 gB | 1610612736 - 1tb | 1099511627776 - 3000 tb | 3298534883328000 - 1e6 mb | 1048576000000 -(9 rows) + ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB')) x(size); + size | pg_size_bytes +----------+---------------------- + 1 | 1 + 123bYteS | 123 + 1kb | 1024 + 1mb | 1048576 + 1 Gb | 1073741824 + 1.5 gB | 1610612736 + 1tb | 1099511627776 + 3000 tb | 3298534883328000 + 1e6 mb | 1048576000000 + 99 pb | 111464090777419776 + 45 eB | 51881467707308113920 +(11 rows) -- negative numbers are supported SELECT size, pg_size_bytes(size) FROM (VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '), - ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size); + ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb')) x(size); size | pg_size_bytes -----------+------------------- -1 | -1 @@ -83,7 +114,8 @@ SELECT size, pg_size_bytes(size) FROM -1tb | -1099511627776 -3000 TB | -3298534883328000 -10e-1 MB | -1048576 -(9 rows) + -19e-4eb | -2190550858753009 +(10 rows) -- different cases with allowed points SELECT size, pg_size_bytes(size) FROM @@ -118,21 +150,21 @@ SELECT pg_size_bytes('1e100'); SELECT pg_size_bytes('1 AB'); ERROR: invalid size: "1 AB" DETAIL: Invalid size unit: "AB". -HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". +HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB". SELECT pg_size_bytes('1 AB A'); ERROR: invalid size: "1 AB A" DETAIL: Invalid size unit: "AB A". -HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". +HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB". SELECT pg_size_bytes('1 AB A '); ERROR: invalid size: "1 AB A " DETAIL: Invalid size unit: "AB A". -HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". +HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB". SELECT pg_size_bytes('1e1000000000000000000'); ERROR: value overflows numeric format SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported ERROR: invalid size: "1 byte" DETAIL: Invalid size unit: "byte". -HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". +HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB". SELECT pg_size_bytes(''); ERROR: invalid size: "" SELECT pg_size_bytes('kb'); @@ -150,6 +182,6 @@ ERROR: invalid size: ".+912" SELECT pg_size_bytes('+912+ kB'); ERROR: invalid size: "+912+ kB" DETAIL: Invalid size unit: "+ kB". -HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". +HINT: Valid units are "bytes", "kB", "MB", "GB", "TB", "PB", and "EB". SELECT pg_size_bytes('++123 kB'); ERROR: invalid size: "++123 kB" diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql index e88184e9c9..cfa993f63f 100644 --- a/src/test/regress/sql/dbsize.sql +++ b/src/test/regress/sql/dbsize.sql @@ -4,26 +4,43 @@ SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM (1000000000000000::bigint)) x(size); SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM - (VALUES (10::numeric), (1000::numeric), (1000000::numeric), - (1000000000::numeric), (1000000000000::numeric), + (VALUES (10::numeric), + (1000::numeric), + (1000000::numeric), + (1000000000::numeric), + (1000000000000::numeric), (1000000000000000::numeric), - (10.5::numeric), (1000.5::numeric), (1000000.5::numeric), - (1000000000.5::numeric), (1000000000000.5::numeric), - (1000000000000000.5::numeric)) x(size); + (1000000000000000000::numeric), + (1000000000000000000000::numeric), + (1000000000000000000000000::numeric), + (1000000000000000000000000000::numeric), + (1000000000000000000000000000000::numeric), + (10.5::numeric), + (1000.5::numeric), + (1000000.5::numeric), + (1000000000.5::numeric), + (1000000000000.5::numeric), + (1000000000000000.5::numeric), + (1000000000000000000.5::numeric), + (1000000000000000000000.5::numeric), + (1000000000000000000000000.5::numeric), + (1000000000000000000000000000.5::numeric), + (1000000000000000000000000000000.5::numeric) + ) x(size); SELECT size, pg_size_bytes(size) FROM (VALUES ('1'), ('123bytes'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '), - ('1TB'), ('3000 TB'), ('1e6 MB')) x(size); + ('1TB'), ('3000 TB'), ('1e6 MB'), ('99 PB'), ('45 EB')) x(size); -- case-insensitive units are supported SELECT size, pg_size_bytes(size) FROM (VALUES ('1'), ('123bYteS'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '), - ('1tb'), ('3000 tb'), ('1e6 mb')) x(size); + ('1tb'), ('3000 tb'), ('1e6 mb'), ('99 pb'), ('45 eB')) x(size); -- negative numbers are supported SELECT size, pg_size_bytes(size) FROM (VALUES ('-1'), ('-123bytes'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '), - ('-1tb'), ('-3000 TB'), ('-10e-1 MB')) x(size); + ('-1tb'), ('-3000 TB'), ('-10e-1 MB'), ('-19e-4eb')) x(size); -- different cases with allowed points SELECT size, pg_size_bytes(size) FROM -- 2.30.1 (Apple Git-130)
David Christensen <david.christensen@crunchydata.com> writes: > Enclosed is the patch to change the return type to numeric, as well as one for expanding units to > add PB and EB. Can we really get away with changing the return type? That would by no stretch of the imagination be free; one could expect breakage of a few user views, for example. Independently of that, I'm pretty much -1 on going further than PB. Even if the additional abbreviations you mention are actually recognized standards, I think not that many people are familiar with them, and such input is way more likely to be a typo than intended data. regards, tom lane
Tom Lane writes: > David Christensen <david.christensen@crunchydata.com> writes: >> Enclosed is the patch to change the return type to numeric, as well as one for expanding units to >> add PB and EB. > > Can we really get away with changing the return type? That would > by no stretch of the imagination be free; one could expect breakage > of a few user views, for example. Hmm, that's a good point, and we can't really make the return type polymorphic (being as there isn't a source type of the given return value). > Independently of that, I'm pretty much -1 on going further than PB. > Even if the additional abbreviations you mention are actually recognized > standards, I think not that many people are familiar with them, and such > input is way more likely to be a typo than intended data. If we do go ahead and restrict the expansion to just PB, the return value of pg_size_bytes() would still support up to 8192 PB before running into range limitations. I assume it's not worth creating a pg_size_bytes_numeric() with the full range of supported units, but that is presumably an option as well. Best, David
On Thu, 8 Jul 2021 at 01:32, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > Hmm, this looked easy, but... > > It occurred to me that there ought to be regression tests for the edge > cases where it steps from one unit to the next. So, in the style of > the existing regression tests, I tried the following: > > SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM > (VALUES (10239::bigint), (10240::bigint), > (10485247::bigint), (10485248::bigint), > (10736893951::bigint), (10736893952::bigint), > (10994579406847::bigint), (10994579406848::bigint), > (11258449312612351::bigint), (11258449312612352::bigint)) x(size); > 11258449312612352 | 10240 TB | -10239 TB Hmm, yeah, I noticed that pg_size_pretty(bigint) vs pg_size_pretty(numeric) didn't always match when I was testing this patch, but I was more focused on having my results matching the unpatched version than I was with making sure bigint and numeric matched. I imagine this must date back to 8a1fab36ab. Do you feel like it's something this patch should fix? I was mostly hoping to keep this patch about rewriting the code to both make it easier to add new units and also to make it easier to keep all 3 functions in sync. It feels like if we're going to fix this negative rounding thing then we should maybe do it and backpatch a fix then rebase this work on top of that. What are your thoughts? David
On Thu, 8 Jul 2021 at 13:31, David Rowley <dgrowleyml@gmail.com> wrote: > It feels like if we're going to fix this negative rounding thing then > we should maybe do it and backpatch a fix then rebase this work on top > of that. Here's a patch which I believe makes pg_size_pretty() and pg_size_pretty_numeric() match in regards to negative values. Maybe this plus your regression test would be ok to back-patch? David
Attachment
On Thu, 8 Jul 2021 at 05:30, David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 8 Jul 2021 at 13:31, David Rowley <dgrowleyml@gmail.com> wrote: > > It feels like if we're going to fix this negative rounding thing then > > we should maybe do it and backpatch a fix then rebase this work on top > > of that. Yes, that was my thinking too. > Here's a patch which I believe makes pg_size_pretty() and > pg_size_pretty_numeric() match in regards to negative values. LGTM, except I think it's worth also making the numeric code not refer to bit shifting either. > Maybe this plus your regression test would be ok to back-patch? +1 Here's an update with matching updates to the numeric code, plus the regression tests. Regards, Dean
Attachment
On Thu, 8 Jul 2021 at 07:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Christensen <david.christensen@crunchydata.com> writes: > > Enclosed is the patch to change the return type to numeric, as well as one for expanding units to > > add PB and EB. > > Can we really get away with changing the return type? That would > by no stretch of the imagination be free; one could expect breakage > of a few user views, for example. That's a good point. We should probably leave it alone then. I had had it in mind that it might be ok since we did this for extract() in 14. At least we have date_part() as a backup there. I'm fine to leave the return value of pg_size_bytes as-is. > Independently of that, I'm pretty much -1 on going further than PB. > Even if the additional abbreviations you mention are actually recognized > standards, I think not that many people are familiar with them, and such > input is way more likely to be a typo than intended data. I'm fine with that too. In [1] I mentioned my concerns with adding all the defined units up to Yottabyte. David reduced that down to just exabytes, but I think if we're keeping pg_size_bytes returning bigint then drawing the line at PB seems ok to me. Anything more than pg_size_bytes('8 EB') would overflow. David [1] https://www.postgresql.org/message-id/CAApHDvp9ym+RSQNGoSRPjH+j6TJ1tFBhfT+JoLFf_RbZq1EszQ@mail.gmail.com
tOn Thu, 8 Jul 2021 at 20:23, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > > On Thu, 8 Jul 2021 at 13:31, David Rowley <dgrowleyml@gmail.com> wrote: > > Here's a patch which I believe makes pg_size_pretty() and > > pg_size_pretty_numeric() match in regards to negative values. > > LGTM, except I think it's worth also making the numeric code not refer > to bit shifting either. > > > Maybe this plus your regression test would be ok to back-patch? > > +1 > > Here's an update with matching updates to the numeric code, plus the > regression tests. Looks good. I gave it a bit of exercise by running pgbench and calling this procedure: CREATE OR REPLACE PROCEDURE public.test_size_pretty2() LANGUAGE plpgsql AS $procedure$ declare b bigint; begin FOR i IN 1..1000 LOOP b := 0 - (random() * 9223372036854775807)::bigint; if pg_size_pretty(b) <> pg_size_pretty(b::numeric) then raise notice '%. % != %', b, pg_size_pretty(b), pg_size_pretty(b::numeric); end if; END LOOP; END; $procedure$ It ran 8526956 times, so with the loop that's 8.5 billion random numbers. No variations between the two functions. I got the same after removing the 0 - to test positive numbers. If you like, I can push this in my morning, or if you'd rather do it yourself, please go ahead. David
On Thu, 8 Jul 2021 at 14:38, David Rowley <dgrowleyml@gmail.com> wrote: > > I gave it a bit of exercise by running pgbench and calling this procedure: > > It ran 8526956 times, so with the loop that's 8.5 billion random > numbers. No variations between the two functions. I got the same > after removing the 0 - to test positive numbers. Wow, that's a lot of testing! I just tried a few hand-picked edge cases. > If you like, I can push this in my morning, or if you'd rather do it > yourself, please go ahead. No, I didn't get as much time as I thought I would today, so please go ahead. Regards, Dean
On Thu, 8 Jul 2021 at 05:44, David Christensen <david.christensen@crunchydata.com> wrote: > Enclosed is the patch to change the return type to numeric, as well as one for expanding units to > add PB and EB. I ended up not changing the return type of pg_size_bytes(). > I figured that PB and EB are probably good enough additions at this point, so we can debate whether > to add the others. Per Tom's concern both with changing the return type of pg_size_bytes() and his and my concern about going too far adding more units, I've adjusted your patch to only add petabytes and pushed it. The maximum range of BIGINT is only 8 exabytes, so the BIGINT version would never show in exabytes anyway. It would still be measuring in petabytes at the 64-bit range limit. After a bit of searching, I found reports that the estimated entire stored digital data on Earth as of 2020 to be 59 zettabytes, or about 0.06 yottabytes. I feel like we've gone far enough by adding petabytes today. Maybe that's worth revisiting in a couple of storage generations. After we're done there, we can start working on the LSN wraparound code. David