Thread: Add numeric_trim(numeric)

Add numeric_trim(numeric)

From
Marko Tiikkaja
Date:
Hi,

Here's a patch for the second function suggested in
5643125E.1030605@joh.to.  This is my first patch trying to do anything
with numerics, so please be gentle.  I'm sure it's full of stupid.

January's commit fest, feedback welcome, yada yada..


.m

Attachment

Re: Add numeric_trim(numeric)

From
Pavel Stehule
Date:
Hi

2015-11-19 3:58 GMT+01:00 Marko Tiikkaja <marko@joh.to>:
Hi,

Here's a patch for the second function suggested in 5643125E.1030605@joh.to.  This is my first patch trying to do anything with numerics, so please be gentle.  I'm sure it's full of stupid.

January's commit fest, feedback welcome, yada yada..

I am looking on this patch and I don't understand to formula

dscale = (ndigits - arg.weight - 1) * DEC_DIGITS;

the following rule is valid

DEC_DIGITS * ndigits >= dscale + arg.weight + 1

so dscale should be calculated like

dscale <= DEC_DIGITS * ndigits - arg.weight - 1

?

but your formula is correct and working. Can you explain it?

Regards

Pavel




 


.m

Re: Add numeric_trim(numeric)

From
Pavel Stehule
Date:
Hi

2015-12-26 21:44 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

2015-11-19 3:58 GMT+01:00 Marko Tiikkaja <marko@joh.to>:
Hi,

Here's a patch for the second function suggested in 5643125E.1030605@joh.to.  This is my first patch trying to do anything with numerics, so please be gentle.  I'm sure it's full of stupid.

January's commit fest, feedback welcome, yada yada..

I am looking on this patch and I don't understand to formula

dscale = (ndigits - arg.weight - 1) * DEC_DIGITS;

the following rule is valid

DEC_DIGITS * ndigits >= dscale + arg.weight + 1

so dscale should be calculated like

dscale <= DEC_DIGITS * ndigits - arg.weight - 1

?

but your formula is correct and working. Can you explain it?

I understand to it now. I didn't catch the semantic of arg.weight well.

So I am sending a review of this patch.

1. There is not any objection against this feature. I am thinking so it is good idea, and this mechanism can be used more often in other routines by default. But it is different topic.

2. The implementation is simple, without any possible side effects, performance impacts, etc

3. The patch is clean, small, it does what is expected.

4. There is good enough doc and regress tests

5. The patch respects PostgreSQL formatting - original version is maybe too compact, I am sending a little bit edited code with few more empty lines.

6. All regress tests was passed

I'll mark this patch as ready for commiter

Regards

Pavel

 

Regards

Pavel




 


.m


Attachment

Re: Add numeric_trim(numeric)

From
Dean Rasheed
Date:
On 27 December 2015 at 07:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> 2015-11-19 3:58 GMT+01:00 Marko Tiikkaja <marko@joh.to>:
>>> Here's a patch for the second function suggested in
>>> 5643125E.1030605@joh.to.
>>>
> So I am sending a review of this patch.
>

I took a quick look at this too.

It seems like a useful function to have, but perhaps it should just be
called trim() rather than numeric_trim(), for consistency with the
names of the other numeric functions, which don't start with
"numeric_".


I found a bug in the dscale calculation:

select numeric_trim(1e100);
ERROR:  value overflows numeric format

The problem is that the trailing zeros are already stripped off, and
so the computation
   dscale = (arg.ndigits - arg.weight - 1) * DEC_DIGITS;

results in a negative dscale, which needs to be guarded against.


Actually I found the implementation a little confusing, partly because
the first comment doesn't really match the line of code that follows
it, and partly because of the complexity of the loop, the macros and
working out all the exit conditions from the loop. A much simpler
implementation would be to first call strip_var(), and then you'd only
need to inspect the final digit (known to be non-zero). In
pseudo-code, it might look a little like the following:
   init_var_from_num()   strip_var()   if ndigits > 0:       dscale = (arg.ndigits - arg.weight - 1) * DEC_DIGITS
ifdscale < 0:           dscale = 0       if dscale > 0:           // Here we know the last digit is non-zero and that
itis to the           // right of the decimal point, so inspect it and adjust dscale           // (reducing it by up to
3)to trim any remaining zero decimal           // digits           dscale -= ...   else:       dscale = 0
 

This then only has to test for non-zero decimal digits in the final
base-NBASE digit, rather than looping over digits from the right. In
addition, it removes the need to do the following at the start:
   /* for simplicity in the loop below, do full NBASE digits at a time */   dscale = ((arg.dscale + DEC_DIGITS - 1) /
DEC_DIGITS)* DEC_DIGITS;
 

which is the comment I found confusing because doesn't really match up
with what that line of code is doing.

Also, it then won't be necessary to do
   /* If it's zero, normalize the sign and weight */   if (ndigits == 0)   {       arg.sign = NUMERIC_POS;
arg.weight= 0;       arg.dscale = 0;   }
 

because strip_var() will have taken care of that.

In fact, in most (all?) cases, the equivalent of strip_var() will have
already been called by whatever produced the input Numeric, so it
won't actually have any work to do, but it will fall through very
quickly in those cases.


One final comment -- in the regression tests the quotes around all the
numbers are unnecessary.

Regards,
Dean



Re: Add numeric_trim(numeric)

From
Pavel Stehule
Date:
Hi

2016-01-06 16:21 GMT+01:00 Dean Rasheed <dean.a.rasheed@gmail.com>:
On 27 December 2015 at 07:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> 2015-11-19 3:58 GMT+01:00 Marko Tiikkaja <marko@joh.to>:
>>> Here's a patch for the second function suggested in
>>> 5643125E.1030605@joh.to.
>>>
> So I am sending a review of this patch.
>

I took a quick look at this too.

It seems like a useful function to have, but perhaps it should just be
called trim() rather than numeric_trim(), for consistency with the
names of the other numeric functions, which don't start with
"numeric_".


I found a bug in the dscale calculation:

select numeric_trim(1e100);
ERROR:  value overflows numeric format

my oversight :(

 

The problem is that the trailing zeros are already stripped off, and
so the computation

    dscale = (arg.ndigits - arg.weight - 1) * DEC_DIGITS;

results in a negative dscale, which needs to be guarded against.


Actually I found the implementation a little confusing, partly because
the first comment doesn't really match the line of code that follows
it, and partly because of the complexity of the loop, the macros and
working out all the exit conditions from the loop. A much simpler
implementation would be to first call strip_var(), and then you'd only
need to inspect the final digit (known to be non-zero). In
pseudo-code, it might look a little like the following:

    init_var_from_num()
    strip_var()
    if ndigits > 0:
        dscale = (arg.ndigits - arg.weight - 1) * DEC_DIGITS
        if dscale < 0:
            dscale = 0
        if dscale > 0:
            // Here we know the last digit is non-zero and that it is to the
            // right of the decimal point, so inspect it and adjust dscale
            // (reducing it by up to 3) to trim any remaining zero decimal
            // digits
            dscale -= ...
    else:
        dscale = 0

This then only has to test for non-zero decimal digits in the final
base-NBASE digit, rather than looping over digits from the right. In
addition, it removes the need to do the following at the start:

    /* for simplicity in the loop below, do full NBASE digits at a time */
    dscale = ((arg.dscale + DEC_DIGITS - 1) / DEC_DIGITS) * DEC_DIGITS;

which is the comment I found confusing because doesn't really match up
with what that line of code is doing.

Also, it then won't be necessary to do

    /* If it's zero, normalize the sign and weight */
    if (ndigits == 0)
    {
        arg.sign = NUMERIC_POS;
        arg.weight = 0;
        arg.dscale = 0;
    }

because strip_var() will have taken care of that.

In fact, in most (all?) cases, the equivalent of strip_var() will have
already been called by whatever produced the input Numeric, so it
won't actually have any work to do, but it will fall through very
quickly in those cases.


One final comment -- in the regression tests the quotes around all the
numbers are unnecessary.

so I changed status of this patch to "waiting on author"

Thank you

Pavel
 

Regards,
Dean

Re: Add numeric_trim(numeric)

From
Robert Haas
Date:
On Wed, Jan 6, 2016 at 10:21 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> It seems like a useful function to have, but perhaps it should just be
> called trim() rather than numeric_trim(), for consistency with the
> names of the other numeric functions, which don't start with
> "numeric_".

That wouldn't work in this case, because we have hard-coded parser
productions for TRIM().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Add numeric_trim(numeric)

From
Dean Rasheed
Date:
On 6 January 2016 at 20:09, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 6, 2016 at 10:21 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>> It seems like a useful function to have, but perhaps it should just be
>> called trim() rather than numeric_trim(), for consistency with the
>> names of the other numeric functions, which don't start with
>> "numeric_".
>
> That wouldn't work in this case, because we have hard-coded parser
> productions for TRIM().
>

Ah. Good point.

Pity -- I would have liked a nice short name for this, in a similar
vein to the existing numeric functions.

Regards,
Dean



Re: Add numeric_trim(numeric)

From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 6 January 2016 at 20:09, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jan 6, 2016 at 10:21 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>> It seems like a useful function to have, but perhaps it should just be
>>> called trim() rather than numeric_trim(), for consistency with the
>>> names of the other numeric functions, which don't start with
>>> "numeric_".

>> That wouldn't work in this case, because we have hard-coded parser
>> productions for TRIM().

Does it have to be called TRIM()?  After looking at the spec for it
I'd think rtrim() is the more correct analogy.

Also worth noting is that those hard-wired parser productions aren't
as hard-wired as all that.

regression=# select trim(43.5);
ERROR:  function pg_catalog.btrim(numeric) does not exist

If we wanted to call the function btrim() underneath, this would
Just Work.  However, to alleviate confusion, it might be better
if we altered the grammar productions to output "trim" not "btrim"
for the not-LEADING-or-TRAILING cases, and of course renamed the
relevant string functions to match.

A different approach is that I'm not real sure why we want a function
that returns a modified numeric value at all.  To the extent I understood
Marko's original use case, it seems like what you'd invariably do with the
result is extract its scale().  Why not skip the middleman and define a
function named something like minscale() or leastscale(), which returns an
int that is the smallest scale that would not drop data?  (If you actually
did want the modified numeric value, you could use round(x, minscale(x))
to get it.)
        regards, tom lane



Re: Add numeric_trim(numeric)

From
Robert Haas
Date:
On Wed, Jan 6, 2016 at 6:51 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On 6 January 2016 at 20:09, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jan 6, 2016 at 10:21 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>> It seems like a useful function to have, but perhaps it should just be
>>> called trim() rather than numeric_trim(), for consistency with the
>>> names of the other numeric functions, which don't start with
>>> "numeric_".
>>
>> That wouldn't work in this case, because we have hard-coded parser
>> productions for TRIM().
>>
>
> Ah. Good point.
>
> Pity -- I would have liked a nice short name for this, in a similar
> vein to the existing numeric functions.

My experiences with function overloading haven't been enormously
positive - things that we think will work out sometimes don't, a la
the whole pg_size_pretty mess.

In this case, trim(stringy-thingy) and trim(numberish-thingy) aren't
even really doing the same thing, which makes me even less excited
about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Add numeric_trim(numeric)

From
Pavel Stehule
Date:


2016-01-07 1:11 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 6 January 2016 at 20:09, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jan 6, 2016 at 10:21 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>> It seems like a useful function to have, but perhaps it should just be
>>> called trim() rather than numeric_trim(), for consistency with the
>>> names of the other numeric functions, which don't start with
>>> "numeric_".

>> That wouldn't work in this case, because we have hard-coded parser
>> productions for TRIM().

Does it have to be called TRIM()?  After looking at the spec for it
I'd think rtrim() is the more correct analogy.

Also worth noting is that those hard-wired parser productions aren't
as hard-wired as all that.

regression=# select trim(43.5);
ERROR:  function pg_catalog.btrim(numeric) does not exist

If we wanted to call the function btrim() underneath, this would
Just Work.  However, to alleviate confusion, it might be better
if we altered the grammar productions to output "trim" not "btrim"
for the not-LEADING-or-TRAILING cases, and of course renamed the
relevant string functions to match.

A different approach is that I'm not real sure why we want a function
that returns a modified numeric value at all.  To the extent I understood
Marko's original use case, it seems like what you'd invariably do with the
result is extract its scale().  Why not skip the middleman and define a
function named something like minscale() or leastscale(), which returns an
int that is the smallest scale that would not drop data?  (If you actually
did want the modified numeric value, you could use round(x, minscale(x))
to get it.)

A example "round(x, minscale(x))" looks nice, but there can be a performance issues - you have to unpack varlena 2x

I'll try to some performance tests

Regards

Pavel


                        regards, tom lane

Re: Add numeric_trim(numeric)

From
Pavel Stehule
Date:


2016-01-07 8:12 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:


2016-01-07 1:11 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 6 January 2016 at 20:09, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Jan 6, 2016 at 10:21 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>> It seems like a useful function to have, but perhaps it should just be
>>> called trim() rather than numeric_trim(), for consistency with the
>>> names of the other numeric functions, which don't start with
>>> "numeric_".

>> That wouldn't work in this case, because we have hard-coded parser
>> productions for TRIM().

Does it have to be called TRIM()?  After looking at the spec for it
I'd think rtrim() is the more correct analogy.

Also worth noting is that those hard-wired parser productions aren't
as hard-wired as all that.

regression=# select trim(43.5);
ERROR:  function pg_catalog.btrim(numeric) does not exist

If we wanted to call the function btrim() underneath, this would
Just Work.  However, to alleviate confusion, it might be better
if we altered the grammar productions to output "trim" not "btrim"
for the not-LEADING-or-TRAILING cases, and of course renamed the
relevant string functions to match.

A different approach is that I'm not real sure why we want a function
that returns a modified numeric value at all.  To the extent I understood
Marko's original use case, it seems like what you'd invariably do with the
result is extract its scale().  Why not skip the middleman and define a
function named something like minscale() or leastscale(), which returns an
int that is the smallest scale that would not drop data?  (If you actually
did want the modified numeric value, you could use round(x, minscale(x))
to get it.)

A example "round(x, minscale(x))" looks nice, but there can be a performance issues - you have to unpack varlena 2x

the overhead of two numeric functions instead is about 100ms on 1M rows - that can be acceptable

I prefer it over string like design

Regards

Pavel
 

I'll try to some performance tests

Regards

Pavel


                        regards, tom lane


Re: Add numeric_trim(numeric)

From
Dean Rasheed
Date:
On 7 January 2016 at 00:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A different approach is that I'm not real sure why we want a function
> that returns a modified numeric value at all.  To the extent I understood
> Marko's original use case, it seems like what you'd invariably do with the
> result is extract its scale().  Why not skip the middleman and define a
> function named something like minscale() or leastscale(), which returns an
> int that is the smallest scale that would not drop data?  (If you actually
> did want the modified numeric value, you could use round(x, minscale(x))
> to get it.)
>

minscale() sounds good to me.

Re-reading Marko's original use case, it sounds like that specific
example would boil down to a check that minscale(x) <= 2, although
that can be done today using trunc(x,2) = x. Still, it seems that
minscale() would be a useful general purpose function to have.

Regards,
Dean



Re: Add numeric_trim(numeric)

From
Dean Rasheed
Date:
On 6 January 2016 at 15:21, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> Actually I found the implementation a little confusing, partly because
> the first comment doesn't really match the line of code that follows
> it, and partly because of the complexity of the loop, the macros and
> working out all the exit conditions from the loop. A much simpler
> implementation would be to first call strip_var() ...

Additionally, the central part of the algorithm where it computes dig
% 10, dig % 100 and dig % 1000 inside a bunch of conditional macros is
overly complex and hard to read. Once you've got it down to the case
where you know dig is non-zero, you can just do

while ((dig % 10) == 0)
{   dscale--;   dig /= 10;
}

which works for any NBASE, rather than having to introduce an
NBASE-dependent block of code with all those preprocessor
conditionals.

(Actually, given the other thread of this discussion, I would name
that local variable min_scale now.)

Regards,
Dean



Re: Add numeric_trim(numeric)

From
Marko Tiikkaja
Date:
On 2016-01-07 1:11 AM, Tom Lane wrote:
> A different approach is that I'm not real sure why we want a function
> that returns a modified numeric value at all.  To the extent I understood
> Marko's original use case, it seems like what you'd invariably do with the
> result is extract its scale().

Well, no, both are useful.  I think as far as the interface goes, having 
both a scale() and a way to "rtrim" a numeric is optimal.  rtrim() can 
also be used before storing and/or displaying values to get rid of 
unnecessary zeroes.

As for what the actual function should be called, I don't much care.  I 
wanted to avoid making trim() work because I thought it would interfere 
with the  trim('foo')  case where the input argument's type is unknown, 
but after some testing it appears that that would not be interfered with.


.m



Re: Add numeric_trim(numeric)

From
Alvaro Herrera
Date:
Marko Tiikkaja wrote:
> Hi,
> 
> Here's a patch for the second function suggested in 5643125E.1030605@joh.to.

I think this patch got useful feedback but we never saw a followup
version posted.  I closed it as returned-with-feedback.  Feel free to
submit a new version for the 2016-03 commitfest.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add numeric_trim(numeric)

From
Robert Haas
Date:
On Wed, Jan 27, 2016 at 7:09 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> Here's a patch for the second function suggested in 5643125E.1030605@joh.to.
>
> I think this patch got useful feedback but we never saw a followup
> version posted.  I closed it as returned-with-feedback.  Feel free to
> submit a new version for the 2016-03 commitfest.

This was switched back to Waiting on Author despite not having been
updated.  I have switched it back to Returned with Feedback.  It's
very difficult to focus on the patches in a CommitFest that have a
chance of actually being ready to commit when the list is cluttered
with entries that have not even been updated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company