Re: Greatest Common Divisor - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: Greatest Common Divisor
Date
Msg-id CAEZATCXO7rCCTjmk25hF9MEpvhHwOVucG11ETXf8G5ibhpzV1w@mail.gmail.com
Whole thread Raw
In response to Re: Greatest Common Divisor  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Greatest Common Divisor  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
On Thu, 2 Jan 2020 at 15:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Stephen Frost <sfrost@snowman.net> writes:
> > * Dean Rasheed (dean.a.rasheed@gmail.com) wrote:
> >> I'm not objecting to adding it, I'm just curious. In fact, I think
> >> that if we do add this, then we should probably add lcm() at the same
> >> time, since handling its overflow cases is sufficiently non-trivial to
> >> justify not requiring users to have to implement it themselves.
>
> > I tend to agree with this.
>
> Does this impact the decision about whether we need a variant for
> numeric?  I was leaning against that, primarily because (a)
> it'd introduce a set of questions about what to do with non-integral
> inputs, and (b) it'd make the patch quite a lot larger, I imagine.
> But a variant of lcm() that returns numeric would have much more
> resistance to overflow.
>

Well Vik has now provided a numeric implementation and it doesn't
appear to be too much code.

BTW, there is actually no need to restrict the inputs to integral
values because GCD is something that has a perfectly natural extension
to floating point inputs (see for example [1]). Moreover, since we
already have a mod(numeric, numeric) that works for arbitrary inputs,
Euclid's algorithm just works. For example:

SELECT gcd(285, 7845);
 gcd
-----
  15

SELECT gcd(28.5, 7.845);
  gcd
-------
 0.015

Essentially, this is because gcd(a*10^n, b*10^n) = gcd(a, b) * 10^n,
so you can think of it as pre-multiplying by a power of 10 large
enough to make both inputs integers, and then dividing the result by
that power of 10.

If it were more work to support non-integer inputs, I'd say that it's
not worth the effort, but since it's actually less work to just allow
it, then why not?


> Maybe we could just define "lcm(bigint, bigint) returns numeric"
> and figure that that covers all cases, but it feels slightly
> weird.  You couldn't do lcm(lcm(a,b),c) without casting.
> I guess that particular use-case could be addressed with
> "lcm(variadic bigint[]) returns numeric", but that's getting
> really odd.
>

Having thought about that, I don't like defining these functions to
return a different type than their inputs. I think most people tend to
be working with a given type, and are used to having to move to a
wider type if necessary. We don't, for example, define "mul(bigint,
bigint) returns numeric".

Also I don't think it really buys you all that much -- the problem
with lcm(lcm(a,b),c) where bigint inputs produce a numeric output
isn't just that you need to add casting; the lcm(a,b) result may not
fit in a bigint, so the cast might fail. So really, this is just
postponing the problem a bit, without really fixing it. As for
"lcm(variadic bigint[]) returns numeric", to implement that you'd need
to use numeric computations internally, so I suspect it's
implementation would be at least as complex as lcm(numeric, numeric).

FWIW, looking for precedents elsewhere, I note that the C++ standard
library defines these functions to return the same type as the inputs.
To me, that seems more natural.

Regards,
Dean

[1] https://www.geeksforgeeks.org/program-find-gcd-floating-point-numbers/



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Greatest Common Divisor
Next
From: Dean Rasheed
Date:
Subject: Re: Greatest Common Divisor