On Sun, Feb 12, 2017 at 10:35:04AM +0000, Dean Rasheed wrote:
> On 11 February 2017 at 01:17, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> > Thanks for the feedback, I'll fix this. I've allowed myself to be a bit
> > sloppy because the number of attributes in the statistics is currently
> > limited to 8, so the overflows are currently not an issue. But it doesn't
> > hurt to make it future-proof, in case we change that mostly artificial limit
> > sometime in the future.
> >
>
> Ah right, so it can't overflow at present, but it's neater to have an
> overflow-proof algorithm.
>
> Thinking about the exactness of the division steps is quite
> interesting. Actually, the order of the multiplying factors doesn't
> matter as long as the divisors are in increasing order. So in both my
> proposal:
>
> result = 1
> for (i = 1; i <= k; i++)
> result = (result * (n-k+i)) / i;
>
> and David's proposal, which is equivalent but has the multiplying
> factors in the opposite order, equivalent to:
>
> result = 1
> for (i = 1; i <= k; i++)
> result = (result * (n-i+1)) / i;
>
> the divisions are exact at each step. The first time through the loop
> it divides by 1 which is trivially exact. The second time it divides
> by 2, having multiplied by 2 consecutive factors, one of which is
> therefore guaranteed to be divisible by 2. The third time it divides
> by 3, having multiplied by 3 consecutive factors, one of which is
> therefore guaranteed to be divisible by 3, and so on.
Right. You know you can use integer division, which make sense as
permutations of discrete sets are always integers.
> My approach originally seemed more logical to me because of the way it
> derives from the recurrence relation binomial(n, k) = binomial(n-1,
> k-1) * n / k, but they both work fine as long as they have suitable
> overflow checks.
Right. We could even cache those checks (sorry) based on data type
limits by architecture and OS if performance on those operations ever
matters that much.
> It's also interesting that descriptions of this algorithm tend to
> talk about setting k to min(k, n-k) at the start as an optimisation
> step, as I did in fact, whereas it's actually more than that -- it
> helps prevent unnecessary intermediate overflows when k > n/2. Of
> course, that's not a worry for the current use of this function, but
> it's good to have a robust algorithm.
Indeed. :)
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate