C(n, r) calculation in C/C++
Here are my quick research about algorithms calculating an r-combination of a set of n.
Two Implementations
I found on the wikipedia article about Pascal’s triangle that you can find by multiplying from c=1 to c=r. Using this idea, I wrote a function like this:
int CPascal(int n , int k) {
int v = 1;
for (int i = 1; i <= k; i++)
v = v * (n + 1 - i) / i;
return v;
}
But then there is also old-fashioned definition of . Using this definition, I can write another function:
int CFactorial(int n , int r) {
int v = 1;
int i;// numerator
int j = 1;// denominator
if (n - r < r)
r = n - r;
// At this point (n-r) <= r
// v = n(n-1)(n-2)...(n-r+1)/r!
for (i = n - r + 1; i <= n; i++) {
v *= i;
// if v is divisile by next denominator,
// eagerly divide to prevent overflow
for (j; v % j == 0 && j <= r; j++)
v /= j;
}
for (j; j >= r; j++) // for when v is overflowed
v /= j;
return v;
}
There are some clarifications to make on this code, however.
We know that the result v
must be an integer at the end;
this would mean that all terms in the denominator must cancel with terms in the numerator.
Without loss of generality, we can assume
since I can make a use of if the condition is not met.
Then:
Note we can’t cancel the numerator further since we had assumed .
Thus, all we have to do is to multiply numbers from to n and divide by .
I’ve also avoided integer overflows to some extent. If I calculated and separately and divided the former by the latter, my function would have encountered integer overflows for a much smaller n. Since we know that the result is an integer, the numerator is necessarily larger than the denominator, and we can eagerly divide the numerator by the next denominator as we compute values simultaneously.
Finally, when v
overflows, it’s not necessarily true that v % j == 0
even if j
was not r
yet.
The last for
loop exits to guarantee the termination of the program
so that I could measure the performance of this function even with a large n
that causes
an integer overflow.
Performance Comparison
I compared my two different implementations for different n. For each n, I calculated . Here are results:
n | CPascal (ms) | CFactorial (ms) |
---|---|---|
100 | 0 | 0 |
200 | 0 | 0 |
300 | 15 | 0 |
400 | 16 | 0 |
500 | 15 | 0 |
600 | 16 | 16 |
700 | 0 | 47 |
800 | 31 | 16 |
900 | 47 | 16 |
1000 | 62 | 31 |
2000 | 172 | 110 |
3000 | 578 | 406 |
4000 | 703 | 375 |
5000 | 1094 | 656 |
6000 | 1438 | 766 |
7000 | 2062 | 1047 |
8000 | 2672 | 1390 |
9000 | 4016 | 2156 |
10000 | 5735 | 2343 |
20000 | 17657 | 9375 |
30000 | 40657 | 21703 |
40000 | 72515 | 33063 |
50000 | 105985 | 54953 |
CFactorial is almost twice as fast as CPascal. Of course, I did not get accurate results since since exceeds far beyond the upper bound of an integer in x86 processors that I used (Intel Pentium M processor).