Tutorial for Just a simple sum |
Warning - mathematical content ahead!
If you've participated in programming contests before, you'll know that most of the time, the simpler the problem statement, the tougher the problem seems to be. This one looks like it follows the trend, but as long as you are grounded with the correct mathematical knowledge, each step in the solution to this problem was reasonably straightforward - the trick was not getting sidetracked into alternate methods that were never going to work.
Let's start by looking at a couple of key algorithms in problems like these.
Two initial algorithms
Calculating powers
Calculating a large power of a number in modular arithmetic is a common problem, and can be done very quickly. To calculate ab mod m, we calculate a, a2, a4, a8, a16 ... by repeated squaring the previous value. We then multiply together the right terms to reach whatever power of b we want.
We can combine both these steps into one function as follows:
[blockcode] result = 1;
power = a;
while (b>0) {
if (b%2==1) result = (result*power)%m;
power = (power*power)%m;
b/=2;
} [/blockcode]
Make sure you understand how this works - we are basically multiplying in the correct power of a every time we see a '1' bit in binary in b.
Calculating inverses
If we are working mod m and want to divide two numbers (ie find a/b), we can do so when b and m share no common factors. In that case, we find the inverse of b mod m, and then multiply this by a.
How do we calculate an inverse? By using The Extended Euclidean Algorithm. This isn't a complicated algorithm to learn, and you should store this away for future use so you don't have to rewrite it every time. (Which, as it happens, I always end up doing anyway, because I keep losing it :)) Wikipedia covers this algorithm in detail, so I won't go over it here.
Initial thoughts
We should always start by looking at the constraints. n can be huge in this problem - up to 1018. This means we are going to need to look for an algorithm whose speed doesn't really depend on n at all - at the worst, with time O(log n). But m is going to need to be the key value in our algorithm.
We should also see straight away 64 bit integers will be required - while we can reduce calculations mod 200000 at each stage, we could end up with a calculation of 199999*199999 which will not fit inside an int.
The last thing we should see immediately is that as we are doing calculations mod m, we can immediately reduce all of the base numbers in each power modulo m.
Combining the bases
In order to do things in a faster way than looping through each of the n terms, we're going to have to group things together in some way.
After we reduce all of the bases mod m, there are going to be lots of powers of 1, lots of powers of 2, all the way up to powers of m-1. There will be some powers of 0 as well, but that's not going to affect the sum at all. Perhaps we can work out how to add all the powers with the same base at the same time.
The sum we are looking at is of the form ii + ii+m + ii+2m + ... + ii+(k-1)m for some values of i and k (we can work out k in terms of i and n). Each term in the sum can be found from the previous term by multiplying by im at each stage.
This tells you the sum is a geometric series. If we forget about the fact we need to calculate the value mod m, there is a formula for calculating the exact sum of such a series:
a * (rn - 1) / (r - 1)
where a = the first term, r = the fixed ratio between adjacent terms, and n is the number of terms we are adding. In this case a = ii, r = im, and n = k.
Now, if r-1 and m share no common factors, then we can calculate this division mod m by multiplying by the inverse of r-1 mod m instead of the division. But what if r-1 and m do share common factors? There doesn't seem to be any easy way forward.
Another approach
Let's go back a step and have another look at the geometric series in its expanded form. Often when we are a given a sequence and told to find values mod some number, the sequence is going to start repeating itself at some stage. Perhaps the terms in this sequence repeat as well.
Since the only thing to do to get from one number to the next is multiply by a fixed value, as soon as one term repeats, the whole sequence will repeat. A possible approach would be to keep calculating terms until you have a repeat. After that, all you need to do is count how many times the cycle will occur, and you have a much faster way of calculating the entire sum.
If you code up that sort of approach, it will even appear to work as it calculates the solution for the largest possible input reasonably quickly. But the judge will give TLE - why?
Choose your boundary cases carefully
While testing your code with the smallest/largest inputs is a good idea, you have to be careful when deciding what 'largest' means. In this case, m=200000 has plenty of small factors, which will make the terms repeat very quickly. Testing with 199999, which is a prime, will take a very very long time. (You could have predicted this in advance with some help from Euler's Theorem.
Back to the geometric series
Since primes caused a problem with our last approach, let's see what happens with the geometric series approach when m is a prime. The only way r-1 can share a factor with m is if r-1 is divisible by m; so r == 1 mod m. But in this case, we can calculate the sum straight away - all of the terms are exactly the same, with sum a*n (or ii * k with the original variables).
So now m being a prime is the *best* possible case. Perhaps this method will lead to a solution after all.
What about other values of m - for example, m=p2 where p is a prime? Again, if r==1 mod m, we can calculate the sum directly. But now we have the added case where r-1 is a multiple of p, but not p2. Is there anything we can do here?
(Note: the following is not 100% rigorous, but hopefully you get the idea :))
We can see the problem with dividing by p if we turn things into pure algebra, which can often help when working with mods. We know r-1 is a multiple of p, so rn - 1 must be as well (since the fraction is an integer).
Suppose rn - 1 == B mod p2 - in other words, rn - 1 = A*p2 + B for some A. When we divide this by p, we get A*p + (B/p) - so basically, we are getting a result which is congruent to B/p, mod p. But we want a result mod p2, not mod p.
So why not calculate rn - 1 mod p3? Suppose it turns out to B mod p3, ie rn - 1 = A*p3 + B for some A. When we divide this by p, we get A*p2 + (B/p), so the result is congruent to B/p mod p2. Of course, there may be other factors of r-1 that we need to divide by - but we can handle these by multiplying by inverses as in the simpler case.
The same idea is going to work for all other prime powers - if we want to calculate a division mod p5, and the bottom is divisible by p2, then we should calculate the top mod p7.
So if m is a prime power (pt), we do the following:
- Work out the largest power of p dividing r-1 - suppose this is pu.
- If this is m itself, then the sum becomes a itself.
- Otherwise calculate rn - 1 mod pt+u. The result will always be a multiple of pu.
- Divide both the numerator (rn - 1 calculated above) and the denominator (r-1) by pu.
- Now we can calculate the division by calculating the inverse as normal.
But what about all the other possible values of m that aren't prime powers?
They don't matter. If we can solve the problem where m is a prime power, we can solve it for all m by combining these answers, thanks to the Chinese Remainder Theorem, which tells us there is a unique answer mod m that satisfies each smaller result.
While there are ways of combining these results in general, all we really need to do is loop through each of the m possible answers and check if it satisfies our results.
Our full algorithm becomes:
- Factorise m into a product of prime factors.
- For each prime power, calculate the sum mod that prime power (ie by setting m = that prime power in our original equation) by looping through each possible base and adding the results together
- Check each number between 0 and m-1 to see if it satisfies the results for each prime power mod.
This approach will fit inside the generous time limit comfortably. Have a look at some of the accepted solutions to this problem on the contest page and see if you can see these ideas in action.
An alternative approach
A clever alternative approach was proposed by narri on the Topcoder Forums. In calculating the sum of the geometric series mentioned earlier, you can pair up the terms as follows:
1 + r + r2 + r3 + .. + rk-1
= (1+r) + r2(1+r) + r4(1+r) + ... + r2[(k-1)/2](1+r) + (rk-1 if k is odd)
= (1+r)(1 + r2 + r4 + .. + r2[(k-1)/2]) + (rk-1 if k is odd)
This has reduced the problem to calculating the sum of a geometric series with half as many terms. So we can calculate this recursively in O(log k) time, without any divisions required at all.
In case someone would like to revise the formulae for series and progressions:
A tutorial with problems and solutions on Arithmetic Progression, Geometric Progression, Harmonic Progression AP,GP,HP