Zac and Duc's conversation on Flood |
The Problem Statement can be found here.
Duc:
This can be seen as a weighted version of the Steiner Tree (http://en.wikipedia.org/wiki/Steiner_tree_problem) problem on a grid graph. The problem can be shown to be NP-hard and various methods can be used to get a good approximate solution.
Zac:
First, I want to comment on the writeup for FLOOD Duc provided. Technically, it isn't a Steiner Tree problem in the classic sense. It is an instance of *Node Weighted* Steiner Tree, which is a very different problem from the view of approximation algorithms. That is, for the classic Edge Weighted Steiner Tree (which is what the wikipedia article is about), there is a simple 2-approximation (and even better approximations exist), but one can prove that no approximation for Node Weighted Steiner Tree can, in polynomial time, find a solution within o(log n) of the optimum unless P = NP where n is the number of nodes. We have a much stronger characterization of it's difficulty than simply saying it is NP-hard. There are also polynomial-time approximation algorithms that find solutions within O(log n) of the optimum, so the poly-time approximability of the problem is essentially known. Do you want references? The algorithm I implemented is actually not a great approximation algorithm and I had Duc include a test case that makes my approach perform fairly poorly, but my algorithm does seem to work well on the randomly generated cases.
Duc:
For FLOOD, you said that "no approximation for Node Weighted Steiner Tree can, in polynomial time, find a solution within o(log n) ", but then you also said that " There are also polynomial-time approximation algorithms that find solutions within O(log n) of the optimum. Am I missing something here? Graphs of cubic, quartic and quadratic polynomials may give you an idea of how polynomial functions grow with input size. Perhaps we can include your approximation algorithms in to the discussion as one possible way. We also say that it can be combined with local search techniques to improve the solution.
Zac:
Look closely at the first o(log n)... it's a "little oh". For clarity, what I meant was no approximation algorithm can guarantee an approximation ratio that is asymptotically better than log n. On the other hand, we do have approximation algorithms that guarantee the solution is within c log n of the optimum for some fixed constant c.
Actually, I would like to make a more precise (and slightly stronger) statement. There is some constant c such that no c log n (the natural logarithm) approximation exists unless P = NP. On the other hand, there is a 2 log n (again, the natural logarithm) approximation for the problem. So, up to optimizing the constant in front of the log n, we pretty much have a full characterization of the approximability of this problem. I can dig up references if you would like.
My heuristic was:
- let all the cells be nodes in a graph
- for each cell/node, add edges to the surrounding (up to) eight cells/nodes
- for each edge, set the weight of this edge to be the sum of the weights of the endpoint cells (an island has weight 0)
- from now on, ignore the node weights and consider only the edge weights
- find a minimum spanning tree of this graph
- iteratively prune all leaves until only islands are leaves
- each non-island node remaining in this tree is to be used in the final solution
Of course, there are other tricks (e.g. local search improvements) that one can employ to try to improve this solution. I'm just explaining the basic idea behind my approach.
The solution is valid since the sub-tree we find is connected and spans all islands. The weights of the edges are supposed to somehow capture the cost of using the endpoints in the final solution. Of course, this is not a great approximation as the bad example I gave you shows (it's not even an O(log n) approximation... it can be as bad as Omega(n)). However, it did work fairly well on the randomly generated instances. The 2 log n approximation I alluded to earlier is practical for a reasonable number of nodes, but the graphs in your instance had MANY nodes and only a few seconds of time were allowed, so I had to resort to this weaker heuristic.
Zac:
I've attached the paper that gives the 2 ln n approximation; you can search for the title online to get the full citation. The authors comment on the hardness in that paper, but don't give the precise statement about hardness I mentioned earlier. Still, for one who is used to "hardness of approximation" results, it's a simple observation to node that the problem can't be approximated better than c ln n for some constant c unless P = NP due to a similar result for a related problem called "set cover".
Keep in mind, all of the comments I've made are for the general problem. Maybe there are better approximations for instances that arise from these "grid graphs" that we dealt with in this problem.