Monday, January 24, 2011

Puzzle: Find Sophie

A friend told me the facebook puzzles page, so I started to work on some puzzles for fun. I started to work on "Find Sophie" which is at buffet level (hardest). This one is pretty interesting and it is not as hard as I thought.

http://www.facebook.com/careers/puzzles.php?puzzle_id=11

As usual I started with a recursive solution. The algorithm is simple:

  1. starts with a node (first node) and find all the neighbors. Also calculate the current distance traveled so far.
  2. for each neighbor, if it has not been visited yet, compute the current cost and call the recursive method with that node

// pseudo code

double minCost, curCost;

void backtrack (int node, double distSoFar)
{
    if (currentPath.size() == numberOfNodes) 
    {
        // this is a solution
        if (minCost > curCost) minCost = curCost;
        return;
    }

    foreach n in node's neighbors
    {
        if n has not been accessed
        {
            double p; // possibility
            double dist; // the distance to this neighbor
            curCost = curCost + (distSoFar + dist) * p;
 
// set the access flag of node n 
// backtrack
            backtrack(n, distSoFar + dist);
 
// restore curCost, accessflag of n
        }
    }
} 



So what is  the complexity of this puzzle? Obviously it is O(n!). For a 17 node example, it took several hours running on my server and still not finished. So how to improve it? Simple. Prune the search path.  A few simple techniques were used and all of a sudden, it took less than 1 second to run all the examples posted in this blog. http://www.facebook.com/topic.php?uid=15325934266&topic=7058


  1. At any time, a minimal cost is maintained. In the search process, if we find the cost of the search is larger than the minimal cost, the current search can be terminated earlier
  2. Which neighbor to try first? Simple math could show that if the node has higher probability and lower distance, that node should go first. So we can sort the neighbors for each node by "dist * (1-p)", where "dist" is the distance to current node and "p" is the probability of the current node. This is very important. The earlier we find a relative lower cost, the more search paths can be pruned.


Another critical point about this puzzle is not all nodes are connected, so we need to calculate a path for each pair of nodes (preferably the shortest path between each pair). This is a O(n3) operation and should be OK comparing to the O(n!) nature of the problem.


Puzzles are fun!

    0 comments:

    Post a Comment