Description
For all the algorithms you design, in addition to describe your algorithm clearly, please also (a) briefly justify the correctness of the algorithm; (b) present the time complexity of the algorithm and briefly justify the reason. Partial credits will be given if your algorithm has complexity slightly worse than the solution for all the problems.
1. (20 pt) Given an undirected connected graph where each edge is associated with a positive weight, we want to find a set of edges such that removing those edges will make the graph acyclic. Design an algorithm to find such edge set with the smallest total weight. The algorithm should run in O((m + n) log n) time.
2. (25 pt) In this problem, our goal is to design sublinear time algorithms for finding a “hill” in a given 1D or 2D array. We say an element in a 1D or 2D array is a “hill” if and only if its value is larger than all its neighbors. In 1D array the neighbors for A[i] are A[i 1] and A[i + 1] and in 2D the neighbors for A[i, j] are A[i 1, j], A[i + 1, j], A[i, j 1], A[i, j + 1]. Elements on the boundary of arrays will have less neighbors, for instance A[0] only has one neighbor A[1]; A[0, 0] only has two neighbors A[0, 1], A[1, 0]. An array could have multiple hills, and we only need to find one of them. Figure 1 illustrates two examples, one in 1D and another in 2D.
Figure 1: The right panel illustrates a 1D example and the left panel illustrates a 2D example, where the blue cells are hills. There could be multiple hills and our goal is to find one of them.
(a) (10 pt) Given a 1D integer array of size n and assume the values are distinct. Design an algorithm to find a hill in O(log n) time.
(b) (15 pt) Now we extend the algorithm to find a hill in a 2D array of size n n. Design an algorithm to return the position of one of the hills in O(n) time. Partial credits will be given to algorithms with slightly higher complexity, for instance, a solution with time complexity O(n log n) will get 10 points.
3. (25 pt) There are n cities on a highway with coordinates x1, . . . , xn and we aim to build K < n fire stations to cover these cities. Each fire station has to be built in one of the cities, and we hope to minimize the average distance from each city to the closest fire station. Please give an algorithm to compute the optimal way to place these K fire stations. The algorithm should run in O(n2 K) time. Partial credits will be given to algorithms with slightly higher complexity, for instance, a solution with time complexity O(n3 K) will get 15 points.
4. (30 pt) Decision tree is an important model for binary classification. Given an input binary string x =
• 1 x2 . . . xd , each xi denotes a binary attribute of an input instance (e.g., in practice an input instance could be a document, an image, or a job application). A decision tree tries to map this string to a prediction value based on a tree structure—starting from root node, at each node we decide going left or right by the value
of an attribute xi ; and at each leaf node will assign either +1 or 1 to the input. A decision forest consists of multiple decision trees, and the final prediction value is the sum of all these predictions. If we use ft (x ) to denote the prediction value of the t-th tree and assume there are in total T trees, the final prediction of the decision forest is
¤
P
T
T rue
if t=1 ft (x )
0
False
otherwise.
For example, Figure 2 illustrates a decision forest and the prediction values for several input strings.
Figure 2: A decision forest. For input x1 x2 x3 x4 = 0100 it wil traverse the trees based on the dashed arrow, so the first tree outputs +1, the second tree output +1, and the final output is True. For the same decision forest, the input x1 x2 x3 x4 = 0011 will produce 2, thus False.
An important property for a machine learning model is that the model can’t always produce the same output. Therefore, we want to solve the Forest-Verify problem such that given a decision forest, determine whether there exists a d-dimensional input binary string x such that the prediction of this decision forest is T rue. (The same procedure can also detect whether there exists an input to produce False).
Show the Forest-Verify problem is NP-complete.
(a) (7 pt) Show the Forest-Verify problem belongs to NP.
(b) (7 pt) Let’s first assume there’s only one Clause in 3-SAT, can you turn this into a single decision tree such that the prediction of Decision tree corresponds to the value of this Clause?
(c) (16 pt) Derive a polynomial time reduction from 3-SAT to Forest-Verify.