Description
-
(25 pt) You’re helping to run a high-performance computing system capable of processing several terabytes of data per day. For each of n days, you’re presented with a quantity of data; on day i, you’re presented with xi terabytes. For each terabyte you process, you receive a fixed revenue, but any unprocessed data becomes unavailable at the end of the day (i.e., you can’t work on it in any future day).
You can’t always process everything each day because you’re constrained by the capabilities of your com-puting system, which can only process a fixed number of terabytes in a given day. In fact, it’s running some one-of-a-kind software that, while very sophisticated, is not totally reliable, and so the amount of data you can process goes down with each day that passes since the most recent reboot of the system. On the first day after a reboot, you can process s1 terabytes, on the second day after a reboot, you can process s2 terabytes,
and so on, up to sn; we assume s1 > s2 > s3 > > sn > 0. (Of course, on day i you can only process up to
-
i terabytes, regardless of how fast your system is.) To get the system back to peak performance, you can choose to reboot it; but on any day you choose to reboot the system, you can’t process any data at all.
The problem. Given the amounts of available data x1, x2, …, xn for the next n days, and given the profile of your system as expressed by s1, s2, …, sn (and starting from a freshly rebooted system on day 1), choose the days on which you’re going to reboot so as to maximize the total amount of data you process.
Example. Suppose n = 4, and the values of xi and si are given by the following table.
-
day1
day2
day3
day4
x
10
1
7
7
s
8
4
2
1
The best solution would be to reboot on day 2 only; this way, you process 8 terabytes on day 1, then 0 on day 2, then 7 on day 3, then 4 on day 4, for a total of 19. (Note that if you didn’t reboot at all, you’d process 8 + 1 + 2 + 1 = 12; and other rebooting strategies give you less than 19 as well.)
-
Give an example of an instance with the following properties.
-
-
There is a “surplus” of data in the sense that xi > s1 for every i.
-
-
-
The optimal solution reboots the system at least twice.
-
In addition to the example, you should say what the optimal solution is. You do not need to provide a proof that it is optimal.
-
-
-
Give an efficient algorithm that takes values for x1, x2, …, xn and s1, s2, …, sn and returns the total number of terabytes processed by an optimal solution.
-
-
-
(25 pt) A palindrome is a string that reads the same from left to right and from right to left. Design an algorithm to find the minimum number of characters required to make a given string to a palindrome if you are allowed to insert characters at any position of the string. For example, for the input “aab” the output should 1 (we’ll add a ’b’ in the beginning so it becomes “baab”).
The algorithm should run in O(n2) time if the input string has length n.
-
(25 pt) Given an undirected graph G = (V, E), an independent set is a subset I V such that no two nodes in I are adjacent in G. I.e. for any two nodes u, v 2 I, (u, v) 2= E. Finding a maximum cardinality independent set in a graph is a hard problem, but the problem becomes easy when the graph is a tree. Design an algorithm which, given a tree T = (V, E), runs in O(jV j) time and returns a maximum cardinality independent set in T .
-
(10 pt) Consider the set A = fa1, . . . , ang and a collection B1, B2, . . . , Bm of subsets of A (i.e., Bi A for each i). We say that a set H A is a hitting set for the collection B1, B2, . . . , Bm if H contains at least one element from each Bi —that is, if H \ Bi is not empty for each i (so H “hits” all the sets Bi ).
We now define the Hitting Set Problem as follows. We are given a set A = fa1, . . . , ang, a collection B1, B2, . . . , Bm of subsets of A, and a number k. We are asked: Is there a hitting set H A for B1, . . . , Bm so that the size of H is at most k?
Prove that the vertex cover problem p the hitting set problem.
-
(15 pt) An undirected graph G = (V, E) is called “k-colorable” if there exists a way to color the nodes with k colors such that no pair of adjacent nodes are assigned the same color. I.e. G is k-colorable iff there exists a k-coloring : V ! f1, . . . , kg, such that for all (u, v) 2 E, (u) 6= (v) (the function is called a proper k-coloring). The “k-colorable problem” is the problem of determining whether an input graph G = (V, E) is k-colorable. Prove that the 3-colorable problem P the 4-colorable problem.
-
Homework assignments are due on the exact time indicated. Please submit your homework using the Gradescope system. Email attachments or other electronic delivery methods are not acceptable. To learn how to use Gradescope, you can:
– 1. Watch the one-minute video with complete instructions from here:
https://www.youtube.com/watch?v=-wemznvGPfg
– 2. Follow the instructions to generate a PDF scan of the assignments:
http://gradescope-static-assets.s3-us-west-2.amazonaws.com/help/submitting_ hw_guide.pdf
– 3. Make sure you start each problem on a new page.
-
We recommend to use LATEX, LYX or other word processing software for submitting the homework. This is not a requirement but it helps us to grade the homework and give feedback. For grading, we will take into account both the correctness and the clarity. Your answer are supposed to be in a simple and understandable manner. Sloppy answers are expected to receiver fewer points.
-
Unless specified, you should justify your algorithm with proof of correctness and time complexity.