Project 1 Random Graphs and Random Walks

$24.99 $18.99

One can use igraph library1 to generate di erent networks and measure various properties of a given network. The library has R and Python implementations. You may choose either lan-guage that you prefer. However, for this project, using R is strongly recommended, as some functions might not be implemented for the Python version of the…

5/5 – (2 votes)

You’ll get a: zip file solution

 

Categorys:

Description

5/5 – (2 votes)

One can use igraph library1 to generate di erent networks and measure various properties of a given network. The library has R and Python implementations. You may choose either lan-guage that you prefer. However, for this project, using R is strongly recommended, as some functions might not be implemented for the Python version of the package.

Submission: Upload a zip le containing your report and codes to CCLE. One submission from any member of groups is su cient.

Generating Random Networks

    1. Create random networks using Erdos-Renyi (ER) model

      1. Create undirected random networks with n = 1000 nodes, and the probability p for drawing an edge between two arbitrary vertices 0.003, 0.004, 0.01, 0.05, and 0.1. Plot the degree distributions. What distribution is observed? Explain why. Also, report the mean and variance of the degree distributions and compare them to the theoretical values.

Hint Useful function(s): sample_gnp , degree , degree_distribution , plot

      1. For each p and n = 1000, answer the following questions:

Are all random realizations of the ER network connected? Numerically estimate the probability that a generated network is connected. For one instance of the networks with that p, nd the giant connected component (GCC) if not connected. What is the diameter of the GCC?

Hint Useful function(s): is_connected , clusters , diameter

      1. It turns out that the normalized GCC size (i.e., the size of the GCC as a fraction of the total network size) is a highly nonlinear function of p, with interesting properties occurring for values where p = O(n1 ) and p = O(lnnn ).

For n = 1000, sweep over values of p from 0 to a pmax that makes the network almost surely connected and create 100 random networks for each p. pmax should be roughly determined by yourself. Then scatter plot the normalized GCC sizes vs p. Plot a line of the average normalized GCC sizes for each p along with the scatter plot.

        1. Empirically estimate the value of p where a giant connected component starts to emerge (de ne your criterion of \emergence”)? Do they match with theoretical values mentioned or derived in lectures?

        1. Empirically estimate the value of p where the giant connected component takes up over 99% of the nodes in almost every experiment.

      1. i. De ne the average degree of nodes c = n p = 0:5. Sweep over the number of nodes, n, ranging from 100 to 10000. Plot the expected size of the GCC of ER networks with n nodes and edge-formation probabilities p = c=n, as a function of n. What trend is observed?

        1. Repeat the same for c = 1.

        1. Repeat the same for values of c = 1:1; 1:2; 1:3, and show the results for these three values in a single plot.

        1. What is the relation between the expected GCC size and n in each case?

    1. Create networks using preferential attachment model

      1. Create an undirected network with n = 1000 nodes, with preferential attachment model, where each new node attaches to m = 1 old nodes. Is such a network always connected?

Hint Useful function(s): sample_pa ( barabasi.game )

      1. Use fast greedy method to nd the community structure. Measure modularity. Hint Useful function(s): cluster_fast_greedy , modularity

      1. Try to generate a larger network with 10000 nodes using the same model. Compute modularity. How is it compared to the smaller network’s modularity?

      1. Plot the degree distribution in a log-log scale for both n = 1000; 10000, then estimate the slope of the plot using linear regression.

    1. In the two networks generated in 2(d), perform the following:

Randomly pick a node i, and then randomly pick a neighbor j of that node. Plot the degree distribution of nodes j that are picked with this process, in the log-log scale. Is the distribution linear in the log-log scale? If so, what is the slope? How does this di er from the node degree distribution?

Hint Useful function(s): sample

    1. Estimate the expected degree of a node that is added at time step i for 1 i 1000. Show the relationship between the age of nodes and their expected degree through an appropriate plot.

    1. Repeat the previous parts for m = 2; and m = 5. Compare the results of each part for di erent values of m.

    1. Again, generate a preferential attachment network with n = 1000, m = 1. Take its degree sequence and create a new network with the same degree sequence, through stub-matching procedure. Plot both networks, mark communities on their plots, and measure their modularity. Compare the two procedures for creating random power-law networks.

Hint In case that fastgreedy community detection fails because of self-loops, you may use \walktrap” community detection.

Useful function(s): sample_degseq

  1. Create a modi ed preferential attachment model that penalizes the age of a node

    1. Each time a new vertex is added, it creates m links to old vertices and the probability that an old vertex is cited depends on its degree (preferential attachment) and age. In particular, the probability that a newly added vertex connects to an old vertex is proportional to:

P [i] (cki + a)(dli + b);

where ki is the degree of vertex i in the current time step, and li is the age of vertex i. Produce such an undirected network with 1000 nodes and parameters m = 1,

= 1; = 1, and a = c = d = 1; b = 0. Plot the degree distribution. What is the power law exponent?

Hint Useful function(s): sample_pa_age

  1. Use fast greedy method to nd the community structure. What is the modularity?

Random Walk on Networks

    1. Random walk on Erdos-Renyi networks

      1. Create an undirected random network with 1000 nodes, and the probability p for drawing an edge between any pair of nodes equal to 0.01.

      1. Let a random walker start from a randomly selected node (no teleportation). We use t to denote the number of steps that the walker has taken. Measure the average

distance (de ned as the shortest path length) hs(t)i of the walker from his starting point at step t. Also, measure the variance 2(t) = h(s(t) h s(t)i)2i of this distance. Plot hs(t)i v.s. t and 2(t) v.s. t. Here, the average h i is over random choices of the starting nodes.

    1. Measure the degree distribution of the nodes reached at the end of the random walk. How does it compare to the degree distribution of graph?

    1. Repeat 1(b) for undirected random networks with 10000 nodes. Compare the results and explain qualitatively. Does the diameter of the network play a role?

  1. Random walk on networks with fat-tailed degree distribution

    1. Generate an undirected preferential attachment network with 1000 nodes, where each new node attaches to m = 1 old nodes.

    1. Let a random walker start from a randomly selected node. Measure and plot hs(t)i v.s. t and 2(t) v.s. t.

    1. Measure the degree distribution of the nodes reached at the end of the random walk on this network. How does it compare with the degree distribution of the graph?

    1. Repeat 2(b) for preferential attachment networks with 100 and 10000 nodes, and m = 1. Compare the results and explain qualitatively. Does the diameter of the network play a role?

  1. PageRank

The PageRank algorithm, as used by the Google search engine, exploits the linkage struc-ture of the web to compute global \importance” scores that can be used to in uence the ranking of search results. Here, we use random walk to simulate PageRank.

    1. We are going to create a directed random network with 1000 nodes, using the pref-erential attachment model. Note that in a directed preferential attachment network, the out-degree of every node is m, while the in-degrees follow a power law distribution.

One problem of performing random walk in such a network is that, the very rst node will have no outbounding edges, and be a \black hole” which a random walker can never \escape” from. To address that, let’s generate another 1000-node random network with preferential attachment model, and merge the two networks by adding the edges of the second graph to the rst graph with a shu ing of the indices of the nodes. For example,

Create such a network using m = 4. Measure the probability that the walker visits each node. Is this probability related to the degree of the nodes?

Hint Useful function(s): as_edgelist , sample , permute , add_edges

    1. In all previous questions, we didn’t have any teleportation. Now, we use a telepor-tation probability of = 0:15. By performing random walks on the network created in 3(a), measure the probability that the walker visits each node. Is this probability related to the degree of the node?

  1. Personalized PageRank

While the use of PageRank has proven very e ective, the web’s rapid growth in size and diversity drives an increasing demand for greater exibility in ranking. Ideally, each user should be able to de ne their own notion of importance for each individual query.

    1. Suppose you have your own notion of importance. Your interest in a node is propor-tional to the node’s PageRank, because you totally rely upon Google to decide which website to visit (assume that these nodes represent websites). Again, use random walk on network generated in question 3 to simulate this personalized PageRank. Here the teleportation probability to each node is proportional to its PageRank (as opposed to the regular PageRank, where at teleportation, the chance of visiting all nodes are the same and equal to N1 ). Again, let the teleportation probability be equal to = 0:15. Compare the results with 3(a).

    1. Find two nodes in the network with median PageRanks. Repeat part 4(a) if tele-portations land only on those two nodes (with probabilities 1/2, 1/2). How are the PageRank values a ected?

    1. More or less, 4(b) is what happens in the real world, in that a user browsing the web only teleports to a set of trusted web pages. However, this is against the assumption of normal PageRank, where we assume that people’s interest in all nodes are the same. Can you take into account the e ect of this self-reinforcement and adjust the PageRank equation?

Final Remarks

The following functions from igraph library are useful for this project:

degree, degree.distribution, diameter, clusters, vcount, ecount

random.graph.game, barabasi.game, aging.prefatt.game, degree.sequence.game page rank

For part 2 of the project, you can start o with the Jupyter notebook provided to you.

6

Project 1 Random Graphs and Random Walks
$24.99 $18.99