# Edit Create an Image Page: Social Networks

You have to log in to edit pages.

To create an image page, simply complete the form below and then hit the 'Save Page' button at the bottom of the form. As you complete the form, remember that one of the main goals of the Math Images Project is to provide explanations of the images on our site at various levels, so that everyone can understand some of the math behind the images. Try to complete the form as fully as possible, but remember that other users will have the opportunity to add more information to your image pages in the future. Also, please note that by contributing to the Math Images Project, you agree to comply to the guidelines as stated in our general disclaimer.

As always, thank you for your contributions! --The Math Images Project

If you need help filling out this page, please consult our Help sections: Want to Contribute and Math Resources.

Note: * Indicates a required field.

Please note: When you are filling in the below explanations, you should feel free to use standard wikitext.

 Image Title*: Upload a Math Image Friend network of a particular Facebook account. The pink indicates a "mob" of tightly interconnected friends, such as high school or college friends. The picture above may seem like an innocent model of someone's friend network on Facebook, but it reveals plenty about how modern society operates. First, it indicates that people inherently have ''preferences'' for who they choose to associate with. Taken in the context of economic transactions, these preferences can determine whether or not a firm, state, or country prospers economically. Think about it: there are infinite ''possibilities'' of associations, but people sometimes ''prefer'' not to associate. Second, the pink area (which probably represents high school friends, college friends, or coworkers) indicates something called ''clustering'', which is kind of like grouping within a group. Given the context above, grouping can be thought of as a similar preference among certain individuals. Being in groups has allowed people to protest for their rights, start a new company, take over countries, etc. The odds are more in your favor in a group if you want to achieve something. These kinds of preferences have undoubtedly shaped our world. Therefore, it is beneficial to analyze such powerful phenomenon. In other words, '''social networks''', or the patterns of connections between people, are of enough importance to study through mathematics. Social network analysis (SNA) can provide valuable information about a network by answering the following questions: *"Who is most central to a network?" *"Who has the greatest influence in a group? *"How many different clusters can you find in a network?" *"Which connections are pivotal for the functioning of a group? *"Who is Mr./Ms. Popular?" Most importantly, this analysis allows us to make more rational decisions based on data (rather than intuition) and to make predictions about the behavior of a group. For example, if we are able to determine that in the case of an epidemic, the virus is concentrated generally around a select few, there could be something done to prevent its spread. Alternatively, if we wanted to make sure an important idea would spread quickly (a pitch for a school candidate, for example) we would first target the most influential and central people. ==Centrality Measures== Perhaps the easiest and most fundamental answers to all of the the questions above revolve around the concept of '''centrality'''. The most frequently used measures of network structure, therefore, are '''centrality measures'''. These include Degree, Eigenvector, Closeness and Betweenness measures of centrality. More information about these measures is found below. Keep in mind, however, that these measures quantify characteristics of people (or whatever objects pertain to the network) rather than the network as a whole. For illustrative and simplification purposes, these explanations will be in the context of human interactions. ===Degree Centrality=== [[image:Degree_centrality_hamlet.jpg|right|thumb|200px| (Social Network of Shakespeare's ''Hamlet'', illustrating how protagonists are mathematically central to the plot, as evidenced by their degree centrality)]] Degree centrality measures how well connected a person is to a network. It does this by simply counting how many people a person is directly connected to. It seeks to measure: *The level of influence someone can establish in a community, organization, group, etc. *The opportunity to be influenced by someone in a community, organization, group, etc *How exposed someone is in a network, mostly known as the ''index of exposure'' Example: The characters in the image to the right will probably seem familiar. It is a network analyzing the ties among characters of one of William Shakespeare's famous plays, ''Hamlet.''If you are not familiar with the play, just know that Hamlet (a prince) is the protagonist and is next in line to inherit the throne of his father, King Hamlet. However, prince Hamlet's uncle, Claudius, inherits the throne. Hamlet later finds out that Claudius was responsible for killing his father. The story mostly revolves around Hamlet's quest to avenge the death of his father, which involves his relations with almost all the characters in the play. It seems like prince Hamlet is rather ''central'', right? In the simplest sense, because he has the most ties to other characters in the play (as shown in the image) he has the highest ''degree centrality.'' Thus it makes sense to conceive of him as the protagonist, or ''main'' character. ===Eigenvector Centrality=== Eigenvector centrality is a more sophisticated version of degree centrality that measures not only the number of people someone is connected to, but also the ''quality'' of the connections. For a connection to have ''quality'', in this context, it means that that connection leads to lots of other connections. It is necessary because, in an intuitive sense, connections with influential people will make you more influential than non-influential connections will. Eigenvector Centrality seeks to measure: *Who the most popular person is in the group *How well someone is connected to the well-connected, in other words, whether they know the "right people" ===Closeness Centrality=== Closeness centrality, without a surprise, measures how ''close'' a person is within a network.''Close'' in this context refers to how many "friends of friends" one would need to be related to another person. A not so close relationship, for example, would be someone in California being connected to some random person in North Dakota through 1 friend who knows another friend who knows another friend...(6 intermediate friends in total) between the person in California and the person in North Dakota. The 6 in the last example indicates the ''degrees of separation''. A person who is well connected has, on average, fewer degrees of separation to reach everyone on the network. ===Betweenness Centrality=== Betweenness centrality measures how "in between" someone is in the network. "In between" refers to how often the shortest route for information to get from one person to another in a network passes through a specific third person. A person with a high measure of betweenness usually "knows what's going on" and can act as a liaison to separate parts of the network. From now on, we will add to our understanding of social networks by looking at them as graphs in the context of interacting people. The people will now be thought of as vertices and the connections between people will be thought of as edges. For more information on the basics of graphs, which will be imperative to the mathematical understanding of social netoworks, look at the page [[Graph Theory]]. The following graph of a hypothetical social network will serve as the example for the following sections, where its centrality measures will be determined: [[image:graph_math_1.png]] If we are to do mathematical analyses, there must be one way to store information about the network as a whole. Luckily there is such a way! It is through the '''Adjacency Matrix'''. The adjacency [[Matrix|matrix]] $A=[a_{ij}]$ is defined as follows: :$a_{ij}=\left\{ \begin{array}{rcl} 1 & \mbox{for} & i, j \mbox{ adjacent}\\ 0 & \mbox{for} & i, j \mbox{ non-adjacent} \end{array}\right.$ As an example, we will determine the adjacency matrix, ''A'', of the graph above: $A= \begin{pmatrix} 0 & 1 & 0 & 0 & 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \end{pmatrix}$ For a better visual and mathematical understanding of the adjacency matrix, you can look at the following chart: [[Image:new_adjacency_matrix.png]] Now with the graph and adjacency matrix at our disposal, we are ready to find certain measures of centrality. Typically, computers are used to measure centrality, but our graph is small enough that we can show most of the procedures to find certain measures of centrality. ==Degree Centrality== Definition: The '''degree''' $k_{i}$ of a vertex ''i'' is ::$k_{i} = \sum_{j=1}^n(A)_{ij}$ in other words, it means adding all of the edges adjacent to a given vertex. Having the adjacency matrix in mind, it is like counting all of the ones in a column or row for a specific vertex. We can use Chris's row in the adjacency matrix as an example: [[image:Degree_centrality_matrix.png]] As you can see, Chris's degree centrality is: 1+1+1+1=4 Let's calculate the degree centralities of the others: Jason: 1+1+1=3 Austin: 1+1+1=3 Donald: 1 Bernie: 1 Mark: 1 David: 1 Elissa: 1 We can now conclude that Chris has the highest degree centrality (he knows the most people!) and Donald, Bernie, Mark, David and Elissa are tied for the lowest degree centrality, each only knowing one person. ==Eigenvector Centrality== (Note: This section requires a knowledge of Linear Algebra) If you can remember one thing about eigenvector centrality, it is that it's a more sophisticated version of degree centrality (counting edges from a vertex) that takes in account of the ''quality'' of the connections! How can math store information regarding relative centralities (quality) and connections toward other vertices? If you thought we could do this through [[Matrix|Matrices]] and [[Vector|Vectors]], you were right! We will utilize both in our focus on eigenvector centrality. Let $x_{i}$ be the eigenvector centrality score of a vertex ''i'' and let $A=[a_{ij}]$ be the adjacency matrix of the graph containing ''i''. The use of the adjacency matrix is important because we want to consider everyone's data, and we don't want to include ones that aren't connected to us. The adjacency matrix tells us which vertices are connected to each other and how many vertices are connected. Let $x_{j}$ be the eigenvector centrality scores of the vertices adjacent to ''i''. The eigenvector centrality score for vertex ''i'' is proportional to the sums of the scores of all vertices that are connected to it: :::$x_{i}=\frac{1}{\lambda}\sum_{j=1}^Na_{ij}x_{j}$ where $\lambda$ is a constant and ''N'' is the number of vertices in a graph. What this basically says that one person's eigenvector score depends on the eigenvector scores of its neighbors. Following the properties of matrix multiplication, the equation above can be written as: ::::$\overrightarrow{x}=\frac{1}{\lambda}A\overrightarrow{x}$ Where $overrightarrow{x}$ is the vector containing all the verticesâ€™ eigenvector centrality scores. This simplifies into the eigenvector equation: ::::$A\overrightarrow{x}=\lambda\overrightarrow{x}$ Where $\overrightarrow{x}$ is an [[Eigenvalues and Eigenvectors|eigenvector]]. This is all nice and dandy. The eigenvector centrality score of one vertex relies on the eigenvector scores of the others. ''But how would we find the eigenvector centrality score of a particular vertex if it depends on the others??'' Well the answer is that, strangely, we find them all at once! We just find the eigenvector associated with the highest eigenvalue, according to the '''Perron-Frobenious Theorem.''' The ''i''th entry of that vector will correspond to the eigenvector centrality score of ''i''th vertex. We will now find the [[Eigenvalues and Eigenvectors|eigenvector]] associated with the highest eigenvalue, and thus the vector of eigenvector centralities, of our hypothetical network: [[image:Eigenvector_Centrality_2.png]] With $\lambda=2.50$ Let's test out one of the formulas, $x_{i}=\frac{1}{\lambda}\sum_{j=1}^Na_{ij}x_{j}$, to Jason. $x_{1}=\frac{1}{2.50}*(2+2.65+2)=2.65$, which matches Jason's eigenvector score. We can clearly see here that even though Chris had a higher degree centrality than Jason, Jason and Chris had the ''same'' eigenvector centrality. Eigenvector centrality was created for reasons like these, to determine someone's importance in a network by looking more than the amount of connections. In many settings,''who'' you are connected to is more important than how many people you are connected to. ==Closeness and Betweenness Centrality== Background Definitions: *A '''path''' is a sequence of vertices in which each vertex is adjacent to the next. We can think of this as a sequence of vertices that include or start from a vertex ''i''. More information can be found in [[Graph Theory]] *A '''geodesic path''' is the shortest path between a pair of vertices. '''Closeness Centrality''' of a vertex $i$ is related to the mean geodesic path length between ''i'' and every other vertex in the graph. Intuitively, if mean geodesic path length is low, we say ''i'' is ''closely located'' to every other vertex. Notice that a low number in this case indicates a higher centrality score. Because this may be confusing, the closeness centrality of a vertex ''i''is typically represented as the reciprocal of the average geodesic path length between i and every other vertex in the graph, as follows: ::$c_{j}=\frac{n-1}{\sum_{j=1}^ng_{ij}}$ where ''n'' is the number of vertices in the graph and $g_{ij}$ is the length of the geodesic path between vertex ''i'' to vertex ''j''. We will use our example graph once more to illustrate the shortest paths for certain individuals (the rest will be calculated and left to be verified by the reader): Let's look at the shortest path from Donald to Elissa: [[image:Donald_centrality.png]] We can count the number of edges required to get to the other vertex (person). To get to Elissa, Donald's geodesic path has a length of 1+1+1+1=4. We hope that the shortest path seems is relatively easy to find given the simplicity of the model. For most real and complicated models, computers calculate these shortest paths with a specific algorithm, '''Dijksta's Algorithm''', mentioned below. Now to see an example in which someone may be "close" within a network, which is to have high closeness centrality. Let's look at some of Jason's geodesic paths to other people: [[image:Jason_centrality.png]] We see that to get to the two most distant people, Donald and Elissa, Jason only has to "walk" along two edges. In other words, Jason has 2 degrees of separation to Donald and to Elissa. Again, to have high closeness centrality, one have must the lowest mean degree of separation to ''everyone'' in the network. Things are looking good for Jason so far. Let's now compute the closeness centralities for everyone. The entries of the following chart indicate the length of the geodesic path for each possible pairing: [[image: closeness_measures_2.png|thumb|800px]] You can check for your own answers for the geodesic path/closeness centrality. It looks like Jason came in first with a closeness centrality score of 0.63 and Donald, Bernie, David and Elissa tied for last with a closeness centrality score of 0.39. Jason's high score makes sense because visually, he is centrally located. The tie between Donald, Bernie, David and Elissa makes sense because they are directly connected to one other person in the system, and thus have to travel "farther" (meaning having to go through more "friends of friends") to reach someone in the network. '''Betweenness Centrality''' of a vertex ''i'' is the fraction of geodesic paths within the graph that include ''i''. In other words, it assembles every pair of vertices and computes the geodesic paths between them, then it determines the fraction of those paths that include ''i''. Mathematically, $b_{i}=\sum_{j,k}^n\frac{\gamma_{jk}(i)}{\gamma_{jk}}\times\frac{2}{(n-1)(n-2)}$ Where ''j'', ''k'' and ''i'' are vertices in the graph such that $i\neq j\neq k$, ''n'' is the total number of vertices in the graph, $\gamma_{jk}$ is the number of geodesic paths between ''j'' and ''k'', and $\gamma_{jk}(i)$ is the number of those paths that pass through ''i''. Again, we will use our example network to find betweenness. The following picture illustrates how betweenness is calculated (although not completely) for Chris and Austin in the context of paths between Jason and Elissa, Donald and Mark: [[image:betweenness_centrality.png]] As you can see, shortest paths consisting of only one traversed edge (they are first degree friends), won't have any "middle men", or intermediaries. These entries of 1 contribute a value of zero to betweenness centrality but will be counted regardless because they still represent a geodesic path of length one. Notice that the formula has ''i'', ''j'', ''k'' not equal to each other. So when we calculate the betweenness centrality of i, we don't count any geodesic paths that start or end on i, but we count all the others. What matters most are the entries in the closeness centrality matrix that are greater than one, because those entries suggest that someone had to know an intermediary (or several) to be associated with a friend. In the next image, the closeness centrality matrix will be modified in such a way that highlights the intermediaries for each possible geodesic path. For every path, the intermediary's (if any) initial will replace the numerical entry. If there are several intermediaries, then they will each have their own initial in the entry. The 0 entries of the former closeness centrality matrix in will be blank in the betweenness centrality matrix. The 1 entries in the former centrality matrix (the geodesic paths that were length 1) will have an asterisk because they are still counted, but are counted as zero. In addition, we will not count the geodesic path between vertex ''i'' and ''j'' twice. In other words, the geodesic path from ''i'' to ''j'' and the geodesic path from ''j'' to ''i''will only be recorded once, not twice. Therefore, we will only see the upper bound triangle of the matrix (above the diagonal). For our example, the total number of connections is $\frac{(n-1)(n-2)}{2}=\frac{(8-1)(8-2)}{2}=\frac{(n-1)(n-2)}{2}$, or 21. The two in the divisor comes from the fact that we are not counting pairs. This graph also gives an example for excluding the row and column of a person (Donald) when computing betweenness centrality. [[image:betweenness_centrality_2.2.png]] With this image, we can now calculate everyone's betweenness centrality! All we do is add a person's "interruptions", or times they are on a geodesic path, and divide by the total amount of geodesic paths, 21. [[image:betweenness_centrality_3.1.png]] This result makes intuitive sense. Jason has highest betweenness centrality because of his central location in the network diagram. Donald, Bernie, Mark, David and Elissa all have low betweenness centrality because they are located in the periphery of the network. ===Dijkstra's Algorithm=== {{Hide|1= For a visual demonstration of Dijkstra's Algorithm, you can [http://optlab-server.sce.carleton.ca/POAnimations2007/DijkstrasAlgo.html see here] }} ==Bernoulli Graph and Configuration Models== Previously, we were establishing measures concerning the properties of nodes, or people. We will now focus our attention on network properties. Our previous analyses consisted of taking measurements and analyzing them, now we are taking a step further to model real life situations knowing only very little of the network. Before we begin, here are some definitions: *'''Degree Sequence''': The degree sequence for the graph $G=(V,E), V=\{v_{1},v_{2}...v_{n}\}$ is $\{\text{deg}(v_{a}), \text{deg}(v_{b})...\text{deg}(v_{n}) Algebra Analysis Calculus Dynamic Systems Fractals Geometry Graph Theory Number Theory Polyhedra Probability Topology Other None Algebra Analysis Calculus Dynamic Systems Fractals Geometry Graph Theory Number Theory Polyhedra Probability Topology Other None Algebra Analysis Calculus Dynamic Systems Fractals Geometry Graph Theory Number Theory Polyhedra Probability Topology Other Social networks are definitely applicable to the real world. Why? Because they ''model'' real world activity! Think of the possibilities for social growth once we have with these analytic tools at our disposal. The following examples illustrate the power of Social Network Analysis (SNA). ==Epidemics== As hinted at throughout the page, Social Network Analysis can help prevent the spread of an epidemic. It makes sense that SNA can prevent it because an epidemic is nothing more than the interaction of humans, which we can analyze! The basic idea is to target the people that are the most central, in either degree, eigenvector, closeness, or betweenness centrality (or combination thereof). The picture to the right portrays epidemics in a global sense: [[image:epidemic_world.jpg|thumb|right|300px|Picture illustrating the spread of an epidemic. Yellow/orange areas indicate high levels of contagion]] Graphically, there is a short window of time in which the epidemic can be stopped or mitigated. In a '''response graph''', for this example, the incidence of contagion is graphed against time. If we make two graphs on the same plot, one with a sample of central people, and the other with a random sample, that graph would look like the picture below (with Influenza as the virus): [[image: epidemic_graph.jpg|thumb|center|400px|Response graph for influenza illustrating the window of time necessary for prevention]] With this information of time, we can define the window of time necessary to mitigate epidemics as the difference in time between the random graph (not to be confused with our discussion of Bernoulli random graphs) and the central friend graph. These findings make intuitive sense. Clearly, the people who are most central to the network will have a high index of exposure and would thus catch our example virus, the flu, more quickly than a random person would. We can use that fact to our advantage to know what time there is left and to take direct action toward the people who are most central. ==Google PageRank== Why does almost every first entry of a google search contain a link to a Wikipedia article? Well it related to SNA, and more specifically, it's a method that uses a variant of eigenvector centrality! The PageRank for a certain page ''i'' relates the PageRank of the pages that it links to. They all depend on each other! Just as in the case for eigenvector centrality. As a reminder: The ''i''th node eigenvector centrality score corresponded to the ''i''th entry of the eigenvector from the adjacency matrix that was attributed to the greatest positive eigenvalue. The key differences between the PageRank calculation and regular eigenvector centrality calculation. *The entries make a stochastic matrix, meaning that the entries are normalized so that the sum of entries of certain column ''j'' equals 1. *The entries are defined as [itex]\frac{1}{L(p{j})}$, where $L(p{j})$ is the number of outbound links from ''j'' if ''i'' is connected to ''j'', and 0 otherwise. *There is a damping factor, ''d'', empirically estimated to be 0.85, that is the probability of a random surfer to keep surfing. This is like the value of ''r'' in the reproduction number section. *The PageRank value of a page is the chance that a random surfer will land on that page by clicking on a link. The formula is as follows: $PR(p_{i})=\frac{1-d}{N}+d\sum_{p_{j}\in{M(p_{i})}}\frac{PR(p_{j})}{L{p_{j}}}$ where $PR(p_{i})$ is the PageRank of a page ''i'', $p_{1}...p(N)$ are the considered pages, $M(p_{i})$ is the set of pages that link to $P_{i}$, $L(p_{i})$ is the number of outbound links on page $p_{i}$, and ''N'' is the total number of pages. The PageRank values are the entries in the dominant eigenvector of the adjacency matrix defined earlier. You might be thinking, "how does this relate to the world?" Well it might not save millions of lives like preventing an epidemic can, but it facilitates the speed and relevance of finding information online! You also encounter this every time you make a Google search, and that is pretty interesting. Image for Hamlet's Network is found here:http://magmods.wordpress.com/2011/08/18/how-to-do-things-with-networks-a-response-to-franco-moretti/ http://www.mpg.de/4406928/Travelling_epidemics Yes, it is.