Network Science - Node Strength, Centrality and Communicability
December 27th, 2020
Overview🔗
H-degrees are a fast, edge-based distribution measure in weighted networks that capture more of it’s structural properties than node degree or edge weights, and can make distinctions that node degrees and edge weights individually cannot. H-degrees have applications in any discipline and can be applied to any weighted network. H-degree is also used to create the communication centrality, which, unlike many other network centralities, can be applied to networks that do not have geodesic flows.
This post will also cover methods based on the H-degree that can be used to measure and compare networks by their distribution of edges and edge weights.
In this post I will cover the following measures:
- Lobby Index
- w-Lobby Index
- Lobby Gain
- l-core
- G-index
- H-degree
- In-H-degree
- Out-H-degree
- H-degree centrality
- H-centralization
- H-ratio index
- Communication Centrality
- \(C_c\) -centrality
- c-core nodes
- c-centralization
Lobby Index🔗
The first measure of node strength was the Hirsch Index for measuring the output of an academic scholar in terms of publications and citations. This metric inspires all of the metrics that are in this post and their calculations are based on those of the Hirsch Index.
A scientist has index \( h \) if \( h \) of his or her \( N \) papers have at least \( h \) citations each and the other \( N - h \) papers have \( h \) or fewer citations each
The following _h_index_helper
function is a function that computes the h-index on a given iterable of values, and will be used throughout this post to significantly shorten implementations of h-index-like metrics.
Here is an implementation of the H-index:
This measure was later the inspiration for the lobby index, which is a generalization of the Hirsch index to be applied to any weighted or unweighted network. Formally, the lobby index is defined as
The l-index or **lobby index** of a node \( n \) is the largest integer \( k \) such that \( n \) has at least \( k \) neighbors with a degree of at least \( k \)... $$ l(n) = \max{\{ k: deg(y_k) \geq k \}} $$ such that all \( y_k \) are neighbors of node \( n \)
The lobby index is called a lobby index because it is a quantified indicator for how strong a node’s lobbying/influence is in a network both in terms of number of neighbors a node has and how highly connected their neighbors are. So, to put it simply, the lobby index is based on the degree distribution of a node’s direct neighbors. This makes the lobby index useful for identifying strong/weak nodes based on the degree distribution of it’s neighbors, such as in communication networks and social networks, or political influence networks.
It is clear that a person has strong lobby power, the ability to influence people’s opinions, if he or she has many highly connected neighbors. This is exactly the aim of a lobbyist or a diplomat. The diplomat’s goal is to have strong influence on the community while keeping the number of his connections (which have a cost) low. If \( x \) has a high lobby index, then the l-core \( L(x) \) (those neighbors which provide the index) has high connectivity.
l-core nodes are the neighbor nodes that contribute to the lobby index of a node. The author also provides a metric called lobby gain which shows how the access to the network is multiplied using a link to the l-core nodes
. The lobby gain can be determined using the following equation:
where ( deg_2^L(x) ) is the number of second neighbors reachable via the l-core nodes of ( x ). These formulas quantify the effects of a neighborhood’s degree distribution on a node’s ability to access the rest of the network. The lobby gain quantifies a node’s accessibility to the network through it’s most important connections and makes the connections of different nodes comparable in terms of connectivity. Between the lobby index and the lobby gain, developers can identify the high-degree neighbors of a node and quantify the access that they provide to the network.
The lobby-index has a moderate correlation with closeness, betweenness and eigenvector centrality. The distribution of the lobby index can be approximated to be the following in a scale-free network:
\[P( l(x) \geq k) \eqsim k^{-\alpha(\alpha + 1)}\]where ( \alpha ) is the tail exponent of the degree distribution. If a node has many neighbors with high-degree it has a high lobby index. But how so we know if a given lobby-index is good? The author of the original paper gives the reader the following equation to evaluate a lobby index in a scale-free network:
\[l(x) \approx c deg(x)^{1 \over{\alpha + 1}}\]where ( c ) is a constant.
An alternative to the lobby index in weighted graphs is the weighted lobby index, or w-lobby index. The w-lobby index accounts for the number of connections a node has, and the strength of relationships that those neighbors have via their edge weights. The strength of relationships a node has with it’s neighbors is called it’s node strength. Node strength is the sum of the weights of edges to/from a node.
the w-lobby index (weighted network lobby index) of node \( n \), is defined as the largest integer \( k \) such that node \( n \) has at least \( k \) neighbors with node strength at least \( k \).
I have been able to find remarkably little research on the w-lobby index, so I won’t dedicate anymore time/space to this measure.
The following code implements the node_strength
, lobby index
, and l-core
calculations.
Here is an example of how to use the lobby_index
function
Update: C-index and G-index🔗
In 2006, Leo Egghe proposed the G-index as an improvement to the h-index was proposed which makes highly connected neighbors of a node benefit increase the index of the node. This makes it a more fair approach than the h-index. However, the g-index becomes saturated when the average number of edges for all nodes exceeds the total number of nodes, which makes it unsuitable for situations where that can occur.
H-Degree🔗
The H-index and lobby index inspired the creation of the H-degree in 2011, which is based on the relationships of a node with its neighbors rather than on the relationships of the node’s neighbors. The H-degree is a measure based on the distribution of edge weights of a given node with it’s neighbors and can be applied to both undirected and directed graphs. The h-degree of a node increases as the node’s degree increases and as the weights of the node’s edges increase.
The h-degree (\( d_h \)) of node \( n \) in a weighted network is equal to \( d_h(n) \) if \( d_h(n) \) is the largest natural number such that \( n \) has at least \( d_h(n) \) links each with strength at least equal to \( d_h(n) \)
The H-degree can be used in communications network to measure the strength and breadth of a node’s relationships with it’s direct neighbors based purely on the strength of their connecting edges.
Directed H-Degree🔗
In a directed graph, the H-degree can computed for in/out degrees, and is called the In-h-Degree and Out-h-Degree.
**Definition 1**. In a directed weighted network, the In-h-degree (\( h_I \)) of Node \( n \) is equal to \( h_I(n) \) if \( h_I(n) \) is the largest natural number such that \( n \) has at least \( h_I(n) \) In-links each with strength at least equal to \( h_I(n) \). **Definition 2**. In a directed weighted network, the Out-h-degree (\( h_O \)) of Node \( n \) is equal to \( h_O(n) \) if \( h_O(n) \) is the largest natural number such that \( n \) has at least \( h_O(n) \) Out-links each with strength at least equal to \( h_O(n) \).
Here is an example of how to use the hdegree
function
These directed H-degree measures adds a directional component to the H-degree and opens the door to comparing the in/out H-degree measures using the ( H_{dif} ).
**Definition 5.** In a directed weighted network with \( N \) nodes ( \( N > 1 \) ), the h-difference \( (h_{dif}(n) ) \) of node \( n \) is defined as: $$ h_{dif}(n) = h_I(n) − h_O(n) $$
The H-difference can be used to measure the difference between the input to a node, and the output from a node. In a communications network, where nodes represent users and edges represent communications from one user to another, and the edge weight represents the frequency with which they communicate with them. The users who have sent messages to the given user are the input nodes, and the output nodes are those users who the given. If a user has a high in-H-degree and a low out-H-degree, or a negative h-difference, then the user is an information consumer, receiving more communications from more users, and sending fewer emails to fewer users. If a user has a low in-H-degree and a high-H-degree, or a positive h-difference, then they are an information producer, receiving fewer communications from less people, and sending more communications to more users.
H-degree Centrality🔗
H-degree can be used to create a node centrality based on the total number of nodes in the graph:
**Definition 2** In a weighted network with \( N \) nodes, the h-centrality, \( C_h \), of node \( n \) is defined as: $$ C_h(n) = \Large{ d_h(n) \over N − 1} $$ where \( d_h(n) \) is the h-degree of node \( n \). h-Centrality is just a normalized form of the h-degree.
Directed forms of the H-degree centrality are also offered.
**Definition 3** In a directed weighted network with \( N \) nodes ( \( N > 1 \)), the In-h-centrality ( \(C_{hI}(n) \) ) of node \( n \) is defined as: $$ C_{hI}(n) = \Large{ h_I(n) \over N - 1 } $$ **Definition 4** In a directed weighted network with \( N \) nodes ( \( N > 1 \)), the Out-h-centrality ( \(C_{hO}(n) \) ) of node \( n \) is defined as: $$ C_{hO}(n) = \Large{ h_O(n) \over N - 1 } $$
Since both of the directed H-centralities are normalized by the number of nodes in the graph, the In/Out-H centralities of nodes in different graphs, or the same graph over time are comparable. The directed H-degree centralities are subject to the following constraints, which can be used as minimum and maximum values:
**Proposition 1.** In a directed weighted network with \( N \) nodes (\( N > 1 \)), for a non-isolated node \( n \), the following inequalities always hold: $$ 0 \leq C_{hI}(n) \leq \Large{ N_I(n) \over N - 1 } \leq 1 $$ $$ 0 \leq C_{hO}(n) \leq \Large{ N_O(n) \over N - 1 } \le 1 $$
where ( N_I(n) ) denotes the numbers of node ( n )’s neighbor nodes which maintain In-links with node ( n )
.
**Proposition 2.** In a directed weighted network with \( N \) nodes (\( N > 1\)), for a non-isolated node \( n \), the following inequalities always hold: $$ C_{hI}(n) \leq \Large{ \sqrt{C_I(n)} \over N - 1 } $$ $$ C_{hO}(n) \leq \Large{ \sqrt{C_O(n)} \over N - 1 } $$ where the \( C_I(n) \) and \( C_O(n) \) denote the in-degree centrality and out-degree centrality, respectively.
Here is an example of how to use the hdegree_centrality
function
H-Centralization🔗
The H-degree allows for developers to quantify the edge weight distribution of network in graphs using the H-centralization ( C_h ). If the H-centralization of a network is high, a subset of the nodes in the network have higher degree and higher edge weights than the rest of the network. If the H-centralization is low, then the edges and edge weight are distributed more evenly between all the nodes in the network.
**Definition 3** In a weighted network \( G \) with \( N \) nodes, the h-centralization, $$C_h(G) \) of this network is $$ C_h(G) = \Large{ \sum_{i=1}^N [\text{MAX}(G) - d_h(n_i)] \over (N - 1)(N - 2) } $$ where \( d_h(n) \) is the h-degree of node \( n \). h-Centrality is just a normalized form of the h-degree.
The directed H-centralization measures for directed networks quantify the distribution of In/Out-edges and the edge-weights in In/Out edges. The equations for calculating these measures are very similar to the undirected H-centralization and are quoted from the original paper below:
**Definition 6.** In a directed weighted network with \( N \) nodes (\( N > 1 \)), the In-h-centralization (\( C_{hI}(G) \)) of this network is defined as: $$ C_{hI}(G) = \Large{ \sum_{i=1}^N [\text{MAX}_I(G) - d_{hI}(n_i)] \over (N - 1)^2 } $$ **Definition 7.** In a directed weighted network with \( N \) nodes (\( N > 1 \)), the Out-h-centralization (\( C_{hO}(G) \)) of this network is defined as: $$ C_{hO}(G) = \Large{ \sum_{i=1}^N [\text{MAX}_O(G) - d_{hO}(n_i)] \over (N - 1)^2 } $$ where the \( C_I(n) \) and \( C_O(n) \) denote the in-degree centrality and out-degree centrality, respectively.
These H-centralization measures allow developers to quantify and compare the degree of total centralization, and quantify the in/out centralizations within networks as a method to quantify the differences in distributions of degrees and edge weights between the nodes in networks.
The hdegree_centralization
function can be used as follows:
Communication Centrality🔗
The H-degree eventually led to the development of the communication centrality , which bases the index of a node on the relationships of it’s neighbors weighted by the node’s relationship with it’s neighbor. Simply, this measure recognizes that having well-connected neighbors is not valuable to a node, unless they have strong relationships to those neighbors.
**Definition 1** (Communication Centrality). The communication centrality \( c(x) \) of node \( x \) is the largest integer \( k \) such that the node \( x \) has at least \( k \) neighbor nodes satisfying the product of each node’s h-degree and the weight of the edge linked with node \( x \) is no fewer than \( k \).
In the context of a communications network, weights can be thought of as an indicator of the ease of communication, or cheapness of communication costs between two nodes. The degree of a node indicates how many communication channels a node has with other nodes in the network. Together, these indicate the ease of which a information can flow to/from a node and it’s neighbors in a network.
Communication centrality does not make assumptions about how information flows/diffuses through a graph. This is an important attribute, as how information, goods, etc. flow throughout a graph is a key component to whether a centrality measure can be meaningfully applied. For example, in a computer network, packets always take the shortest path between their source and destination and may not replicated, making closeness centrality and betweenness centrality optimal measures as they are calculated based on the shortest path between nodes. However, gossip, another valid form of communication, is spread randomly, can be spread by multiple gossipers simultaneously and has no target destination, so closeness and betweenness centrality would not be effective as they do not accurately assume how information flows in a network of gossip.
Communication centrality of a node can be normalized based on the size of the network, making communication centrality comparable across networks. The normalized form of communication centrality is denoted as ( C_c )-centrality.
In a weighted network with \( N \) nodes, the \( c_c \)-centrality, \( C_c \), of node \( x \) is defined as: \( C_c (x) = \Large{ c(x) \over (N − 1) } \), here \( N − 1 \) is the maximum value attained by communication centrality \( c(x) \).
The communication_centrality
function can be used as follows:
As with H-degree, communication centrality offers a method for measuring the overall distribution of communication ability/costs using a ( C_c )-centralization function. As with H-centralization, a high ( C_c )-centralization value indicates that a subset of nodes are highly communicable with low communication costs in the network, and a low ( C_c )-centralization value indicates that most of the nodes are equally communicable and have similar communication costs.
**Definition 4**. In a weighted network \( G \) with \( N \) nodes, the \( C_c \) -centralization, \( C_c( G ) \) of this network is $$ C_c(G) = \Large{ \sum_{i = 1}^N (\max(G) - c(x_i) \over (N - 1)(N - 2) } $$ Here \( c( x_i ) \) represents the communication centrality of node \( x_i \). The denominator of Eq. (1) is obtained as follows. The largest possible value for \( \max(G) \) is \( N − 1 \), which can be reached in a star network. Then, there are \( N − 1 \) nodes with communication centrality equal to 1, and hence a difference with the largest value of \( N − 2 \), leading to a denominator equal to \( ( N − 1 )( N − 2 ) \).
In the context of a telecommunications network, the ( C_c(G) ) indicates how dependent a network is on specific infrastructure nodes in the network. In a network with a high ( C_c(G) ) value, the network has a subset of instrastructure nodes that have a high number of connection with other network components, and have a large volume of traffic. If any of those susbset nodes were to fail, it would significantly damage the network and cause a significant impact on the service to customers. So the ( C_c(G) ) metric could be used to as a metric for evaluating the reliability of the structure of a network based on the distribution of services. While weighted betweenness centrality could also be used to identify bottlenecks, the ( C_c(G) ) metric is many orders of magnitude faster and is moderately correlated with betweenness centrality.
The c_centralization
function can be used as follows:
Summary🔗
I’ve presented a couple measures for measuring the connectivity of a node and the distributions of edges and edge weights within a network.
H-degree measures the connectivity of a node based on it’s relationships (edges & edge weights) with it’s neighbors. Communication centrality measures connectivity of a node based on it’s edge weight with it’s neighbors and the H-degree of those neighbors. These measures are moderately correlated as H-degree is used when calculating the communication centrality.
You may not be sure how to decide when to use H-degree or communication centrality. If you are only concerned with thes edges/weights of a node, you should use the H-degree. If you are concerned with the neighbors as well, you should use the communication centrality.
H-centralitization and communication centralization are graph level measures, that indicates the distribution of H-degree and communication centralities between the nodes in the graph. If the centralization is higher, the distribution is more unequal, whereas if it is lower, the distribution is more equal.
But sometimes more concrete examples can be more helpful, so here are some applications of the centrality and centralization measures:
-
Communication Network A directed graph where each node is an email address, and edges indicate emails sent/received by a user, and the edge weight indicated the frequency of emails. Alternatives could be telecommunications (Zoom/Microsoft Teams calls), computer networks, etc.
- H-degree - Nodes with high H-degree are involved in communications with many other users and are communicated with often.
- Directed H-degree - Nodes with a high In-H-degree are users who are contacted by many users and are contacted by them often. Nodes that have a high Out-H-Degree contact many other users on a very regular basis.
- H-difference - Users with a negative H-difference are producers, meaning that they initiate more communications than they receive. Users with a positive H-difference are consumers, meaning they are contacted by more people and more often than they contact others. The degree to which the H-difference is negative or positive indicates the magnitude of their role as a producer/consumer.
- H-degree centrality - A high H-degree centrality indicates that the user has sent/received more emails from a high proportion of the overall network.
- H-centralization - If the network has a high H-centralization, then a subset of the users are driving most of the communications, as consumers or producers of many communications. If the H-degree is low, each user produces/receive a more equal amount of communications.
- Communication Centrality - A high communication centrality indicates that an email address sends/receives emails from other email addresses that also send/receive many emails from many email addresses in the network.
- Communication centralization - A high communication centralization in a trade network means that some email addresses in the network send/receive much more email from other email addresses from emails that are well-connected and have high volumes of emails.
-
Trade NetworkA directed graph where each node is a entity (person/organization), and edges indicate a transfer of goods sent/received by a entity, and the edge weight indicate the sum of the transaction values of goods that have been sent/received.
- H-degree - Nodes with a high H-degree trade with many other entities with large total transaction values.
- Directed H-degree - A high In-H-degree indicates that an entity receives goods from many entities and pays larger transaction costs. A high Out-H-degree indicates that an entity sends goods to many entities with large transaction costs.
- H-difference - A negative H-difference indicates that an entity sends out more goods that it receives from other entities. A positive H-difference indicates that the an entity receives more goods that it sends to other entities. The magnitude of these relationships is determined by the distance of the H-difference from 0.
- H-degree centrality - A high H-degree centrality indicates that an entity has established high value trade with a high proportion of the entities in the overall network.
- H-centralization - A trade network with a high H-centralization indicates that a subset of the entities in the network dominate the trade network in terms of trading partners and in money sent/received. If the H-centralization is low, then there are fewer/no dominant players in the trade network.
- Communication Centrality - A high communication centrality indicates that an entity trades heavily with many other entities with many trading partners and large transaction values.
- Communication centralization - A high communication centralization indicate that some entities have much better access to high quality trading partners (ex. many partners, high transaction values) throughout the network, as they have strong trading partners with access to other strong trading partners.
-
Web Page NetworkA directed graph where nodes represent web pages/domains. Edges represent links from one page/domain to another. The edge weight indicates the number of links that exist.
- H-degree - Pages/domains with a high H-degree link to/are linked to by many other pages/domains in the network and are linked to many times by each page.
- Directed H-degree - Pages/domains with a high In-H-degree are linked to by many other pages/domains and each page/domain has many links to them. A high Out-H-degree indicates that the page/domain links to many other web pages/domains and links to them often.
- H-difference - An H-difference of 0 indicates that the page/domain links to other sites about as often as they link to it. A positive H-difference indicates that the page/domain is linked to more often than it links to other neighbors, which would imply that it has a strong reputation. A negative H-difference indicates that the page/domain links to other sites more often than it is linked to by others. This would imply that it does not have a strong reputation.
- H-degree centrality - A high H-degree centrality indicates that a page/domain has more links to/from a high proportion of the pages/domains in the overall network
- H-centralization - If the In-H-centralization is high, it means that a subset of pages/domains are linked to more than the rest of the pages/domains in the network. A high Out-H-centralization indicates that a subset of pages are linking to more of the pages in the network than the rest of the pages/domains in the network. A high H-centralization would mean either the In-H-centralization is high or the Out-H-centralization is high.
- Communication Centrality - A high communication centrality indicates that a page/domain has more links to/from pages that also have more links to/from them.
- Communication centralization - A high communication centralization value in the network means that some pages/domains in the network are more accessible than others, as they have many links to/from other pages/domains that also have many links.
-
Transportation NetworkA direct graph where nodes represent physical locations. Edges represent lanes of transportation directly connecting two physical locations and edge weights represent the volume of users of the transportation path. For example, airports, roads, sea lanes, etc.
- H-degree - A high H-degree indicates that a location has many high traffic paths directly to/from it.
- Directed H-degree - A high In-H-degree indicates that the location has many direct routes to it with a high volume of traffic going to the location. A high Out-H-degree indicates that the location has many routes with traffic leading away from the location.
- H-difference - A negative H-difference indicates that the location has more routes with a high volume leading away from it than towards it. A positive H-difference indicates that the location has a more routes with a high volume of traffic leading towards it. An H-difference near 0 indicates that the number of routes/traffic to/from the location is about the same, which may indicate that it is a way-point for travelers.
- H-degree centrality - A high H-degree centrality indicates that an locations has created many direct high volume routes with a high proportion of the locations in the overall network
- H-centralization - A high In-H-centralization indicates that the network has some locations that are more easily accessible and with higher volume of destination traffic. A high Out-H-centralization indicates that the network has some locations that are more highly accessible and with high traffic heading away from it.
- Communication Centrality - A high communication centrality indicates a location has many high volume routes that also have many high-volume routes.
- Communication centralization - A high communication centrality indicates that the network has more locations that are more easily accessible than others in the network due to having established, high-volume routes with locations with many high volume routes.
-
Social NetworkA directed graph where nodes represent user/organization profiles. Edges represent the strength of a relationship, which is determined by the actions a user takes on another profile (comments, likes, etc.) and edge weights represent the frequency of the occurrences.
- H-degree - A high H-degree indicates that the user has a high number of users who they interact with often or who interact with them often.
- Directed H-degree - A high In-H-degree indicates that the user is interacted with by many users and is interacted with often. A high Out-H-degree indicates that the user interacts with a large number of users and interacts with many of them very often.
- H-difference - A negative H-difference indicates that interacts with more people than they interact with. A positive H-difference indicates that a user is interacted with more than they are interact with others.
- H-degree centrality - A high H-degree centrality indicates that a user has had more interactions with a higher proportion of the users in the overall network
- H- centralization - A social network with a high In-H- centralization indicates that the network has some users that are interacted with more often than the majority of users. These would be celebrity users. A high Out-H- centralization indicates that a network has a subset of users that interact with more of the users significantly more than the rest of the users do. A user with a high H-degree would likely be considered an automated profile
- Communication Centrality - A high communication centrality indicates that user has many interactions/has been interacted with many other users who have also interacted/been interacted with by many users.
- Communication centralization - A high communication centrality indicates that the network has many users that are more easily accessible in the network due to having established relationships (many interactions) with other users with many established relationships
The research below made me want to write up this blog post, and I can attribute all the mathematics I’ve presented to them. I hope you get as much out of them as I did.
Recommended Reading🔗
- H-degree as a Basic Measure in Directed Networks
- Exploring the directed h-degree in directed weighted networks
- Centrality and Network Flow (I got a lot out of this one, as it as helped me to think much more critically about when a centrality is applicable and when it is not).
- A centrality measure for communication ability in weighted network
- The h-index