Background:
Social media networks have experienced a meteoric rise in popularity. Information spread much faster in social networks than any other communication method as of this writing. This ease of information dissemination can be a double-edged sword, however, as social networks can also be used to spread rumors or computer malware. In such circumstances, detecting and determining the source of rumors or misinformation in a social network becomes valuable as a part of an affected party's damage control.
One potential source of information/misinformation may be a result of a node with a high degree of centrality (e.g., a node with a large number of friends on Facebook). This, however, is unlikely, because, in general, every node in a social network has the potential to spread information/misinformation.
It may be possible to use information from a snapshot of infected nodes to identify the source of information/misinformation. This requires the assumption that all nodes in the network monitor and report their status, which is not practical in large-scale social networks. Furthermore, this assumes that the underlying social graph is a regular tree. In general, however, an underlying social graph can be any type of graph.
It may also be possible to use a subset of nodes (called sensors) in the social network to find the source of information/misinformation. The foregoing methods require a large number of nodes in the network to act as sensors which is generally impractical. Furthermore, these methods do not consider the varying inter-node relationship strengths.
Summary:
In view of the foregoing background, a system and method of detecting a source of a rumor in a social media network is disclosed. The system and method involves identifying a plurality of node clusters in the network, each of the plurality of node clusters including a plurality of nodes, each of the plurality of nodes from each of the plurality of node clusters having at least one edge connection defined by a connection to a different node from a same one of the node clusters; identifying a plurality of gateway nodes from each of the plurality of node clusters, each gateway node from each of the plurality of node clusters as having at least one weak tie connection with a corresponding gateway node from a different one of the plurality of node clusters; selecting a subset of the plurality of gateway nodes as sensor nodes; measuring arrival times of the information from a source node at each of the sensor nodes; selecting a candidate node cluster from the plurality of node clusters based on high betweenness centrality, the candidate node cluster having a high probability of including the source node from among its corresponding plurality of nodes; selecting a subset of the plurality of nodes in the candidate cluster as candidate sensor nodes; measuring arrival times of the information from a source node at each of the candidate sensor nodes; and selecting a candidate node from the candidate cluster based on high betweenness centrality, the candidate node having a high probability of being the source node.
FY15-018
Information Assurance/Cyber Security
Alireza Louni Koduvayur Subbalakshmi
David Zimmerman Director of Technology Commercialization Stevens Institute of Technology dzimmer3@stevens.edu