Announcing the Final Examination of Mr. James Scanlon for the degree of Doctor of Philosophy
Date: November 5, 2007
Time: 1:00 pm
Room: HEC 356 (EECS Conference Room)
Dissertation title: Identifying Communities in Complex Networks: A local Approach
Abstract
Identification and discovery of communities has long been recognized as an important problem in the social sciences. In recent years, researchers have realized that a wide variety of other complex networks share some structural characteristics that are reminiscent of social networks. As a result, the nature and identification of community structures in complex networks is an active area of current research. A new definition for communities in complex networks is proposed and a new method for their identification is presented. Given a node within a network, the proposed method finds all communities within a bounded distance of that node thus detecting all communities to which the node may belong.
A review of complex networks and some of the properties that they have been found to share is provided. Then, random graph models used to generate synthetic complex networks are discussed. In addition, existing definitions and algorithms for communities in complex networks are reviewed.
A definition of community is proposed where a community is defined as a subgraph having maximum average degree among all of its subgraphs or containing graphs. A local view of this definition is provided by considering its restriction to those subgraphs induced by nodes within a bounded distance from a given node. This definition is distinguished by both its local nature and its allowance of overlapping communities. A process is provided for the identification of subgraphs that approximately meet the proposed definition. Letting H be a subgraph in the vicinity of the node, it is shown that that the time complexity of finding a community with the proposed method is O(mH2) where mH is the number of edges in H.
A new algorithm for the production of synthetic networks having known community structure is proposed. This algorithm produces networks having the the power-law degree distribution that is a characteristic of real, complex networks. Using synthetic networks created with this algorithm the results of the community identification process proposed are contrasted with a well-regarded divisive method.
The effectiveness of the proposed method is illustrated by identifying communities within citation graphs. It is shown that the proposed method provides a better prediction of article similarity than does author-supplied classification.
The proposed method is applied to subgraphs obtained from the world wide web. Having more than two million nodes and eighty million edges, these subgraphs demonstrate the scalability of the method. The method finds communities in almost all cases. The nature of the discovered communities is discussed, as well as some of the unique challenges posed in community detection in the world-wide-web.
Outline of Studies:
Major: Computer Science
Educational Career:
B.S., 1975, Massachusetts Institute of Technology, Cambridge, Massachusetts
Committee in Charge:
Dr. Narsingh Deo, Chair
Dr. Charles E. Hughes
Dr. Ronald Dutton
Dr. Liqiang Ni
Approved for distribution by Dr. Narsingh Deo, Committee Chair, on October 25, 2007.
The public is welcome to attend.
The public is welcome to attend.
|