Oral Defense of Doctoral Dissertation – Computational Sciences and Informatics – Network Neighborhood Analysis for Detecting Anomalies in Time Series of Graphs – Suchismita Goswami

Notice and Invitation

Oral Defense of Doctoral Dissertation

Doctor of Philosophy in Computational Sciences and Informatics

Department of Computational and Data Sciences

College of Science

George Mason University

Suchismita Goswami

BNUS, University of Calcutta, 1990

Master of Science, State University of New York, Stony Brook, 2001

Master of Science, George Mason University, 2013


Tuesday, April 2, 2019, 11:00 a.m.

Research Hall, Room 162

All are invited to attend.


Igor Griva, Chair

Edward Wegman, Dissertation Director Jeff Solka

Dhafter Marzougui

Around terabytes of unstructured electronic data are generated every day from twitter networks, scientific collaborations, organizational emails, telephone calls and websites. Excessive communications in such social networks continue to be a major problem. In some cases, for example, Enron e-mails, frequent contact or excessive activities on interconnected networks lead to fraudulent activities. In a social network, anomalies can occur as a result of abrupt changes in the interactions among a group of individuals. Analyzing such changes in a social network is thus important to understand the behavior of individuals in a subregion of a network. The motivation of this dissertation work is to investigate the excessive communications or anomalies and make inferences about the dynamic subnetworks. Here I present three major contributions of this research work to detect anomalies of dynamic networks obtained from inter-organizational emails.

I develop a two-step scan process to detect the excessive activities by invoking the maximum log-likelihood ratio as a scan statistic with overlapping and variable window sizes to rank the clusters. The initial step is to determine the structural stability of the time series and perform differencing and de-seasonalizing operations to make the time series stationary, and obtain a primary cluster with a Poisson process model. I then construct neighborhood ego subnetworks around the observed primary cluster to obtain more refined cluster by invoking the graph invariant betweenness as the locality statistic using the binomial model. I demonstrate that the two-step scan statistics algorithm is more scalable in detecting excessive activities in large dynamic social networks.

I implement the multivariate time series models for the first time to detect a group of influential people that are associated with excessive communications, which cannot be assessed using scan statistics models. I employ here a vector auto regressive (VAR) model of time series of subgraphs, constructed using the graph edit distance, as the nodes or vertices of the subgraphs are interrelated. Anomalies are assessed using the residual thresholds greater than three times the standard deviation obtained from fitted time series models.

Finally, I devise a new method of detecting excessive topic activities from the unstructured text obtained from e-mail contents by combining probabilistic topic modeling and scan statistics algorithms. Initially, I investigate the major topic discussed using the latent Dirichlet allocation (LDA) modeling, and apply scan statistics to get excessive topic activities using the largest log-likelihood ratio in the neighborhood of primary cluster.

These processes provide new ways of detecting the excessive communications and topic flow through the influential vertices in dynamic networks, and can be employed in other dynamic social networks to critically investigate excessive activities.