Calendar
During the fall 2018 semester, the Computational Social Science (CSS) and the Computational Sciences and Informatics (CSI) Programs have merged their seminar/colloquium series where students, faculty and guest speakers present their latest research. These seminars are free and are open to the public. This series takes place on Fridays from 3-4:30 in Center for Social Complexity Suite which is located on the third floor of Research Hall.
If you would like to join the seminar mailing list please email Karen Underwood.
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
Karl Battams
Bachelor of Science – Astrophysics, University College London, 2002
Master of Science – Computational Sciences, George Mason University, 2008
Reduction and Synopses of Multi-Scale Time Series with Applications to Massive Solar Data
Monday, July 30, 2018, 11:00 a.m.
Exploratory Hall, Room 3301
All are invited to attend.
Committee
Robert Weigel, Dissertation Director/Chair
Jie Jhang
Robert Meier
Huzefa Rangwala
In this dissertation, we explore new methodologies and techniques applicable to aspects of Big Solar Data to enable new analyses of temporally long, or volumetrically large, solar physics imaging data sets. Specifically, we consider observations returned by two space-based solar physics missions – the Solar Dynamics Observatory (SDO) and the Solar and Heliospheric Observatory (SOHO) – the former operating for over 7-years to date, returning around 1.5 terabytes of data daily, and the latter having been operational for more than 22-years to date. Despite ongoing improvements in desktop computing performance and storage capabilities, temporally and volumetrically massive datasets in the solar physics community continue to be challenging to manipulate and analyze. While historically popular, but more simplistic, analysis methods continue to provide new insights, the results from those studies are often driven by improved observations rather than the computational methods themselves. To fully exploit the increasingly high volumes of observations returned by current and future missions, computational methods must be developed that enable reduction, synopsis and parameterization of observations to reduce the data volume while retaining the physical meaning of those data.
In the first part of this study we consider time series of 4 – 12 hours in length extracted from the high spatial and temporal resolution data recorded by the Atmospheric Imaging Assembly (AIA) instrument on the NASA Solar Dynamics Observatory (SDO). We present a new methodology that enables the reduction and parameterization of full spatial and temporal resolution SDO/AIA data sets into unique components of a model that accurately describes the power spectra of these observations. Specifically, we compute the power spectra of pixel-level time series extracted from derotated AIA image sequences in several wavelength channels of the AIA instrument, and fit one of two models to their power spectra as a function of frequency. This enables us to visualize and study the spatial dependence of the individual model parameters in each AIA channel. We find that the power spectra are well-described by at least one of these models for all pixel locations, with unique model parameterizations corresponding directly to visible solar features. Computational efficiency of all aspects of this code is provided by a flexible Python-based Message Passing Interface (MPI) framework that enables distribution of all workloads across all available processing cores. Key scientific results include clear identification of numerous quasi-periodic 3- and 5-minute oscillations throughout the solar corona; identification and new characterizations of the known ~4.0-minute chromospheric oscillation, including a previously unidentified solar-cycle driven trend in these oscillations; identification of “Coronal Bullseyes”, that present radially decaying periodicities over sunspots and sporadic foot-point regions, and of features we label “Penumbral Periodic Voids”, that appear as annular regions surrounding sunspots in the chromosphere, bordered by 3- and 5-minute oscillations but exhibiting no periodic features.
The second part of this study considers the entire mission archive returned by the Large Angle Spectrometric Coronagraph (LASCO) C2 instrument, operating for more than 20-years on the joint ESA/NASA Solar and Heliospheric Observatory (SOHO) mission. We present a technique that enables the reduction of this entire data set to a fully calibrated, spatially-located time series known as the LASCO Coronal Brightness Index (CBI). We compare these time series to a number concurrent solar activity indices via correlation analyses to indicate relationships between these indices and coronal brightness both globally across the entire corona, and locally over small spatial scales within the corona, demonstrating that the LASCO observations can be reliably used to derive proxies for a number of geophysical indices. Furthermore, via analysis of these time series in the frequency domain, we highlight the effects of long-time scale variability in long solar time series, considering sources of both solar origin (e.g., solar rotation, solar cycle) and of instrumental/operation origin (e.g., spacecraft rolls, stray light contamination), and demonstrate the impact of filtering of temporally long time series to reduce the impacts of these uncertain variables in the signals. Primary findings of this include identification of a strong correlation between coronal brightness and both Total and Spectral Solar Irradiance leading to the development of a LASCO-based proxy of solar irradiance, as well as identification of significant correlations with several other geophysical indices, with plausible driving mechanisms demonstrated via a developed correlation mapping technique. We also determine a number of new results regarding LASCO data processing and instrumental stray light that important to the calibration of the data and have important impacts on the long-term stability of the data.
Computational Social Science Research Colloquium /
Colloquium in Computational and Data Sciences
Robert Axtell, Professor
Computational Social Science Program,
Department of Computational and Data Sciences
College of Science
and
Department of Economics
College of Humanities and Social Sciences
George Mason University
Are Cities Agglomerations of People or of Firms? Data and a Model
Friday, September 28, 3:00 p.m.
Center for Social Complexity, 3rd Floor Research Hall
All are welcome to attend.
Abstract: Business firms are not uniformly distributed over space. In every country there are large swaths of land on which there are very few or no firms, coexisting with relatively small areas on which large numbers of businesses are located—these are the cities. Since the dawn of civilization the earliest cities have husbanded a variety of business activities. Indeed, often the raison d’etre for the growth of villages into towns and then into cities was the presence of weekly markets and fairs facilitating the exchange of goods. City theorists of today tend to see cities as amalgams of people, housing, jobs, transportation, specialized skills, congestion, patents, pollution, and so on, with the role of firms demoted to merely providing jobs and wages. Reciprocally, very little of the conventional theory of the firm is grounded in the fact that most firms are located in space, generally, and in cities, specifically. Consider the well-known facts that both firm and city sizes are approximately Zipf distributed. Is it merely a coincidence that the same extreme size distribution approximately describes firm and cities? Or is it the case that skew firm sizes create skew city sizes? Perhaps it is the other way round, that skew cities permit skew firms to arise? Or is it something more intertwined and complex, the coevolution of firm and city sizes, some kind of dialectical interplay of people working in companies doing business in cities? If firm sizes were not heavy-tailed, but followed an exponential distribution instead, say, could giant cities still exist? Or if cities were not so varied in size, as they were not, apparently, in feudal times, would firm sizes be significantly attenuated? In this talk I develop the empirical foundations of this puzzle, one that has been little emphasized in the extant literatures on firms and cities, probably because these are, for the most part, distinct literatures. I then go on to describe a model of individual people (agents) who arrange themselves into both firms and cities in approximate agreement with U.S. data.
Computational Social Science Research Colloquium /
Colloquium in Computational and Data Sciences
Maciej Latek, Chief Technology Officer, trovero.io./
Ph.D. in Computational Social Science 2011
George Mason University
Industrializing multi-agent simulations:
The case of social media marketing, advertising and influence campaigns
Friday, October 12, 3:00 p.m.
Center for Social Complexity, 3rd Floor Research Hall
All are welcome to attend.
Abstract: System engineering approaches required to transition multi-agent simulations out of science into decision support share features with AI, machine learning and application development, but also present unique challenges. In this talk, I will use trovero as an example to illustrate how some of these challenges can be addressed.
As platform to help advertisers and marketers plan and implement campaigns on the social media, trovero is comprised of social network simulations for optimization and automation and network population synthesis used to preserve people’s privacy while maintaining a robust picture of social media communities. Social network simulations forecast campaign outcomes and pick the right campaigns for given KPIs. Simulation is the only viable way to reliably forecast campaign outcomes: Big data methods fail to forecast campaign outcomes, because they are fundamentally unfit for social network data. Network population synthesis enables working with aggregate data without relying on data sharing agreements with social media platforms that are ever more reluctant to share user data with third parties after GDPR and the Cambridge Analytica debacle.
I will outline how these two approaches complement one another, what computational and data infrastructure is required to support them and how workflows and interactions with social media platforms are organized.
Computational Social Science Research Colloquium /
Colloquium in Computational and Data Sciences
J. Brent Williams
Founder and CEO
Euclidian Trust
Improved Entity Resolution as a Foundation for Model Precision
Friday, November 2, 3:00 p.m.
Center for Social Complexity, 3rd Floor Research Hall
All are welcome to attend.
Abstract: Analyzing behavior, identifying and classifying micro-differentiations, and predicting outcomes relies on the establishment of a core foundation of reliable and complete data linking. Whether data about individuals, families, companies, or markets, acquiring data from orthogonal sources results in significant matching challenges. These matching challenges are difficult because attempts to eliminate (or minimize) false positives yields an increase in false negatives. The converse is true also.
This discussion will focus on the business challenges in matching data and the primary and compounded impact on subsequent outcome analysis. Through practical experience, the speaker led the development and first commercialization of novel approach to “referential matching”. This approach leads to a more comprehensive unit data model (patient, customer, company, etc.), which enables greater computational resolution and model accuracy by hyper-accurate linking, disambiguation, and detection of obfuscation. The discussion also covers the impact of enumeration strategies, data obfuscation/hashing, and natural changes in unit data models over time.
There will be no Computational Social Science Research Colloquium /Colloquium in Computational and Data Sciences talk on Friday, November 23 due to Thanksgiving break.
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
Joseph Shaheen
Bachelor of Science, Murray State University, 2003
Master of Professional Studies, Georgetown University, 2011
Master of Business Administration, Georgetown University, 2013
Data Explorations in Firm Dynamics:
Firm Birth, Life, & Death Through Age, Wage, Size & Labor
Monday, November 26, 2018, 12.30 p.m.
Research Hall
All are invited to attend.
Committee
Robert Axtell, Dissertation Director
Eduardo Lopez
John Shortle
William Rand
Marc Smith
A better understanding of firm birth, life, and death yields a richer picture of firms’ life-cycle and dynamical labor processes. Through “big data” analysis of a collection of universal fundamental distributions and beginning with firm age, wage and size, I discuss stationarity, their functional form, and consequences emanating from their defects. I describe and delineate the potential complications of the firm age defect–caused by the Great Recession—and speculate on a stark future where a single firm may control the U.S. economy. I follow with an analysis of firm sizes, tensions in heavy-tailed model fitting, how firm growth depends on firm size and consequently, the apparent conflict between empirical evidence and Gibrat’s Law. Included is an introduction of the U.S. firm wage distribution. The ever-changing nature of firm dynamical processes played an important role in selecting the conditional distributions of age and size, and wage and size in my analysis. A closer look at these dynamical processes reveals the role played by mode wage and mode size in the dynamical processes of firms and thus in the firm life-cycle. Analysis of firm labor suggests preliminary evidence that the firm labor distribution conforms to scaling properties—that it is power law distributed. Moreover, I report empirical evidence supporting the existence of two separate and distinct labor processes—dubbed labor regimes—a primary and secondary, coupled with a third unknown regime. I hypothesize that this unknown regime must be drawn from the primary labor regime—that it is either emergent from systemic fraudulent activity or subjected to data corruption. The collection of explorations found in this dissertation product provide a fuller, richer picture of firm birth, life, and death through age, wage, size, and labor while supporting our understanding of firm dynamics in many directions.
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
Doug Reitz
Bachelor of Science, Pennsylvania State University, 1995
Master of Science, Binghamton University, 2007
Atomistic Monte Carlo Simulation and
Machine Learning Data Analysis of
Eutectic Alkali Metal Alloys
Tuesday, November 27, 2018, 10:00 a.m.
Research Hall, Room 92
All are invited to attend.
Committee
Estela Blaisten-Barojas, Dissertation Director
Igor Griva
Dmitri Klimov
Howard Sheng
Combining atomistic simulations and machine learning techniques can significantly expedite the materials discovery process. Here an application of such methodological combination for the prediction of the configuration phase (liquid, amorphous solid, and crystalline solid), melting transition, and amorphous-solid behavior of three eutectic alkali metal alloys (Na-K, Na-Cs, K-Cs) is presented. It is shown that efficient prediction of these properties is possible via machine learning methods trained on the topological local structural properties alone. The atomic configurations resulting from Monte Carlo annealing of the eutectic alkali alloys are analyzed with topological attributes based on the Voronoi tessellation using expectation-maximization clustering, Random Forest classification, and Support Vector Machine classification. It is shown that the Voronoi topological fingerprints make an accurate and fast prediction of the alloy thermal behavior by cataloging the atomic configurations into three distinct phases: liquid, amorphous solid, and crystalline solid. Using as few as eight topological features the configurations can be categorized into these three phases. With the proposed metrics, arrest-motion and melting temperature ranges are identified through a top down clustering of the atomic configurations cataloged as amorphous solid and liquid.
The methodology presented here is of direct relevance in identifying or screening unknown materials in a targeted class with desired combination of topological properties in an efficient manner with high fidelity. The results demonstrate explicitly the exceptional power of domain-based machine learning in discovering topological influence on thermodynamic properties, and at the same time providing valuable guidance to machine learning workflows for the analysis of other condensed systems. This statistical learning paradigm is not restricted to eutectic alloys or thermodynamics, extends the utility of topological attributes in a significant way, and harnesses the discovery of new material properties.
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
Yang Xu
Bachelor of Science, Nanjing Normal University, 2006
Master of Science, University of Nebraska-Lincoln, 2009
Almost Regular Graphs and Hamiltonian Cycles
Tuesday, December 4, 2018, 3:00 p.m.
Research Hall, Room 92
All are invited to attend.
Committee
Edward Wegman, Dissertation Director
Eduardo Lopez
Geir Agnarrson
Joseph Mar
This dissertation is third in a series aimed at seeking a method to optimized computer architectures for robustness and efficiency. HADI graphs were first introduced in Hadi Rezazad’s dissertation and were further examined in Roger Shores’ dissertation. This dissertation explores this particular class of graph structure in details and defines this graph structure in a mathematical way. Hadi Graphs are a subset of almost regular graphs with certain invariants. The bound of edge numbers is presented to ensure the new structure Hamiltonian. Another interesting alternative interconnect graph that is called hypercube is discussed in this dissertation. The main focus is to find how many edges can be removed but still retain the Hamiltonian property
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
Suchismita Goswami
BNUS, University of Calcutta, 1990
Master of Science, State University of New York, Stony Brook, 2001
Master of Science, George Mason University, 2013
NETWORK NEIGHBORHOOD ANALYSIS FOR DETECTING
ANOMALIES IN TIME SERIES OF GRAPHS
Tuesday, April 2, 2019, 11:00 a.m.
Research Hall, Room 162
All are invited to attend.
Committee
Igor Griva, Chair
Edward Wegman, Dissertation Director
Jeff Solka
Dhafter Marzougui
Around terabytes of unstructured electronic data are generated every day from twitter networks, scientific collaborations, organizational emails, telephone calls and websites. Excessive communications in such social networks continue to be a major problem. In some cases, for example, Enron e-mails, frequent contact or excessive activities on interconnected networks lead to fraudulent activities. In a social network, anomalies can occur as a result of abrupt changes in the interactions among a group of individuals. Analyzing such changes in a social network is thus important to understand the behavior of individuals in a subregion of a network. The motivation of this dissertation work is to investigate the excessive communications or anomalies and make inferences about the dynamic subnetworks. Here I present three major contributions of this research work to detect anomalies of dynamic networks obtained from interorganizational emails.
I develop a two-step scan process to detect the excessive activities by invoking the maximum log-likelihood ratio as a scan statistic with overlapping and variable window sizes to rank the clusters. The initial step is to determine the structural stability of the time series and perform differencing and de-seasonalizing operations to make the time series stationary, and obtain a primary cluster with a Poisson process model. I then construct neighborhood ego subnetworks around the observed primary cluster to obtain more refined cluster by invoking the graph invariant betweenness as the locality statistic using the binomial model. I demonstrate that the two-step scan statistics algorithm is more scalable in detecting excessive activities in large dynamic social networks.
I implement the multivariate time series models for the first time to detect a group of influential people that are associated with excessive communications, which cannot be assessed using scan statistics models. I employ here a vector auto regressive (VAR) model of time series of subgraphs, constructed using the graph edit distance, as the nodes or vertices of the subgraphs are interrelated. Anomalies are assessed using the residual thresholds greater than three times the standard deviation obtained from fitted time series models.
Finally, I devise a new method of detecting excessive topic activities from the unstructured text obtained from e-mail contents by combining probabilistic topic modeling and scan statistics algorithms. Initially, I investigate the major topic discussed using the latent Dirichlet allocation (LDA) modeling, and apply scan statistics to get excessive topic activities using the largest log-likelihood ratio in the neighborhood of primary cluster.
These processes provide new ways of detecting the excessive communications and topic flow through the influential vertices in dynamic networks, and can be employed in other dynamic social networks to critically investigate excessive activities.
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Social Science
Department of Computational and Data Sciences
College of Science
George Mason University
Gary Keith Bogle
Bachelor of Arts, University of California, Davis, 1990
Master of Arts, University of Illinois at Urbana-Champaign, 1995
Master of Science, Marymount University, 2003
Polity Cycling in Great Zimbabwe via Agent-Based Modeling:
The Effects of Timing and Magnitude of External Factors
Thursday, April 11, 2019, 1:00 p.m.
Research Hall, Room 92
All are invited to attend.
Committee
Claudio Cioffi-Revilla, Chair
William Kennedy
Amy Best
This research explores polity cycling at the site of Great Zimbabwe. It rests on laying out the possibilities that may explain what is seen in the archaeological record in terms of modeling what external factors, operating at specific times and magnitudes. What can cause a rapid rise and decline in the polity? This is explored in terms of attachment that individuals feel towards the small groups of which they are a part of, and the change in this attachment in response to their own resources and the history of success that the group enjoys in conducting collective action. The model presented in this research is based on the Canonical Theory of politogenesis. It is implemented using an agent-based model as this type of model excels at generating macro-level behavior from micro-level decisions. The results of this research cover the relationship between environmental inputs and the pattern of growth and decline of groups, the differences in group fealty and resources between successful groups and unsuccessful groups, the change in the number of groups throughout the simulation and the relationship between the probability of success in collective action and the success of the groups themselves. The input parameters to the model presented here are the collective action frequency (CAF) and environmental effect multiplier. The results show that a prehistoric polity can be modeled to demonstrate a sharp rise and fall in community groups and that the rise and fall emerges from the individual decision-making. Different sets of input parameters represent different environmental conditions, from the stable and predictable to less stable to quite unpredictable. Regardless of the environmental variability, the overall value of fealty experienced by community members moves in a similar fashion for all input sets. However, the more stable environment of Set A means the overall feelings of attachment to leadership do not fall as fast as they do in the more variable environments. In all, there is a two-stage process in which members in the community are sorted in to the surviving groups. Success in collective action leads to overall group success. The significance of this research is that it provides a basis for understanding that, while the archaeological record is incomplete, what happened in Great Zimbabwe lies within what has happened in other areas. What seems at first glance to be unusual can be explained through expected environmental and social factors that affect prehistoric societies on other continents. Furthermore, this research provides the basis for further quantifying the analysis of prehistoric societies by providing a model of laying out external factors along the lines of collective action frequencies and environmental effect multipliers.