Calendar
During the fall 2018 semester, the Computational Social Science (CSS) and the Computational Sciences and Informatics (CSI) Programs have merged their seminar/colloquium series where students, faculty and guest speakers present their latest research. These seminars are free and are open to the public. This series takes place on Fridays from 3-4:30 in Center for Social Complexity Suite which is located on the third floor of Research Hall.
If you would like to join the seminar mailing list please email Karen Underwood.
COLLOQUIUM ON COMPUTATIONAL SCIENCES AND INFORMATICS
Sean Mallon, Associate Vice President
Entrepreneurship and Innovation
George Mason University
and
Eric Koefoot
Founder and CEO of PublicRelay
The Journey and Stories of a Data Science Entrepreneur
Monday, February 19, 4:30-5:45
Exploratory Hall, Room 3301
This session will feature conversation between Sean Mallon, Mason’s AVP for Entrepreneurship and Innovation, and Eric Koefoot, founder and CEO of PublicRelay, a venture-backed data analytics and media intelligence startup based in McLean, VA. During the discussion we will explore a wide range of topics, ranging from what inspired the initial business idea, to customer discovery, to product development challenges, to fundraising, to customer acquisition strategies, and much more. This will be a highly interactive seminar and participants are encouraged to come with questions and personal experiences to share.

Sean Mallon, Associate Vice President, Entrepreneurship and Innovation, Office of the Provost. Photo by Ron Aira/Creative Services/George Mason University
Sean Mallon Bio: Sean Mallon is Mason’s Associate Vice President for Entrepreneurship and Innovation. Before joining Mason in 2016, Sean spent many years as an entrepreneur and early-stage technology investor. Sean hold a Bachelor’s in History from Princeton and an MBA from the Wharton School of the University of Pennsylvania.
Eric Koefoot Bio: Formerly the CEO of U.S. News Ventures, CEO at Five Star Alliance, CFO and later VP Global Sales at Washington Post Digital, Eric is the founder and CEO of PublicRelay and brings substantial media experience and understanding. Eric holds a Bachelor’s in Engineering from MIT and an MBA from the Sloan School at MIT.
COMPUTATIONAL SOCIAL SCIENCE FRIDAY SEMINAR
Qing Tian, Assistant Professor
Computational and Data Sciences
George Mason University
Introduction to R for Computational and Data Science
Friday, February 23, 3:00 p.m.
Center for Social Complexity Suite
3rd Floor, Research Hall
ABSTRACT: R is a programming language and free software environment for statistical computing and graphics. It has gained substantially increased popularity in recent years. In addition to classical statistical analysis functionalities, it includes a wide range of packages with capabilities of data mining and machine learning, text analysis, spatial statistics, and social network analysis etc. This seminar will focus mostly on a suite of R packages that are designed to facilitate data (including social networks) visualization. These visualization functions are useful for exploratory analysis of real world data as well as output data from simulations.
COLLOQUIUM ON COMPUTATIONAL SCIENCES AND INFORMATICS
Eduardo Lopez, Assistant Professor
Department of Computational and Data Sciences
George Mason University
A Network Theory of Inter-Firm Labor Flows
Monday, March 5, 4:30-5:45
Exploratory Hall, Room 3301
Abstract: Using detailed administrative microdata for two countries, we build a modeling framework that yields new explanations for the origin of firm sizes, the firm contributions to unemployment, and the job-to-job mobility of workers between firms. Firms are organized as nodes in networks where connections represent low mobility barriers for workers. These labor flow networks are determined empirically, and serve as the substrate in which workers transition between jobs. We show that highly skewed firm size distributions are a direct consequence of the connectivity of firms. Further, our model permits the reconceptualization of unemployment as a local phenomenon, induced by individual firms, leading to the notion of firm-specific unemployment, which is also highly skewed. In coupling the study of job mobility and firm dynamics the model provides a new analytical tool for industrial organization and may make it possible to synthesize more targeted policies managing job mobility.
COLLOQUIUM ON COMPUTATIONAL SCIENCES AND INFORMATICS
James Glasbrenner, Assistant Professor
Department of Computational and Data Sciences
George Mason University
Using data science and materials simulations to control the corkscrew magnetism of MnAu₂
Monday, March 19, 4:30-5:45
Exploratory Hall, Room 3301
Materials occupy a foundational role in our society, from the silicon-based chips in our smartphones to the metals used to manufacture automobiles and construct buildings. The sheer variety in materials properties enables this wide range of use, and studying the atoms that bond together to form solids reveals the microscopic origin behind these properties. Remarkably, many properties can be traced to the behavior of and interaction between electrons, and computational simulations such as density functional theory calculations are used to study the features and macroscopic effects of this electronic structure. This computational approach can be further enhanced through recent advances in data science, which provide powerful tools and methods for analyzing and modeling data and for handling and storing large datasets.
In this talk, I will: 1) introduce the basic concepts of computational materials science and density functional theory in an accessible manner, and 2) present calculations on the material MnAu₂ where I use density functional theory and modeling to analyze its magnetic properties. The MnAu₂ structure is layered and its magnetic ground state forms a noncollinear corkscrew that rotates approximately 50° between neighboring manganese layers. Using the results of my calculations, I will explain the electronic origin of this corkscrew state and how to control its angle using external pressure and chemical substitution. In addition to discussing the electron physics, I will place a particular emphasis on the connection between data science and how modeling was used to analyze and interpret the density functional theory calculations. This will include a new, critical reexamination of my model fitting procedure using cross-validation and feature selection techniques, which will formally test the underlying assumptions I made in the original study.
Computational Social Science Friday Seminar
Ryan Zelnio, Ph.D.
Chief Analytics Officer
Office of Naval Research
The Creation of the Office of Naval Research’s Data & Analytics Lab
Friday, March 23, 3:00-4:30 p.m.
Center for Social Complexity Suite
Research Hall, 3rd Floor
The Office of Naval Research (ONR) coordinates, executes, and promotes the science and technology programs of the United States Navy and Marine Corps. It administers the Naval Research Enterprise (NRE) investment portfolio of $2B annually in Naval relevant science and technologies (S&T) ranging from basic research to technology prototyping. This portfolio covers over 3000 grant and contract awards annually over a large variety of technologies. In FY2017 alone, the basic and applied research portfolio (which is less than 50% of its budget) funded 4,411 scientific articles, 2,732 conference papers, 343 theses, 204 books & book chapters and 88 patents. However, while this portfolio is large, it is a drop in the bucket within the global research & development (R&D) enterprise. In an attempt to understand this vast amount of data being produced both within the NRE and globally, ONR recently stood up the Data & Analytics Lab. Its mission is to support strategic decision making at the Office of Naval Research with in-depth analysis of the NRE portfolio to enhance mission effectiveness for U.S. Naval Forces. This new lab is led by Mr. Matt Poe and includes Dr. Ryan Zelnio (2013 GMU SPP grad) serving as the Chief Analytics Officer and LCDR Nick Benes serving as the Chief Data Officer. This lab seeks to harness ONR’s investments in social network analysis, machine learning, natural language processing, data visualization, supervised and unsupervised clustering, and many other data science tools to support decision processes across the NRE. Their talk will cover the range of challenges facing their lab as they stand up their effort and discuss the broader move within the federal government to better apply the tools of data science to understand the complexity of the R&D enterprise. They will also discuss future partnering and internship opportunities.
COLLOQUIUM ON COMPUTATIONAL SCIENCES AND INFORMATICS
Dr. Peer Kröger, Professor
Chair of Database Systems and Data Mining
Ludwig-Maximilians-University Munich
TBA
Monday, March 26, 4:30-5:45
Exploratory Hall, Room 3301
Details coming soon….
COMPUTATIONAL SOCIAL SCIENCE FRIDAY SEMINAR
By Christopher Carroll and Jacquelyn Kazil
Christopher Carroll, Professor
Department of Economics
Johns Hopkins University
Title: Introduction to The Economics ARK (Algorithmic Repository and toolKit)
Abstract:
The Econ-ARK/HARK toolkit is a modular and extensible open source toolkit for solving, simulating, and estimating heterogeneous-agent (HA) models in economics and the social sciences. Although the value of models of this kind has been clear both to academics and to policymakers for a long time, the code for implementing such models has so far been handcrafted and idiosyncratic. As a result, it may take years of human capital development for a new researcher to become proficient enough in these methods to contribute to the literature. The seminar will describe how the Heterogeneous Agents Resources and toolKit (HARK) eases this burden by providing a robust framework in which canonical examples of such models are solved. The toolkit provides object-oriented tools for representing heterogeneous agents, solution methods for solving or characterizing their dynamic choice problems, and a framework for representing the environment in which agents interact. The aim of the toolkit is to become the go-to resource for heterogeneous agent modelers, by providing a well-designed, well-documented, and powerful platform in which they can develop their own work in a robust and replicable manner.
Bio:
I am a professor of economics at JHU and co-chair of the National Bureau of Economic Research’s working group on the Aggregate Implications of Microeconomic Consumption Behavior. Originally from Knoxville, Tennessee, I received my A.B. in Economics from Harvard University in 1986 and a Ph.D. from the Massachusetts Institute of Technology in 1990. After graduating from M.I.T., I worked at the Federal Reserve Board in Washington DC, where I prepared forecasts for consumer expenditure. I moved to Johns Hopkins University in 1995 and also spent 1997-98 working at the Council of Economic Advisors in Washington, where I analyzed Social Security reform proposals, tax and pension policy, and bankruptcy reform. Aside from my current work at Hopkins and the NBER, I am also an associate editor at the Review of Economics and Statistics,(ReStat) the Journal of Business and Economic Statistics, (JBES) and the Berkeley Electronic Journal of Macroeconomics (BEJM).
My research has primarily focused on consumption and saving behavior, with an emphasis on reconciling the empirical evidence from both microeconomic and macroeconomic sources with theoretical models. (In addition to articles in economic journals, I’ve authored Encyclopedia Britannica articles on consumption related topics.) My most recent research has focused on the dynamics of expectations formation, particularly on how expectations reflect households’ learning from each other and from experts. This focus flows from a career-long interest in consumer sentiment and its determinants.
Jacquelyn Kazil
CSS PhD Student
Title: Mesa, Agent-based modeling library in Python 3
Abstract:
Python has grown significantly in the scientific community, but there is no tool or reusable framework to do agent-based modeling (ABM) in Python. While there are well-established frameworks in other languages, the lack of one in the Python language is at odds with the growth of Python in the scientific community. As a result, we created an ABM framework called Mesa in Python 3 with sustained contributions. Mesa is built to be modular, so the backend server, the frontend visualization and tooling, the batch runner, and the data collector are each separate components that can be upgraded independently from each other. In addition to this, Mesa is extensible and meant to be decoupled from domain specific add-ons. This empowers the community to develop features and add-ons independent of the core Mesa library. In this talk, Jackie will set the stage for her Ph.D by providing an overview Mesa’s past, present, and proposed future, along with how that fits in the ABM ecosystem of other tooling.
COMPUTATIONAL SOCIAL SCIENCE FRIDAY SEMINAR
Henry Smart, III, PhD candidate
Virginia Tech
A Proof of Concept: An Agent-Based Model of Colorism
within an Organizational Context (Local Policing)
Friday, April 20, 3:00 p.m.
Center for Social Complexity Suite
3rd Floor, Research Hall
Abstract:
Colorism is the allocation of privilege and disadvantage based on skin color, with a prejudice for lighter skin. This project uses agent-based modeling (computational simulation) to explore the potential effects of colorism on local policing. I argue that colorism might help to explain some of the racial disparities in the United States’ criminal justice system. I use simulated scenarios to explore the plausibility of this notion in the form of two questions: 1) How might colorism function within an organization; and 2) What might occur when managers apply the typical dilemmatic responses to detected colorism? The simulated world consists of three citizen-groups (lights, mediums, and darks), five policy responses to detected colorism, and two policing behaviors (fair and biased). Using NetLogo, one hundred simulations were conducted for each policy response and analyzed using one-way ANOVA and pairwise comparison of means. When the tenets of colorism were applied to an organizational setting, only some of the tenets held true. For instance, those in the middle of the skin color spectrum experienced higher rates incarceration when aggressive steps were taken to counter colorism, which ran counter to the expectations of the thought experiment. The study identified an opportunity to expand the description of colorism to help describe the plight of those in the middle of the skin color spectrum. The major contributions from this work include a conceptual model that describes the relationship between the distinct levels of colorism and it progresses the notion of interactive colorism. The study also produced conditional statements that can be converted into hypotheses for future experiments.
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
John T. Rigsby
Bachelor of Science, Mississippi State University, 1999
Master of Science, George Mason University, 2005
Automated Storytelling: Generating and Evaluating Story Chains
Monday, April 30, 2018, 11:00 a.m.
Research Hall, Room 162
All are invited to attend.
Committee
Daniel Barbara, Dissertation Director
Estela Blaisten
Carlotta Domeniconi
Igor Griva
Abstract: Automated storytelling attempts to create a chain of documents linking one article to another while telling a coherent and cohesive story that explains events connecting the two article end points. The need to understand the relationship between documents is a common problem for analysts; they often have two snippets of information and want to find the other pieces that relate them. These two snippets of information form the bookends (beginning and ending) of a story chain. The story chain formed using automated storytelling provides the analyst with better situational awareness by collecting and parsing intermediary documents to form a coherent story that explains the relationships of people, places, and events.
The promise of the Data Age is that the truth really is in there, somewhere. But our age has a curse, too: apophenia, the tendency to see patterns that may or may not exist. — Daniel Conover, Post and Courier, Charleston, South Carolina, 30 Aug. 2004
The above quote expresses a common problem in all areas of pattern recognition and data mining. For text data mining, several fields of study are dedicated to solving aspects of this problem. Some of these include literature-based discovery (LBD), topic detection and tracking (TDT), and automated storytelling. Methods to pull the signal from the noise are often the first step in text data analytics. This step usually takes the form of organizing the data into groups (i.e. clustering). Another common step is understanding the vocabulary of the dataset; this could be as simple as phrase frequency analysis or as complex as topic modeling. TDT and automated storytelling come into play once the analyst has specific documents for which they want more information.
In our world of ever more numerous sources of information, which includes scientific publications, news articles, web resources, emails, blogs, tweets, etc., automated storytelling mitigates information overload by presenting readers with the clarified chain of information most pertinent to their needs. Sometimes referred to as connecting the dots, automated storytelling attempts to create a chain of documents linking one article to another that tells a coherent and cohesive story and explains the events that connect the two articles. In the crafted story, articles next to each other should have enough similarity that readers easily comprehend why the next article in the chain was chosen. However, adjacent articles should also be different enough to move the reader farther along the chain of events with each successive article making significant progress toward the destination article.
The research in this thesis concentrates on three areas:
- story chain generation
- quantitative storytelling evaluation
- focusing storytelling with signal injection.
Storytelling evaluation of the quality of the created stories is difficult and has routinely involved human judgment. Existing storytelling evaluation methodologies have been qualitative in nature, based on results from crowd sourcing and subject matter experts. Limited quantitative evaluation methods currently exist and are generally only used for filtering results before qualitative evaluation. In addition, quantitative evaluation methods become essential to discern good stories from bad when two or more story chains exist for the same bookends. The work described herein extends the state of the art by providing quantitative methods of story quality evaluation which are shown to have good agreement with human judgment. Two methods of automated storytelling evaluation are developed: dispersion and coherence, which will be used later as criterion for a storytelling algorithm. Dispersion, a measure of story flow, ascertains how well the generated story flows away from the beginning document and towards the ending document. Coherence measures how well the articles in the middle of the story provide information about the relationship of the beginning and ending document pair. Kullback-Leibler divergence (KLD) is used to measure the ability to encode the vocabulary of the beginning and ending story documents using the set of middle documents in the story. The dispersion and coherence methodologies developed here have the added benefit that they do not require parameterization or user inputs and are easily automated.
An automated storytelling algorithm is proposed as a multi-criteria optimization problem that maximizes dispersion and coherence simultaneously. The developed storytelling methodologies allow for the automated identification of information which associates disparate documents in support of literature-based discovery and link analysis tasking. In addition, the methods provide quantitative measures of the strength of these associations.
We also present a modification of our storytelling algorithm as a multi-criteria optimization problem that allows for signal injection by the analyst without sacrificing good story flow and content. This is valuable because analysts often have an understanding of the situation or prior knowledge that could be used to focus the story in a better way as compared to the story chain formed without signal injection. Storytelling with signal injection allows an analyst to create alternative stories which incorporate the domain knowledge of the analyst into the story chain generation process.
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
Karl Battams
Bachelor of Science – Astrophysics, University College London, 2002
Master of Science – Computational Sciences, George Mason University, 2008
Reduction and Synopses of Multi-Scale Time Series with Applications to Massive Solar Data
Monday, July 30, 2018, 11:00 a.m.
Exploratory Hall, Room 3301
All are invited to attend.
Committee
Robert Weigel, Dissertation Director/Chair
Jie Jhang
Robert Meier
Huzefa Rangwala
In this dissertation, we explore new methodologies and techniques applicable to aspects of Big Solar Data to enable new analyses of temporally long, or volumetrically large, solar physics imaging data sets. Specifically, we consider observations returned by two space-based solar physics missions – the Solar Dynamics Observatory (SDO) and the Solar and Heliospheric Observatory (SOHO) – the former operating for over 7-years to date, returning around 1.5 terabytes of data daily, and the latter having been operational for more than 22-years to date. Despite ongoing improvements in desktop computing performance and storage capabilities, temporally and volumetrically massive datasets in the solar physics community continue to be challenging to manipulate and analyze. While historically popular, but more simplistic, analysis methods continue to provide new insights, the results from those studies are often driven by improved observations rather than the computational methods themselves. To fully exploit the increasingly high volumes of observations returned by current and future missions, computational methods must be developed that enable reduction, synopsis and parameterization of observations to reduce the data volume while retaining the physical meaning of those data.
In the first part of this study we consider time series of 4 – 12 hours in length extracted from the high spatial and temporal resolution data recorded by the Atmospheric Imaging Assembly (AIA) instrument on the NASA Solar Dynamics Observatory (SDO). We present a new methodology that enables the reduction and parameterization of full spatial and temporal resolution SDO/AIA data sets into unique components of a model that accurately describes the power spectra of these observations. Specifically, we compute the power spectra of pixel-level time series extracted from derotated AIA image sequences in several wavelength channels of the AIA instrument, and fit one of two models to their power spectra as a function of frequency. This enables us to visualize and study the spatial dependence of the individual model parameters in each AIA channel. We find that the power spectra are well-described by at least one of these models for all pixel locations, with unique model parameterizations corresponding directly to visible solar features. Computational efficiency of all aspects of this code is provided by a flexible Python-based Message Passing Interface (MPI) framework that enables distribution of all workloads across all available processing cores. Key scientific results include clear identification of numerous quasi-periodic 3- and 5-minute oscillations throughout the solar corona; identification and new characterizations of the known ~4.0-minute chromospheric oscillation, including a previously unidentified solar-cycle driven trend in these oscillations; identification of “Coronal Bullseyes”, that present radially decaying periodicities over sunspots and sporadic foot-point regions, and of features we label “Penumbral Periodic Voids”, that appear as annular regions surrounding sunspots in the chromosphere, bordered by 3- and 5-minute oscillations but exhibiting no periodic features.
The second part of this study considers the entire mission archive returned by the Large Angle Spectrometric Coronagraph (LASCO) C2 instrument, operating for more than 20-years on the joint ESA/NASA Solar and Heliospheric Observatory (SOHO) mission. We present a technique that enables the reduction of this entire data set to a fully calibrated, spatially-located time series known as the LASCO Coronal Brightness Index (CBI). We compare these time series to a number concurrent solar activity indices via correlation analyses to indicate relationships between these indices and coronal brightness both globally across the entire corona, and locally over small spatial scales within the corona, demonstrating that the LASCO observations can be reliably used to derive proxies for a number of geophysical indices. Furthermore, via analysis of these time series in the frequency domain, we highlight the effects of long-time scale variability in long solar time series, considering sources of both solar origin (e.g., solar rotation, solar cycle) and of instrumental/operation origin (e.g., spacecraft rolls, stray light contamination), and demonstrate the impact of filtering of temporally long time series to reduce the impacts of these uncertain variables in the signals. Primary findings of this include identification of a strong correlation between coronal brightness and both Total and Spectral Solar Irradiance leading to the development of a LASCO-based proxy of solar irradiance, as well as identification of significant correlations with several other geophysical indices, with plausible driving mechanisms demonstrated via a developed correlation mapping technique. We also determine a number of new results regarding LASCO data processing and instrumental stray light that important to the calibration of the data and have important impacts on the long-term stability of the data.