Calendar
During the fall 2018 semester, the Computational Social Science (CSS) and the Computational Sciences and Informatics (CSI) Programs have merged their seminar/colloquium series where students, faculty and guest speakers present their latest research. These seminars are free and are open to the public. This series takes place on Fridays from 3-4:30 in Center for Social Complexity Suite which is located on the third floor of Research Hall.
If you would like to join the seminar mailing list please email Karen Underwood.
COMPUTATIONAL SOCIAL SCIENCE FRIDAY SEMINAR
Henry Smart, III, PhD candidate
Virginia Tech
A Proof of Concept: An Agent-Based Model of Colorism
within an Organizational Context (Local Policing)
Friday, April 20, 3:00 p.m.
Center for Social Complexity Suite
3rd Floor, Research Hall
Abstract:
Colorism is the allocation of privilege and disadvantage based on skin color, with a prejudice for lighter skin. This project uses agent-based modeling (computational simulation) to explore the potential effects of colorism on local policing. I argue that colorism might help to explain some of the racial disparities in the United States’ criminal justice system. I use simulated scenarios to explore the plausibility of this notion in the form of two questions: 1) How might colorism function within an organization; and 2) What might occur when managers apply the typical dilemmatic responses to detected colorism? The simulated world consists of three citizen-groups (lights, mediums, and darks), five policy responses to detected colorism, and two policing behaviors (fair and biased). Using NetLogo, one hundred simulations were conducted for each policy response and analyzed using one-way ANOVA and pairwise comparison of means. When the tenets of colorism were applied to an organizational setting, only some of the tenets held true. For instance, those in the middle of the skin color spectrum experienced higher rates incarceration when aggressive steps were taken to counter colorism, which ran counter to the expectations of the thought experiment. The study identified an opportunity to expand the description of colorism to help describe the plight of those in the middle of the skin color spectrum. The major contributions from this work include a conceptual model that describes the relationship between the distinct levels of colorism and it progresses the notion of interactive colorism. The study also produced conditional statements that can be converted into hypotheses for future experiments.
COLLOQUIUM ON COMPUTATIONAL SCIENCES AND INFORMATICS
Dr. Foteini Baldimtsi, Assistant Professor
Department of Computer Science
George Mason University
Moving off the blockchain: a payment hub for fast, anonymous off-chain Bitcoin payments
Monday, April 23, 4:30-5:45
Exploratory Hall, Room 3301
Abstract: In this talk I will focus in two major technical challenges faced by Bitcoin today: (1) scaling Bitcoin to meet increasing use, and (2) protecting the privacy of payments made via Bitcoin. To address these challenges, I will present TumbleBit, an unidirectional unlinkable payment hub that uses an untrusted intermediary, the Tumbler, to perform off the blockchain transactions. TumbleBit allows to scale the volume and velocity of bitcoin-backed payments while being fully compatible with today’s Bitcoin protocol. At the same time, Tumblebit offers anonymity to the transactions routed through the Tumbler, guaranteeing that no-one, not even the Tumbler, can link a payment from its payer to payee. I will explain how a combination of cryptographic tools and blockchain properties is used to make Tumblebit work and discuss how these techniques are relevant beyond Bitcoin.
Based on joint work with: Ethan Heilman, Leen Alshenibr, Alessandra Scafuro and Sharon Goldberg
Bio: Foteini Baldimtsi is an Assistant Professor in the Computer Science Department at George Mason University. She received her Ph.D. from Brown University in May 2014 and worked as a postdoctoral researcher in Boston University and University of Athens. Her research interests are in the areas of cryptography, security and data privacy with a special focus in electronic payments, Bitcoin and blockchain technologies and private authentication techniques. During her PhD she was a recipient of a Paris Kanellakis fellowship and currently her research is supported by NSF and an IBM faculty award.
COMPUTATIONAL SOCIAL SCIENCE FRIDAY SEMINAR
Center for Social Complexity Suite
3rd Floor, Research Hall
Abstract:
Institutions are created by people interacting in complex ways with others in their socio-economic environment. A study of institutions should therefore study the people and interactions that create them.Acemoglu and Robinson developed a theory on the creation and consolidation of democracy through a game-theoretic framework. They studied how economic incentives influence the way social groups shape institutions to allocate political and economic power. The A&R models assume groups of people are completely rational and identical intra-group in order to make the models mathematically tractable. My dissertation utilizes an agent-based computational methodology to reproduce the A&R formal models with the same restrictions in order to validate my model. Specifically, with intra-group homogeneity the agent-based model reproduces the group-level threshold conditions affecting institutional choices found by A&R. I show that these results are robust to parameter changes within the ranges defined by A&R. The more flexible computational methodology allows me to relax the restrictive assumptions and explore how a more realistic set of assumptions such as heterogeneous incomes and limited intelligence affect the larger outcomes for all groups. The population structure with heterogeneity can include a more realistic middle class. Modeling a middle class by using agent-based models with heterogeneous agents finds that the effect of a middle class is non-linear and does not make democratizations more likely for all ranges of underlying economic conditions. This work demonstrates the usefulness of agent-based modeling as a viable alternative quantitative methodology for studying complex institutions.
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
John T. Rigsby
Bachelor of Science, Mississippi State University, 1999
Master of Science, George Mason University, 2005
Automated Storytelling: Generating and Evaluating Story Chains
Monday, April 30, 2018, 11:00 a.m.
Research Hall, Room 162
All are invited to attend.
Committee
Daniel Barbara, Dissertation Director
Estela Blaisten
Carlotta Domeniconi
Igor Griva
Abstract: Automated storytelling attempts to create a chain of documents linking one article to another while telling a coherent and cohesive story that explains events connecting the two article end points. The need to understand the relationship between documents is a common problem for analysts; they often have two snippets of information and want to find the other pieces that relate them. These two snippets of information form the bookends (beginning and ending) of a story chain. The story chain formed using automated storytelling provides the analyst with better situational awareness by collecting and parsing intermediary documents to form a coherent story that explains the relationships of people, places, and events.
The promise of the Data Age is that the truth really is in there, somewhere. But our age has a curse, too: apophenia, the tendency to see patterns that may or may not exist. — Daniel Conover, Post and Courier, Charleston, South Carolina, 30 Aug. 2004
The above quote expresses a common problem in all areas of pattern recognition and data mining. For text data mining, several fields of study are dedicated to solving aspects of this problem. Some of these include literature-based discovery (LBD), topic detection and tracking (TDT), and automated storytelling. Methods to pull the signal from the noise are often the first step in text data analytics. This step usually takes the form of organizing the data into groups (i.e. clustering). Another common step is understanding the vocabulary of the dataset; this could be as simple as phrase frequency analysis or as complex as topic modeling. TDT and automated storytelling come into play once the analyst has specific documents for which they want more information.
In our world of ever more numerous sources of information, which includes scientific publications, news articles, web resources, emails, blogs, tweets, etc., automated storytelling mitigates information overload by presenting readers with the clarified chain of information most pertinent to their needs. Sometimes referred to as connecting the dots, automated storytelling attempts to create a chain of documents linking one article to another that tells a coherent and cohesive story and explains the events that connect the two articles. In the crafted story, articles next to each other should have enough similarity that readers easily comprehend why the next article in the chain was chosen. However, adjacent articles should also be different enough to move the reader farther along the chain of events with each successive article making significant progress toward the destination article.
The research in this thesis concentrates on three areas:
- story chain generation
- quantitative storytelling evaluation
- focusing storytelling with signal injection.
Storytelling evaluation of the quality of the created stories is difficult and has routinely involved human judgment. Existing storytelling evaluation methodologies have been qualitative in nature, based on results from crowd sourcing and subject matter experts. Limited quantitative evaluation methods currently exist and are generally only used for filtering results before qualitative evaluation. In addition, quantitative evaluation methods become essential to discern good stories from bad when two or more story chains exist for the same bookends. The work described herein extends the state of the art by providing quantitative methods of story quality evaluation which are shown to have good agreement with human judgment. Two methods of automated storytelling evaluation are developed: dispersion and coherence, which will be used later as criterion for a storytelling algorithm. Dispersion, a measure of story flow, ascertains how well the generated story flows away from the beginning document and towards the ending document. Coherence measures how well the articles in the middle of the story provide information about the relationship of the beginning and ending document pair. Kullback-Leibler divergence (KLD) is used to measure the ability to encode the vocabulary of the beginning and ending story documents using the set of middle documents in the story. The dispersion and coherence methodologies developed here have the added benefit that they do not require parameterization or user inputs and are easily automated.
An automated storytelling algorithm is proposed as a multi-criteria optimization problem that maximizes dispersion and coherence simultaneously. The developed storytelling methodologies allow for the automated identification of information which associates disparate documents in support of literature-based discovery and link analysis tasking. In addition, the methods provide quantitative measures of the strength of these associations.
We also present a modification of our storytelling algorithm as a multi-criteria optimization problem that allows for signal injection by the analyst without sacrificing good story flow and content. This is valuable because analysts often have an understanding of the situation or prior knowledge that could be used to focus the story in a better way as compared to the story chain formed without signal injection. Storytelling with signal injection allows an analyst to create alternative stories which incorporate the domain knowledge of the analyst into the story chain generation process.
COLLOQUIUM ON COMPUTATIONAL SCIENCES AND INFORMATICS
Dr. Jim Jones, Associate Professor
Digital Forensics and Cyber Analysis program, ECE Department
George Mason University
Digital Data Persistence, Decay, and Recovery
Monday, April 30, 4:30-5:45
Exploratory Hall, Room 3301
Abstract:
Digital data dies an uncertain death. Delete a file today, and the content might be entirely destroyed immediately, or maybe some of it survives for a few hours, days, or longer. For a forensic investigator, this is good news – residual fragments of a deleted file might be recoverable days, months, even years after the file was deleted. But why? What factors drive this persistence, and can those factors be understood well enough to predict the decay pattern of different files on different systems and under different circumstances? To help answer this question, we developed a methodology and software tools to trace the contents of a deleted file over time using sequential snapshots. By recording the actions taken between each snapshot, and by conducting controlled experiments with many files, we generate decay curves and datasets which can be subsequently analyzed for factors affecting deleted file content persistence. Understanding these factors can support triage decisions and interpretation of results, e.g., should I expect to find anything on media X from event Y at time T, and what does it mean if I don’t? I will present our methodology and software tools (GitHub: jjonesu/DeletedFilePersistence), as well as a collection of preliminary results on magnetic hard disks, flash memory sticks, SD cards, and embedded flash memory.
Bio:
Jim Jones is an Associate Professor in the Digital Forensics and Cyber Analysis program within the ECE Department. Dr. Jones earned his Bachelor’s degree from Georgia Tech (Industrial and Systems Engineering, 1989), Master’s degree from Clemson University (Mathematical Sciences, 1995), and PhD from George Mason University (Computational Sciences and Informatics, 2008). He has been a cyber security practitioner, researcher, and educator for over 20 years. During that time, he has led and performed network and system vulnerability and penetration tests, led a cyber incident response team, conducted digital forensics investigations, and taught university courses in cyber security, penetration testing, digital forensics, and programming. Past and current funded research sponsors include DARPA, DHS, NSF, and DoD. His research interests are focused on digital artifact persistence, extraction, analysis, and manipulation.
COMPUTATIONAL SOCIAL SCIENCE FRIDAY SEMINAR
Sanjay Nayar, CSS PhD Student
George Mason University
Modeling Panic with Psychological Agents
Friday, May 4, 3:00PM
Center for Social Complexity Suite
3rd Floor, Research Hall
Abstract: Agent-based modeling (ABM) is steadily gaining traction in the modeling of real-world financial models built/used by organizations such as the Office of Financial Research, IMF, European Central Bank and others. As expected, the models are starting to show more complexity over the years but still lack much detailed modeling of agents at a psychological level. This becomes especially important in a crisis as individuals panic and make emotional decisions that are far from being fully rational or perhaps even boundedly-rational, in the traditional definition of the term. This exploratory talk will cover some of the recent ABM efforts in modeling financial crises and discuss the possible design elements for implementing and enhancing the psychological modeling of individuals agents, focusing on panic behavior in highly stressful/disastrous situations. Similarities and differences between financial panic and pedestrian/evacuation panic models will also be discussed, along with underlying theories of panic such as panics of “escape” and panics of “affiliation”.
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
Karl Battams
Bachelor of Science – Astrophysics, University College London, 2002
Master of Science – Computational Sciences, George Mason University, 2008
Reduction and Synopses of Multi-Scale Time Series with Applications to Massive Solar Data
Monday, July 30, 2018, 11:00 a.m.
Exploratory Hall, Room 3301
All are invited to attend.
Committee
Robert Weigel, Dissertation Director/Chair
Jie Jhang
Robert Meier
Huzefa Rangwala
In this dissertation, we explore new methodologies and techniques applicable to aspects of Big Solar Data to enable new analyses of temporally long, or volumetrically large, solar physics imaging data sets. Specifically, we consider observations returned by two space-based solar physics missions – the Solar Dynamics Observatory (SDO) and the Solar and Heliospheric Observatory (SOHO) – the former operating for over 7-years to date, returning around 1.5 terabytes of data daily, and the latter having been operational for more than 22-years to date. Despite ongoing improvements in desktop computing performance and storage capabilities, temporally and volumetrically massive datasets in the solar physics community continue to be challenging to manipulate and analyze. While historically popular, but more simplistic, analysis methods continue to provide new insights, the results from those studies are often driven by improved observations rather than the computational methods themselves. To fully exploit the increasingly high volumes of observations returned by current and future missions, computational methods must be developed that enable reduction, synopsis and parameterization of observations to reduce the data volume while retaining the physical meaning of those data.
In the first part of this study we consider time series of 4 – 12 hours in length extracted from the high spatial and temporal resolution data recorded by the Atmospheric Imaging Assembly (AIA) instrument on the NASA Solar Dynamics Observatory (SDO). We present a new methodology that enables the reduction and parameterization of full spatial and temporal resolution SDO/AIA data sets into unique components of a model that accurately describes the power spectra of these observations. Specifically, we compute the power spectra of pixel-level time series extracted from derotated AIA image sequences in several wavelength channels of the AIA instrument, and fit one of two models to their power spectra as a function of frequency. This enables us to visualize and study the spatial dependence of the individual model parameters in each AIA channel. We find that the power spectra are well-described by at least one of these models for all pixel locations, with unique model parameterizations corresponding directly to visible solar features. Computational efficiency of all aspects of this code is provided by a flexible Python-based Message Passing Interface (MPI) framework that enables distribution of all workloads across all available processing cores. Key scientific results include clear identification of numerous quasi-periodic 3- and 5-minute oscillations throughout the solar corona; identification and new characterizations of the known ~4.0-minute chromospheric oscillation, including a previously unidentified solar-cycle driven trend in these oscillations; identification of “Coronal Bullseyes”, that present radially decaying periodicities over sunspots and sporadic foot-point regions, and of features we label “Penumbral Periodic Voids”, that appear as annular regions surrounding sunspots in the chromosphere, bordered by 3- and 5-minute oscillations but exhibiting no periodic features.
The second part of this study considers the entire mission archive returned by the Large Angle Spectrometric Coronagraph (LASCO) C2 instrument, operating for more than 20-years on the joint ESA/NASA Solar and Heliospheric Observatory (SOHO) mission. We present a technique that enables the reduction of this entire data set to a fully calibrated, spatially-located time series known as the LASCO Coronal Brightness Index (CBI). We compare these time series to a number concurrent solar activity indices via correlation analyses to indicate relationships between these indices and coronal brightness both globally across the entire corona, and locally over small spatial scales within the corona, demonstrating that the LASCO observations can be reliably used to derive proxies for a number of geophysical indices. Furthermore, via analysis of these time series in the frequency domain, we highlight the effects of long-time scale variability in long solar time series, considering sources of both solar origin (e.g., solar rotation, solar cycle) and of instrumental/operation origin (e.g., spacecraft rolls, stray light contamination), and demonstrate the impact of filtering of temporally long time series to reduce the impacts of these uncertain variables in the signals. Primary findings of this include identification of a strong correlation between coronal brightness and both Total and Spectral Solar Irradiance leading to the development of a LASCO-based proxy of solar irradiance, as well as identification of significant correlations with several other geophysical indices, with plausible driving mechanisms demonstrated via a developed correlation mapping technique. We also determine a number of new results regarding LASCO data processing and instrumental stray light that important to the calibration of the data and have important impacts on the long-term stability of the data.
Computational Social Science Research Colloquium /Colloquium in Computational and Data Sciences
Kieran Marray, Laidlaw Scholar
St. Catherine’s College
University of Oxford
FORTEC: Forecasting the Development of Artificial Intelligence up to 2050 Using Agent-Based Modeling
Friday, August 31, 3:00 p.m.
Center for Social Complexity, 3rd Floor Research Hall
All are welcome to attend.
Kieran is a Laidlaw Scholar from St Catherine’s College, University of Oxford. He has been visiting the Center for Social Complexity over the summer to do research in complexity economics supervised by Professor Rob Axtell.
Due to a welcome reception for new and returning CDS student, there will be no colloquium on Friday, September 7. The next one will be held on Friday, September 14. Speaker and topic to be announced later.
Computational Social Science Research Colloquium /Colloquium in Computational and Data Sciences
William Kennedy, PhD, Captain, USN (Ret.)
Research Assistant Professor
Center for Social Complexity
Computational and Data Sciences
College of Science
Characterizing the Reaction of the Population of NYC to a Nuclear WMD
Friday, September 14, 3:00 p.m.
Center for Social Complexity, 3rd Floor Research Hall
All are welcome to attend.
Abstract: This talk will again review the status of our multi-year project to characterize the reaction of the population of a US megacity to a nuclear WMD event 2 years into the project. Our approach has been to develop an agent-based model of the New York City area, with agents representing each of the 23 million people, and establish a baseline of normal behaviors before exploring the population’s reactions to small (5-10Kt) nuclear weapon explosions. We have the modeled the environment, agents, and their interactions, but there have been some challenges in the last year. I’ll review our status, successes, and challenges as well as near term plans.