Oral Defense of Doctoral Dissertation – Computational Sciences and Informatics – John T. Rigsby – Automated Storytelling: Generating and Evaluating Story Chains
Notice and Invitation
Oral Defense of Doctoral Dissertation
Doctor of Philosophy in Computational Sciences and Informatics
Department of Computational and Data Sciences
College of Science
George Mason University
John T. Rigsby
Bachelor of Science, Mississippi State University, 1999
Master of Science, George Mason University, 2005
Automated Storytelling: Generating and Evaluating Story Chains
Monday, April 30, 2018, 11:00 a.m.
Research Hall, Room 162
All are invited to attend.
Daniel Barbara, Dissertation Director
Abstract: Automated storytelling attempts to create a chain of documents linking one article to another while telling a coherent and cohesive story that explains events connecting the two article end points. The need to understand the relationship between documents is a common problem for analysts; they often have two snippets of information and want to find the other pieces that relate them. These two snippets of information form the bookends (beginning and ending) of a story chain. The story chain formed using automated storytelling provides the analyst with better situational awareness by collecting and parsing intermediary documents to form a coherent story that explains the relationships of people, places, and events.
The promise of the Data Age is that the truth really is in there, somewhere. But our age has a curse, too: apophenia, the tendency to see patterns that may or may not exist. — Daniel Conover, Post and Courier, Charleston, South Carolina, 30 Aug. 2004
The above quote expresses a common problem in all areas of pattern recognition and data mining. For text data mining, several fields of study are dedicated to solving aspects of this problem. Some of these include literature-based discovery (LBD), topic detection and tracking (TDT), and automated storytelling. Methods to pull the signal from the noise are often the first step in text data analytics. This step usually takes the form of organizing the data into groups (i.e. clustering). Another common step is understanding the vocabulary of the dataset; this could be as simple as phrase frequency analysis or as complex as topic modeling. TDT and automated storytelling come into play once the analyst has specific documents for which they want more information.
In our world of ever more numerous sources of information, which includes scientific publications, news articles, web resources, emails, blogs, tweets, etc., automated storytelling mitigates information overload by presenting readers with the clarified chain of information most pertinent to their needs. Sometimes referred to as connecting the dots, automated storytelling attempts to create a chain of documents linking one article to another that tells a coherent and cohesive story and explains the events that connect the two articles. In the crafted story, articles next to each other should have enough similarity that readers easily comprehend why the next article in the chain was chosen. However, adjacent articles should also be different enough to move the reader farther along the chain of events with each successive article making significant progress toward the destination article.
The research in this thesis concentrates on three areas:
- story chain generation
- quantitative storytelling evaluation
- focusing storytelling with signal injection.
Storytelling evaluation of the quality of the created stories is difficult and has routinely involved human judgment. Existing storytelling evaluation methodologies have been qualitative in nature, based on results from crowd sourcing and subject matter experts. Limited quantitative evaluation methods currently exist and are generally only used for filtering results before qualitative evaluation. In addition, quantitative evaluation methods become essential to discern good stories from bad when two or more story chains exist for the same bookends. The work described herein extends the state of the art by providing quantitative methods of story quality evaluation which are shown to have good agreement with human judgment. Two methods of automated storytelling evaluation are developed: dispersion and coherence, which will be used later as criterion for a storytelling algorithm. Dispersion, a measure of story flow, ascertains how well the generated story flows away from the beginning document and towards the ending document. Coherence measures how well the articles in the middle of the story provide information about the relationship of the beginning and ending document pair. Kullback-Leibler divergence (KLD) is used to measure the ability to encode the vocabulary of the beginning and ending story documents using the set of middle documents in the story. The dispersion and coherence methodologies developed here have the added benefit that they do not require parameterization or user inputs and are easily automated.
An automated storytelling algorithm is proposed as a multi-criteria optimization problem that maximizes dispersion and coherence simultaneously. The developed storytelling methodologies allow for the automated identification of information which associates disparate documents in support of literature-based discovery and link analysis tasking. In addition, the methods provide quantitative measures of the strength of these associations.
We also present a modification of our storytelling algorithm as a multi-criteria optimization problem that allows for signal injection by the analyst without sacrificing good story flow and content. This is valuable because analysts often have an understanding of the situation or prior knowledge that could be used to focus the story in a better way as compared to the story chain formed without signal injection. Storytelling with signal injection allows an analyst to create alternative stories which incorporate the domain knowledge of the analyst into the story chain generation process.