Publications

Google Scholar

  1. Jun Araki, Lamana Mulaffer, Arun Pandian, Yukari Yamakawa, Kemal Oflazer, and Teruko Mitamura. 2018.
    Interoperable Annotation of Events and Event Relations across Domains.
    In Proceedings of the 14th Workshop on Interoperable Semantic Annotation.
    [ paper | abstract | bib | slides ]

    This paper presents methodologies for interoperable annotation of events and event relations across different domains. In addition to the interoperability, our annotation scheme supports a wider coverage of events and event relations than prior work. We employ the methodologies to annotate events and event relations on Simple Wikipedia articles in 10 different domains. Our analysis demonstrates that the methodologies can allow us to annotate events and event relations in a principled manner against the wide variety of domains. Despite our relatively wide and flexible annotation of events, we achieve high inter-annotator agreement on event annotation. We also provide an analysis of issues on annotation of events and event relations.
    @inproceedings{Araki2018Open,
      author    = {Jun Araki and Lamana Mulaffer and Arun Pandian and Yukari Yamakawa and Kemal Oflazer and Teruko Mitamura},
      title     = {Interoperable Annotation of Events and Event Relations across Domains},
      booktitle = {Proceedings of the 14th Interoperable Semantic Annotation Workshop},
      pages     = {10--20},
      month     = {August},
      year      = {2018},
      address   = {Santa Fe, NM, USA},
    }
    

  2. Jun Araki and Teruko Mitamura. 2018.
    Open-Domain Event Detection using Distant Supervision.
    In Proceedings of COLING 2018.
    [ paper | abstract | bib | poster | data ]

    This paper introduces open-domain event detection, a new event detection paradigm to address issues of prior work on restricted domains and event annotation. The goal is to detect all kinds of events regardless of domains. Given the absence of training data, we propose a distant supervision method that is able to generate high-quality training data. Using a manually annotated event corpus as gold standard, our experiments show that despite no direct supervision, the model outperforms supervised models. This result indicates that the distant supervision enables robust event detection in various domains, while obviating the need for human annotation of events.
    @inproceedings{Araki2018Open,
      author    = {Jun Araki and Teruko Mitamura},
      title     = {Open-Domain Event Detection using Distant Supervision},
      booktitle = {Proceedings of COLING},
      pages     = {878--891},
      month     = {August},
      year      = {2018},
      address   = {Santa Fe, NM, USA},
    }
    

  3. Jun Araki. 2018.
    Extraction of Event Structures from Text.
    Ph.D. Thesis, Carnegie Mellon University.
    [ thesis | abstract | bib | slides ]

    Events are a key semantic component integral to information extraction and natural language understanding, which can potentially enhance many downstream applications. Despite their importance, they have received less attention in research on natural language processing. Salient properties of events are that they are a ubiquitous linguistic phenomenon appearing in various domains and that they compose rich discourse structures via event coreferences, forming a coherent story over multiple sentences.

    The central goal of this thesis is to devise a computational method that models the structural property of events in a principled framework to enable more sophisticated event detection and event coreference resolution. To achieve this goal, we address five important problems in these areas: (1) restricted domains in event detection, (2) data sparsity in event detection, (3) lack of subevent detection, (4) error propagation in pipeline models, and (5) limited applications of events. For the first two problems, we introduce a new paradigm of open-domain event detection and show that it is feasible for a distant supervision method to build models detecting events robustly in various domains while obviating the need for human annotation of events. For the third and fourth problems, we show how structured learning models are capable of capturing event interdependencies and making more informed decisions on event coreference resolution and subevent detection. Lastly, we present a novel application of event structures for question generation, illustrating usefulness of event structures as inference steps in reading comprehension by humans.
    @phdthesis{Araki2018Event,
      author    = {Jun Araki},
      title     = {Extraction of Event Structures from Text},
      school    = {Carnegie Mellon University},
      month     = {August},
      year      = {2018},
    }
    

  4. Jun Araki, Dheeraj Rajagopal, Sreecharan Sankaranarayanan, Susan Holm, Yukari Yamakawa, and Teruko Mitamura. 2016.
    Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts.
    In Proceedings of COLING 2016.
    [ paper | abstract | bib | poster | code ]

    We present a novel approach to automated question generation that improves upon prior work both from a technology perspective and from an assessment perspective. Our system is aimed at engaging language learners by generating multiple-choice questions which utilize specific inference steps over multiple sentences, namely coreference resolution and paraphrase detection. The system also generates correct answers and semantically-motivated phrase-level distractors as answer choices. Evaluation by human annotators indicates that our approach requires a larger number of inference steps, which necessitate deeper semantic understanding of texts than a traditional single-sentence approach.
    @inproceedings{Araki2016Generating,
      author    = {Jun Araki and Dheeraj Rajagopal and Sreecharan Sankaranarayanan and Susan Holm and Yukari Yamakawa and Teruko Mitamura},
      title     = {Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts},
      booktitle = {Proceedings of COLING},
      pages     = {1125--1136},
      month     = {December},
      year      = {2016},
      address   = {Osaka, Japan},
    }
    

  5. Abhishek Kumar and Jun Araki. 2016.
    Incorporating Relational Knowledge into Word Representations using Subspace Regularization.
    In Proceedings of ACL 2016 (Short Papers).
    [ paper | abstract | bib | slides ]

    Incorporating lexical knowledge from semantic resources (e.g., WordNet) has been shown to improve the quality of distributed word representations. This knowledge often comes in the form of relational triplets (x, r, y) where words x and y are connected by a relation type r. Existing methods either ignore the relation types, essentially treating the word pairs as generic related words, or employ rather restrictive assumptions to model the relational knowledge. We propose a novel approach to model relational knowledge based on low-rank subspace regularization, and conduct experiments on standard tasks to evaluate its effectiveness.
    @inproceedings{Kumar2016Incorporating,
      author    = {Abhishek Kumar and Jun Araki},
      title     = {Incorporating Relational Knowledge into Word Representations using Subspace Regularization},
      booktitle = {Proceedings of ACL},
      pages     = {506--511},
      month     = {August},
      year      = {2016},
      address   = {Berlin, Germany},
    }
    

  6. Jun Araki and Teruko Mitamura. 2015.
    Joint Event Trigger Identification and Event Coreference Resolution with Structured Perceptron.
    In Proceedings of EMNLP 2015 (Short Papers).
    [ paper+supplement | abstract | bib | slides ]

    Events and their coreference offer useful semantic and discourse resources. We show that the semantic and discourse aspects of events interact with each other. However, traditional approaches addressed event extraction and event coreference resolution either separately or sequentially, which limits their interactions. This paper proposes a document-level structured learning model that simultaneously identifies event triggers and resolves event coreference. We demonstrate that the joint model outperforms a pipelined model by 6.9 BLANC F1 and 1.8 CoNLL F1 points in event coreference resolution using a corpus in the biology domain.
    @inproceedings{Araki2015Joint,
      author    = {Jun Araki and Teruko Mitamura},
      title     = {Joint Event Trigger Identification and Event Coreference Resolution with Structured Perceptron},
      booktitle = {Proceedings of EMNLP},
      pages     = {2074--2080},
      month     = {September},
      year      = {2015},
      address   = {Lisbon, Portugal},
    }
    

  7. Di Wang, Leonid Boytsov, Jun Araki, Alkesh Patel, Jeff Gee, Zhengzhong Liu, Eric Nyberg, and Teruko Mitamura. 2014.
    CMU Multiple-choice Question Answering System at NTCIR-11 QA-Lab.
    In Proceedings of NTCIR-11.
    [ paper | abstract | bib ]

    We describe CMU's UIMA-based modular automatic question answering (QA) system. This system answers multiple-choice English questions for the world history entrance exam. Questions are preceded by short descriptions providing a historical context. Given the context and question-specific instructions, we generate verifiable assertions for each answer choice. These assertions are evaluated using several evidencing modules, which assign a plausibility score to each assertion. These scores are then aggregated to produce the most plausible answer choice. In the NTCIR-11 QALab evaluations, our system achieved 51.6% accuracy on the training set, 47.2% on Phase 1 testing set, and 34.1% on Phase 2 testing set.
    @inproceedings{Wang2014CMU,
      author    = {Di Wang and Leonid Boytsov and Jun Araki and Alkesh Patel and Jeff Gee and Zhengzhong Liu and Eric Nyberg and Teruko Mitamura},
      title     = {{CMU} Multiple-choice Question Answering System at {NTCIR-11} {QA-Lab}},
      booktitle = {Proceedings of NTCIR-11},
      pages     = {542--549},
      month     = {December},
      year      = {2014},
      address   = {Tokyo, Japan},
    }
    

  8. Jun Araki and Jamie Callan. 2014.
    An Annotation Similarity Model in Passage Ranking for Historical Fact Validation.
    In Proceedings of SIGIR 2014 (Short Papers).
    [ paper | abstract | bib | poster ]

    State-of-the-art question answering (QA) systems employ passage retrieval based on bag-of-words similarity models with respect to a query and a passage. We propose a combination of a traditional bag-of-words similarity model and an annotation similarity model to improve passage ranking. The proposed annotation similarity model is generic enough to process annotations of arbitrary types. Historical fact validation is a subtask to determine whether a given sentence tells us historically correct information, which is important for a QA task on world history. Experimental results show that the combined model gains up to 7.7% and 4.2% improvements in historical fact validation in terms of precision at rank 1 and mean reciprocal rank, respectively.
    @inproceedings{Araki2014Annotation,
      author    = {Jun Araki and Jamie Callan},
      title     = {An Annotation Similarity Model in Passage Ranking for Historical Fact Validation},
      booktitle = {Proceedings of SIGIR},
      pages     = {1111--1114},
      month     = {July},
      year      = {2014},
      address   = {Gold Coast, Australia},
    }
    

  9. Jun Araki, Eduard Hovy, and Teruko Mitamura. 2014.
    Evaluation for Partial Event Coreference.
    In Proceedings of the 2nd Workshop on Events: Definition, Detection, Coreference, and Representation.
    [ paper | abstract | bib | poster ]

    This paper proposes an evaluation scheme to measure the performance of a system that detects hierarchical event structure for event coreference resolution. We show that each system output is represented as a forest of unordered trees, and introduce the notion of conceptual event hierarchy to simplify the evaluation process. We enumerate the desiderata for a similarity metric to measure the system performance. We examine three metrics along with the desiderata, and show that metrics extended from MUC and BLANC are more adequate than a metric based on Simple Tree Matching.
    @inproceedings{Araki2014Evaluation,
      author    = {Jun Araki and Eduard Hovy and Teruko Mitamura},
      title     = {Evaluation for Partial Event Coreference},
      booktitle = {Proceedings of the 2nd Workshop on Events: Definition, Detection, Coreference, and Representation},
      pages     = {68--76},
      month     = {June},
      year      = {2014},
      address   = {Baltimore, MD, USA},
    }
    

  10. Jun Araki, Zhengzhong Liu, Eduard Hovy, and Teruko Mitamura. 2014.
    Detecting Subevent Structure for Event Coreference Resolution.
    In Proceedings of LREC 2014.
    [ paper | abstract | bib | slides ]

    In the task of event coreference resolution, recent work has shown the need to perform not only full coreference but also partial coreference of events. We show that subevents can form a particular hierarchical event structure. This paper examines a novel two-stage approach to finding and improving subevent structures. First, we introduce a multiclass logistic regression model that can detect subevent relations in addition to full coreference. Second, we propose a method to improve subevent structure based on subevent clusters detected by the model. Using a corpus in the Intelligence Community domain, we show that the method achieves over 3.2 BLANC F1 gain in detecting subevent relations against the logistic regression model.
    @inproceedings{Araki2014Detecting,
      author    = {Jun Araki and Zhengzhong Liu and Eduard Hovy and Teruko Mitamura},
      title     = {Detecting Subevent Structure for Event Coreference Resolution},
      booktitle = {Proceedings of LREC},
      pages     = {4553--4558},
      month     = {May},
      year      = {2014},
      address   = {Reykjavik, Iceland},
    }
    

  11. Zhengzhong Liu, Jun Araki, Eduard Hovy, and Teruko Mitamura. 2014.
    Supervised Within-Document Event Coreference using Information Propagation.
    In Proceedings of LREC 2014.
    [ paper | abstract | bib ]

    Event coreference is an important task for full text analysis. However, previous work uses a variety of approaches, sources and evaluation, making the literature confusing and the results incommensurate. We provide a description of the differences to facilitate future research. Second, we present a supervised method for event coreference resolution that uses a rich feature set and propagates information alternatively between events and their arguments, adapting appropriately for each type of argument.
    @inproceedings{Liu2014Supervised,
      author    = {Zhengzhong Liu and Jun Araki and Eduard Hovy and Teruko Mitamura},
      title     = {Supervised Within-Document Event Coreference using Information Propagation},
      booktitle = {Proceedings of LREC},
      pages     = {4539--4544},
      month     = {May},
      year      = {2014},
      address   = {Reykjavik, Iceland},
    }
    

  12. Mahmoud Azab, Ahmed Salama, Kemal Oflazer, Hideki Shima, Jun Araki, and Teruko Mitamura. 2013.
    An English Reading Tool as an NLP Showcase.
    In Proceedings of IJCNLP 2013: System Demonstrations.
    [ paper | abstract | bib ]

    We introduce -- SmartReader -- an English reading tool for non-native English readers to overcome language related hindrances while reading a text. It makes extensive use of widely-available NLP tools and resources. SmartReader is a web-based application that can be accessed from standard browsers running on PCs or tablets. A user can choose a text document from the system's library they want to read or can upload a new document of their own and the system will display an interactive version of such text, that provides the reader with an intelligent e-book functionality.
    @inproceedings{Azab2013English,
      author    = {Mahmoud Azab and Ahmed Salama and Kemal Oflazer and Hideki Shima and Jun Araki and Teruko Mitamura},
      title     = {An {E}nglish Reading Tool as a {NLP} Showcase},
      booktitle = {Proceedings of IJCNLP: System Demonstrations},
      pages     = {5--8},
      month     = {October},
      year      = {2013},
      address   = {Nagoya, Japan},
    }
    

  13. Mahmoud Azab, Ahmed Salama, Kemal Oflazer, Hideki Shima, Jun Araki, and Teruko Mitamura. 2013.
    An NLP-based Reading Tool for Aiding Non-native English Readers.
    In Proceedings of RANLP 2013.
    [ paper | abstract | bib ]

    This paper describes a text-reading tool that makes extensive use of widely available NLP tools and resources to aid non-native English speakers overcome language related hindrances while reading a text. It is a web-based tool, that can be accessed from browsers running on PCs or tablets, and provides the reader with an intelligent e-book functionality.
    @inproceedings{Azab2013NLP,
      author    = {Mahmoud Azab and Ahmed Salama and Kemal Oflazer and Hideki Shima and Jun Araki and Teruko Mitamura},
      title     = {An {NLP}-based Reading Tool for Aiding Non-native English Readers},
      booktitle = {Proceedings of RANLP},
      pages     = {41--48},
      month     = {September},
      year      = {2013},
      address   = {Hissar, Bulgaria},
    }
    

  14. Eduard Hovy, Teruko Mitamura, Felisa Verdejo, Jun Araki, and Andrew Philpot. 2013.
    Events are Not Simple: Identity, Non-Identity, and Quasi-Identity.
    In Proceedings of the 1st Workshop on Events: Definition, Detection, Coreference, and Representation.
    [ paper | abstract | bib | poster ]

    Despite considerable theoretical and computational work on coreference, deciding when two entities or events are identical is very difficult. In a project to build corpora containing coreference links between events, we have identified three levels of event identity (full, partial, and none). Event coreference annotation on two corpora was performed to validate the findings.
    @inproceedings{Hovy2013Events,
      author    = {Eduard Hovy and Teruko Mitamura and Felisa Verdejo and Jun Araki and Andrew Philpot},
      title     = {Events are Not Simple: {I}dentity, Non-Identity, and Quasi-Identity},
      booktitle = {Proceedings of the 1st Workshop on Events: Definition, Detection, Coreference, and Representation},
      pages     = {21--28},
      month     = {June},
      year      = {2013},
      address   = {Atlanta, GA, USA},
    }
    

  15. Jun Araki. 2003.
    Text Classification with a Polysemy Considered Feature Set.
    Master Thesis, The University of Tokyo.
    [ thesis | abstract | bib ]

    As we store and distribute a large amount of computerized text, we have an important issue about how we extract useful data effectively from the text data. For this reason, the techniques for classifying text automatically with computers have attached attention.

    Generally, in a field of text classification, we use a model called Vector Space Model(VSM), in which we map a document into a point in a vector space with multiple dimension that has axes based on feature sets of keywords to characterize categories. In the past, lots of different attempts to extract words with highly evaluated values based on some measures, such as mutual information between categories and words, have been made for selection of feature words which characterize categories in text classification. However, some words are polysemous ones which have not a single meaning but multiple meanings, and therefore in the case of those polysemous words, there are documents which belong to different categories from the one to be intended, which causes problems for classification.

    In our research, we consider polysemous words as features with a risk factor for classification, and propose a method that we determine whether each feature word is the risk factor or not, using mutual information as a measure for feature selection, and disambiguate feature sets by removing features judged as risk factors. We compare classifying results with our method to the ones with an existing method, and evaluate its efficiency by using the Reuters-21578 corpus as the target data for classifying.
    @mastersthesis{Araki2003Text,
      author    = {Jun Araki},
      title     = {Text Classification with a Polysemy Considered Feature Set},
      school    = {The University of Tokyo},
      month     = {March},
      year      = {2003},
    }
    

  16. Jun Araki, Fumitaka Nakamura, and Masaya Nakayama. 2003.
    Text Classification with a Polysemy Considered Feature Set.
    In Proceedings of Information Processing Society of Japan, Special Interest Group on Natural Language Processing (IPSJ-SIGNL). In Japanese.

  17. Jun Araki, Fumitaka Nakamura, and Masaya Nakayama. 2002.
    Automated Categorization of Newspaper Articles using Sectorial Dictionary with Relevant Terms.
    In Proceedings of the Forum on Information Technology 2002 (FIT2002). In Japanese.