Publications

  1. Jun Araki, Dheeraj Rajagopal, Sreecharan Sankaranarayanan, Susan Holm, Yukari Yamakawa, and Teruko Mitamura. 2016.
    Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts.
    In Proceedings of COLING 2016.
    [ paper | abstract | bib | poster ]

    We present a novel approach to automated question generation that improves upon prior work both from a technology perspective and from an assessment perspective. Our system is aimed at engaging language learners by generating multiple-choice questions which utilize specific inference steps over multiple sentences, namely coreference resolution and paraphrase detection. The system also generates correct answers and semantically-motivated phrase-level distractors as answer choices. Evaluation by human annotators indicates that our approach requires a larger number of inference steps, which necessitate deeper semantic understanding of texts than a traditional single-sentence approach.
    @inproceedings{Araki2016Generating,
      author    = {Jun Araki and Dheeraj Rajagopal and Sreecharan Sankaranarayanan and Susan Holm and Yukari Yamakawa and Teruko Mitamura},
      title     = {Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts},
      booktitle = {Proceedings of COLING 2016},
      pages     = {1125--1136},
      month     = {December},
      year      = {2016},
      address   = {Osaka, Japan},
    }
    

  2. Abhishek Kumar and Jun Araki. 2016.
    Incorporating Relational Knowledge into Word Representations using Subspace Regularization.
    In Proceedings of ACL 2016 (Short Papers).
    [ paper | abstract | bib | slides ]

    Incorporating lexical knowledge from semantic resources (e.g., WordNet) has been shown to improve the quality of distributed word representations. This knowledge often comes in the form of relational triplets (x, r, y) where words x and y are connected by a relation type r. Existing methods either ignore the relation types, essentially treating the word pairs as generic related words, or employ rather restrictive assumptions to model the relational knowledge. We propose a novel approach to model relational knowledge based on low-rank subspace regularization, and conduct experiments on standard tasks to evaluate its effectiveness.
    @inproceedings{Kumar2016Incorporating,
      author    = {Abhishek Kumar and Jun Araki},
      title     = {Incorporating Relational Knowledge into Word Representations using Subspace Regularization},
      booktitle = {Proceedings of ACL 2016},
      pages     = {506--511},
      month     = {August},
      year      = {2016},
      address   = {Berlin, Germany},
    }
    

  3. Jun Araki and Teruko Mitamura. 2015.
    Joint Event Trigger Identification and Event Coreference Resolution with Structured Perceptron.
    In Proceedings of EMNLP 2015 (Short Papers).
    [ paper+supplement | abstract | bib | slides ]

    Events and their coreference offer useful semantic and discourse resources. We show that the semantic and discourse aspects of events interact with each other. However, traditional approaches addressed event extraction and event coreference resolution either separately or sequentially, which limits their interactions. This paper proposes a document-level structured learning model that simultaneously identifies event triggers and resolves event coreference. We demonstrate that the joint model outperforms a pipelined model by 6.9 BLANC F1 and 1.8 CoNLL F1 points in event coreference resolution using a corpus in the biology domain.
    @inproceedings{Araki2015Joint,
      author    = {Jun Araki and Teruko Mitamura},
      title     = {Joint Event Trigger Identification and Event Coreference Resolution with Structured Perceptron},
      booktitle = {Proceedings of EMNLP 2015},
      pages     = {2074--2080},
      month     = {September},
      year      = {2015},
      address   = {Lisbon, Portugal},
    }
    

  4. Di Wang, Leonid Boytsov, Jun Araki, Alkesh Patel, Jeff Gee, Zhengzhong Liu, Eric Nyberg, and Teruko Mitamura. 2014.
    CMU Multiple-choice Question Answering System at NTCIR-11 QA-Lab.
    In Proceedings of NTCIR-11.
    [ paper | abstract | bib ]

    We describe CMU's UIMA-based modular automatic question answering (QA) system. This system answers multiple-choice English questions for the world history entrance exam. Questions are preceded by short descriptions providing a historical context. Given the context and question-specific instructions, we generate verifiable assertions for each answer choice. These assertions are evaluated using several evidencing modules, which assign a plausibility score to each assertion. These scores are then aggregated to produce the most plausible answer choice. In the NTCIR-11 QALab evaluations, our system achieved 51.6% accuracy on the training set, 47.2% on Phase 1 testing set, and 34.1% on Phase 2 testing set.
    @inproceedings{Wang2014CMU,
      author    = {Di Wang and Leonid Boytsov and Jun Araki and Alkesh Patel and Jeff Gee and Zhengzhong Liu and Eric Nyberg and Teruko Mitamura},
      title     = {{CMU} Multiple-choice Question Answering System at {NTCIR-11} {QA-Lab}},
      booktitle = {Proceedings of NTCIR-11},
      pages     = {542--549},
      month     = {December},
      year      = {2014},
      address   = {Tokyo, Japan},
    }
    

  5. Jun Araki and Jamie Callan. 2014.
    An Annotation Similarity Model in Passage Ranking for Historical Fact Validation.
    In Proceedings of SIGIR 2014 (Short Papers).
    [ paper | abstract | bib | poster ]

    State-of-the-art question answering (QA) systems employ passage retrieval based on bag-of-words similarity models with respect to a query and a passage. We propose a combination of a traditional bag-of-words similarity model and an annotation similarity model to improve passage ranking. The proposed annotation similarity model is generic enough to process annotations of arbitrary types. Historical fact validation is a subtask to determine whether a given sentence tells us historically correct information, which is important for a QA task on world history. Experimental results show that the combined model gains up to 7.7% and 4.2% improvements in historical fact validation in terms of precision at rank 1 and mean reciprocal rank, respectively.
    @inproceedings{Araki2014Annotation,
      author    = {Jun Araki and Jamie Callan},
      title     = {An Annotation Similarity Model in Passage Ranking for Historical Fact Validation},
      booktitle = {Proceedings of SIGIR 2014},
      pages     = {1111--1114},
      month     = {July},
      year      = {2014},
      address   = {Gold Coast, Australia},
    }
    

  6. Jun Araki, Eduard Hovy, and Teruko Mitamura. 2014.
    Evaluation for Partial Event Coreference.
    In Proceedings of ACL 2014 Workshop on Events: Definition, Detection, Coreference, and Representation.
    [ paper | abstract | bib | poster ]

    This paper proposes an evaluation scheme to measure the performance of a system that detects hierarchical event structure for event coreference resolution. We show that each system output is represented as a forest of unordered trees, and introduce the notion of conceptual event hierarchy to simplify the evaluation process. We enumerate the desiderata for a similarity metric to measure the system performance. We examine three metrics along with the desiderata, and show that metrics extended from MUC and BLANC are more adequate than a metric based on Simple Tree Matching.
    @inproceedings{Araki2014Evaluation,
      author    = {Jun Araki and Eduard Hovy and Teruko Mitamura},
      title     = {Evaluation for Partial Event Coreference},
      booktitle = {Proceedings of ACL 2014 Workshop on Events: Definition, Detection, Coreference, and Representation},
      pages     = {68--76},
      month     = {June},
      year      = {2014},
      address   = {Baltimore, MD, USA},
    }
    

  7. Jun Araki, Zhengzhong Liu, Eduard Hovy, and Teruko Mitamura. 2014.
    Detecting Subevent Structure for Event Coreference Resolution.
    In Proceedings of LREC 2014.
    [ paper | abstract | bib | slides ]

    In the task of event coreference resolution, recent work has shown the need to perform not only full coreference but also partial coreference of events. We show that subevents can form a particular hierarchical event structure. This paper examines a novel two-stage approach to finding and improving subevent structures. First, we introduce a multiclass logistic regression model that can detect subevent relations in addition to full coreference. Second, we propose a method to improve subevent structure based on subevent clusters detected by the model. Using a corpus in the Intelligence Community domain, we show that the method achieves over 3.2 BLANC F1 gain in detecting subevent relations against the logistic regression model.
    @inproceedings{Araki2014Detecting,
      author    = {Jun Araki and Zhengzhong Liu and Eduard Hovy and Teruko Mitamura},
      title     = {Detecting Subevent Structure for Event Coreference Resolution},
      booktitle = {Proceedings of LREC 2014},
      pages     = {4553--4558},
      month     = {May},
      year      = {2014},
      address   = {Reykjavik, Iceland},
    }
    

  8. Zhengzhong Liu, Jun Araki, Eduard Hovy, and Teruko Mitamura. 2014.
    Supervised Within-Document Event Coreference using Information Propagation.
    In Proceedings of LREC 2014.
    [ paper | abstract | bib ]

    Event coreference is an important task for full text analysis. However, previous work uses a variety of approaches, sources and evaluation, making the literature confusing and the results incommensurate. We provide a description of the differences to facilitate future research. Second, we present a supervised method for event coreference resolution that uses a rich feature set and propagates information alternatively between events and their arguments, adapting appropriately for each type of argument.
    @inproceedings{Liu2014Supervised,
      author    = {Zhengzhong Liu and Jun Araki and Eduard Hovy and Teruko Mitamura},
      title     = {Supervised Within-Document Event Coreference using Information Propagation},
      booktitle = {Proceedings of LREC 2014},
      pages     = {4539--4544},
      month     = {May},
      year      = {2014},
      address   = {Reykjavik, Iceland},
    }
    

  9. Mahmoud Azab, Ahmed Salama, Kemal Oflazer, Hideki Shima, Jun Araki, and Teruko Mitamura. 2013.
    An English Reading Tool as an NLP Showcase.
    In Proceedings of IJCNLP 2013: System Demonstrations.
    [ paper | abstract | bib ]

    We introduce -- SmartReader -- an English reading tool for non-native English readers to overcome language related hindrances while reading a text. It makes extensive use of widely-available NLP tools and resources. SmartReader is a web-based application that can be accessed from standard browsers running on PCs or tablets. A user can choose a text document from the system's library they want to read or can upload a new document of their own and the system will display an interactive version of such text, that provides the reader with an intelligent e-book functionality.
    @inproceedings{Azab2013English,
      author    = {Mahmoud Azab and Ahmed Salama and Kemal Oflazer and Hideki Shima and Jun Araki and Teruko Mitamura},
      title     = {An {E}nglish Reading Tool as a {NLP} Showcase},
      booktitle = {Proceedings of IJCNLP 2013: System Demonstrations},
      pages     = {5--8},
      month     = {October},
      year      = {2013},
      address   = {Nagoya, Japan},
    }
    

  10. Mahmoud Azab, Ahmed Salama, Kemal Oflazer, Hideki Shima, Jun Araki, and Teruko Mitamura. 2013.
    An NLP-based Reading Tool for Aiding Non-native English Readers.
    In Proceedings of RANLP 2013.
    [ paper | abstract | bib ]

    This paper describes a text-reading tool that makes extensive use of widely available NLP tools and resources to aid non-native English speakers overcome language related hindrances while reading a text. It is a web-based tool, that can be accessed from browsers running on PCs or tablets, and provides the reader with an intelligent e-book functionality.
    @inproceedings{Azab2013NLP,
      author    = {Mahmoud Azab and Ahmed Salama and Kemal Oflazer and Hideki Shima and Jun Araki and Teruko Mitamura},
      title     = {An {NLP}-based Reading Tool for Aiding Non-native English Readers},
      booktitle = {Proceedings of RANLP 2013},
      pages     = {41--48},
      month     = {September},
      year      = {2013},
      address   = {Hissar, Bulgaria},
    }
    

  11. Eduard Hovy, Teruko Mitamura, Felisa Verdejo, Jun Araki, and Andrew Philpot. 2013.
    Events are Not Simple: Identity, Non-Identity, and Quasi-Identity.
    In Proceedings of NAACL-HLT 2013 Workshop on Events: Definition, Detection, Coreference, and Representation.
    [ paper | abstract | bib | poster ]

    Despite considerable theoretical and computational work on coreference, deciding when two entities or events are identical is very difficult. In a project to build corpora containing coreference links between events, we have identified three levels of event identity (full, partial, and none). Event coreference annotation on two corpora was performed to validate the findings.
    @inproceedings{Hovy2013Events,
      author    = {Eduard Hovy and Teruko Mitamura and Felisa Verdejo and Jun Araki and Andrew Philpot},
      title     = {Events are Not Simple: {I}dentity, Non-Identity, and Quasi-Identity},
      booktitle = {Proceedings of NAACL-HLT 2013 Workshop on Events: Definition, Detection, Coreference, and Representation},
      pages     = {21--28},
      month     = {June},
      year      = {2013},
      address   = {Atlanta, GA, USA},
    }
    

  12. Jun Araki. 2003.
    Text Classification with a Polysemy Considered Feature Set.
    Master Thesis, The University of Tokyo.
    [ thesis | abstract | bib ]

    As we store and distribute a large amount of computerized text, we have an important issue about how we extract useful data effectively from the text data. For this reason, the techniques for classifying text automatically with computers have attached attention.

    Generally, in a field of text classification, we use a model called Vector Space Model(VSM), in which we map a document into a point in a vector space with multiple dimension that has axes based on feature sets of keywords to characterize categories. In the past, lots of different attempts to extract words with highly evaluated values based on some measures, such as mutual information between categories and words, have been made for selection of feature words which characterize categories in text classification. However, some words are polysemous ones which have not a single meaning but multiple meanings, and therefore in the case of those polysemous words, there are documents which belong to different categories from the one to be intended, which causes problems for classification.

    In our research, we consider polysemous words as features with a risk factor for classification, and propose a method that we determine whether each feature word is the risk factor or not, using mutual information as a measure for feature selection, and disambiguate feature sets by removing features judged as risk factors. We compare classifying results with our method to the ones with an existing method, and evaluate its efficiency by using the Reuters-21578 corpus as the target data for classifying.
    @mastersthesis{Araki2003Text,
      author    = {Jun Araki},
      title     = {Text Classification with a Polysemy Considered Feature Set},
      school    = {The University of Tokyo},
      month     = {March},
      year      = {2003},
    }
    

  13. Jun Araki, Fumitaka Nakamura, and Masaya Nakayama. 2003.
    Text Classification with a Polysemy Considered Feature Set.
    In Proceedings of Information Processing Society of Japan, Special Interest Group on Natural Language Processing (IPSJ-SIGNL). In Japanese.

  14. Jun Araki, Fumitaka Nakamura, and Masaya Nakayama. 2002.
    Automated Categorization of Newspaper Articles using Sectorial Dictionary with Relevant Terms.
    In Proceedings of the Forum on Information Technology 2002 (FIT2002). In Japanese.