Publications

Most of my publications are also available at my Google Scholar profile.

  1. Pei Chen, Haibo Ding, Jun Araki, and Ruihong Huang.
    Explicitly Capturing Relations between Entity Mentions via Graph Neural Networks for Domain-specific Named Entity Recognition.
    Association for Computational Linguistics (ACL): Short Papers. 2021 (To Appear).
    [ paper | summary | abstract | bibtex ]

    @inproceedings{Chen2021Explicitly,
      author    = {Pei Chen and Haibo Ding and Jun Araki and Graham Neubig},
      title     = {Explicitly Capturing Relations between Entity Mentions via Graph Neural Networks for Domain-specific Named Entity Recognition},
      booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP)},
      month     = {8},
      year      = {2021},
      address   = {Online},
      note      = {To Appear},
    }
    
  2. Zhengbao Jiang, Jun Araki, Haibo Ding, and Graham Neubig.
    How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering.
    Transactions of the Association for Computational Linguistics (TACL). 2021 (To Appear).
    [ paper | summary | abstract | bibtex | arxiv ]

    Abstract: Recent works have shown that language models (LM) capture different types of knowledge regarding facts or common sense. However, because no model is perfect, they still fail to provide appropriate answers in many cases. In this paper, we ask the question "how can we know when language models know, with confidence, the answer to a particular query?" We examine this question from the point of view of calibration, the property of a probabilistic model's predicted probabilities actually being well correlated with the probability of correctness. We first examine a state-of-the-art generative QA model, T5, and examine whether its probabilities are well calibrated, finding the answer is a relatively emphatic no. We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness through fine-tuning, post-hoc probability modification, or adjustment of the predicted outputs or inputs. Experiments on a diverse range of datasets demonstrate the effectiveness of our methods. We also perform analysis to study the strengths and limitations of these methods, shedding light on further improvements that may be made in methods for calibrating LMs.
    @article{Jiang2021How,
      author    = {Zhengbao Jiang and Jun Araki and Haibo Ding and Graham Neubig},
      title     = {How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering},
      journal   = {Transactions of the Association for Computational Linguistics (TACL)},
      year      = {2020},
      note      = {To Appear},
    }
    
  3. Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki, Haibo Ding, and Graham Neubig.
    X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models.
    Empirical Methods in Natural Language Processing (EMNLP). 2020.
    [ paper | summary | abstract | bibtex | slides | arxiv | code/data ]

    Summary: We provide a multilingual benchmark of cloze-style probes to assess the factual knowledge retrieval capability of language models in typologically diverse languages.
    Abstract: Language models (LMs) have proven surprisingly successful at capturing factual knowledge by completing cloze-style fill-in-the-blank questions such as “Punta Cana is located in _.” However, while knowledge is both written and queried in many languages, studies on LMs’ factual representation ability have almost invariably been performed on English. To assess factual knowledge retrieval in LMs in different languages, we create a multilingual benchmark of cloze-style probes for typologically diverse languages. To properly handle language variations, we expand probing methods from single- to multi-word entities, and develop several decoding algorithms to generate multi-token predictions. Extensive experimental results provide insights about how well (or poorly) current state-of-the-art LMs perform at this task in languages with more or fewer available resources. We further propose a code-switching-based method to improve the ability of multilingual LMs to access knowledge, and verify its effectiveness on several benchmark languages. Benchmark data and code have be released at https://x-factr.github.io.
    @inproceedings{Jiang2020XFACTR,
      author    = {Zhengbao Jiang and Antonios Anastasopoulos and Jun Araki and Haibo Ding and Graham Neubig},
      title     = {{X-FACTR}: {M}ultilingual Factual Knowledge Retrieval from Pretrained Language Models},
      booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
      pages     = {5943--5959},
      month     = {11},
      year      = {2020},
      address   = {Online},
    }
    
  4. Zhengbao Jiang, Frank F. Xu, Jun Araki, and Graham Neubig.
    How Can We Know What Language Models Know?
    Transactions of the Association for Computational Linguistics (TACL). 2020.
    [ paper | summary | abstract | bibtex | slides | arxiv | code/data ]

    Summary: Generating high-quality and diverse prompts achieves a more accurate estimate of the factual knowledge retrieved by language models because language models are sensitive to how we query them.
    Abstract: Recent work has presented intriguing results examining the knowledge contained in language models (LM) by having the LM fill in the blanks of prompts such as "Obama is a _ by profession". These prompts are usually manually created, and quite possibly sub-optimal; another prompt such as "Obama worked as a _" may result in more accurately predicting the correct profession. Because of this, given an inappropriate prompt, we might fail to retrieve facts that the LM does know, and thus any given prompt only provides a lower bound estimate of the knowledge contained in an LM. In this paper, we attempt to more accurately estimate the knowledge contained in LMs by automatically discovering better prompts to use in this querying process. Specifically, we propose mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts, as well as ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark for extracting relational knowledge from LMs demonstrate that our methods can improve accuracy from 31.1% to 39.6%, providing a tighter lower bound on what LMs know. We have released the code and the resulting LM Prompt And Query Archive (LPAQA) at https://github.com/jzbjyb/LPAQA.
    @article{Jiang2020How,
      author    = {Zhengbao Jiang and Frank F. Xu and Jun Araki and Graham Neubig},
      title     = {How Can We Know What Language Models Know?},
      journal   = {Transactions of the Association for Computational Linguistics (TACL)},
      volume    = {8},
      pages     = {423--438},
      year      = {2020},
    }
    
  5. Zhengbao Jiang, Wei Xu, Jun Araki, and Graham Neubig.
    Generalizing Natural Language Analysis through Span-relation Representations.
    Association for Computational Linguistics (ACL). 2020.
    [ paper | summary | abstract | bibtex | slides | arxiv | code ]

    Summary: A single task-agnostic model based on span-relation representations can address a wide variety of NLP tasks predicting syntax, semantics, and information contents, while being able to achieve performance comparable to state-of-the-art specialized models.
    Abstract: Natural language processing covers a wide variety of tasks predicting syntax, semantics, and information content, and usually each type of output is generated with specially designed architectures. In this paper, we provide the simple insight that a great variety of tasks can be represented in a single unified format consisting of labeling spans and relations between spans, thus a single task-independent model can be used across different tasks. We perform extensive experiments to test this insight on 10 disparate tasks spanning dependency parsing (syntax), semantic role labeling (semantics), relation extraction (information content), aspect based sentiment analysis (sentiment), and many others, achieving performance comparable to state-of-the-art specialized models. We further demonstrate benefits of multi-task learning, and also show that the proposed method makes it easy to analyze differences and similarities in how the model handles different tasks. Finally, we convert these datasets into a unified format to build a benchmark, which provides a holistic testbed for evaluating future models for generalized natural language analysis.
    @inproceedings{Jiang2020Generalizing,
      author    = {Zhengbao Jiang and Wei Xu and Jun Araki and Graham Neubig},
      title     = {Generalizing Natural Language Analysis through Span-relation Representations},
      booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
      pages     = {2120--2133},
      month     = {7},
      year      = {2020},
      address   = {Online},
    }
    
  6. Zhengbao Jiang, Jun Araki, Donghan Yu, Ruohong Zhang, Wei Xu, Yiming Yang, and Graham Neubig.
    Learning Relation Entailment with Structured and Textual Information.
    Automated Knowledge Base Construction (AKBC). 2020.
    [ paper | summary | abstract | bibtex | slides | code/data ]

    Summary: We define and explore the task of predicting relation entailment, which allows us to construct relation hierarchies and potentially benefits a wide variety of downstream applications such as knowledge graph representation learning, question answering, relation extraction, and summarization.
    Abstract: Relations among words and entities are important for semantic understanding of text, but previous work has largely not considered relations between relations, or meta-relations. In this paper, we specifically examine relation entailment, where the existence of one relation can entail the existence of another relation. Relation entailment allows us to construct relation hierarchies, enabling applications in representation learning, question answering, relation extraction, and summarization. To this end, we formally define the new task of predicting relation entailment and construct a dataset by expanding the existing Wikidata relation hierarchy without expensive human intervention. We propose several methods that incorporate both structured and textual information to represent relations for this task. Experiments and analysis demonstrate that this task is challenging, and we provide insights into task characteristics that may form a basis for future work. The dataset and code have been released at https://github.com/jzbjyb/RelEnt.
    @unpublished{Jiang2020Learning,
      author    = {Zhengbao Jiang and Jun Araki and Donghan Yu and Ruohong Zhang and Wei Xu and Yiming Yang and Graham Neubig},
      title     = {Learning Relation Entailment with Structured and Textual Information},
      booktitle = {Proceedings of the 2nd Conference on Automated Knowledge Base Construction (AKBC)},
      month     = {6},
      year      = {2020},
      address   = {Online},
    }
    
  7. Jun Araki, Lamana Mulaffer, Arun Pandian, Yukari Yamakawa, Kemal Oflazer, and Teruko Mitamura.
    Interoperable Annotation of Events and Event Relations across Domains.
    Workshop on Interoperable Semantic Annotation (ISA). 2018.
    [ paper | summary | abstract | bibtex | slides ]

    Summary: We present methodologies for annotating a wide coverage of events and event relations on different genres of text in a principled and consistent manner, thereby improving interoperability in the annotation of events and their relations.
    Abstract: This paper presents methodologies for interoperable annotation of events and event relations across different domains. In addition to the interoperability, our annotation scheme supports a wider coverage of events and event relations than prior work. We employ the methodologies to annotate events and event relations on Simple Wikipedia articles in 10 different domains. Our analysis demonstrates that the methodologies can allow us to annotate events and event relations in a principled manner against the wide variety of domains. Despite our relatively wide and flexible annotation of events, we achieve high inter-annotator agreement on event annotation. We also provide an analysis of issues on annotation of events and event relations.
    @inproceedings{Araki2018Interoperable,
      author    = {Jun Araki and Lamana Mulaffer and Arun Pandian and Yukari Yamakawa and Kemal Oflazer and Teruko Mitamura},
      title     = {Interoperable Annotation of Events and Event Relations across Domains},
      booktitle = {Proceedings of the 14th Interoperable Semantic Annotation Workshop (ISA)},
      pages     = {10--20},
      month     = {8},
      year      = {2018},
      address   = {Santa Fe, NM, USA},
    }
    
  8. Jun Araki and Teruko Mitamura.
    Open-Domain Event Detection using Distant Supervision.
    International Conference on Computational Linguistics (COLING). 2018.
    [ paper | summary | abstract | bibtex | poster | code/data ]

    Summary: We introduce open-domain event detection which aims to detect all kinds of events regardless of domains, and show that a distant supervision method can mitigate the issue of training data scarcity in that task.
    Abstract: This paper introduces open-domain event detection, a new event detection paradigm to address issues of prior work on restricted domains and event annotation. The goal is to detect all kinds of events regardless of domains. Given the absence of training data, we propose a distant supervision method that is able to generate high-quality training data. Using a manually annotated event corpus as gold standard, our experiments show that despite no direct supervision, the model outperforms supervised models. This result indicates that the distant supervision enables robust event detection in various domains, while obviating the need for human annotation of events.
    @inproceedings{Araki2018Open,
      author    = {Jun Araki and Teruko Mitamura},
      title     = {Open-Domain Event Detection using Distant Supervision},
      booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING)},
      pages     = {878--891},
      month     = {8},
      year      = {2018},
      address   = {Santa Fe, NM, USA},
    }
    
  9. Jun Araki.
    Extraction of Event Structures from Text.
    Ph.D. Thesis, Carnegie Mellon University. 2018.
    [ thesis | abstract | bibtex | slides ]

    Abstract: Events are a key semantic component integral to information extraction and natural language understanding, which can potentially enhance many downstream applications. Despite their importance, they have received less attention in research on natural language processing. Salient properties of events are that they are a ubiquitous linguistic phenomenon appearing in various domains and that they compose rich discourse structures via event coreferences, forming a coherent story over multiple sentences.

    The central goal of this thesis is to devise a computational method that models the structural property of events in a principled framework to enable more sophisticated event detection and event coreference resolution. To achieve this goal, we address five important problems in these areas: (1) restricted domains in event detection, (2) data sparsity in event detection, (3) lack of subevent detection, (4) error propagation in pipeline models, and (5) limited applications of events. For the first two problems, we introduce a new paradigm of open-domain event detection and show that it is feasible for a distant supervision method to build models detecting events robustly in various domains while obviating the need for human annotation of events. For the third and fourth problems, we show how structured learning models are capable of capturing event interdependencies and making more informed decisions on event coreference resolution and subevent detection. Lastly, we present a novel application of event structures for question generation, illustrating usefulness of event structures as inference steps in reading comprehension by humans.
    @phdthesis{Araki2018Extraction,
      author    = {Jun Araki},
      title     = {Extraction of Event Structures from Text},
      school    = {Carnegie Mellon University},
      month     = {8},
      year      = {2018},
    }
    
  10. Hans Chalupsky, Jun Araki, Eduard Hovy, Andrew Hsi, Zhengzhong Liu, Xuezhe Ma, Evangelia Spiliopoulou, and Shuxin Yao.
    Multi-lingual Extraction and Integration of Entities, Relations, Events and Sentiments into ColdStart++ KBs with the SAFT System.
    Text Analysis Conference (TAC). 2017.
    [ paper | abstract | bibtex ]

    Abstract: This paper describes our participation in the TAC-KBP 2017 Cold Start++ Knowledge Base Population task. Our SAFT system is a loosely-coupled integration of individual components processing documents in English, Spanish and Chinese. The system extracts entities, slot relations, event nuggets and arguments, performs entity linking against Freebase and event coreference, and also integrates sentiment relations extracted by external collaborators at Columbia and Cornell. The various extractions get combined, linked, deconflicted and integrated into a consistent knowledge base (one per language) for query-based evaluation.
    @inproceedings{Chalupsky2017Multi,
      author    = {Hans Chalupsky and Jun Araki and Eduard Hovy and Andrew Hsi and Zhengzhong Liu and Xuezhe Ma and Evangelia Spiliopoulou and Shuxin Yao},
      title     = {Multi-lingual Extraction and Integration of Entities, Relations, Events and Sentiments into {ColdStart++} {KBs} with the {SAFT} System},
      booktitle = {Proceedings of the 2017 Text Analysis Conference (TAC)},
      month     = {11},
      year      = {2017},
      address   = {Gaithersburg, MD, USA},
    }
    
  11. Jun Araki, Dheeraj Rajagopal, Sreecharan Sankaranarayanan, Susan Holm, Yukari Yamakawa, and Teruko Mitamura.
    Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts.
    International Conference on Computational Linguistics (COLING). 2016.
    [ paper | summary | abstract | bibtex | poster | code ]

    Summary: We propose a question generation method that engages language learners through the use of specific inference steps over multiple sentences.
    Abstract: We present a novel approach to automated question generation that improves upon prior work both from a technology perspective and from an assessment perspective. Our system is aimed at engaging language learners by generating multiple-choice questions which utilize specific inference steps over multiple sentences, namely coreference resolution and paraphrase detection. The system also generates correct answers and semantically-motivated phrase-level distractors as answer choices. Evaluation by human annotators indicates that our approach requires a larger number of inference steps, which necessitate deeper semantic understanding of texts than a traditional single-sentence approach.
    @inproceedings{Araki2016Generating,
      author    = {Jun Araki and Dheeraj Rajagopal and Sreecharan Sankaranarayanan and Susan Holm and Yukari Yamakawa and Teruko Mitamura},
      title     = {Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts},
      booktitle = {Proceedings of the 26th International Conference on Computational Linguistics (COLING)},
      pages     = {1125--1136},
      month     = {12},
      year      = {2016},
      address   = {Osaka, Japan},
    }
    
  12. Zhengzhong Liu, Jun Araki, Teruko Mitamura, and Eduard Hovy.
    CMU-LTI at KBP 2016 Event Nugget Track.
    Text Analysis Conference (TAC). 2016.
    [ paper | abstract | bibtex ]

    Abstract: In this paper, we describe the CMU LTI team's participation in TAC KBP 2016 event nugget track. This year, we extend our feature based event detection and coreference systems to process also Chinese documents. We also conduct experiments using Neural Network based models for English event nugget detection, which can enable us building models that can be easily transfer to different languages. Our feature based English Nugget Detection and Coreference systems both rank number 2 among all the participants. The Chinese counterpart ranks first in English Nugget Detection and second in English Coreference.
    @inproceedings{Liu2016CMU,
      author    = {Zhengzhong Liu and Jun Araki and Teruko Mitamura and Eduard Hovy},
      title     = {{CMU-LTI} at {KBP} 2016 Event Nugget Track},
      booktitle = {Proceedings of the 2016 Text Analysis Conference (TAC)},
      month     = {11},
      year      = {2016},
      address   = {Gaithersburg, MD, USA},
    }
    
  13. Abhishek Kumar and Jun Araki.
    Incorporating Relational Knowledge into Word Representations using Subspace Regularization.
    Association for Computational Linguistics (ACL): Short Papers. 2016.
    [ paper | summary | abstract | bibtex | slides ]

    Summary: We propose a regularization method for word representation learning that models relations with a low-rank subspace, thereby incorporating relational knowledge in a more relaxed manner than the constant translation assumption.
    Abstract: Incorporating lexical knowledge from semantic resources (e.g., WordNet) has been shown to improve the quality of distributed word representations. This knowledge often comes in the form of relational triplets (x, r, y) where words x and y are connected by a relation type r. Existing methods either ignore the relation types, essentially treating the word pairs as generic related words, or employ rather restrictive assumptions to model the relational knowledge. We propose a novel approach to model relational knowledge based on low-rank subspace regularization, and conduct experiments on standard tasks to evaluate its effectiveness.
    @inproceedings{Kumar2016Incorporating,
      author    = {Abhishek Kumar and Jun Araki},
      title     = {Incorporating Relational Knowledge into Word Representations using Subspace Regularization},
      booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL)},
      pages     = {506--511},
      month     = {8},
      year      = {2016},
      address   = {Berlin, Germany},
    }
    
  14. Zhengzhong Liu, Jun Araki, Dheeru Dua, Teruko Mitamura, and Eduard Hovy.
    CMU-LTI at KBP 2015 Event Track.
    Text Analysis Conference (TAC). 2015.
    [ paper | abstract | bibtex ]

    Abstract: We describe CMU LTI's participation in the KBP 2015 Event Track. We officially participated in Task 1: Event Nugget Detection track and Task 3: Event Coreference track. Our system rank high in both tracks. We found that our combined system is competitive but have room to improve. In addition, we have conducted follow up experiments by creating a simple piplined system, and We found it competitive comparing to the official submissions.
    @inproceedings{Liu2015CMU,
      author    = {Zhengzhong Liu and Jun Araki and Dheeru Dua and Teruko Mitamura and Eduard Hovy},
      title     = {{CMU-LTI} at {KBP} 2015 Event Track},
      booktitle = {Proceedings of the 2015 Text Analysis Conference (TAC)},
      month     = {11},
      year      = {2015},
      address   = {Gaithersburg, MD, USA},
    }
    
  15. Jun Araki and Teruko Mitamura.
    Joint Event Trigger Identification and Event Coreference Resolution with Structured Perceptron.
    Empirical Methods in Natural Language Processing (EMNLP): Short Papers. 2015.
    [ paper+supplement | abstract | bibtex | slides ]

    Abstract: Events and their coreference offer useful semantic and discourse resources. We show that the semantic and discourse aspects of events interact with each other. However, traditional approaches addressed event extraction and event coreference resolution either separately or sequentially, which limits their interactions. This paper proposes a document-level structured learning model that simultaneously identifies event triggers and resolves event coreference. We demonstrate that the joint model outperforms a pipelined model by 6.9 BLANC F1 and 1.8 CoNLL F1 points in event coreference resolution using a corpus in the biology domain.
    @inproceedings{Araki2015Joint,
      author    = {Jun Araki and Teruko Mitamura},
      title     = {Joint Event Trigger Identification and Event Coreference Resolution with Structured Perceptron},
      booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
      pages     = {2074--2080},
      month     = {9},
      year      = {2015},
      address   = {Lisbon, Portugal},
    }
    
  16. Di Wang, Leonid Boytsov, Jun Araki, Alkesh Patel, Jeff Gee, Zhengzhong Liu, Eric Nyberg, and Teruko Mitamura.
    CMU Multiple-choice Question Answering System at NTCIR-11 QA-Lab.
    NII Testbeds and Community for Information access Research (NTCIR). 2014.
    [ paper | abstract | bibtex ]

    Abstract: We describe CMU's UIMA-based modular automatic question answering (QA) system. This system answers multiple-choice English questions for the world history entrance exam. Questions are preceded by short descriptions providing a historical context. Given the context and question-specific instructions, we generate verifiable assertions for each answer choice. These assertions are evaluated using several evidencing modules, which assign a plausibility score to each assertion. These scores are then aggregated to produce the most plausible answer choice. In the NTCIR-11 QALab evaluations, our system achieved 51.6% accuracy on the training set, 47.2% on Phase 1 testing set, and 34.1% on Phase 2 testing set.
    @inproceedings{Wang2014CMU,
      author    = {Di Wang and Leonid Boytsov and Jun Araki and Alkesh Patel and Jeff Gee and Zhengzhong Liu and Eric Nyberg and Teruko Mitamura},
      title     = {{CMU} Multiple-choice Question Answering System at {NTCIR-11} {QA-Lab}},
      booktitle = {Proceedings of the 11th NII Testbeds and Community for Information access Research Conference (NTCIR)},
      pages     = {542--549},
      month     = {12},
      year      = {2014},
      address   = {Tokyo, Japan},
    }
    
  17. Jun Araki and Jamie Callan.
    An Annotation Similarity Model in Passage Ranking for Historical Fact Validation.
    ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR): Short Papers. 2014.
    [ paper | abstract | bibtex | poster ]

    Abstract: State-of-the-art question answering (QA) systems employ passage retrieval based on bag-of-words similarity models with respect to a query and a passage. We propose a combination of a traditional bag-of-words similarity model and an annotation similarity model to improve passage ranking. The proposed annotation similarity model is generic enough to process annotations of arbitrary types. Historical fact validation is a subtask to determine whether a given sentence tells us historically correct information, which is important for a QA task on world history. Experimental results show that the combined model gains up to 7.7% and 4.2% improvements in historical fact validation in terms of precision at rank 1 and mean reciprocal rank, respectively.
    @inproceedings{Araki2014Annotation,
      author    = {Jun Araki and Jamie Callan},
      title     = {An Annotation Similarity Model in Passage Ranking for Historical Fact Validation},
      booktitle = {Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)},
      pages     = {1111--1114},
      month     = {7},
      year      = {2014},
      address   = {Gold Coast, Australia},
    }
    
  18. Jun Araki, Eduard Hovy, and Teruko Mitamura.
    Evaluation for Partial Event Coreference.
    Workshop on Events: Definition, Detection, Coreference, and Representation. 2014.
    [ paper | abstract | bibtex | poster ]

    Abstract: This paper proposes an evaluation scheme to measure the performance of a system that detects hierarchical event structure for event coreference resolution. We show that each system output is represented as a forest of unordered trees, and introduce the notion of conceptual event hierarchy to simplify the evaluation process. We enumerate the desiderata for a similarity metric to measure the system performance. We examine three metrics along with the desiderata, and show that metrics extended from MUC and BLANC are more adequate than a metric based on Simple Tree Matching.
    @inproceedings{Araki2014Evaluation,
      author    = {Jun Araki and Eduard Hovy and Teruko Mitamura},
      title     = {Evaluation for Partial Event Coreference},
      booktitle = {Proceedings of the 2nd Workshop on Events: Definition, Detection, Coreference, and Representation},
      pages     = {68--76},
      month     = {6},
      year      = {2014},
      address   = {Baltimore, MD, USA},
    }
    
  19. Jun Araki, Zhengzhong Liu, Eduard Hovy, and Teruko Mitamura.
    Detecting Subevent Structure for Event Coreference Resolution.
    International Conference on Language Resources and Evaluation (LREC). 2014.
    [ paper | abstract | bibtex | slides ]

    Abstract: In the task of event coreference resolution, recent work has shown the need to perform not only full coreference but also partial coreference of events. We show that subevents can form a particular hierarchical event structure. This paper examines a novel two-stage approach to finding and improving subevent structures. First, we introduce a multiclass logistic regression model that can detect subevent relations in addition to full coreference. Second, we propose a method to improve subevent structure based on subevent clusters detected by the model. Using a corpus in the Intelligence Community domain, we show that the method achieves over 3.2 BLANC F1 gain in detecting subevent relations against the logistic regression model.
    @inproceedings{Araki2014Detecting,
      author    = {Jun Araki and Zhengzhong Liu and Eduard Hovy and Teruko Mitamura},
      title     = {Detecting Subevent Structure for Event Coreference Resolution},
      booktitle = {Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC)},
      pages     = {4553--4558},
      month     = {5},
      year      = {2014},
      address   = {Reykjavik, Iceland},
    }
    
  20. Zhengzhong Liu, Jun Araki, Eduard Hovy, and Teruko Mitamura.
    Supervised Within-Document Event Coreference using Information Propagation.
    International Conference on Language Resources and Evaluation (LREC). 2014.
    [ paper | abstract | bibtex ]

    Abstract: Event coreference is an important task for full text analysis. However, previous work uses a variety of approaches, sources and evaluation, making the literature confusing and the results incommensurate. We provide a description of the differences to facilitate future research. Second, we present a supervised method for event coreference resolution that uses a rich feature set and propagates information alternatively between events and their arguments, adapting appropriately for each type of argument.
    @inproceedings{Liu2014Supervised,
      author    = {Zhengzhong Liu and Jun Araki and Eduard Hovy and Teruko Mitamura},
      title     = {Supervised Within-Document Event Coreference using Information Propagation},
      booktitle = {Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC)},
      pages     = {4539--4544},
      month     = {5},
      year      = {2014},
      address   = {Reykjavik, Iceland},
    }
    
  21. Mahmoud Azab, Ahmed Salama, Kemal Oflazer, Hideki Shima, Jun Araki, and Teruko Mitamura.
    An English Reading Tool as an NLP Showcase.
    International Joint Conference on Natural Language Processing (IJCNLP): System Demonstrations. 2013.
    [ paper | abstract | bibtex ]

    Abstract: We introduce -- SmartReader -- an English reading tool for non-native English readers to overcome language related hindrances while reading a text. It makes extensive use of widely-available NLP tools and resources. SmartReader is a web-based application that can be accessed from standard browsers running on PCs or tablets. A user can choose a text document from the system's library they want to read or can upload a new document of their own and the system will display an interactive version of such text, that provides the reader with an intelligent e-book functionality.
    @inproceedings{Azab2013English,
      author    = {Mahmoud Azab and Ahmed Salama and Kemal Oflazer and Hideki Shima and Jun Araki and Teruko Mitamura},
      title     = {An {E}nglish Reading Tool as a {NLP} Showcase},
      booktitle = {Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP): System Demonstrations},
      pages     = {5--8},
      month     = {10},
      year      = {2013},
      address   = {Nagoya, Japan},
    }
    
  22. Mahmoud Azab, Ahmed Salama, Kemal Oflazer, Hideki Shima, Jun Araki, and Teruko Mitamura.
    An NLP-based Reading Tool for Aiding Non-native English Readers.
    International Conference on Recent Advances in Natural Language Processing (RANLP). 2013.
    [ paper | abstract | bibtex ]

    Abstract: This paper describes a text-reading tool that makes extensive use of widely available NLP tools and resources to aid non-native English speakers overcome language related hindrances while reading a text. It is a web-based tool, that can be accessed from browsers running on PCs or tablets, and provides the reader with an intelligent e-book functionality.
    @inproceedings{Azab2013NLP,
      author    = {Mahmoud Azab and Ahmed Salama and Kemal Oflazer and Hideki Shima and Jun Araki and Teruko Mitamura},
      title     = {An {NLP}-based Reading Tool for Aiding Non-native English Readers},
      booktitle = {Proceedings of the 9th International Conference on Recent Advances in Natural Language Processing (RANLP)},
      pages     = {41--48},
      month     = {9},
      year      = {2013},
      address   = {Hissar, Bulgaria},
    }
    
  23. Eduard Hovy, Teruko Mitamura, Felisa Verdejo, Jun Araki, and Andrew Philpot.
    Events are Not Simple: Identity, Non-Identity, and Quasi-Identity.
    Workshop on Events: Definition, Detection, Coreference, and Representation. 2013.
    [ paper | abstract | bibtex | poster ]

    Abstract: Despite considerable theoretical and computational work on coreference, deciding when two entities or events are identical is very difficult. In a project to build corpora containing coreference links between events, we have identified three levels of event identity (full, partial, and none). Event coreference annotation on two corpora was performed to validate the findings.
    @inproceedings{Hovy2013Events,
      author    = {Eduard Hovy and Teruko Mitamura and Felisa Verdejo and Jun Araki and Andrew Philpot},
      title     = {Events are Not Simple: {I}dentity, Non-Identity, and Quasi-Identity},
      booktitle = {Proceedings of the 1st Workshop on Events: Definition, Detection, Coreference, and Representation},
      pages     = {21--28},
      month     = {6},
      year      = {2013},
      address   = {Atlanta, GA, USA},
    }
    
  24. Jun Araki.
    Text Classification with a Polysemy Considered Feature Set.
    Master Thesis, The University of Tokyo. 2003.
    [ thesis | abstract | bibtex ]

    Abstract: As we store and distribute a large amount of computerized text, we have an important issue about how we extract useful data effectively from the text data. For this reason, the techniques for classifying text automatically with computers have attached attention.

    Generally, in a field of text classification, we use a model called Vector Space Model(VSM), in which we map a document into a point in a vector space with multiple dimension that has axes based on feature sets of keywords to characterize categories. In the past, lots of different attempts to extract words with highly evaluated values based on some measures, such as mutual information between categories and words, have been made for selection of feature words which characterize categories in text classification. However, some words are polysemous ones which have not a single meaning but multiple meanings, and therefore in the case of those polysemous words, there are documents which belong to different categories from the one to be intended, which causes problems for classification.

    In our research, we consider polysemous words as features with a risk factor for classification, and propose a method that we determine whether each feature word is the risk factor or not, using mutual information as a measure for feature selection, and disambiguate feature sets by removing features judged as risk factors. We compare classifying results with our method to the ones with an existing method, and evaluate its efficiency by using the Reuters-21578 corpus as the target data for classifying.
    @mastersthesis{Araki2003Text,
      author    = {Jun Araki},
      title     = {Text Classification with a Polysemy Considered Feature Set},
      school    = {The University of Tokyo},
      month     = {3},
      year      = {2003},
    }
    
  25. Jun Araki, Fumitaka Nakamura, and Masaya Nakayama.
    Text Classification with a Polysemy Considered Feature Set.
    Information Processing Society of Japan, Special Interest Group on Natural Language Processing (IPSJ-SIGNL). In Japanese. 2003.

  26. Jun Araki, Fumitaka Nakamura, and Masaya Nakayama.
    Automated Categorization of Newspaper Articles using Sectorial Dictionary with Relevant Terms.
    Forum on Information Technology (FIT). In Japanese. 2002.