Publications

Most of my publications are also available at my Google Scholar profile.

  1. Anthony Colas, Jun Araki, Zhengyu Zhou, Bingqing Wang, and Zhe Feng.
    Knowledge-grounded Natural Language Recommendation Explanation.
    In Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP. 2023. Best Paper Award.
    [ paper | summary | abstract | bibtex | arxiv | slides | code/data ]

    Summary:
    Abstract: Explanations accompanying a recommendation can assist users in understanding the decision made by recommendation systems, which in turn increases a user's confidence and trust in the system. Recently, research has focused on generating natural language explanations in a human-readable format. Thus far, the proposed approaches leverage item reviews written by users, which are often subjective, sparse in language, and unable to account for new items that have not been purchased or reviewed before. Instead, we aim to generate fact-grounded recommendation explanations that are objectively described with item features while implicitly considering a user's preferences, based on the user's purchase history. To achieve this, we propose a knowledge graph (KG) approach to natural language explainable recommendation. Our approach draws on user-item features through a novel collaborative filtering-based KG representation to produce fact-grounded, personalized explanations, while jointly learning user-item representations for recommendation scoring. Experimental results show that our approach consistently outperforms previous state-of-the-art models on natural language explainable recommendation metrics.
    @inproceedings{Colas2023Knowledge,
      author    = {Anthony Colas and Jun Araki and Zhengyu Zhou and Bingqing Wang and Zhe Feng},
      title     = {Knowledge-grounded Natural Language Recommendation Explanation},
      booktitle = {Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP},
      pages     = {1--15},
      month     = {12},
      year      = {2023},
      address   = {Singapore},
    }
    
  2. Mobashir Sadat, Zhengyu Zhou, Lukas Lange, Jun Araki, Arsalan Gundroo, Bingqing Wang, Rakesh R. Menon, Md Rizwan Parvez, and Zhe Feng.
    DelucionQA: Detecting Hallucinations in Domain-specific Question Answering.
    In Findings of the Association for Computational Linguistics: EMNLP 2023. 2023.
    [ paper | summary | abstract | bibtex | arxiv | slides | poster | code/data ]

    Summary:
    Abstract: Hallucination is a well-known phenomenon in text generated by large language models (LLMs). The existence of hallucinatory responses is found in almost all application scenarios e.g., summarization, question-answering (QA) etc. For applications requiring high reliability (e.g., customer-facing assistants), the potential existence of hallucination in LLM-generated text is a critical problem. The amount of hallucination can be reduced by leveraging information retrieval to provide relevant background information to the LLM. However, LLMs can still generate hallucinatory content for various reasons (e.g., prioritizing its parametric knowledge over the context, failure to capture the relevant information from the context, etc.). Detecting hallucinations through automated methods is thus paramount. To facilitate research in this direction, we introduce a sophisticated dataset, DelucionQA, that captures hallucinations made by retrieval-augmented LLMs for a domain-specific QA task. Furthermore, we propose a set of hallucination detection methods to serve as baselines for future works from the research community. Analysis and case study are also provided to share valuable insights on hallucination phenomena in the target scenario.
    @inproceedings{Sadat2023DelucionQA,
      author    = {Mobashir Sadat and Zhengyu Zhou and Lukas Lange and Jun Araki and Arsalan Gundroo and Bingqing Wang and Rakesh R. Menon and Md Rizwan Parvez and Zhe Feng},
      title     = {{DelucionQA}: {D}etecting Hallucinations in Domain-specific Question Answering},
      booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2023},
      pages     = {822--835},
      month     = {12},
      year      = {2023},
      address   = {Singapore},
    }
    
  3. Naoki Otani, Jun Araki, HyeongSik Kim, and Eduard Hovy.
    On the Underspecification of Situations in Open-domain Conversational Datasets.
    In Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI). 2023. Outstanding Paper Award.
    [ paper | summary | abstract | bibtex | slides ]

    Summary:
    Abstract: Advances of open-domain conversational systems have been achieved through the creation of numerous conversation datasets. However, many of the commonly used datasets contain little or no information about the conversational situation, such as relevant objects/people, their properties, and relationships. This absence leads to underspecification of the problem space and typically results in undesired dialogue system behavior. This position paper discusses the current state of the field associated with processing situational information. An analysis of response generation using three datasets shows that explicitly provided situational information can improve the coherence and specificity of generated responses, but further experiments reveal that generation systems can be misled by irrelevant information. Our conclusions from this evaluation provide insights into the problem and directions for future research.
    @inproceedings{Otani2023Underspecification,
      author    = {Naoki Otani and Jun Araki and HyeongSik Kim and Eduard Hovy},
      title     = {On the Underspecification of Situations in Open-domain Conversational Datasets},
      booktitle = {Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI)},
      pages     = {12--28},
      month     = {7},
      year      = {2023},
      address   = {Toronto, Canada},
    }
    
  4. Naoki Otani, Jun Araki, HyeongSik Kim, and Eduard Hovy.
    A Textual Dataset for Situated Proactive Response Selection.
    In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). 2023.
    [ paper | summary | abstract | bibtex | poster | data ]

    Summary:
    Abstract: Recent data-driven conversational models are able to return fluent, consistent, and informative responses to many kinds of requests and utterances in task-oriented scenarios. However, these responses are typically limited to just the immediate local topic instead of being wider-ranging and proactively taking the conversation further, for example making suggestions to help customers achieve their goals. This inadequacy reflects a lack of understanding of the interlocutor’s situation and implicit goal. To address the problem, we introduce a task of proactive response selection based on situational information. We present a manually-curated dataset of 1.7k English conversation examples that include situational background information plus for each conversation a set of responses, only some of which are acceptable in the situation. A responsive and informed conversation system should select the appropriate responses and avoid inappropriate ones; doing so demonstrates the ability to adequately understand the initiating request and situation. Our benchmark experiments show that this is not an easy task even for strong neural models, offering opportunities for future research.
    @inproceedings{Otani2023Textual,
      author    = {Naoki Otani and Jun Araki and HyeongSik Kim and Eduard Hovy},
      title     = {A Textual Dataset for Situated Proactive Response Selection},
      booktitle = {Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)},
      pages     = {3856--3874},
      month     = {7},
      year      = {2023},
      address   = {Toronto, Canada},
    }
    
  5. Rakesh R. Menon, Bingqing Wang, Jun Araki, Zhengyu Zhou, Zhe Feng, and Liu Ren.
    CoAug: Combining Augmentation of Labels and Labelling Rules.
    In Findings of the Association for Computational Linguistics: ACL 2023. 2023.
    [ paper | summary | abstract | bibtex | poster | code ]

    Summary:
    Abstract: Collecting labeled data for Named Entity Recognition (NER) tasks is challenging due to the high cost of manual annotations. Instead, researchers have proposed few-shot self-training and rule-augmentation techniques to minimize the reliance on large datasets. However, inductive biases and restricted logical language lexicon, respectively, can limit the ability of these models to perform well. In this work, we propose CoAug, a co-augmentation framework that allows us to improve few-shot models and rule-augmentation models by bootstrapping predictions from each model. By leveraging rules and neural model predictions to train our models, we complement the benefits of each and achieve the best of both worlds. In our experiments, we show that our best CoAug model can outperform strong weak-supervision-based NER models at least by 6.5 F1 points.
    @inproceedings{Menon2023CoAug,
      author    = {Rakesh R. Menon and Bingqing Wang and Jun Araki and Zhengyu Zhou and Zhe Feng and Liu Ren},
      title     = {{CoAug}: {C}ombining Augmentation of Labels and Labelling Rules},
      booktitle = {Findings of the Association for Computational Linguistics: ACL 2023},
      pages     = {9062--9071},
      month     = {7},
      year      = {2023},
      address   = {Toronto, Canada},
    }
    
  6. Koustava Goswami, Lukas Lange, Jun Araki, and Heike Adel.
    SwitchPrompt: Learning Domain-Specific Gated Soft Prompts for Classification in Low-Resource Domains.
    In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL): Short Papers. 2023.
    [ paper | summary | abstract | bibtex | arxiv | poster | code ]

    Summary: We propose a trainable domain-oriented prompt that provides effective guidance on the target domains for general-domain language models.
    Abstract: Prompting pre-trained language models leads to promising results across natural language processing tasks but is less effective when applied in low-resource domains, due to the domain gap between the pre-training data and the downstream task. In this work, we bridge this gap with a novel and lightweight prompting methodology called SwitchPrompt for the adaptation of language models trained on datasets from the general domain to diverse low-resource domains. Using domain-specific keywords with a trainable gated prompt, SwitchPrompt offers domain-oriented prompting, that is, effective guidance on the target domains for general-domain language models. Our few-shot experiments on three text classification benchmarks demonstrate the efficacy of the general-domain pre-trained language models when used with SwitchPrompt. They often even outperform their domain-specific counterparts trained with baseline state-of-the-art prompting methods by up to 10.7% performance increase in accuracy. This result indicates that SwitchPrompt effectively reduces the need for domain-specific language model pre-training.
    @inproceedings{Goswami2023SwitchPrompt,
      author    = {Koustava Goswami and Lukas Lange and Jun Araki and Heike Adel},
      title     = {{S}witch{P}rompt: {L}earning Domain-Specific Gated Soft Prompts for Classification in Low-Resource Domains},
      booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
      pages     = {2689--2695},
      month     = {5},
      year      = {2023},
      address   = {Dubrovnik, Croatia},
    }
    
  7. Zhengbao Jiang, Luyu Gao, Jun Araki, Haibo Ding, Zhiruo Wang, Jamie Callan, and Graham Neubig.
    Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer.
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2022.
    [ paper | summary | abstract | bibtex | slides | arxiv | code ]

    Summary: We propose a single encoder-decoder model for question answering (QA) that jointly trains retrieval and reading components on supervision from the QA task only, and demonstrate that the model achieves competitive performance in both retrieval and QA tasks, compared to state-of-the-art separately trained retrievers and readers.
    Abstract: Systems for knowledge-intensive tasks such as open-domain question answering (QA) usually consist of two stages: efficient retrieval of relevant documents from a large corpus and detailed reading of the selected documents to generate answers. Retrievers and readers are usually modeled separately, which necessitates a cumbersome implementation and is hard to train and adapt in an end-to-end fashion. In this paper, we revisit this design and eschew the separate architecture and training in favor of a single Transformer that performs Retrieval as Attention (ReAtt), and end-to-end training solely based on supervision from the end QA task. We demonstrate for the first time that a single model trained end-to-end can achieve both competitive retrieval and QA performance, matching or slightly outperforming state-of-the-art separately trained retrievers and readers. Moreover, end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings, making our model a simple and adaptable solution for knowledgeintensive tasks. Code and models are available at https://github.com/jzbjyb/ReAtt.
    @inproceedings{Jiang2022Retrieval,
      author    = {Zhengbao Jiang and Luyu Gao and Jun Araki and Haibo Ding and Zhiruo Wang and Jamie Callan and Graham Neubig},
      title     = {Retrieval as Attention: {E}nd-to-end Learning of Retrieval and Reading within a Single Transformer},
      booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
      pages     = {2336--2349},
      month     = {12},
      year      = {2022},
      address   = {Abu Dhabi, United Arab Emirates},
    }
    
  8. Zhengbao Jiang, Jun Araki, Haibo Ding, and Graham Neubig.
    Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering.
    In Proceedings of the International Conference on Computational Linguistics (COLING). 2022.
    [ paper | summary | abstract | bibtex | slides | arxiv | code ]

    Summary: We investigate the multi-hop reasoning capabilities of generative question answering models and show that they tend to take shortcuts when answering multi-hop questions, but there are possibilities of improving the capabilities through approximations with single-hop questions and logical forms.
    Abstract: Generative question answering (QA) models generate answers to questions either solely based on the parameters of the model (the closed-book setting) or additionally retrieving relevant evidence (the open-book setting). Generative QA models can answer some relatively complex questions, but the mechanism through which they do so is still poorly understood. We perform several studies aimed at better understanding the multi-hop reasoning capabilities of generative QA models. First, we decompose multi-hop questions into multiple corresponding single-hop questions, and find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains. Second, we find that models lack zero-shot multi-hop reasoning ability: when trained only on single-hop questions, models generalize poorly to multi-hop questions. Finally, we demonstrate that it is possible to improve models' zero-shot multi-hop reasoning capacity through two methods that approximate real multi-hop natural language (NL) questions by training on either concatenation of single-hop questions or logical forms (SPARQL). In sum, these results demonstrate that multi-hop reasoning does not emerge naturally in generative QA models, but can be encouraged by advances in training or modeling techniques. Code is available at https://github.com/jzbjyb/multihop.
    @inproceedings{Jiang2022Understanding,
      author    = {Zhengbao Jiang and Jun Araki and Haibo Ding and Graham Neubig},
      title     = {Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering},
      booktitle = {Proceedings of the 29th International Conference on Computational Linguistics (COLING)},
      pages     = {1765--1775},
      month     = {10},
      year      = {2022},
      address   = {Gyeongju, Republic of Korea},
    }
    
  9. Zhengbao Jiang, Jun Araki, Haibo Ding, and Graham Neubig.
    How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering.
    Transactions of the Association for Computational Linguistics (TACL). 2021.
    [ paper | summary | abstract | bibtex | arxiv | code ]

    Summary: We first show that language models (LMs) applied to question answering tasks are poorly calibrated and then examine the effectiveness of both generic and LM-specific calibration methods.
    Abstract: Recent works have shown that language models (LM) capture different types of knowledge regarding facts or common sense. However, because no model is perfect, they still fail to provide appropriate answers in many cases. In this paper, we ask the question, "How can we know when language models know, with confidence, the answer to a particular query?" We examine this question from the point of view of calibration, the property of a probabilistic model’s predicted probabilities actually being well correlated with the probabilities of correctness. We examine three strong generative models—T5, BART, and GPT-2—and study whether their probabilities on QA tasks are well calibrated, finding the answer is a relatively emphatic no. We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness through fine-tuning, post-hoc probability modification, or adjustment of the predicted outputs or inputs. Experiments on a diverse range of datasets demonstrate the effectiveness of our methods. We also perform analysis to study the strengths and limitations of these methods, shedding light on further improvements that may be made in methods for calibrating LMs. We have released the code at https://github.com/jzbjyb/lm-calibration.
    @article{Jiang2021How,
      author    = {Zhengbao Jiang and Jun Araki and Haibo Ding and Graham Neubig},
      title     = {How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering},
      journal   = {Transactions of the Association for Computational Linguistics (TACL)},
      volume    = {9},
      pages     = {962--977},
      month     = {9},
      year      = {2021},
    }
    
  10. Pei Chen, Haibo Ding, Jun Araki, and Ruihong Huang.
    Explicitly Capturing Relations between Entity Mentions via Graph Neural Networks for Domain-specific Named Entity Recognition.
    In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP): Short Papers. 2021.
    [ paper | summary | abstract | bibtex | poster ]

    Summary: We incorporate global coreference and local dependency relations using graph neural networks, thereby improving the performance of domain-specific named entity recognition.
    Abstract: Named entity recognition (NER) is well studied for the general domain, and recent systems have achieved human-level performance for identifying common entity types. However, the NER performance is still moderate for specialized domains that tend to feature complicated contexts and jargonistic entity types. To address these challenges, we propose explicitly connecting entity mentions based on both global coreference relations and local dependency relations for building better entity mention representations. In our experiments, we incorporate entity mention relations by Graph Neural Networks and show that our system noticeably improves the NER performance on two datasets from different domains. We further show that the proposed lightweight system can effectively elevate the NER performance to a higher level even when only a tiny amount of labeled data is available, which is desirable for domain-specific NER.
    @inproceedings{Chen2021Explicitly,
      author    = {Pei Chen and Haibo Ding and Jun Araki and Ruihong Huang},
      title     = {Explicitly Capturing Relations between Entity Mentions via Graph Neural Networks for Domain-specific Named Entity Recognition},
      booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP)},
      pages     = {735--742},
      month     = {8},
      year      = {2021},
      address   = {Online},
    }
    
  11. Zhengbao Jiang, Antonios Anastasopoulos, Jun Araki, Haibo Ding, and Graham Neubig.
    X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models.
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.
    [ paper | summary | abstract | bibtex | slides | arxiv | code/data ]

    Summary: We provide a multilingual benchmark of cloze-style probes to assess the factual knowledge retrieval capability of language models in typologically diverse languages.
    Abstract: Language models (LMs) have proven surprisingly successful at capturing factual knowledge by completing cloze-style fill-in-the-blank questions such as “Punta Cana is located in _.” However, while knowledge is both written and queried in many languages, studies on LMs’ factual representation ability have almost invariably been performed on English. To assess factual knowledge retrieval in LMs in different languages, we create a multilingual benchmark of cloze-style probes for typologically diverse languages. To properly handle language variations, we expand probing methods from single- to multi-word entities, and develop several decoding algorithms to generate multi-token predictions. Extensive experimental results provide insights about how well (or poorly) current state-of-the-art LMs perform at this task in languages with more or fewer available resources. We further propose a code-switching-based method to improve the ability of multilingual LMs to access knowledge, and verify its effectiveness on several benchmark languages. Benchmark data and code have be released at https://x-factr.github.io.
    @inproceedings{Jiang2020XFACTR,
      author    = {Zhengbao Jiang and Antonios Anastasopoulos and Jun Araki and Haibo Ding and Graham Neubig},
      title     = {{X-FACTR}: {M}ultilingual Factual Knowledge Retrieval from Pretrained Language Models},
      booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
      pages     = {5943--5959},
      month     = {11},
      year      = {2020},
      address   = {Online},
    }
    
  12. Zhengbao Jiang, Frank F. Xu, Jun Araki, and Graham Neubig.
    How Can We Know What Language Models Know?
    Transactions of the Association for Computational Linguistics (TACL). 2020.
    [ paper | summary | abstract | bibtex | slides | arxiv | code/data ]

    Summary: Generating high-quality and diverse prompts achieves a more accurate estimate of the factual knowledge retrieved by language models because language models are sensitive to how we query them.
    Abstract: Recent work has presented intriguing results examining the knowledge contained in language models (LM) by having the LM fill in the blanks of prompts such as "Obama is a _ by profession". These prompts are usually manually created, and quite possibly sub-optimal; another prompt such as "Obama worked as a _" may result in more accurately predicting the correct profession. Because of this, given an inappropriate prompt, we might fail to retrieve facts that the LM does know, and thus any given prompt only provides a lower bound estimate of the knowledge contained in an LM. In this paper, we attempt to more accurately estimate the knowledge contained in LMs by automatically discovering better prompts to use in this querying process. Specifically, we propose mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts, as well as ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark for extracting relational knowledge from LMs demonstrate that our methods can improve accuracy from 31.1% to 39.6%, providing a tighter lower bound on what LMs know. We have released the code and the resulting LM Prompt And Query Archive (LPAQA) at https://github.com/jzbjyb/LPAQA.
    @article{Jiang2020How,
      author    = {Zhengbao Jiang and Frank F. Xu and Jun Araki and Graham Neubig},
      title     = {How Can We Know What Language Models Know?},
      journal   = {Transactions of the Association for Computational Linguistics (TACL)},
      volume    = {8},
      pages     = {423--438},
      month     = {7},
      year      = {2020},
    }
    
  13. Zhengbao Jiang, Wei Xu, Jun Araki, and Graham Neubig.
    Generalizing Natural Language Analysis through Span-relation Representations.
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 2020.
    [ paper | summary | abstract | bibtex | slides | arxiv | code ]

    Summary: A single task-agnostic model based on span-relation representations can address a wide variety of NLP tasks predicting syntax, semantics, and information contents, while being able to achieve performance comparable to state-of-the-art specialized models.
    Abstract: Natural language processing covers a wide variety of tasks predicting syntax, semantics, and information content, and usually each type of output is generated with specially designed architectures. In this paper, we provide the simple insight that a great variety of tasks can be represented in a single unified format consisting of labeling spans and relations between spans, thus a single task-independent model can be used across different tasks. We perform extensive experiments to test this insight on 10 disparate tasks spanning dependency parsing (syntax), semantic role labeling (semantics), relation extraction (information content), aspect based sentiment analysis (sentiment), and many others, achieving performance comparable to state-of-the-art specialized models. We further demonstrate benefits of multi-task learning, and also show that the proposed method makes it easy to analyze differences and similarities in how the model handles different tasks. Finally, we convert these datasets into a unified format to build a benchmark, which provides a holistic testbed for evaluating future models for generalized natural language analysis.
    @inproceedings{Jiang2020Generalizing,
      author    = {Zhengbao Jiang and Wei Xu and Jun Araki and Graham Neubig},
      title     = {Generalizing Natural Language Analysis through Span-relation Representations},
      booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)},
      pages     = {2120--2133},
      month     = {7},
      year      = {2020},
      address   = {Online},
    }
    
  14. Zhengbao Jiang, Jun Araki, Donghan Yu, Ruohong Zhang, Wei Xu, Yiming Yang, and Graham Neubig.
    Learning Relation Entailment with Structured and Textual Information.
    In Proceedings of the Conference on Automated Knowledge Base Construction (AKBC). 2020.
    [ paper | summary | abstract | bibtex | slides | code/data ]

    Summary: We define and explore the task of predicting relation entailment, which allows us to construct relation hierarchies and potentially benefits a wide variety of downstream applications such as knowledge graph representation learning, question answering, relation extraction, and summarization.
    Abstract: Relations among words and entities are important for semantic understanding of text, but previous work has largely not considered relations between relations, or meta-relations. In this paper, we specifically examine relation entailment, where the existence of one relation can entail the existence of another relation. Relation entailment allows us to construct relation hierarchies, enabling applications in representation learning, question answering, relation extraction, and summarization. To this end, we formally define the new task of predicting relation entailment and construct a dataset by expanding the existing Wikidata relation hierarchy without expensive human intervention. We propose several methods that incorporate both structured and textual information to represent relations for this task. Experiments and analysis demonstrate that this task is challenging, and we provide insights into task characteristics that may form a basis for future work. The dataset and code have been released at https://github.com/jzbjyb/RelEnt.
    @unpublished{Jiang2020Learning,
      author    = {Zhengbao Jiang and Jun Araki and Donghan Yu and Ruohong Zhang and Wei Xu and Yiming Yang and Graham Neubig},
      title     = {Learning Relation Entailment with Structured and Textual Information},
      booktitle = {Proceedings of the 2nd Conference on Automated Knowledge Base Construction (AKBC)},
      month     = {6},
      year      = {2020},
      address   = {Online},
    }
    
  15. Jun Araki, Lamana Mulaffer, Arun Pandian, Yukari Yamakawa, Kemal Oflazer, and Teruko Mitamura.
    Interoperable Annotation of Events and Event Relations across Domains.
    In Proceedings of the Workshop on Interoperable Semantic Annotation (ISA). 2018.
    [ paper | summary | abstract | bibtex | slides ]

    Summary: We present methodologies for annotating a wide coverage of events and event relations on different genres of text in a principled and consistent manner, thereby improving interoperability in the annotation of events and their relations.
    Abstract: This paper presents methodologies for interoperable annotation of events and event relations across different domains. In addition to the interoperability, our annotation scheme supports a wider coverage of events and event relations than prior work. We employ the methodologies to annotate events and event relations on Simple Wikipedia articles in 10 different domains. Our analysis demonstrates that the methodologies can allow us to annotate events and event relations in a principled manner against the wide variety of domains. Despite our relatively wide and flexible annotation of events, we achieve high inter-annotator agreement on event annotation. We also provide an analysis of issues on annotation of events and event relations.
    @inproceedings{Araki2018Interoperable,
      author    = {Jun Araki and Lamana Mulaffer and Arun Pandian and Yukari Yamakawa and Kemal Oflazer and Teruko Mitamura},
      title     = {Interoperable Annotation of Events and Event Relations across Domains},
      booktitle = {Proceedings of the 14th Joint {ACL} - {ISO} Workshop on Interoperable Semantic Annotation (ISA)},
      pages     = {10--20},
      month     = {8},
      year      = {2018},
      address   = {Santa Fe, NM, USA},
    }
    
  16. Jun Araki and Teruko Mitamura.
    Open-Domain Event Detection using Distant Supervision.
    In Proceedings of the International Conference on Computational Linguistics (COLING). 2018.
    [ paper | summary | abstract | bibtex | poster | code/data ]

    Summary: We introduce open-domain event detection which aims to detect all kinds of events regardless of domains, and show that a distant supervision method can mitigate the issue of training data scarcity in that task.
    Abstract: This paper introduces open-domain event detection, a new event detection paradigm to address issues of prior work on restricted domains and event annotation. The goal is to detect all kinds of events regardless of domains. Given the absence of training data, we propose a distant supervision method that is able to generate high-quality training data. Using a manually annotated event corpus as gold standard, our experiments show that despite no direct supervision, the model outperforms supervised models. This result indicates that the distant supervision enables robust event detection in various domains, while obviating the need for human annotation of events.
    @inproceedings{Araki2018Open,
      author    = {Jun Araki and Teruko Mitamura},
      title     = {Open-Domain Event Detection using Distant Supervision},
      booktitle = {Proceedings of the 27th International Conference on Computational Linguistics (COLING)},
      pages     = {878--891},
      month     = {8},
      year      = {2018},
      address   = {Santa Fe, NM, USA},
    }
    
  17. Jun Araki.
    Extraction of Event Structures from Text.
    Ph.D. Thesis, Carnegie Mellon University. 2018.
    [ thesis | abstract | bibtex | slides ]

    Abstract: Events are a key semantic component integral to information extraction and natural language understanding, which can potentially enhance many downstream applications. Despite their importance, they have received less attention in research on natural language processing. Salient properties of events are that they are a ubiquitous linguistic phenomenon appearing in various domains and that they compose rich discourse structures via event coreferences, forming a coherent story over multiple sentences.

    The central goal of this thesis is to devise a computational method that models the structural property of events in a principled framework to enable more sophisticated event detection and event coreference resolution. To achieve this goal, we address five important problems in these areas: (1) restricted domains in event detection, (2) data sparsity in event detection, (3) lack of subevent detection, (4) error propagation in pipeline models, and (5) limited applications of events. For the first two problems, we introduce a new paradigm of open-domain event detection and show that it is feasible for a distant supervision method to build models detecting events robustly in various domains while obviating the need for human annotation of events. For the third and fourth problems, we show how structured learning models are capable of capturing event interdependencies and making more informed decisions on event coreference resolution and subevent detection. Lastly, we present a novel application of event structures for question generation, illustrating usefulness of event structures as inference steps in reading comprehension by humans.
    @phdthesis{Araki2018Extraction,
      author    = {Jun Araki},
      title     = {Extraction of Event Structures from Text},
      school    = {Carnegie Mellon University},
      month     = {8},
      year      = {2018},
    }
    
  18. Hans Chalupsky, Jun Araki, Eduard Hovy, Andrew Hsi, Zhengzhong Liu, Xuezhe Ma, Evangelia Spiliopoulou, and Shuxin Yao.
    Multi-lingual Extraction and Integration of Entities, Relations, Events and Sentiments into ColdStart++ KBs with the SAFT System.
    In Proceedings of the Text Analysis Conference (TAC). 2017.
    [ paper | abstract | bibtex ]

    Abstract: This paper describes our participation in the TAC-KBP 2017 Cold Start++ Knowledge Base Population task. Our SAFT system is a loosely-coupled integration of individual components processing documents in English, Spanish and Chinese. The system extracts entities, slot relations, event nuggets and arguments, performs entity linking against Freebase and event coreference, and also integrates sentiment relations extracted by external collaborators at Columbia and Cornell. The various extractions get combined, linked, deconflicted and integrated into a consistent knowledge base (one per language) for query-based evaluation.
    @inproceedings{Chalupsky2017Multi,
      author    = {Hans Chalupsky and Jun Araki and Eduard Hovy and Andrew Hsi and Zhengzhong Liu and Xuezhe Ma and Evangelia Spiliopoulou and Shuxin Yao},
      title     = {Multi-lingual Extraction and Integration of Entities, Relations, Events and Sentiments into {ColdStart++} {KBs} with the {SAFT} System},
      booktitle = {Proceedings of the 2017 Text Analysis Conference (TAC)},
      month     = {11},
      year      = {2017},
      address   = {Gaithersburg, MD, USA},
    }
    
  19. Jun Araki, Dheeraj Rajagopal, Sreecharan Sankaranarayanan, Susan Holm, Yukari Yamakawa, and Teruko Mitamura.
    Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts.
    In Proceedings of the International Conference on Computational Linguistics (COLING). 2016.
    [ paper | summary | abstract | bibtex | poster | code ]

    Summary: We propose a question generation method that engages language learners through the use of specific inference steps over multiple sentences.
    Abstract: We present a novel approach to automated question generation that improves upon prior work both from a technology perspective and from an assessment perspective. Our system is aimed at engaging language learners by generating multiple-choice questions which utilize specific inference steps over multiple sentences, namely coreference resolution and paraphrase detection. The system also generates correct answers and semantically-motivated phrase-level distractors as answer choices. Evaluation by human annotators indicates that our approach requires a larger number of inference steps, which necessitate deeper semantic understanding of texts than a traditional single-sentence approach.
    @inproceedings{Araki2016Generating,
      author    = {Jun Araki and Dheeraj Rajagopal and Sreecharan Sankaranarayanan and Susan Holm and Yukari Yamakawa and Teruko Mitamura},
      title     = {Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts},
      booktitle = {Proceedings of the 26th International Conference on Computational Linguistics (COLING)},
      pages     = {1125--1136},
      month     = {12},
      year      = {2016},
      address   = {Osaka, Japan},
    }
    
  20. Zhengzhong Liu, Jun Araki, Teruko Mitamura, and Eduard Hovy.
    CMU-LTI at KBP 2016 Event Nugget Track.
    In Proceedings of the Text Analysis Conference (TAC). 2016.
    [ paper | abstract | bibtex ]

    Abstract: In this paper, we describe the CMU LTI team's participation in TAC KBP 2016 event nugget track. This year, we extend our feature based event detection and coreference systems to process also Chinese documents. We also conduct experiments using Neural Network based models for English event nugget detection, which can enable us building models that can be easily transfer to different languages. Our feature based English Nugget Detection and Coreference systems both rank number 2 among all the participants. The Chinese counterpart ranks first in English Nugget Detection and second in English Coreference.
    @inproceedings{Liu2016CMU,
      author    = {Zhengzhong Liu and Jun Araki and Teruko Mitamura and Eduard Hovy},
      title     = {{CMU-LTI} at {KBP} 2016 Event Nugget Track},
      booktitle = {Proceedings of the 2016 Text Analysis Conference (TAC)},
      month     = {11},
      year      = {2016},
      address   = {Gaithersburg, MD, USA},
    }
    
  21. Abhishek Kumar and Jun Araki.
    Incorporating Relational Knowledge into Word Representations using Subspace Regularization.
    In Proceedings of the Annual Meeting of Association for Computational Linguistics (ACL): Short Papers. 2016.
    [ paper | summary | abstract | bibtex | slides ]

    Summary: We propose a regularization method for word representation learning that models relations with a low-rank subspace, thereby incorporating relational knowledge in a more relaxed manner than the constant translation assumption.
    Abstract: Incorporating lexical knowledge from semantic resources (e.g., WordNet) has been shown to improve the quality of distributed word representations. This knowledge often comes in the form of relational triplets (x, r, y) where words x and y are connected by a relation type r. Existing methods either ignore the relation types, essentially treating the word pairs as generic related words, or employ rather restrictive assumptions to model the relational knowledge. We propose a novel approach to model relational knowledge based on low-rank subspace regularization, and conduct experiments on standard tasks to evaluate its effectiveness.
    @inproceedings{Kumar2016Incorporating,
      author    = {Abhishek Kumar and Jun Araki},
      title     = {Incorporating Relational Knowledge into Word Representations using Subspace Regularization},
      booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL)},
      pages     = {506--511},
      month     = {8},
      year      = {2016},
      address   = {Berlin, Germany},
    }
    
  22. Zhengzhong Liu, Jun Araki, Dheeru Dua, Teruko Mitamura, and Eduard Hovy.
    CMU-LTI at KBP 2015 Event Track.
    In Proceedings of the Text Analysis Conference (TAC). 2015.
    [ paper | abstract | bibtex ]

    Abstract: We describe CMU LTI's participation in the KBP 2015 Event Track. We officially participated in Task 1: Event Nugget Detection track and Task 3: Event Coreference track. Our system rank high in both tracks. We found that our combined system is competitive but have room to improve. In addition, we have conducted follow up experiments by creating a simple piplined system, and We found it competitive comparing to the official submissions.
    @inproceedings{Liu2015CMU,
      author    = {Zhengzhong Liu and Jun Araki and Dheeru Dua and Teruko Mitamura and Eduard Hovy},
      title     = {{CMU-LTI} at {KBP} 2015 Event Track},
      booktitle = {Proceedings of the 2015 Text Analysis Conference (TAC)},
      month     = {11},
      year      = {2015},
      address   = {Gaithersburg, MD, USA},
    }
    
  23. Jun Araki and Teruko Mitamura.
    Joint Event Trigger Identification and Event Coreference Resolution with Structured Perceptron.
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP): Short Papers. 2015.
    [ paper+supplement | summary | abstract | bibtex | slides ]

    Summary: We propose a document-level joint learning model for event detection and event coreference resolution in order to capture interdependencies between event mentions.
    Abstract: Events and their coreference offer useful semantic and discourse resources. We show that the semantic and discourse aspects of events interact with each other. However, traditional approaches addressed event extraction and event coreference resolution either separately or sequentially, which limits their interactions. This paper proposes a document-level structured learning model that simultaneously identifies event triggers and resolves event coreference. We demonstrate that the joint model outperforms a pipelined model by 6.9 BLANC F1 and 1.8 CoNLL F1 points in event coreference resolution using a corpus in the biology domain.
    @inproceedings{Araki2015Joint,
      author    = {Jun Araki and Teruko Mitamura},
      title     = {Joint Event Trigger Identification and Event Coreference Resolution with Structured Perceptron},
      booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
      pages     = {2074--2080},
      month     = {9},
      year      = {2015},
      address   = {Lisbon, Portugal},
    }
    
  24. Di Wang, Leonid Boytsov, Jun Araki, Alkesh Patel, Jeff Gee, Zhengzhong Liu, Eric Nyberg, and Teruko Mitamura.
    CMU Multiple-choice Question Answering System at NTCIR-11 QA-Lab.
    In Proceedings of the NII Testbeds and Community for Information access Research (NTCIR). 2014.
    [ paper | abstract | bibtex ]

    Abstract: We describe CMU's UIMA-based modular automatic question answering (QA) system. This system answers multiple-choice English questions for the world history entrance exam. Questions are preceded by short descriptions providing a historical context. Given the context and question-specific instructions, we generate verifiable assertions for each answer choice. These assertions are evaluated using several evidencing modules, which assign a plausibility score to each assertion. These scores are then aggregated to produce the most plausible answer choice. In the NTCIR-11 QALab evaluations, our system achieved 51.6% accuracy on the training set, 47.2% on Phase 1 testing set, and 34.1% on Phase 2 testing set.
    @inproceedings{Wang2014CMU,
      author    = {Di Wang and Leonid Boytsov and Jun Araki and Alkesh Patel and Jeff Gee and Zhengzhong Liu and Eric Nyberg and Teruko Mitamura},
      title     = {{CMU} Multiple-choice Question Answering System at {NTCIR-11} {QA-Lab}},
      booktitle = {Proceedings of the 11th NII Testbeds and Community for Information access Research Conference (NTCIR)},
      pages     = {542--549},
      month     = {12},
      year      = {2014},
      address   = {Tokyo, Japan},
    }
    
  25. Jun Araki and Jamie Callan.
    An Annotation Similarity Model in Passage Ranking for Historical Fact Validation.
    In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR): Short Papers. 2014.
    [ paper | summary | abstract | bibtex | poster ]

    Summary: We propose to compute an annotation similarity with a graph matching algorithm and combine it with a traditional bag-of-words similarity model to improve passage ranking for the task of historical fact validation.
    Abstract: State-of-the-art question answering (QA) systems employ passage retrieval based on bag-of-words similarity models with respect to a query and a passage. We propose a combination of a traditional bag-of-words similarity model and an annotation similarity model to improve passage ranking. The proposed annotation similarity model is generic enough to process annotations of arbitrary types. Historical fact validation is a subtask to determine whether a given sentence tells us historically correct information, which is important for a QA task on world history. Experimental results show that the combined model gains up to 7.7% and 4.2% improvements in historical fact validation in terms of precision at rank 1 and mean reciprocal rank, respectively.
    @inproceedings{Araki2014Annotation,
      author    = {Jun Araki and Jamie Callan},
      title     = {An Annotation Similarity Model in Passage Ranking for Historical Fact Validation},
      booktitle = {Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)},
      pages     = {1111--1114},
      month     = {7},
      year      = {2014},
      address   = {Gold Coast, Australia},
    }
    
  26. Jun Araki, Eduard Hovy, and Teruko Mitamura.
    Evaluation for Partial Event Coreference.
    In Proceedings of the Workshop on Events: Definition, Detection, Coreference, and Representation. 2014.
    [ paper | abstract | bibtex | poster ]

    Abstract: This paper proposes an evaluation scheme to measure the performance of a system that detects hierarchical event structure for event coreference resolution. We show that each system output is represented as a forest of unordered trees, and introduce the notion of conceptual event hierarchy to simplify the evaluation process. We enumerate the desiderata for a similarity metric to measure the system performance. We examine three metrics along with the desiderata, and show that metrics extended from MUC and BLANC are more adequate than a metric based on Simple Tree Matching.
    @inproceedings{Araki2014Evaluation,
      author    = {Jun Araki and Eduard Hovy and Teruko Mitamura},
      title     = {Evaluation for Partial Event Coreference},
      booktitle = {Proceedings of the 2nd Workshop on Events: Definition, Detection, Coreference, and Representation},
      pages     = {68--76},
      month     = {6},
      year      = {2014},
      address   = {Baltimore, MD, USA},
    }
    
  27. Jun Araki, Zhengzhong Liu, Eduard Hovy, and Teruko Mitamura.
    Detecting Subevent Structure for Event Coreference Resolution.
    In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2014.
    [ paper | abstract | bibtex | slides ]

    Abstract: In the task of event coreference resolution, recent work has shown the need to perform not only full coreference but also partial coreference of events. We show that subevents can form a particular hierarchical event structure. This paper examines a novel two-stage approach to finding and improving subevent structures. First, we introduce a multiclass logistic regression model that can detect subevent relations in addition to full coreference. Second, we propose a method to improve subevent structure based on subevent clusters detected by the model. Using a corpus in the Intelligence Community domain, we show that the method achieves over 3.2 BLANC F1 gain in detecting subevent relations against the logistic regression model.
    @inproceedings{Araki2014Detecting,
      author    = {Jun Araki and Zhengzhong Liu and Eduard Hovy and Teruko Mitamura},
      title     = {Detecting Subevent Structure for Event Coreference Resolution},
      booktitle = {Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC)},
      pages     = {4553--4558},
      month     = {5},
      year      = {2014},
      address   = {Reykjavik, Iceland},
    }
    
  28. Zhengzhong Liu, Jun Araki, Eduard Hovy, and Teruko Mitamura.
    Supervised Within-Document Event Coreference using Information Propagation.
    In Proceedings of the International Conference on Language Resources and Evaluation (LREC). 2014.
    [ paper | abstract | bibtex ]

    Abstract: Event coreference is an important task for full text analysis. However, previous work uses a variety of approaches, sources and evaluation, making the literature confusing and the results incommensurate. We provide a description of the differences to facilitate future research. Second, we present a supervised method for event coreference resolution that uses a rich feature set and propagates information alternatively between events and their arguments, adapting appropriately for each type of argument.
    @inproceedings{Liu2014Supervised,
      author    = {Zhengzhong Liu and Jun Araki and Eduard Hovy and Teruko Mitamura},
      title     = {Supervised Within-Document Event Coreference using Information Propagation},
      booktitle = {Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC)},
      pages     = {4539--4544},
      month     = {5},
      year      = {2014},
      address   = {Reykjavik, Iceland},
    }
    
  29. Mahmoud Azab, Ahmed Salama, Kemal Oflazer, Hideki Shima, Jun Araki, and Teruko Mitamura.
    An English Reading Tool as an NLP Showcase.
    In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP): System Demonstrations. 2013.
    [ paper | abstract | bibtex ]

    Abstract: We introduce -- SmartReader -- an English reading tool for non-native English readers to overcome language related hindrances while reading a text. It makes extensive use of widely-available NLP tools and resources. SmartReader is a web-based application that can be accessed from standard browsers running on PCs or tablets. A user can choose a text document from the system's library they want to read or can upload a new document of their own and the system will display an interactive version of such text, that provides the reader with an intelligent e-book functionality.
    @inproceedings{Azab2013English,
      author    = {Mahmoud Azab and Ahmed Salama and Kemal Oflazer and Hideki Shima and Jun Araki and Teruko Mitamura},
      title     = {An {E}nglish Reading Tool as a {NLP} Showcase},
      booktitle = {Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP): System Demonstrations},
      pages     = {5--8},
      month     = {10},
      year      = {2013},
      address   = {Nagoya, Japan},
    }
    
  30. Mahmoud Azab, Ahmed Salama, Kemal Oflazer, Hideki Shima, Jun Araki, and Teruko Mitamura.
    An NLP-based Reading Tool for Aiding Non-native English Readers.
    In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP). 2013.
    [ paper | abstract | bibtex ]

    Abstract: This paper describes a text-reading tool that makes extensive use of widely available NLP tools and resources to aid non-native English speakers overcome language related hindrances while reading a text. It is a web-based tool, that can be accessed from browsers running on PCs or tablets, and provides the reader with an intelligent e-book functionality.
    @inproceedings{Azab2013NLP,
      author    = {Mahmoud Azab and Ahmed Salama and Kemal Oflazer and Hideki Shima and Jun Araki and Teruko Mitamura},
      title     = {An {NLP}-based Reading Tool for Aiding Non-native English Readers},
      booktitle = {Proceedings of the 9th International Conference on Recent Advances in Natural Language Processing (RANLP)},
      pages     = {41--48},
      month     = {9},
      year      = {2013},
      address   = {Hissar, Bulgaria},
    }
    
  31. Eduard Hovy, Teruko Mitamura, Felisa Verdejo, Jun Araki, and Andrew Philpot.
    Events are Not Simple: Identity, Non-Identity, and Quasi-Identity.
    In Proceedings of the Workshop on Events: Definition, Detection, Coreference, and Representation. 2013.
    [ paper | abstract | bibtex | poster ]

    Abstract: Despite considerable theoretical and computational work on coreference, deciding when two entities or events are identical is very difficult. In a project to build corpora containing coreference links between events, we have identified three levels of event identity (full, partial, and none). Event coreference annotation on two corpora was performed to validate the findings.
    @inproceedings{Hovy2013Events,
      author    = {Eduard Hovy and Teruko Mitamura and Felisa Verdejo and Jun Araki and Andrew Philpot},
      title     = {Events are Not Simple: {I}dentity, Non-Identity, and Quasi-Identity},
      booktitle = {Proceedings of the 1st Workshop on Events: Definition, Detection, Coreference, and Representation},
      pages     = {21--28},
      month     = {6},
      year      = {2013},
      address   = {Atlanta, GA, USA},
    }
    
  32. Jun Araki.
    Text Classification with a Polysemy Considered Feature Set.
    Master Thesis, The University of Tokyo. 2003.
    [ thesis | abstract | bibtex ]

    Abstract: As we store and distribute a large amount of computerized text, we have an important issue about how we extract useful data effectively from the text data. For this reason, the techniques for classifying text automatically with computers have attached attention.

    Generally, in a field of text classification, we use a model called Vector Space Model(VSM), in which we map a document into a point in a vector space with multiple dimension that has axes based on feature sets of keywords to characterize categories. In the past, lots of different attempts to extract words with highly evaluated values based on some measures, such as mutual information between categories and words, have been made for selection of feature words which characterize categories in text classification. However, some words are polysemous ones which have not a single meaning but multiple meanings, and therefore in the case of those polysemous words, there are documents which belong to different categories from the one to be intended, which causes problems for classification.

    In our research, we consider polysemous words as features with a risk factor for classification, and propose a method that we determine whether each feature word is the risk factor or not, using mutual information as a measure for feature selection, and disambiguate feature sets by removing features judged as risk factors. We compare classifying results with our method to the ones with an existing method, and evaluate its efficiency by using the Reuters-21578 corpus as the target data for classifying.
    @mastersthesis{Araki2003Text,
      author    = {Jun Araki},
      title     = {Text Classification with a Polysemy Considered Feature Set},
      school    = {The University of Tokyo},
      month     = {3},
      year      = {2003},
    }
    
  33. Jun Araki, Fumitaka Nakamura, and Masaya Nakayama.
    Text Classification with a Polysemy Considered Feature Set.
    Information Processing Society of Japan, Special Interest Group on Natural Language Processing (IPSJ-SIGNL). In Japanese. 2003.

  34. Jun Araki, Fumitaka Nakamura, and Masaya Nakayama.
    Automated Categorization of Newspaper Articles using Sectorial Dictionary with Relevant Terms.
    In Proceedings of the Forum on Information Technology (FIT). In Japanese. 2002.