Given data reaching an unprecedented amount, coming from diverse sources, and covering a variety of domains in heterogeneous formats, information providers are faced with the critical challenge to process, retrieve and present information to their users in order to satisfy their complex information needs. In this paper, we present Thomson Reuters’ effort in developing a family of services for building and querying an enterprise knowledge graph in order to address this challenge. We first acquire data from various sources via different approaches. Furthermore, we mine useful information from the data by adopting a variety of techniques, including Named Entity Recognition and Relation Extraction; such mined information is further integrated with existing structured data (e.g., via Entity Linking techniques) in order to obtain relatively comprehensive descriptions of the entities. By modeling the data as an RDF graph model, we enable easy data management and the embedding of rich semantics in our data. Finally, in order to facilitate the querying of this mined and integrated data, i.e., the knowledge graph, we propose TR Discover, a natural language interface that allows users to ask questions of our knowledge graph in their own words; these natural language questions are translated into executable queries for answer retrieval. We evaluate our services, i.e., named entity recognition, relation extraction, entity linking and natural language interface, on real-world datasets, and demonstrate and discuss their practicability and limitations.
Index Terms—Knowledge Graph, Data Acquisition, Data Transformation, Data Modeling, Data Interlinking, Natural Language Interface