Longitudinal Analytics
of Web Archive Data


Extraction of Temporal Facts and Events from Wikipedia

The paper "Extraction of Temporal Facts and Events from Wikipedia" by Erdal Kuzey and Gerhard Weikum has been accepted for the second Temporal Web Analytics Workshop (TempWeb 2012) in conjunction with the WWW 2012 conference.

Recently, large-scale knowledge bases have been constructed by automatically extracting relational facts from text. Unfortunately, most of the current knowledge bases focus on static facts and ignore the temporal dimension. However, the vast majority of facts are evolving with time or are valid only during a particular time period. Thus, time is a signi ficant dimension that should be included in knowledge bases.
In this paper, we introduce a complete information extraction framework that harvests temporal facts and events from semi-structured data and free text of Wikipedia articles to create a temporal ontology. First, we extend a temporal data representation model by making it aware of events. Second, we develop an information extraction method which harvests temporal facts and events from Wikipedia infoboxes, categories, lists, and article titles in order to build a temporal knowledge base. Third, we show how the system can use its extracted knowledge for further growing the knowledge base.
We demonstrate the e ffectiveness of our proposed methods through several experiments. We extracted more than one million temporal facts with precision over 90% for extraction from semi-structured data and almost 70% for extraction from text.

TempWeb 2012 homepage