PhD proposal: Finding Story Chains in Newswire Articles

Ph.D. Dissertation Proposal

Finding Story Chains in Newswire Articles

Xianshu Zhu

1:30pm Thursday 2 June 2011, ITE 325B

Huge amounts of information are shared on the Internet every day, such as online newspapers, digital libraries, blogs, and social network messages. While there are some excellent search engines, such as Google, to assist in retrieving information by simply providing keywords, large volumes of unstructured search results returned by search engines make it hard to keep a clear picture of the evolution of an event. Moreover, in addition to events themselves, people may be more interested in finding out the hidden relationships among different events or causes and effects of an event. However, traditional search engines provide limited support for dealing with these sophisticated search tasks. In this dissertation, we try to enrich search options of existing search engines and organize search results in a more structured and meaningful way.

More specifically, we propose to develop a News Story Reader, with functionality similar to Google maps, that contains the following characteristics: (1) Search results are organized into groups of causes and impacts of events, thus helping web users navigate through the search results in a more directional and efficient way; (2) Enriched search options will allow users to search for correlations between two stories by selecting two articles as start and end points respectively producing a coherent story chain as output; (3) An interactive user interface will provide the functionality to zoom in and zoom out, and add via points to the search result.

In our preliminary work, we start with a relatively simple problem: given a start and an end article we want to find a chain of articles that coherently connect them together. We developed a random walk based algorithm that can find story chains that are coherent and relevant, and with low redundancy. We applied two intelligent pruning methods to reduce the size of the graph so that the algorithm is efficient. Moreover, our next goal is to find hierarchical story chains that can show evolution of stories at different levels of granularity. Thus, we further extended our current algorithm by using random walks on the word-document co-clustering graph with weights biased on name entities to find hierarchical story chains.

The contributions of this dissertation include (1) a News Story Reader system that can help alleviate the information overload problem; (2) design and development of two story chain finding algorithms; (3) exploration of methods that can find story chains on which news articles are connected via causes and impacts; (4) exploration of methods on story chain visualization.

Committee:

  • Dr. Tim Oates (chair)
  • Dr. Charles Nicholas
  • Dr. Tim Finin
  • Dr. Sergei Nirenburg

Posted

in

, , ,

by

Tags: