Event extraction from unstructured data such as news corpora can help many knowledge based systems including building predictive models, risk analysis applications, and enabling new policy making tools. We propose a more generic framework to define and extract events from a large news corpora. This new way of defining events is generic and can capture different kinds of events which are not possible with the existing frameworks. To have a complete coverage of events from a news corpora the representation scheme of events needs to be more flexible and generic. Our model assumes that every single news article is about one single event and from this assumption it automatically tries to learn type of events from a large corpus of news articles. Our model intuitively defines an event as a combination of a central event the articles is describing and some subsidiary events that have some association with the central event and have been described in the articles. Based on the occurrences of such events in the entire corpus our model learns the different event types and the most likely subsidiary events. In addition, the model tries to combine the more probable locations and time for an event. Although many events are not dependent upon time and location, it tries to associate the most likely set of locations and time based on the past occurrences.
Agriculture forms the backbone of several emerging economies. In the past few years, several agrarian regions have been severely affected, due to a combination of several factors including climate, lack of water availability, soil infertility etc. However, in reality, many policymakers and the general public are often unaware of the status of agricultural conditions across different localities within their countries. We have designed a system that automatically constructs a location-specific climatic and agricultural information aggregation and summarization portal based on disparate information sources from the Web. Given a location, the system searches the Web for information concerning different parameters influencing agriculture and climate and presents a summary of relevant information.
Our system is built around three key ideas. First, we (manually) identify target topics of interest within climate and agriculture (such as soil, water) and construct a list of appropriate search queries that comprehensively describe the different aspects of the target topic. Second, for each target topic (such as soil or water), we download the top search result pages and perform information extraction on the textual content of these pages. The information extraction process aims to extract the critical textual snippets that can capture the key trends within the target area. Finally, we perform information summarization where the goal is to identify key trends corresponding to each target topic. We have tailored standard information retrieval techniques to address these problems. This summarized information on the location can be utilized to detect different problems and infer possible remedies from it. Hence, the aim is to bring to fore the important as well as lesser known facts, thereby increasing the availability of knowledge.
Agricultural land availability is undergoing dramatic changes across the globe. This phenomenon is more rampant in developing world where rapid economic growth and increasing population is resulting in unplanned development. Loss of arable land has a direct impact on food security. Most developing regions are also predominantly agrarian economies and changes in arable land can significantly impact food production and availability. We developed an automated satellite image analytics tool that can leverage publicly available satellite image data sources to provide a fine-grained longitudinal analysis of changes in land pattern in a given region. Our goal is to design a data analytics system that can understand the longitudinal relationship between changes in agricultural land availability patterns in a given small geographic area and its corresponding impact on food production. This paper is specifically contextualized for the region of West Bengal, traditionally considered one of the most fertile areas in the world being in the delta of the Gangetic plains. We used a corpus of satellite images gathered from Google Earth, which maintains updated repository of satellite images along with archives of older images across the globe. Based on detailed food production data gathered in collaboration with the bureau of statistics of West Bengal, we analyze the correlations between changes in agricultural land patterns and corresponding changes in food production at fine-grained district granularities. The key building block of our analytics tool is a satellite image analysis engine that can analyze potentially noisy satellite images and provide fine-grained classification of regions within each image into different categories such as: arable land, water body, developed land, forest etc. Given historical data about the same location, the image analysis engine can provide a detailed analysis of land pattern changes. Our engine can detect such changes at different location granularities (small region, district, state level etc.). In the case of West Bengal, we obtained data over a 13 year time period from 2000-2012 and could track land evolution over this entire time period. We correlate this land change pattern with food production data over the same time period gathered by the Bureau of Statistics in the government. This tool can be helpful to policymakers to monitor the changes in the land pattern and take appropriate steps if any drastic changes are noticed.
This study measures the effects of Federal Open Market Committee (FOMC) communications on interest rates. We find that, in recent years, the typical FOMC statement moved mid-term interest rates by 1.1 to 2.8 basis points (bps) up or down. If a press conference is held the day of the meeting, this movement is 0.9 bps greater. Releasing meeting minutes three weeks later only moves interest rates if the FOMC meeting was not followed by a press conference, indicating that the press conference and meeting minutes contain duplicate information. We propose this measure of the volatility effect of the communication itself as a check on the accuracy of text-mining methods that measure the market effects of specific words or sentiments. Next, we identify keywords describing sentiments conveyed and topics discussed in FOMC meetings. Frequencies of these words appear to predict interest rate movements in a regression context. However, two pieces of evidence suggest that these regressions overfit the interest rate data. First, the regression-predicted interest rate movements far exceed the average market effects of an entire statement or press conference. Second, if the frequencies of our keywords summarize the market-moving content in the data— as the regression results suggest—and if the press conferences and minutes contain duplicate information—as our first results indicate—then we would expect the keyword frequencies to be correlated between the press conferences and minutes.