Social, Political and Economic Event Database (SPEED)

The Cline Center for Democracy at the University of Illinois at Urbana-Champaign is committed to furthering scientific research concerning the operation of democratic processes and the relationship between democracy and societal welfare.

One of the Cline Center's signature projects is the Social, Political and Economic Event Database (SPEED) project. SPEED is a technology-intensive effort to extract data from news reports about small-scale civil unrest events like protests and acts of political violence, as well as governmental responses to those activities. SPEED documents civil unrest activity for every country in the world from World War II to the present using a global archive of news reports. Within SPEED, event data are generated by a hybrid system combining fully automated machine-learning and natural language processing technologies with human analysts who draw from a suite of sophisticated tools to implement carefully structured and pretested protocols.

Geo-referencing the appearance of civil unrest events to the level of city-days provides unprecedented insights into key behavioral patterns and relationships that hold across countries and over time. The Cline Center is interested in developing synergistic collaborations, both within the United States and internationally, to realize the potential of SPEED data and to enhance our understanding of how civil unrest affects the course of nations.

High-Resolution Hydrogeologic Predictions through Advanced Spatial, Spatiotemporal, and Visual Analytics

Groundwater recharge and shallow groundwater flow are two components of the hydrologic cycle that are affected by and directly influence many natural and societally important activities.

Understanding and predicting recharge and shallow groundwater flow has historically involved data and models with spatial and temporal resolutions that perform poorly in predicting both extreme events and high-resolution behavior. Increases in measurement technology over the past two decades have resulted in data with sub-meter spatial and sub-hour temporal resolutions. Although these data offer potential for significant advances in hydrogeologic prediction, the resulting file size and computational complexity have effectively limited their use.

The aim of this research is to develop new applications of statistical- and machine-learning algorithms to improve our characterization and prediction capabilities of shallow hydrogeologic properties and processes, and to develop novel applications for utilizing visual analytic methods to enhance the description, analysis, and understanding of this complex, correlated spatiotemporal system. These goals will be achieved through the development of new programs that utilize GPU processing capabilities to achieve improvements in computational performance, making it possible to apply these computationally expensive algorithms and applications to modern high-resolution spatial and spatiotemporal data sets.

Contact: Don Keefer (dkeefer@illinois.edu)

Network Routing of Snow Plows with Resource Replenishment and Plowing Priorities: Formulation, Algorithm, and Application

Leila Hajibabai, Seyed Mohammad Nourbakhsh, Yanfeng Ouyang, and Fan Peng (2014), Transportation Research Record: Journal of the Transportation Research Board. In Press.

Routing of snow plow trucks in urban and regional areas encompasses a variety of complex decisions, especially for jurisdictions with heavy snowfall. The main activities involve dispatching a fleet of plow trucks from a central depot and/or satellite facilities to clean and spread salt/chemicals on the snow routes. This work develops an advanced mathematical models to minimize the total operation time/cost for a fleet of snow plow trucks to complete a given set of snow removal tasks with priorities. Customized solution algorithms are developed to effectively solve the full-scale application for the Lake County Division of Transportation (LCDOT).

Snow control operations involve intensive spatial information and geographic information systems (GIS) have been used to provide a suitable platform for creating, maintaining, and analyzing relevant data. The proposed models and algorithms are incorporated into a C++ optimization module and embedded into the development of a state-of-art snow plow routing analysis and design software tool that is embedded in ESRI ArcGIS. The user interface in the GIS environment passes user-specified inputs to the optimization module, and visualizes and reports the output (e.g., snow plow routes and system performance statistics). The system performance statistics provide a scientific basis for decision makers to conduct “what-if” analyses on fleet management and resource allocation, and supports visualization of the results. Other research efforts, such as integrated planning of supply chain networks for biofuel production and agricultural logistics modeling, are informed and influenced by the progress in this project.

Concurrent Science, Engineering, and Technology for the Prevention of Postharvest Loss

This project aims to study the problem of postharvest losses and identify prevention strategies and address the same by developing a platform and modeling framework for using Concurrent Science, Engineering, and Technology (ConSEnT) tools. The long-term goal of the project is to provide holistic solutions for addressing food security issues in a world with increasing population and demand with an immediate objective to lay the foundation for educators, practitioners, and policy makers to address postharvest loss issues in countries such as India and Brazil.

System Informatics & Analysis for Biomass Feedstock Provision

Biomass Implementation Optimization Modeling Analysis Simulation Software (BIOMASS) has been developed to conduct systems informatics and analysis for biomass provision scenario management. The software includes the following modules:

  1. BioFeed model, which is used for tactical level biomass supply chain optimization including biomass farm production and delivery schedules and farming equipment and vehicle selection;
  2. BioScope model, which is used for strategic level biomass supply chain optimization including facility location and capacity as well as optimal biomass flow patterns;
  3. BioTraNS model, which is used for operational-level biomass supply chain optimization including real-time vehicle routing and dispatch schedules;
  4. BioAgent model, which provides an agent-based simulation tool for analysis of the equilibrium performance between biorefineries and biomass producers; and
  5. BPSys platform, which provides a web interface and computational tools for web-based decision support.

DOE/NSF: Open Science Grid

The CyberGIS Center and the CyberInfrastructure and Geospatial Information (CIGI) Laboratory are part of the Open Science Grid (OSG), a national cyberinfrastructure that brings together distributed computing and storage resources from many campuses and diverse research communities. The CyberGIS Center and CIGI contribute technical and management expertise to OSG. Dr. Shaowen Wang has served two terms as a member of the OSG Council, which governs the OSG Consortium. Dr. Anand Padmanabhan is a member of the OSG Security team that is responsible for the OSG security framework. Additionally, we operate the CIGI Virtual Organization (VO) on the OSG which consists of members from earth and social sciences communities who develop and use geospatial information systems and technologies and allows our members to access High Throughput Computing (HTC) resources available through OSG.

 

NSF XSEDE: Extending and Sustaining CyberGIS Discovery Environment

The CyberGIS Center and the CyberInfrastructure and Geospatial Information (CIGI) Laboratory have been awarded access to computational resources through the NSF eXtreme Science and Engineering Discovery Environment (XSEDE) (formerly TeraGrid) program continuously since 2007. We were awarded 9.35 million supercomputing hours for the academic year of 2013-2014 to advance data-rich geospatial sciences and technologies based on the research and development of CyberGIS. A set of high-performance and scalable spatial analysis and modeling methods and services have been developed to fully exploit these high-end computational resources. This project provides access to a rich set of the most powerful supercomputers in the world for open scientific research and, thus, will enable CyberGIS researchers and educators to continue and lead cutting-edge computing and data-intensive geospatial research and education. These resources cover a wide spectrum of supercomputer architectures and parallel computing models. Specifically, our allocated supercomputing time is distributed on seven high-end resources operated by four supercomputer centers. The following research and development projects are supported in this project:

  • Parallel spatial optimization
  • Parallel agent-based modeling
  • Social media analytics
  • High-performance map data processing
  • CyberGIS Toolkit
  • CyberGIS Gateway and outreach
 

NSF CAREER: Formalizing & Resolving Computational Intensity of Spatial Analysis to Establish a Cyber-GIS Framework

This project will transform the current state of play of the three fields GIScience, spatial analysis, and cyberinfrastructure, while creating a new subject domain of computational intensity. A novel theoretical approach to computational intensity will enhance spatial analysis methods integrated with cyberinfrastructure and GIS. The Cyber-GIS framework will be established by developing innovative algorithms and software components based on this approach. Both real and synthetic data within a science application context of discovering geographic patterns of global climate change impact on large-scale coupled human and natural systems will be used to evaluate this framework. The project will gain fundamental knowledge for coupling the capabilities of GIS, spatial analysis, and cyberinfrastructure and will, therefore, guide the development of emerging spatial cyberinfrastructure. While this project will outreach to underrepresented and minority groups as well as general public through an on-line Cyber-GIS platform, it holds a great promise to enable widespread scientific breakthroughs that are important to the nation and society.

 

NSF SI2: SSI: CyberGIS Software Integration for Sustained Geospatial Innovation

Specific project objectives include:

  • Engage multidisciplinary communities through a participatory approach to evolving CyberGIS software requirements;
  • Integrate and sustain a core set of composable, interoperable, manageable, and reusable CyberGIS software elements based on community-driven and open-source strategies;
  • Empower high-performance and scalable CyberGIS by exploiting spatial characteristics of data and analytical operations for achieving unprecedented capabilities for geospatial scientific discoveries;
  • Enhance an online geospatial problem solving environment to allow for the contribution, sharing and learning of CyberGIS software by numerous users, which will foster the development of crosscutting education, outreach and training programs with significant broad impacts;
  • Deploy and test CyberGIS software by linking with national and international CI to achieve scalability to significant sizes of geospatial problems, amounts of CI resources, and number of users;
  • Evaluate and improve the CyberGIS framework through domain science applications and vibrant partnerships to gain better understanding of the complexity of coupled human-natural systems (e.g. for assessing impacts of climate change and rapid emergency response).

Domain Sciences

  • Advanced cyberinfrastructure
  • Climate change impact assessment
  • Emergency management
  • Geographic information science
  • Geography and spatial sciences
  • Geosciences

User Communities

  • Biologists
  • Geographers
  • Geoscientists
  • Social scientists
  • General public
  • Broad geographic information systems (GIS) users
 

NSF EAGER: CISSDA: Unified Cyberinfrastructure Framework for Scalable Spatiotemporal Data Analytics

This project creates a unified cyberinfrastructure framework by adapting and integrating heterogeneous modalities of computing and information infrastructure (e.g., cloud, high-performance computing, and high-throughput computing) for scalable spatiotemporal data analytics. The framework encompasses two types of novel and complementary capabilities: 1) a suite of methods and algorithms for scalable spatiotemporal data analytics through synthesis of data mining, information network analysis, and parallel and cloud computing; and 2) a geographic information system (GIS) based on advanced cyberinfrastructure (i.e., cyberGIS) to facilitate the use of the methods and algorithms by a large number of users. These novel capabilities help overcome many current limitations in geographic and social science research involving huge amount of spatiotemporal data, and bring forth useful insights for formulating new policies. The framework is designed to gain new fundamental understanding about individual activity patterns and spaces in the domain of environmental health through scalable analysis of massive space-time trajectory data that depict the movement of individuals over space and time. By the ubiquitous use of spatiotemporal data, the project will lead to both transformative and broad impacts on almost all disciplines that employ geospatial technologies for scientific problem solving and decision-making support.

 

NSF Blue Waters: An Extreme-Scale Computational Approach to Redistricting Optimization

This project exploits the massive computational power provided by the Blue Waters supercomputer for computationally intensive zoning optimization research. Zoning can be formulated as NP-hard discrete optimization problems, and has attracted significant research interests in political science, geographic information science (GIScience), and operations research with tremendous broader impacts. The project team, led by Shaowen Wang, will develop computational approaches to addressing the fundamental question of how to evaluate bias (racial, partisan, or otherwise) in zoning plans at fine spatial scales. A parallel genetic algorithm (PGA) library will be extended and enhanced on Blue Waters to scale zoning optimization capabilities to hundreds of thousands of problem variables.

 

NSF Data Infrastructure Building Blocks: Scalable Capabilities for Spatial Data Synthesis

Spatial data often embedded with geographic references are important to numerous scientific domains (e.g., ecology, geography and spatial sciences, geosciences, and social sciences, to name just a few) and also beneficial to solving many critical societal problems (e.g., environmental and urban sustainability). In recent years, however, this type of data has exploded to massive size and significant complexity as increasingly sophisticated location-based sensors and devices (e.g., social networks, smart phones, and environmental sensors) are widely deployed and used. The big spatial data collected from numerous sources are extensively used to instrument our natural, human and social systems at unprecedented scales while providing us with tremendous opportunities to gain dynamic insight into complex phenomena. However, to synthesize various spatial data—a foundational process of various scientific problem-solving practices—has become increasingly difficult and is not scalable to the significant size, complexity and diversity of spatial data. Therefore, the overarching goal of this project is to establish fundamental and scalable capabilities for spatial data synthesis through integration with cyberGIS and novel cloud computing strategies to enable cutting-edge data-intensive research and education across multiple scientific communities. This project will achieve the following objectives:

  1. Develop a core set of community-driven and scalable capabilities for meeting the requirements of spatial data synthesis in two representative scientific case studies: measuring urban sustainability based on a number of social, environmental, and physical factors and processes, and examining population dynamics at high spatial and temporal resolutions by synthesizing multiple state of the art population data sources with location-based social media data;
  2. Establish a scalable suite of data synthesis capabilities: (1) data integration capabilities that ensure that data from different sources can be combined independently of the original format and type in which it has been produced, and (2) data aggregation capabilities that ensure scaling to accommodate varying numbers of data sources, user requests, and processing while providing response time appropriate to the handled data;
  3. Evaluate and improve these capabilities by engaging the broad cyberGIS community that span scientists across bio, computational, engineering, geo, and social sciences;
  4. Integrate the data synthesis capabilities with the CyberGIS Science Gateway to ensure open and wide access to the capabilities;
  5. Develop novel education and training materials for a large number of users to learn the capabilities and build on them to learn about scientific principles of spatial data synthesis.
 

NSF MRI: Acquisition of a National CyberGIS Facility for Computing and Data-Intensive Geospatial Research and Education

The CyberGIS Center has received a Major Research Instrumentation (MRI) grant from the National Science Foundation to build a high-performance computing system optimized to deal with geospatial data.

The earth and environment are facing a changing climate and the accelerated degradation of natural resources, issues that create a host of societal problems. Advancing geographic information science and systems and related applications in fields such as environmental engineering, hydrology and water resources, public health, and urban studies can help address these important geospatial concerns.

Once collected, geospatial data—or information that is linked to location and time—can be complex, irregular, and difficult to analyze. With the newly awarded NSF grant of more than $2.5 million support, the CyberGIS Center hopes to change that with the creation of the National CyberGIS Facility. A consortium of 11 units across the University of Illinois, Urbana-Champaign campus as well as multiple academic, government, and industry partners have come together to build a novel CyberGIS instrument capable of solving diverse sets of scientific problems previously considered impossible.

This instrument will be equipped with:

  • more than 7 petabytes of raw disk storage with high input/output (I/O) bandwidth;
  • solid state drives for applications demanding high data-access performance;
  • advanced graphics processing units for exploiting massive parallelism in geospatial computing;
  • and interactive visualization supported with a high-speed network and dynamically provisioned cloud computing resources.

These capabilities will be integrated through leading edge CyberGIS software and tools and lead users to data-rich and interactive geospatial problem solving and decision-making. The new facility will be made available in spring 2015, supporting pioneering research in geospatial communities, industries, and government entities as well as foster crosscutting education, outreach, and training programs.