Application or Semester Project Ideas

From CMSC734 Spring 2015
Jump to: navigation, search

Please add ideas to the Project Ideas page.

Develop an error detecting program for multi-variate (tabular) data.

Develop text analysis software that compares two or more corpora/documents, based on keywords, keyphrases, or topics. A simple version of this is to compare two lists of keywords, keyphrases, or rows of data, and show the similarities and dissimilarities.

Study the fund raising of UMd vs other Universities and find ways to improve donations from alumni.

Study crime patterns from the UMd logs to help police reduce crime.

Develop idea of searching networks for subgraphs (former student project that could be improved)

Study Twitter hashtags over time to see evolution of interests, especially convergence on hashtag terms.

Study YouTube evolution for topics: number of videos, number of views.

Urban Institute- Bureau of Labor Statistics Monthly Employment Report Jon Schwabish ( Provide web-based visualization of the monthly employment report.

Scholar Citation Statistics Jimmy Lin ( Scholar Scraper web site Use visualization to find patterns, expose anomalies, etc.

Personal Health Records: Monifa Vaughn-Cooke,

Personal Health Records (PHR) are a critical component of chronic disease treatment and have the ability to support patient self-management by empowering patients to be more active in their medical care. PHR includes features such as symptom checker, lab results, prescription refill, and messaging, among others. However, current PHR systems are not designed for the largest and highest-risk segments of the chronic disease population (elderly, disabled, minorities, etc.). In an effort to address health disparities and better inform PHR design, survey data (n=160) was collected to evaluate personal characteristics and design preferences in demographic subgroups of the chronic disease patient population. The data fields include demographic information, disease information, health literacy, technological competence, and preferences for PHR features. The goal of the CMSC 734 project is to explore the relationships between personal patient characteristics, chronic disease, and PHR features intended to support these patients.

Deployment of Security Patches: Tudor Dumitras <>

We are measuring the deployment of security patches, over time, on millions of hosts worldwide. The main data sets are time series of the vulnerability survival probability, which is the likelihood that a vulnerable host remains vulnerable after t days, and of the hazard rate, which corresponds to the daily patching rate. I attach here a couple of examples. The survival functions (computed using the Kaplan Meier estimator) are monotonically decreasing, from 1 to 0, and they can be used for comparing the patching speed of different vulnerabilities. The hazard rates have more interesting features — they can increase or decrease over time, and in some cases we can see spikes, which indicate multiple waves of patching. We have the data for about 300 of these patches, for several applications using different patch mechanisms, and our challenge is to group these curves and identify the main patterns of patching.

Visualizing Web Connections; Michelle Mazurek []

The high-level idea is to take a real user's browsing history data (suitably sanitized), containing information about which 3rd party services are connected to each domain, and visualize how domains are connected in terms of web tracking. That is, visualizing how your Amazon activity can be connected to your Facebook activity, WebMD activity, etc. (To my knowledge, this has been attempted a couple of times but the results are not very satisfactory.) We have sanitized data from 33 people that can be used as examples (and could potentially get more if needed). There are essentially two high-level research questions: 1) Can users comprehend the web connections using the visualization (e.g. speed, accuracy) and 2) How does the visualization affect their attitudes/behaviors -- does it change their opinions about web tracking? Targeted advertising? etc.

University of Maryland Center for Advanced Transportation Technology

Michael L. Pack, Director, J. Kim Engineering Bldg. Suite 3144 College Park, MD 20742 Work: 301-405-0722 Fax: 301-403-4591 He has worked with course teams in the past and hired at least a half dozen students for his projects. Michael Robert VanDaniker Visualization Manager Slides-Research Needs

Drug Prescription Patterns: Temporal Event Sequences

Catherine Plaisant, Research Scientist [] at HCIL ( She has worked with many course teams and has projects on medical informatics. Here are the slides from her presentation: Slides-Visualization of Drug Prescription Patterns.