Application or Semester Project Ideas

From Cmsc734_f12
Jump to: navigation, search

Project Ideas (please add ones you would like teammates for)

Develop an error detecting program for multi-variate (tabular) data.

Develop text analysis software that compares two or more corpora/documents, based on keywords, keyphrases, or topics. A simple version of this is to compare two lists of keywords, keyphrases, or rows of data, and show the similarities and dissimilarities.

Study the fund raising of UMd vs other Universities and find ways to improve donations from alumni.

Study crime patterns from the UMd logs to help police reduce crime.

Develop idea of searching networks for subgraphs (former student project that could be improved)

Study Twitter hashtags over time to see evolution of interests, especially convergence on hashtag terms.

Study YouTube evolution for topics: number of videos, number of views.


Marc Smith marc@smrfoundation.org http://nodexlgraphgallery.org http://www.smrfoundation.org/

Consider adding features to NodeXL - open source in C# .NET (http://www.codeplex.com/nodexl) Potential project sponsor: CS Grad Student Cody Dunne [cdunne@cs.umd.edu]

  • Compare networks: select two graphs and report what is different between them
  • Integrate NLP features
  • Write an importer for a new data sources: Google+, RDF, and Wikimedia
  • Better layouts: Cody's designs, Force Atlas 2, "No Overlap" option
  • Aggregate content over time: track URLs, Hashtags, Users, Word Pairs over time.
  • Network compression for megascale and greater network results: in the end, we must reduce ALL networks to no more than N elements to be discretely visible on a current screen. Thus, down sampling networks to fit a N element maximum is important.
  • Time Series - Jacopo Cirrone prototype - fast BTree DB back end.
  • Multiplex edges: best practices - review the current range of multiple edge management.


Guest Speakers with Project ideas:

Diane H. Cline 240-421-3226 dianehcline@gmail.com Humanities Researcher, GWU

NODEXL and Social Network Analysis for Ancient History Needed immediately: A talented data visualization student who can take basic NODEXL charts and transform them into clear, informative, and publishable images to accompany articles demonstrating the results of social network analyses of the ancient world. We have been using NODEXL to study and generate social network graphs for ancient history. We have three data sets for you to work on; you may do all three or choose one. The first is the large social network of Alexander the Great (4th century BC) with 404 nodes and over 1000 edges; the second is a small set for Pericles of the 5th century BC in Ancient Athens with 35 nodes and 55 edges, and the third is a medium-sized set based on correspondence written on clay tablets called “the Amarna Letters” of correspondence between Egyptian pharaohs and ancient Near Eastern kings of the 14th century BC (see http://en.wikipedia.org/wiki/Amarna_letters) with 78 nodes and 100 edges. You will be working with the data sets we provide in the format of Excel spread sheets with the NODEXL template. You will have the opportunity to consult with two professors of Classics and Ancient History, Diane and Eric Cline, of the George Washington University ( http://home.gwu.edu/~ehcline/ and http://gwu.academia.edu/DianeCline ). Your renderings may be published with credits in possible articles or books. This is trail-blazing research; no one working in the field of ancient history has yet published social network analysis charts or used network analysis to study ancient history. You will be making (ancient) history!

Clayton Lewis <Clayton.Lewis@ed.gov> and Ken Slavin

The project we will propose is to extract information from data in the Therap Electronic Documentation System for intellectual and developmental disabilities providers (http://www.therapservices.net/) that will allow assessment of individuals’ progress in independence, community integration, and skill development. The Therap system records a wide variety of data, on a fine grained individual basis, extending over years. It is a major priority for providers to use these data to understand whether or not people are actually making progress, or not, in important life areas. (This is similar in logic to your example, “Study crime patterns from the UMd logs to help police reduce crime.”)

We will obtain consent from a few to several individuals for students to have access to their data; even on this scale this is a fairly “big data” project, because of the volume of material collected for individuals (several records a day for any one datum, over several years.) Some of the data of interest are captured in free text progress notes, so there is an opportunity here for students to explore emerging techniques for dealing with this kind of information at scale, though doing this is not essential.

Students will have access, appropriately constrained by permissions, to the interactive Therap system, as well as the ability to request bulk transfer of data for analysis outside the Therap system. Besides developing ways of extracting meaning from the existing data, students will be able to suggest how data collection and representation might be enhanced to make it easier to extract value as the system evolves.

In the longer term this project has great strategic potential, for students who may be looking for dissertation research. Beyond the meaning extraction problem posed here, there is a large collection of other opportunities associated with these data, including comparing the progress of different individuals, looking for factors in the development of obesity, early-onset Alzheimer disease and other conditions, and much more. These demand more difficult consent procedures, which is why we aren’t proposing them now, but could be addressed in follow-on research. An associated opportunity for future projects is to develop good ways to convey the sense of some of these data to the people with disabilities themselves. These individuals have a clear right to see the data collected and managed about them, but today we lack good ways to present the data meaningfully to people with intellectual and developmental disabilities.

Zach Hettinger MedStar (Washington Hospital Center) emergency physician/researcher (zach.hettinger@medicalhfe.org),

  • Medical team allocation
  • Radiation dangers of CT-scan

Michael L. Pack, Director University of Maryland Center for Advanced Transportation Technology J. Kim Engineering Bldg. Suite 3144 College Park, MD 20742 Work: 301-405-0722 Fax: 301-403-4591 [packml@umd.edu] (http://www.cattlab.umd.edu) He has worked with course teams in the past and hired at least a half dozen students for his projects.

Catherine Plaisant, Research Scientist [plaisant@cs.umd.edu] at HCIL (http://www.cs.umd.edu/hcil) She has worked with many course teams and has projects on medical informatics.


Cody Dunne (cdunne@cs.umd.edu), Powsner Seth (seth.powsner@yale.edu) Amitabh Varshney (varshney@umiacs.umd.edu), Susan Moeller (smoeller@umd.edu), John Alexis Guerra Gómez (jguerrag@cs.umd.edu), Ronald A. Yaros (ryaros@umd.edu), Joseph JaJa (joseph@sesync.org)