Application or Semester Project Ideas

From Cmsc734_f13
Jump to: navigation, search

Project Ideas (please add ones you would like teammates for)

Develop an error detecting program for multi-variate (tabular) data.

Develop text analysis software that compares two or more corpora/documents, based on keywords, keyphrases, or topics. A simple version of this is to compare two lists of keywords, keyphrases, or rows of data, and show the similarities and dissimilarities.

Study the fund raising of UMd vs other Universities and find ways to improve donations from alumni.

Study crime patterns from the UMd logs to help police reduce crime.

Develop idea of searching networks for subgraphs (former student project that could be improved)

Study Twitter hashtags over time to see evolution of interests, especially convergence on hashtag terms.

Study YouTube evolution for topics: number of videos, number of views.


Marc Smith marc@smrfoundation.org http://nodexlgraphgallery.org http://www.smrfoundation.org/

Consider adding features to NodeXL - open source in C# .NET (http://www.codeplex.com/nodexl) Potential project sponsor: ???

  • Compare networks: select two graphs and report what is different between them
  • Integrate NLP features
  • Write an importer for a new data sources: Google+, RDF, and Wikimedia
  • Better layouts: Cody's designs, Force Atlas 2, "No Overlap" option
  • Aggregate content over time: track URLs, Hashtags, Users, Word Pairs over time.
  • Network compression for megascale and greater network results: in the end, we must reduce ALL networks to no more than N elements to be discretely visible on a current screen. Thus, down sampling networks to fit a N element maximum is important.
  • Time Series - Jacopo Cirrone prototype - fast BTree DB back end.
  • Multiplex edges: best practices - review the current range of multiple edge management.
      • NEW IDEA

Many organizations seek to amplify the messages they create in social media. To do so, tools need to be developed that can identify existing topics and the key people who initiate and amplify those messages. Social media managers at many organizations lack the tools to gain insights into social media structures and content flows. They seek a tool to track the diffusion of content through networks. While most network visualizations display a collection of edges, few can convey the flow of content over those edges effectively. Since edges often out number vertices and cover longer stretches of screen real estate, the display of edge information has lagged over vertex displays. Multiple edges are often collapsed into a single arc, it is a information visualization challenge to display multiple edges between the same pair of vertices. Stacked, or "bowed" edges are a possible direction but still have issues when many parallel edges exist. Labeling edges is possible but quickly leads to illegible displays from odd angles and crossings. Performance and UI issues make detecting edge mouse-over events difficult as well.

Proposed work items:

> Create a data model for representing the paths of content flow in social media networks > Paths are ordered linked lists of connected edges > Using the Paths data structure, visually represent the flow of content through the network > Provide summary data about paths allowing users to filter on attributes like: speed, length, population, branchiness, duration, and size. > Add path data to vertex data worksheet, number of paths, messages per path > Allow users to select one or more paths and identify the key people and terms within the path

Practitioners will use this information to identify existing paths with interesting attributes. They will track their own content paths and compare their diffusion to selected paths for contrast.

Related work

http://blogs.hbr.org/cs/2013/08/visualizing_how_online_word-of.html

http://www.nature.com/srep/2013/130828/srep02522/full/srep02522.html

I have tons of data. If they create prototypes, I will get them users.

Additional related materials:

http://blog.socialflow.com/post/5246404319/breaking-bin-laden-visualizing-the-power-of-a-single

http://nytlabs.com/projects/cascade.html

http://www.youtube.com/watch?v=O5DtR5kqSuQ

http://infosthetics.com/archives/2010/10/thruthy_visualizing_the_diffusion_of_memes_on_twitter.html

http://cnets.indiana.edu/groups/nan/truthy

http://live.wsj.com/video/the-truthy-project-ferrets-out-online-deception/219A2EA6-4D22-4F5B-8D96-81AF342104F7.html#!219A2EA6-4D22-4F5B-8D96-81AF342104F7

http://cgcsblog.asc.upenn.edu/2012/10/04/charting-the-spread-of-political-memes-in-social-media/

-


Guest Speakers with Project ideas:

Paul Hitlin Senior Researcher Pew Research Center phitlin@pewresearch.org 202-419-3653

Pew Research Center is pursuing work that can inform how advanced data visualization techniques can further its research goals. The Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping America and the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research does not take policy positions. We are proposing two specific visualization projects which are described below. The Center is open to consider other proposals, and if students have ideas that align with the Center’s agenda, we would be eager to hear them. Students who successfully complete a project working with Pew Research will likely be able to have a byline on a report issued by the Center and featuring the student’s visualization tool. The two proposals offered by the Center are as follows:

1. Changes in Party Affiliation over Time: This project would involve the use of Pew Research survey data to create a visualization of changes in party affiliation from 1990 to the present. We would provide proprietary survey data that include demographic variables (age, gender, income, education, state) along with party affiliation. Students would create an interactive tool that shows how demographic groups have changed their party affiliations over time. This project would give us a historical map of changes in the electorate that would inform upcoming work regarding partisanship.

2. A Tool to Analyze a Twitter Account’s Network: This project would involve the creation of a new interactive tool that would analyze the network of any specific Twitter account. The tool would collect data about a given account related to who is following that account and who the account follows. Then, using a process that analyzes the characteristics of the people in that account’s network, the tool would deliver a summary of that network. Not only would the tool identify demographic info about the people in the network, such as how many of the followers are male and female, but also a scale of other characteristics such as the political score or the level of interaction with “public” or “private” figures.

For example, this tool would be able to tell us what percentage of followers of the NY Times Twitter account were on the liberal side of the political spectrum compared to those on the conservative side. One of the main challenges of this project is to create an automated system of scoring accounts on a liberal/conservative or public/private scale. Existing tools are able to identify the obvious characteristics of an account’s followers, but this tool would create another level of discovery.


Michael L. Pack, Director University of Maryland Center for Advanced Transportation Technology J. Kim Engineering Bldg. Suite 3144 College Park, MD 20742 Work: 301-405-0722 Fax: 301-403-4591 packml@umd.edu(http://www.cattlab.umd.edu) He has worked with course teams in the past and hired at least a half dozen students for his projects.
Michael Robert VanDaniker Visualization Manager mvandani@umd.edu
Detailed study information can be found in the presentation slides (PPT).


Catherine Plaisant, Research Scientist [plaisant@cs.umd.edu] at HCIL (http://www.cs.umd.edu/hcil) She has worked with many course teams and has projects on medical informatics.


Paul Hitlin Researcher with Pew Internet Center (PHitlin@Journalism.org)


Monifa Vaughn-Cooke, (Professor of Mechanical Engineering) mvc@umd.edu offers a project on Diabetes treatment: Facilitating treatment adherence among patients with chronic conditions poses a significant challenge to healthcare providers. Diabetes treatment outcome is largely driven by patient self-management, which includes modifying life activities based on regular feedback about blood glucose control. The study sought to identify when, how and why non-adherence to self-monitoring of blood glucose (SMBG) occurred over a 60-day study period. Detailed study information and data description can be found in the overview document, Description of Diabetes Study Data document, Excel data set, and the presentation slides (PPT).


Amitabh Varshney (varshney@umiacs.umd.edu), Susan Moeller (smoeller@umd.edu), Ronald A. Yaros (ryaros@umd.edu), Joseph JaJa (joseph@sesync.org)