State of the USA

From Cmsc734_08
Jump to: navigation, search

Deliverables

SUSA Information

Susa logo.jpg

The State of the USA (SUSA) is intended as a web portal to access key national indicators from respected non-partisan sources, along with a set of interactive tools to explore, analyze and present the data. The site will cover topics such as health, the economy, demographics, crime and the environment. In addition to accessing the data itself, it will include information about how the data is typically measured and used.

The intended audience for the site is potentially vast. The smallest user group includes professionals such as journalists and policymakers who are familiar with the use of indicators. For this group, it would provide easy access to high quality data as well as ways to explore, analyze and present it. Potentially, the site would also provide social networking capability to support the presentation and discussion of data. SUSA would also like the site to be useful for high school students and the interested general public, which would be a much larger user group.

Supporters and participants include :

  • The William and Flora Hewlett Foundation
  • The Rockefeller Foundation
  • The John D. and Catherine T. MacArthur Foundation
  • The Carnegie Corporation of New York
  • The F.B. Heron Foundation
  • The National Academies

Back to Contents

Links

SUSA Website

Demonstration site user guide includes scenarios for the site's use.

Project Subversion (read-only web browser)

Project Schedule Work in Progress. To see a Gantt chart, hit the "project" tab. I'll need to add you all to the tool. Not sure about viewing.

This is a link to another indicator project. All of their visualizations are canned, no user interaction.

Contact

Scott Gilkeson

Email: scott@scottgilkeson.com

Office: 301-920-0145

Cell: 301-520-5106

Scott's website

Back to Contents

Project Overview

SUSA Homepage Mockup
Education data (poor Excel visualization)

A lot of the indicators are actually multidimensional data spaces, such as census data or Gross Domestic Product for various years broken down by state. A key goal of the SUSA project is to allow users to drill down into the data (at either the national, state, or county level) and display data disaggregated into categories like gender or ethnicity.

This brings up at least two HCI issues: 1) how does the user control which data are displayed in an intuitive way, and 2) how can the visualization inform the user of the quality of the date they are exploring? As the data is disaggregated, the sample sizes will get smaller and smaller. Data that on the whole is meaningful may become statistically insignificant when sufficiently disaggregated. For the data to be used responsibly, it is critical that users understand how much they can rely on the integrity of a visualization. This means that data quality should be encoded directly into the visual information display. This is an interesting problem because the site is intended to be able to display information using many different representations: bar charts, pie charts, scatter charts and maps, among others.

We will explore techniques for guiding user navigation and disaggregation of the SUSA data sets. We will need to find or invent appropriate measures of data quality (taking into account data gaps as well as sample sizes), and integrate these measures into the SUSA visualizations. We would also like to encode other statistical measures of significance into the display as relevant.

Team Goals

  • Develop intuitive controls with fast feedback for data navigation and disaggregation.
  • Incorporate visualizations into sample web pages, striving for excellent HCI design.
  • Intuitively represent data quality on pie charts, scatter plots, bar charts and maps.
  • Develop or find summary metric for data quality and statistical significance.
  • Allow users to capture visualizations easily for inclusion in off line reports or presentations.

Team Members

  • Abigail Daken
  • Eylul Dogruel
  • Justin Grimes
  • Michael Lam
  • Thomas Lotze

Back to Contents

Meetings

Weekly meeting time: Mondays 2:00-3:00 pm at the group study area in EPSL.

Back to Contents

Ideas - now including link to sample mock-up and suggestions

work in progress


Put fun and crazy ideas here. Split off into other pages as necessary for cleanliness.

A note (from a friend) on visualizing error/population, based on http://sedac.ciesin.columbia.edu/usgrid/ and the maps under links. "Sure, all twenty five people on that indian reservation are poor--but the 30,000 poor people in new York don't pop out at you" (Numbers, as far as I know, not accurate)

Census Grid example

Here is an example of a visualization that does a bad job of showing you how significant the data are (or not!). The pie chart appears to show how many more Mac computers are stolen than other brands, but upon further examination you find that they have a sample size of only nine!

Statistical significance example
Attempt at a paper mockup

more sketches and interface discussion

Mike's UI examples

Back to Contents

Resources

Mysql database (work in progress) - susa.autumnus.net

FTP access to the rest of the database files (for anyone who needs to look at the data in the mean time) - https://webftp.dreamhost.com/

Statistical notes

About Ajax:

http://java.sun.com/developer/technicalArticles/J2EE/AJAX/

Java Graphing Libraries:

Literature Review

The full lit review has been moved here.

In summary, it looks like many of the techniques that have been suggested have not been tested adequately with users. Given SUSA's audience, it may make more sense to use relatively representations of uncertainty, but evaluate them carefully with a realistic group of users. While we can't finish that in a single semester, it would be a valuable contribution to this literature.

Selected and annotated:

Source BibTex Category Reviewer Review
DataPlace - UI Design Mike Link
There's a lot here; perhaps more than first meets the eye when you look around the site. The website presents a lot of ways to look at a ton of data, and the interactive visualizations feel very responsive. Some of the interfaces feel rough, however, and many options are confusing. Things we may want to examine closer during the course of our project:
  • The indicator selection screen from the rankings window (filtering for indicators)
  • The charts window (why is it so confusing?)
Endeca - UI Design Justin Link
Endeca makes several different types of products, the components most interesting for us is guided navigation and guided summary. It's nothing too amazing, just a very nicely done faceted classification system with a well polished interface.

An example of an Endeca application is NCSU library catalog system

Flamenco - UI Design Eylul Link
Flemenco is a categorization system created at university of Berkeley.

Short critique: very clean, aesthetically pleasant, easy to use. The way to chose multiple categories is very intuitive. The only negative issue it is very costly/hard to do a good and intuitive use of keywords, both demos suffered from that issue. I think this is one project we should use as a starting point for our user interface of selecting data.

Performance Benefits of Simultaneous Over Sequential Menus as Task Complexity Increases - UI Design Eylul Link
This paper compares sequential and simultaneous menus with a user study, they declare the simultaneous menus faster.

Example of simultaneous menu can be found at the cancer by state visualization (check the viz4all review for details)

The finding is that simultaneous menus are faster, but the only significant data they have are the estimated advanced user speed based on composite of 3 users which I do not find very reliable. (the rest of the data has too much error).

I do think that for our case simultaneous menus are a better choice but not necessarily for speed but for clarity and reducing memory load (a similar argument to flamenco)

Pang, A.T., C.M. Wittenbrink and S.K. Approaches to uncertainty visualization.

The Visual Computer 13: 370-390. 1997. PDF

- Data Quality Abi Link
Many of the techniques are demonstrated for 3D, but a careful reading of his classification may be useful. Instead of presenting user studies in this paper, he refers to actual user studies. For any technique we think we might use, it could be useful to look for the user studies. Is there a more recent overview of uncertainty viz techniques?
MacEachren, A.M. Visualizing uncertain information.

Cartographic Perspectives 13: 10-19. 1992. PDF

- Data Quality Mike Link
This is an early overview of uncertainty viz ideas. It serves as a basic introduction to the concept and might be a nice reference for our related work section, but otherwise there aren't really that many ideas that I think will be useful to us, except perhaps the idea of using Type I and II error rates for evaluation.
Chris Olston and Jock D. Mackinlay. Visualizing Data with Bounded Uncertainty.

IEEE Symposium on Information Visualization (InfoVis'02) October 28 - 29, 2002.

Boston, Massachusetts, USA p. 37. PDF

- Data Quality Mike Link
This paper suggests ambiguation instead of error bars for the case where nothing is known about the distribution of likelihood within the bounds of certainty. Their example is instrumentation precision. The actual viz examples they use are pretty basic, but the distinction they make between guassian error and bounded uncertainty is interesting.
Johnson, C.R. and Sanderson, A.R. A Next Step: Visualizing Errors and Uncertainty

Computer Graphics and Applications, IEEE. Sept.-Oct. 2003. Volume: 23, Issue: 5 pp 6-10.

PDF IEEE Ref Review

johnson2003 Data Quality Abi Link
Refers specifically to visualizing uncertainty in 3D. Makes the useful point that the visualization itself (e.g. rendering of 3D objects) can introduce uncertainty through the algorithms used to speed the rendering. Has several potentially useful suggestions of ways to represent uncertainty that could be applied to our cases.
Andrej Cedilnik and Penny Rheingans. Procedural annotation of uncertain information.

IEEE Visualization '00 Conference Proceedings. Salt Lake City, Utah. 77-83. ACM Ref

cedilnik2000 Data Quality Thomas Link
A series of suggestions for using overlay information such as gridlines to represent uncertainty. Each of these three methods could be converted to our visualizations (bar plots, scatterplots, or timeplots) to show uncertainty. Also makes some good points about design principles for uncertainty visualization, for instance that gridlines indicating different uncertainties should have similar visual impact. i.e. if they are dimmer or fuzzier, they also need to be wider.
Griethe, H.; and Schumann, H. The Visualization of Uncertain Data: Methods and Problems.

Proceedings SimVis'06, Magdeburg, Germany, März, 2006. PDF

- Data Quality Thomas Link
I think this is a good survey article with links to specific examples of representing uncertainty. It will be most useful to us in categorizing different ways of visualizing uncertainty.
Judi Thomson and Beth Hetzler and Alan MacEachren and Mark Gahegan and Misha Pavel. A Typology for Visualizing Uncertainty.

Conference on Visualization and Data Analysis 2005, 16-20 January 2005, San Jose, CA USA PDF

- Data Quality Thomas Link
I think this paper is most useful for giving us a good language for talking about data quality/uncertainty and helping us focus in on particular kinds. It may also be useful in providing us with a quantitative way of measuring different kinds of uncertainty.
Howard Wainer. Depicting Error.

The American Statistician, Vol. 50, No. 2. (May, 1996), pp. 101-111. JSTOR link

- Data Quality Thomas Link
I really like the point that error bars should not dominate the visualization, as then we focus on the error rather than the data. Toggling between normal and advanced mode might be interesting, where normal shows more precise points as larger but advanced mode shows the error distribution (or provides radio selection of different types of uncertainty representation) could be useful.

The multiple comparisons issue is definitely something we need to keep in mind (that an error bar for an individual needs to be wider if all points are being considered).

I like the use of greying out instead of bars in the bar charts, to show lighter gray as we increase our error bars (with larger alphas). In this way, we could show uncertainty especially when the amount of uncertainty for different points is not equal.

Instead of the annoyance of blinking points, perhaps we could represent less precise points as more transparent (it should have a similar effect of solidity). I suspect that transparency was not easily modified at the time the paper was written.

Denton, W.: How to make a faceted classification and put it on the web (November 2003) [1] denton2008 UI Design Justin Link
An overview of what a faceted classification system is, why we might want to use it, and how to operationalize it.
Riveiro, Maria: Evaluation of uncertainty visualization techniques for information fusion

PDF

- Data Quality Justin Link
Uncertainty is a massive unresolved problem in information fusion aka data mining. This paper provides an overview of uncertainty representation techniques while incorporating the general principles of Tufte, Chambers and Bertin and results of user experiments.

"Most of the developed techniques to represent uncertainty do not include a perceptual and cognitive analysis or user evaluations that validate its usefulness."

HCI Bottleneck = 2D computer screens

Human Decision Maker Loop Framework proposed some other things that would be cool and useful to incorporate on the lines of visualizing uncertainty, such as visualizing negative reasoning enhancement and the focus/defocus to assist users. The basic principles from info design, graphic design etc, are a too me an essential component and have utility that is grounded. The user studies are particularly interesting because I approach the whole problem scientifically and would like to see how successful uncertainty techniques actually are.

Meredith Skeels and Bongshin Lee and Greg Smith and George Robertson. Revealing Uncertainty for Information Visualization. Unpublished Microsoft Technical Report, 2008 skeels2008 Data Quality Abi Link
This is the most up-to-date typology of uncertainty I've seen. (Ben handed me a hard copy in class.) It's interesting because they had their typology reviewed by scientists and other uncertainty experts. They end up with three levels : measurement precision, completeness, and inferences. In addition, they have two types that span levels: credibility and disagreement (which sometimes results in credibility uncertainty).

Back to Contents