Evaluating Traffic Data

From Cmsc734_f12
Jump to: navigation, search



For this project we used Spotfire to analyze data supplied by Michael Pack, the Director of the University of Maryland Center for Advanced Transportation Technology. As residents of Northern Virginia and Maryland, we have experienced our fair share of traffic. The data provided by Michael Pack allows for analysis of traffic data by roadway, direction, time, and many other factors. This analysis uncovered insights into causes of collisions, congestion patterns, and overall incident occurrences.


  • Alison Smith - Master's Student, Department of Computer Science
  • Michael Szaszy - Master's Student, Department of Computer Science


Source: Regional Integrated Transportation Information System (RITIS) (https://www.ritis.org/)

RITIS provides a user the ability to export traffic incident data from 14 regions for a 1-3 month span depending on the number of regions selected. For this project, we exported all incident data for MDOT (Maryland Department of Transportation) and VDOT NOVA (Virginia Department of Transportation for Northern Virginia) for the 3 month span from June 29th to September 29th of 2012. This data contains 40,000 incidents composed of 18 unique incident types: Alert, Animal Struck, Collision, Congestion, Disabled Vehicle, Disturbances, Emergency Roadwork, Fatalities Involved, Fire, Incident, Injuries Involved, Obstructions, Road Maintenance Operations, Special Event, Traffic Signal Not Working, Vehicle on Fire, Weather Condition, and Work on Underground Services.

Each incident in the data contains a variety of metadata which can be used for analysis. Table 1 shows the fields for each incident that existed within the data, and Table 2 shows the fields we added to allow for deeper analysis.

AM table1.png

Table 1: The fields for each incident in the table from the data provided by RITIS.

AM table2.png

Table 2: The fields added to the data to allow for further analysis.

Data Errors

A first step in analysis was to remove some of the glaring data errors that would have significantly affected our results. Figure 1 shows two different ways that we were able to uncover data errors using visualization in Spotfire. After removing data errors the average “Congestion” incident duration is 2:20.00 instead of 4:30.00.

AM dataErrors.png

Figure 1: [left] Removed duration data errors by using a scatter plot with duration on the x-axis. All of the points with duration much greater than the trend were removed. [right] Removed location errors by using a map plot.


1. Overall Incident Patterns

A visualization was created in order to provide an overview of all of the incident data. Figure 2 shows this overview. The overview shows the layout of incidents on the major roadways, the breakdown by incident type, and the volume of incidents by road.

AM overview.png

Figure 2: [top left] A pie chart shows the breakdown by incident type. [bottom left] The incidents for the major roadways plotted on a map. [right] A stacked bar chart shows the incidents by roadway and type.

Incidents by Road

The initial analysis of the incident data showed that over 50% of the total incidents reported over all roads occurred on I-95. However, on closer inspection, it appears that more than 50% of the I-95 incidents are “Disabled Vehicle” incidents. Figure 3 shows the overview of all incident data for I-95, I-66, US 50, I-495, I-395, and I-270 before and after removing the “Disabled Vehicle” incidents.

AM incidentByRoad.png

Figure 3: [top] Overview of all incident data for I-95, I-66, US 50, I-495, I-395, and I-270. The orange bar represents “Disabled Vehicle” incidents, and accounts for over 50% of the total incident volume for I-95, where I-95 accounts for 53.8% of the incident volume for all roads. [bottom] After removing the “Disabled Vehicle” incidents, I-95 no longer dominates.

Road Construction/Congestion Patterns

Road maintenance in VA and MD is properly scheduled in order to avoid the commuter rush hours. Constructions is scheduled on weekdays in two time blocks, 9AM-3PM and 8PM-5AM the next morning. Little road work is done on days where road crews could work in 1 uninterrupted time block (Saturday or Sunday). For all roadways, morning congestion typically goes from 6 am to 9 am, and evening congestion from 3 pm to 7 pm. Figure 4 shows the construction and congestion patterns by hours of the day.

AM roadWorkPattern.png

Figure 4: Line graphs showing the scheduled road maintenance incidents and the congestion incidents by hours of the day.

Another interesting finding is that on I-66 Westbound traffic is worse during the afternoon commute and Eastbound traffic is worse during the morning commute. Although this makes some sense, because Eastbound I-66 is headed into the city, so you’d expect higher volume in the morning, I-66 Eastbound between the hours of 6:30 and 9:00 AM is HOV only, whereas I-66 Westbound at that time is not. The majority of incidents occur on I-66 with more incidents occurring during the afternoon commute.

2. Effect of Road Direction on Collisions

Prior to beginning analysis, we hypothesized that sun-position would have an effect on the number of collision incidents. Finding a greater number of collisions for the eastbound or westbound commute would have provided data towards proving this hypothesis. However, analysis showed that identical number of collisions occur (40 to 40) during the afternoon westbound commute than the morning eastbound commute. This disproves our hypothesis that sun-position has an effect on the number of collisions. Instead it appears that congestion is the main driver. Figure 5 shows the analysis of eastbound and westbound collisions. AM roadDirectionCollisions.png

3. You're Safer Driving on the Weekend

There is a noticeable periodicity in the congestion and collision incident data. Each hump spans roughly 5 days and indicates that the number of incidents reported on weekends is drastically lower than the number reported during the week. We also see that congestion incidents and collision incidents rise and fall together indicating a correlation between the two. Figure 6 shows collision and congestion incidents over the date range of the data.

AM weekends.png

Figure 6: Collision and congestion data for the date range of the collected traffic incident data.


Using Spotfire provided the ability to complete an in-depth analysis on the traffic data provided by RITIS. This analysis provided evidence for existing traffic patterns, such as rush hour traffic congestion and non-rush hour scheduled road work. Also, the analysis disproved the hypothesis that sun-position had an effect on collision incidents. A somewhat surprising finding from the analysis was that the congestion incident volume for I-66 East was higher than the volume for I-66 West during the morning commute. This is unexpected because I-66 East is HOV only in the morning, which should drive down the congestion incidents. Finally, analysis determined that driving on the weekends is safer than driving during the week.

Spotfire Critique

Overall we felt that Spotfire allowed us to visualize our data in a number of different and interesting ways and generated very aesthetic visuals. However, we did have a couple of issues that made working within Spotfire challenging. For instance, the nature of our data was a collection of events characterized by both a time interval and point in space. We found that with Spotfire it was very difficult to determine events that co-occur spatially (within some proximity) as well as co-occurred temporally (overlapping time spans).

We tried using Spotfire’s hierarchical clustering to get some sense of co-occurrence but it was not very helpful. We also tried using “binning” on the axis in scatter plots to create a 1-level grid index but this is not the same as looking at the data from an event centric view (e.g. for each event, give me all events nearby in space with overlapping time spans). It was also unclear how certain Custom Expression functions should be used (expected inputs / outputs).

Once we learned that you could make multiple, independent selection criteria via configuring the “Markings” feature in Spotfire we were able to see our data much easier. We created 2 different pages where 1 marking could be used to filter spatially and the second temporally or by different attributes. Subsequent pages were used to investigate the selected data.


Regional Integrated Transportation Information System (RITIS) (https://www.ritis.org/)