Traffic data has grown in volume and diversity
A decade ago, it was hard to access timely, accurate traffic data to support applications and services, such as traffic management, mapping and navigation, safety and emergency management, urban planning for smart cities, or location intelligence. Today, traffic data is much more widely available as public datasets and commercial sources. The movement of people and vehicles is fueling emerging machine learning algorithms and commercial analyses. In this blog post, we’ll discuss the pros and cons of the most common sources of traffic data.
Many municipalities have deployed closed-circuit television (CCTV) systems for surveillance purposes. These can be used for measuring traffic.
- For municipalities that have already deployed video analytics systems, there is no extra data collection effort.
- 24/7/365 coverage
- Cost: This option utilizes a low number of very expensive sensors. Municipalities must maintain camera networks, and analysis of the data can be labor-intensive.
- Consent: Drivers and passengers have not provided direct consent to being recorded. License plate data is captured by these solutions and may be considered personal information. Driver and passenger faces might also be captured. Today, redaction of this personal information is time- and labor-intensive.
- Limited access: It’s illegal for municipalities to give access to data without consent unless costly de-identification techniques are applied. These techniques typically reduce the usability of the data.
- Limited coverage: Information is only available where cameras have been placed.
- Accuracy: Cameras may lose accuracy in heavy storms or other extreme conditions.
- Limited data richness: It’s difficult to get any information beyond speed and location.
In-roadway traffic flow sensors
There are several types of in-roadway sensors that can be placed on roadways or within the pavement. They are another useful traffic data source.
- For municipalities that have already invested in sensors, this is an existing data source upon which they can capitalize.
- Few privacy concerns: Sensors can detect vehicles passing over them but do not collect any information about those vehicles.
- 24/7/365 coverage
- Cost: Sensors are expensive to place within roads and maintain over time.
- Limited access: It may be difficult for private companies to access data owned by municipalities.
- Limited coverage: Information is only available in locations where sensors have been placed.
- Limited data richness: It’s difficult to get any information beyond speed, location, and possibly weight. In addition, sensors can’t recognize the same vehicle across multiple sensors, so they don’t support continuous analysis of trips, which add a deeper understanding of traffic anomalies and also help identify errors in the data.
The RFID transponder devices that consumers place in their cars have the ability to transmit data that includes a unique identifier (the toll card number) and location (captured by an RFID reader).
- Low-cost solution: In geographic areas with bridges and toll roads, many consumers place these devices in their cars.
- Consumer consent: It’s practical to require that consumers consent to data collection during the device registration process.
- Privacy concerns: Even if consent is in the fine print, consumers do not expect that they’re being tracked when they’re not on toll roads. In the U.S., the American Civil Liberties Union (ACLU) has raised concerns about this data collection practice in New York State.
- Dispersed datasets with limited geographic coverage: In the United States in particular, toll authorities are decentralized state or local agencies with limited geographic reach. Other countries such as the United Kingdom also operate toll authorities at a local level.
Researchers have long used consumer surveys — including intercept, online, and other methods — to understand movement in a particular area.
- Researchers can record movement while asking qualitative questions that provide context (e.g., commuting versus pleasure) and understanding demographic profiles.
- Consumers have consented to sharing their movements.
- Cost: Consumer surveys are very expensive to field.
- Memory bias: Consumers must rely on memory to report on their movements.
- Limited sampling: Datasets are very small (rarely over 1,000 responses).
- Limited time series: Surveys represent only one point in time.
Mobile phone data (floating cellular data)
Data from mobile phones with location tracking switched on can be used in aggregate as a traffic probe. This data is also known as floating cellular data.
- Low-cost or no-cost solution: Public datasets are available.
- No practical limits on geographic reach: While other datasets may be limited to specific geographic areas, phones go everywhere.
- “Noise” in data: It can be difficult to distinguish between people walking, cycling, driving, or riding trains, so floating cellular data requires complex algorithms to generate adequate data quality.
- Privacy concerns: Privacy advocates and media organizations, including the New York Times, are raising concerns about the consent processes that have allowed thousands of mobile apps to collect location data. Tech companies have responded. For example, starting with Android 10, the Android operating system now requires apps to obtain explicit user permission to collect location information when the app is not in use. Apple iOS 13 provides users with a map view of how apps have been using their location data in the background. As consumers become more aware of how their data is being used, fewer may consent to the levels of data collection that have been occurring in the past.
Connected car data
As more vehicles become connected via built-in telematics and onboard devices, connected cars have become a very useful and direct source of traffic data. Any other data sources are making some assumptions about how vehicles are behaving. Data can be collected directly from the vehicle’s electronic control units (ECUs) or Controller Access Networks (CANs).
- Low cost: Essentially, each connected car is operating as a low-cost sensor, with no incremental installation or maintenance costs.
- Data richness: Connected car data offers more predictive value with richer data points beyond speed and location. These may include braking, air bag deployment, ambient temperature, windshield wiper operation, and more. Connected cars can also provide visibility into environmental events (e.g., road signs). Therefore, connected car data provides a more complete view of a geographic area and can support more complex use cases.
- De-identification: As data passes through an OEM’s data centers, it can be stripped of identifiable information, such as the VIN. Location information may also be considered to be personally identifiable, so data must be aggregated as well.
- Consumer consent: OEMs have the opportunity to educate buyers and obtain consent for data collection during the vehicle purchase process. Many commercial fleets already obtain employee consent for vehicle tracking and behavior monitoring.
- Data availability: Today, not every car is connected, and much of the data is not yet available. Select OEMs and fleets have partnered with companies like Otonomo to find ways to utilize connected car data and put business models in place to support these use cases. We expect this data source to grow significantly in 2020 and beyond.
- Data heterogeneity: There is no single standard format for car data. Depending on how you are accessing car data, you may have a hefty amount of work to do around data cleansing and normalization. (This is a significant value-add provided by the Otonomo Platform.)
- Privacy concerns: As with other data sources, consumers may not be paying attention to the policies through which OEMs are collecting connected car data.
Traffic data has many potential uses that provide tangible benefits to individuals, municipalities, and the public as a whole. There are upsides and downsides to every potential data source, so it’s important that you do your homework if you’re planning to use traffic data for your application. We’ll be sharing more thoughts on what to look for in the coming months!