Part 1a of 2: Calculating Additional Statistics for Charts

Note this is optional code used for creating the data needed for Part 2a

Imports and Constants

Imports

This implementation of the Basic Safety Messages to Synthetic Trajectories algorithm uses:

  • Pandas 0.23 : to store the Basic Safety Messages in an easily searchable and filterable format
  • geopy 1.13.0: to calculate the new latitude and longitude the vehicle moved to based on a distance and bearing. For more information read the algorithm document.
In [1]:
import pandas as pd
import geopy.distance

For this example, the Socrata API is used to query and access Basic Safety Messages from the Advanced Messaging Concept Development Basic Safety Message dataset available on data.transportation.gov.

  • Note The data source can be any local or online csv source

Speed is converted from meters per second to feet per second to match the trajectories results. Time Received is divided by 100 to truncate the data at tenth of a second so that every message with the same timestamp is known to be a different vehicle.

In [2]:
data_source = ("https://data.transportation.gov/resource/5b3h-czfm.csv?"
               "$where=time_received%20between%201479310905000%20and%201479326400000"
               "&$limit=700000&$$app_token=QL17HswS1IZjgfNJdj9k2ovzG")
col_to_use = ['time_received', 'latitude', 'longitude', 'speed', 'heading', 'elevation']

df_bsms = pd.read_csv(filepath_or_buffer = data_source, header = 0, skipinitialspace = True, usecols = col_to_use)

df_bsms['speed'] = df_bsms['speed'].apply(lambda x: x * 3.28084)
df_bsms['time_received'] = df_bsms['time_received'].apply(lambda x: int(x / 100))
df_bsms.head()
Out[2]:
elevation heading latitude longitude speed time_received
0 140.4 0.00 38.924071 -77.237130 0.000000 14793109050
1 140.4 0.00 38.924071 -77.237130 0.000000 14793109050
2 158.3 0.00 38.916107 -77.227700 0.000000 14793109050
3 150.5 0.00 38.915815 -77.226548 0.000000 14793109050
4 149.7 308.85 38.914958 -77.225480 37.795277 14793109050

Groups the messages by Time Received, counts the number of messages at each timestamp, then adds a new column for Number of Vehicles with that count.

In [3]:
df_grouped = pd.DataFrame({'Number of Vehicles' : df_bsms.groupby( ['time_received'] ).size()}).reset_index()
df_grouped.to_csv(path_or_buf = "VehicleCount.csv", index = False)
df_grouped.head()
Out[3]:
time_received Number of Vehicles
0 14793109050 5
1 14793109051 6
2 14793109052 5
3 14793109053 5
4 14793109054 5

RSU locations is the centerpoints of the Roadside Units in the study. These were found in the metadata document attached to the AMCD Basic Safety Message dataset.

inRange uses the geopy library to determine the distance in meters between the message generation location and the RSU location. If the distance is less than 300 meters the message is designated as in range.

In [4]:
rsulocations = [(38.930045,-77.24315),(38.928128,-77.241327),(38.923859,-77.236135),(38.920883,-77.234304),(38.918416,-77.230494),(38.915165,-77.226364)]

def inRange(rsus, bsm):
    for rsu in rsus:
        if geopy.distance.vincenty(rsu, bsm).meters <= 300:
            return True
    return False

Adds column inrangeofrsu for each message, then groups the data by time recieved and whether the vehicle was in range or not. Creates a pivot table to display the results with time received as the independent variable.

In [5]:
df_bsms['inrangeofrsu']  = df_bsms.apply(lambda x: inRange(rsulocations,(x['latitude'],x['longitude'])),axis=1)

df_grouped2 = pd.DataFrame({'Number of Vehicles' : df_bsms.groupby(['time_received', 'inrangeofrsu']).size()}).reset_index()
pt = pd.pivot_table(df_grouped2, values = 'Number of Vehicles', index='time_received', columns='inrangeofrsu', fill_value=0.0)
df_flat = pd.DataFrame(pt.to_records())
df_flat.to_csv(path_or_buf="VehicleCountInRSURange.csv", index = False)
df_flat.head()
Out[5]:
time_received False True
0 14793109050 0 5
1 14793109051 1 5
2 14793109052 0 5
3 14793109053 0 5
4 14793109054 0 5