import pandas as pd
import geopy.distance
For this example, the Socrata API is used to query and access Basic Safety Messages from the Advanced Messaging Concept Development Basic Safety Message dataset available on data.transportation.gov.
Speed is converted from meters per second to feet per second to match the trajectories results. Time Received is divided by 100 to truncate the data at tenth of a second so that every message with the same timestamp is known to be a different vehicle.
data_source = ("https://data.transportation.gov/resource/5b3h-czfm.csv?"
"$where=time_received%20between%201479310905000%20and%201479326400000"
"&$limit=700000&$$app_token=QL17HswS1IZjgfNJdj9k2ovzG")
col_to_use = ['time_received', 'latitude', 'longitude', 'speed', 'heading', 'elevation']
df_bsms = pd.read_csv(filepath_or_buffer = data_source, header = 0, skipinitialspace = True, usecols = col_to_use)
df_bsms['speed'] = df_bsms['speed'].apply(lambda x: x * 3.28084)
df_bsms['time_received'] = df_bsms['time_received'].apply(lambda x: int(x / 100))
df_bsms.head()
Groups the messages by Time Received, counts the number of messages at each timestamp, then adds a new column for Number of Vehicles with that count.
df_grouped = pd.DataFrame({'Number of Vehicles' : df_bsms.groupby( ['time_received'] ).size()}).reset_index()
df_grouped.to_csv(path_or_buf = "VehicleCount.csv", index = False)
df_grouped.head()
RSU locations is the centerpoints of the Roadside Units in the study. These were found in the metadata document attached to the AMCD Basic Safety Message dataset.
inRange uses the geopy library to determine the distance in meters between the message generation location and the RSU location. If the distance is less than 300 meters the message is designated as in range.
rsulocations = [(38.930045,-77.24315),(38.928128,-77.241327),(38.923859,-77.236135),(38.920883,-77.234304),(38.918416,-77.230494),(38.915165,-77.226364)]
def inRange(rsus, bsm):
for rsu in rsus:
if geopy.distance.vincenty(rsu, bsm).meters <= 300:
return True
return False
Adds column inrangeofrsu for each message, then groups the data by time recieved and whether the vehicle was in range or not. Creates a pivot table to display the results with time received as the independent variable.
df_bsms['inrangeofrsu'] = df_bsms.apply(lambda x: inRange(rsulocations,(x['latitude'],x['longitude'])),axis=1)
df_grouped2 = pd.DataFrame({'Number of Vehicles' : df_bsms.groupby(['time_received', 'inrangeofrsu']).size()}).reset_index()
pt = pd.pivot_table(df_grouped2, values = 'Number of Vehicles', index='time_received', columns='inrangeofrsu', fill_value=0.0)
df_flat = pd.DataFrame(pt.to_records())
df_flat.to_csv(path_or_buf="VehicleCountInRSURange.csv", index = False)
df_flat.head()