This model conducts dynamic real-time cost-based routing by computing incurred risk cost. Quantification of incurred collision risk is conducted through a hierarchical modeling approach that features two successive CatBoost models. This pipeline produces a collision risk value that can be used to select optimal routes.
This project is sub-divided into three major sections:
1. Data Collection
2. Model Training
3. Model Application
All required packages can be installed with the following command.
pip install -r requirements.txt
This model can be ran out of the following notebook:
Dynamic_Risk_Modeling.ipynb
Every collision was mapped to the road segment of occurrence. Road network data was obtained from TIGER/Line Shapefiles. This project concentrates on the state of Texas and a visualization of the segment network used for this project is listed below:
Collision data was extracted from the Texas Department of Transportation (TxDOT). This dataset maintained over 1.4 million collision instances spaning from the beginning of 2017 until the end of 2020
This project developed methodology to extract both historical and real-time weather data. Using this method the previously mentioned collision segment pairs were then matched with local weather data within five-minutes of event occurrence. The weather station network implemented for this project reports in five-minute intervals and maintains a high level of density within the observation area. Listed below is a sample of the local granularity of weather stations.
Representation of road segments within the non-crash data set were sampled to match the distribution of total trips throughout the state of Texas. This method ensures that segments are sample with respect to traffic volume. To obtain this representative sample, data was sampled with the weighted probability of selection being assigned by each segments fraction of total average annule daily traffic (AADT). Below depicts the representation of major road classes after proportional sampling.
Dates and time were then paired with the selected segments via their seasonal, monthly, daily and hourly variation. As a result, dates and time with higher traffic volume will have a proportional increase in the probability of selection. The plot below details the percent historical distribution of traffic per-month.
Below shows the daily and hourly distributions of traffic volume.
After pairing sampled segments with time and dates. Weather data for those respective time periods were also merge with the dataset.
Model features were pruned by permutated feature importance. The 102 intail features were reduced to the following set of principle features.
The β Model was trained to predict crash severity given that a collision does occur. To prevent model biasing the distributions of crash severity were balanced.
Model features were pruned with their permutated feature importance. The 102 features were reduced to the following set of principle features.
This model utilizes the Highway Safety Manual’s comprehensive crash cost inline with out model pipeline to convert route options to incurred risk cost.
This model can be applied to any route within the state of Texas. For demonstration, the small city of Fredericksburg, Texas was selected for application.
For demonstration, each trips incurred risk cost are calculated on randomly generated trips within the cite of Fredericksburg, Texas.
The collision risk engendered from these models can also be viewed at network level.
Below details the variation in risk across various time period. The Entire State of Texas and Dallas, Texas are plotted below in respective order.
Holding all other features constant allow the model to isolate the effects of one feature. The plot below shows the variation is risk to the level of percipitation.