Statistical Analysis of Fire Incidents in the City of Greensboro, NC

Abstract

The research in this report examines twelve years of reported fire incidents in Greensboro, NC to further analyze how fire service response time is affected as well as how fire service response time affects the rate of total loss in household fires. Additionally, this report will provide a variety of forecasting models in order to predict the future in regards to reported fire incidents. To produce sufficient results a variety of statistical analysis approaches have been accomplished. We have visualized the data (univariate , bivariate, time series), forecasted the data (TBATS, ARIMA, Holtwinters), and built linear regression and random forest models in order to fully conquer the research questions at task. In result, we were able to identify what factors hold most significance in relation to response time as well as depict to what extent response time affects total loss. In addition we produced multiple forecasting models in order to accurately predict the future of reported fire incidents in Greensboro, NC.

Introduction

Many calls are made to fire departments all the time. Whether it is a false alarm or not, it is important for first responders to take care of fire incidents as soon as possible. Therefore it is important to look at data about different fire incidents and see how the response time can be improved for all kinds of situations.

There are all kinds of fire incidents, some not even about fires. For example someone could be having a stroke and the local fire department was called to take action on the situation. If the incident is a fire, it could be at all kinds of different properties. Some different properties are small houses, apartment buildings, and highways. The fire could vary in size, like the wildfires in California.

No matter the type of fire incident. It is vital that first responders attend to the situation quickly. Now, there can be some factors that can affect the response time of the first responders. Some of those factors are the location, the type of property, and the type of the incident. From this project we want to explore which factors from the data have a significant affect on the response time of the fire department. By learning about these factors, the fire department could use this information to work on decreasing their response time.

For more than ten years, the city of Greensboro, NC has been collecting data on fire incidents in their area. Some of this data includes the date of the incident, the response time, the location, and the type of property. Yet, take note when looking at the data that it goes from July 1, 2010, to May 31, 2022. So for the years 2010 and 2022, only data for about half of those years was collected.

For this project we will be visualizing the fire incidents in Greensboro, NC. Our research questions are 1) What factors affect the response time to fire incidents? 2) How to improve the response time? 3) How does response time by fire emergency services affect the rate of total loss and civilian injuries? 4)Can we build models to forecast the number of fire incidents in certain areas of Greensboro based on the data from 2010 to 2022? 5) What Dynamic models can we make for forecasting fire incidents?

Data, Methods, and Analysis

Data

For this analysis, we utilized public data from the Greensboro Fire Department. This data set accounts for all reported fire incidents in the city of Greensboro and updates itself continuously. Although, we noticed there were some missing values in the data. So we did a map to check the percentage of missing values and it ended being about 14 percent for the whole data set. To further cater towards our research we filtered the data set in two areas; the date (from July 1, 2010 to May 31st, 2022) and the Property Use (“1-2 family dwelling”). In addition to this we converted ResponseTime to seconds. The initial data set can be found at this website link:https://data.greensboro-nc.gov/datasets/greensboro-fire-incidents/explore

Methods

Exploratory Data Analysis

The first step in our research project was exploratory data analysis. During this process we investigated the data looking for patterns and outliers. This was carried out by making various graphical representations of the data.

Univariate Explorations

Data Visualizations consisting of one variable

For the histogram above, we learned that the typical response time falls between about 100 seconds to 400 seconds and that the distribution is normal.

From the histogram above, we can visually see the distribution of the amount of staff members reported towards a given incident. It is important to note that the Y-axis on this graph has been scaled. Overall, the great majority of total stall on incident is 4.

For the graph above, we can identify that there are not many occasions in which civilians have reported injured. There has only been two incidents in which two or more civilian injuries occurred. It is important to note that the Y-axis on this graph has been scaled. Overall, the great majority of civilian injuries is 0.

For the graph above, we can identify that there has only been a couple incidents in which fatalities have been reported in regards to civilians. It is important to note that the Y-axis on this graph has been scaled. Overall, the great majority of civilian fatalities is 0.

For the graph above, we can identify that there has only been a handful of incidents in which injuries have been reported in regards to fire service respondents. It is important to note that the Y-axis on this graph has been scaled. Overall, the great majority of fire service injuries is 0.

The histogram above displays the amount of property loss across the amount of incidents reported.

The histogram above displays the amount of total loss (property + content) across the amount of incidents reported.

From this bar graph, it can be seen that the frequency of calls to the fire stations are less during the middle of the night and increase as people start waking up.

For this graph, we wanted to find out which fire stations got the most calls. Note that the calls counted are from July 1, 2010 to Jun 1, 2022. From the results we can tell that the top five fire stations that got the most calls over ten years in Greensboro NC, was 7, 8, 14, 4, and 56.

For this bar plot, we wanted to find out which fire districts got the most calls in Greensboro, NC over ten years. The fire districts consist of a few fire stations. The most popular fire districts from this bar plot from greatest to least were Battalion II, Battalion I and Battalion III.

From this graph, we wanted to learn if there were some years that had a significant difference in number of phone calls to the fire stations in Greensboro, NC. By looking at those differences we could explore how response time varied between years that were busier than others. Note from this graph, that only half of the data was collected for the years 2010 and 2022, so those bars will be obviously lower than the bars of other years displayed in the graph.

This visual was created to look at if the number of calls to the fire station would vary depending on which shift was currently working. For this data, there were three shifts, A, B, and C. From the results, it can be seen that there is no obvious difference in the frequency of the calls to the fire station on different work shifts.

From this bar graph, it can be seen that the frequency of calls to the fire station over ten years ended up being about the same between each day of the week with no obvious differences.

From the bar plot above, it is shown that the first week and the last week of the year typically had fewer calls to the fire station compared to other weeks in the year. This graph had a little more variability in frequency in calls. For example, the dip in the middle of the bar plot. This plot helps us to look at the busier times in the data and compare the response times.

This visual was used to look further into if there was a higher frequency of calls to fire stations in certain parts of the year. From the results, June and February seem to have the lowest frequency of calls, while January and December have the highest frequency of calls. With this knowledge we can look at how response time varied with those months.

Bivariate Explorations

Data Visualizations of a given variable vs response time

From this box plot, we wanted to see if certain weeks would have notable differences in response time. Looking at the results, there does not seem to be much variation. Therefore, the weeks in the year might not have a significant impact on response time.

The box plot above shows that there is a lot of variability in response time depending on which station responded to the call. By looking at these results, we can explore and question why different fire stations have such big contrasts in response time.

For this graph above, we looked at nature code and how it affects response time. Nature code represents the nature of the fire incident, whether it was a fire or someone having a stroke. As shown by the graph, there is a lot of variability in response time depending on the nature of the fire incident.

From this scatter plot, we wanted to see if the number of staff on incident would affect the response time of the fire responders. The outcome of the graph shows that there is no significant change in response time when having more people on shift. The response time still falls between 100 and 600 seconds, no matter the number of responders.

From the scatter plot above, we wanted to visualize the relationship between response time and property loss. Through this we were able to identify a very slight correlation between the two. The majority of the higher property loss values fall between 100 and 400 seconds

From the scatter plot above, we wanted to visualize the relationship between response time and total loss. Through this we were able to identify a very slight correlation between the two. Similar to property loss, the majority of the higher total loss values fall between 100 and 400 seconds.

Below, the box and violin plots had similar results to the bar graphs presented earlier. Each graph will summarize briefly what have seen before.

For this violin and box plot, it can be seen that there is very little difference in response time between the days of the week.

From the plot above there is a slight increase in the average response time after the year 2013 and it continues to stay the same or slightly increase more.

For this graph it can be seen that different work shifts did not affect the response time.

As shown by the graph, the response time varies depending on which fire district responded to the fire incident.

For this box and violin plot, we learned that the time of month in the year has very little effect on the response time.

From this visual, we learned that alarm hour has a slight effect on response time, as shown by the dips in the plot.

For the graph above, there would sometimes be a couple alarms set off for a fire incident, but it did not affect the response time whether the alarm was once or twice.

For the graph above, we can see the how each level of civilian injury various in relations to response time.

For the graph above, we can see the how each level of civilian fatality various in relations to response time.

For the graph above, we can see the how each level of fire service injury various in relations to response time.

Time Series Visualization

This section contains the visualization of all the data of the fire incidents in Greensboro, NC from July 1, 2010 to May 31, 2022.

Daily Time series

Above is a time series plot of the daily number of fire incidents from July 1, 2010 to June 1st 2022. As shown by the plot, there were big spikes around 2014, 2018, and 2021.

To take a deeper look into the time series below is the daily time series graphs for the years 2010 to 2022.

Above is the time series of the daily fire incidents broken into plots by year. From these graphs we have a clearer view of peaks for each year. So from the results, we can see that there were major peeks around April 2011, March 2014, October 2018, and February 2021.

Monthly Time Series

Above is a time series graph depicting the number of monthly fire incidents from July 1, 2010 to May 31 2022. The graph shows an upward trend but with drastic drops occurring at dates like November of 2019 and November 2021.

Results

Time Series Forecasting

Daily number of fire incidents from 2010 to 2022

In order to forecast the data we had to convert it into time series objects with the ts function in R studio.

Before forecasting we checked the data for stationarity. Meaning that we are checking that the time series has a constant mean and variance over time.

To check if the data needs any first difference calculations then we used the ndiffs function. Also to check if there needs to be any seasonal difference calculations then we used the nsdiffs function.

#determine the number of differences required for time series to be made stationary
ndiffs(gso.fire.ts.2) #1

## [1] 1

nsdiffs(gso.fire.ts.2) #0

## [1] 0

From the results, the data needs one difference, so we used the diff function to do that.

Using the acf function we check that the data is stationary after we had done the first difference for the data.

#attempt at differencing
gso.fire.ts.2.dif = diff(gso.fire.ts.2)
#checking stationarity of first difference
acf(gso.fire.ts.2.dif)

Using the acf function we made sure that it did not take many lags to hit 0. From the results, the first difference made our data stationary.

The plot above shows the daily time series graph before and after the first difference was calculated.

After checking for stationarity we could begin forecasting.

##                     ME     RMSE      MAE MPE MAPE      MASE        ACF1
## Training set -0.112468 6.553429 4.981169 NaN  Inf 0.4296495 0.001337923

We used the Exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components (TBATS model) because it is a popular model that tends to perform well at forecasting time series.

For the plot above, we used the whole data set, from July 1, 2010 to May 31, 2022. We planned to forecast the fire incidents from June 1, 2010 to December 31, 2022. From the plot we can see that the TBATS was doing well, it seemed to predict a slow big dip that would eventually start going up again before 2022 ended. The dip though seems to go below the average at one point.

##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set 0.07291525 6.507242 4.929365 -4.066674 16.8887 0.6931919
##                       ACF1
## Training set -0.0002287735

We also chose the Autoregressive Integrated Moving Average (ARIMA) model because it has been known to perform competently. Compared to the TBATS model, the ARIMA automatically calculates the differences, so those don’t have to be done manually.

For the plot above, there was a slight increase briefly and then after that the forecast created a horizontal line. The horizontal line appears to be close to the average line of the data.

##                      ME     RMSE      MAE MPE MAPE      MASE        ACF1
## Training set -0.1338178 6.518173 4.943409 NaN  Inf 0.4280165 0.001781049

In the plot above we can see that the TBATS model(light red) did not follow the true number of fire incidents(light blue). This appears to be because the data end three months earlier at March 1, 2022 to see how the predictions would appear. So around that time period there were some drastic drops in number of fire incidents which led to the TBATS model forecasting such a severe drop in incidents.

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.0767143 6.502043 4.919923 -4.048972 16.86875 0.6907802
##                       ACF1
## Training set -0.0001927383

From the plot above, there was an increase briefly and then the forecast created a horizontal line. Plus, the horizontal line appears to be close to the average line of the data. So when ended the data set three months earlier the ARIMA model did better than the TBATS for predicting number of future fire incidents.

Monthly number of fire incidents from 2010 to 2022

We again converted the data into a time series object us the ts function in order to use the functions for forecasting the time series data.

Then we checked the monthly data for stationarity.

We used the ndiffs and nsdiffs functions for checking if a first difference or seasonal difference had to be calculated

ndiffs(gso.fire.ts.2) #1

## [1] 1

nsdiffs(gso.fire.ts.2) #0

## [1] 0

So, our data needs one first difference and that is done with the diff function. After that, we used the acf function to make sure that the difference transformation made our data stationary.

Using the acf function again we made sure that it did not take many lags to hit 0. From the results, the first difference made our data stationary.

The plot above shows the monthly time series graph before and after the first difference was calculated.

Now that the monthly data is stationary we can being forecasting.

##                     ME     RMSE      MAE  MPE MAPE      MASE        ACF1
## Training set -4.171551 56.04023 43.91867 -Inf  Inf 0.3824204 -0.02345482

From the time series plot above, the TBATS model (red line) shows an increase for the future number of fire incidents. The line appears to fit fairly well with the trend of the data.

##                    ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 6.008386 66.80605 49.59427 0.2636763 5.173699 0.6704695 -0.0142654

In the plot above, it can be seen that the ARIMA model was producing similar spikes compared to the TBATS model and had done a fairly good job.

##                     ME     RMSE      MAE  MPE MAPE      MASE       ACF1
## Training set -5.423912 81.16663 60.75395 -Inf  Inf 0.7440428 -0.5099868

As seen above, the HoltWinters model predicted much larger spikes upwards compared to the TBATS model. As a result, with such a drastic increase it is likely it would be less accurate than the TBATS model for this data.

Multi-Step Forecasting

##                     ME    RMSE      MAE  MPE MAPE      MASE         ACF1
## Training set -0.593745 7.71444 5.982271 -Inf  Inf 0.4312064 -0.005924354

For the plot above, the data was split into a training set and a test set. The training set was from July 1, 2010 to December 31, 2022. The test set was from January 1, 2022 to May 31, 2022. The results show that the TBATS model did fairly well, but it did not follow the big dip in the true data.

From the accuracy function, we can see that the errors of the TBATS model were small. The root mean square error was only 7.71.

##                         ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -2.836533e-13 7.793247 5.969387 -6.113375 19.90425 0.7813334
##                   ACF1
## Training set 0.2085914

From the visual above, the data was again split into a training set and a test set. The training set was from July 1, 2010 to December 31, 2022. The test set was from January 1, 2022 to May 31, 2022. With the results of th plot and the accuracy function, the ARIMA model did fairly and had a root mean square error of only 7.79.

##       ARIMA       TBATS Combination 
##    8.853813   19.038254    9.176881

As can be seen above, the data was split into training and test sets with the same dates as the two previous plots. From the results it can be seen that the TBATS model did not do that well and predicted a bigger decrease than what actually happened with the actual data.

From the accuracy function, we can see that the ARIMA and Combination models did better

##                     ME     RMSE      MAE      MPE     MAPE      MASE       ACF1
## Training set 0.7417519 170.9214 133.6995 123.8174 123.8174 0.5040507 -0.4291488

For this plot, it can be seen that the TBATS model did not expect such a big dip in the true data but it still predicted values that appear to fit well with the plot.

From the accuracy function, we had a root mean squared error of 170.92, which is pretty high but that is because of the big dip in the true data that the TBATS model was not able to predict.

##                        ME     RMSE   MAE       MPE     MAPE      MASE
## Training set 1.364256e-13 114.6902 93.28 -1.371908 9.592816 0.6410997
##                    ACF1
## Training set -0.3627437

As you can see, the ARIMA model had the same issue as the TBATS model with the big dip in the true data.

The root mean squared error for this model ended up being 114.69, which is pretty high.

##                 ME     RMSE      MAE       MPE     MAPE       ACF1 Theil's U
## Test set -27.91337 134.5553 99.61927 -4.403427 10.79837 -0.4022097 0.6941563

For the model above it can been seen that the HoltWinters method was also not able to predict the big dip in the true data, but what it did predict followed the trend of the data nicely.

From the accuracy function, the root mean squared error was 134.55, which is significantly high.

##       ARIMA       TBATS          HW Combination 
##   110.67193    79.13765    55.57372    76.41376

As shown by the plot above, the models were unable to predict the big dip in the true data that happened in 2020. But all the models followed the trend of the data fairly well.

From the accuracy function the root mean squared error for ARIMA was 110.67, TBATS’ was 79.14, HoltWinters’ was 55.57, and the Combination model’s was 76.41. All were fairly high because of the huge drop in the true data.

Modeling of Response Time

In this section the results of the predictive modeling of fire incidents is displayed.

There were two types of modeling done, linear regression and random forest. With this, we can evaluate the main factors that affect response time for fire incidents.

Linear Regression

In order to successfully depict the affect of each given variable in relation to response time the first model we sought was a Linear Regression model. With this approach we are able to identify which surrounding variables from our original data set directly impact our independent variable which in this case is fire service response time. First we filtered our cutoff point in regards to response time and ran a linear regression model with all given variables that we identified to have a direct impact on response time. Then we used two different variable selection methods to finalize the surrounding dependent variables being used in the Linear Regression model Once we completed the variable selection process we were able to identify that the variables holding significance in correlation to response time were TotalStaffOnIncident, FireDistrict, DayOfWeek, shift, Month, AlarmHour, and NatureCode. When originally running this test a couple of our dependent variables had processed as one big character component not giving the model its true ability to compare all given components within that given variable. To solve this we mutated these variables as factors in order to unravel the different contents within each variables allowing the model to accurately compare each given variable to fire service response time. In return we were able to mathematically see the significance of each variable and its impact on response time as a whole.

First we created a linear regression model on all given variables that directly impact response time.

## 
## Call:
## lm(formula = response_time_seconds ~ TotalStaffOnIncident + FireDistrict + 
##     DayOfWeek + shift + Month + AlarmTime + NatureCode, data = gso.fire.filtered)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -340.43  -47.60   -6.53   40.29  383.81 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           408.9655    45.3873   9.011  < 2e-16 ***
## TotalStaffOnIncident                    0.9030     0.1409   6.410 1.46e-10 ***
## FireDistrictBattalion I              -147.1994    14.6499 -10.048  < 2e-16 ***
## FireDistrictBattalion II             -168.3688    14.6493 -11.493  < 2e-16 ***
## FireDistrictBattalion III            -136.0269    14.6590  -9.279  < 2e-16 ***
## FireDistrictBattalion IV             -135.2603    14.6599  -9.227  < 2e-16 ***
## FireDistrictBattalion V              -140.2406    14.6571  -9.568  < 2e-16 ***
## FireDistrictColfax                     43.1908    36.3430   1.188 0.234670    
## FireDistrictFire District 13          -75.2848    15.9148  -4.731 2.24e-06 ***
## FireDistrictFire District 14          -72.6721    14.7392  -4.931 8.21e-07 ***
## FireDistrictGuilford  College        -158.4621    15.6616 -10.118  < 2e-16 ***
## FireDistrictHigh Point Fire District -163.8343    42.3830  -3.866 0.000111 ***
## FireDistrictMcLeansville             -116.1506    17.8435  -6.509 7.57e-11 ***
## FireDistrictNortheast Fire            197.9903    75.8371   2.611 0.009036 ** 
## FireDistrictOak Ridge Fire Dept       -48.9577    33.7024  -1.453 0.146324    
## FireDistrictOther Protection Agency    82.9478    75.8323   1.094 0.274031    
## FireDistrictPinecroft Sedgefield      -88.8137    17.3627  -5.115 3.14e-07 ***
## FireDistrictPleasant Garden            -3.0358    22.8157  -0.133 0.894149    
## FireDistrictPTI Airport               -70.8423    75.8393  -0.934 0.350248    
## FireDistrictSummerfield                27.2998    21.2993   1.282 0.199942    
## FireDistrictWhitsett Fire District     62.2344    75.8355   0.821 0.411847    
## DayOfWeekMonday                        -0.2482     0.7727  -0.321 0.748055    
## DayOfWeekSaturday                      -2.3148     0.7737  -2.992 0.002774 ** 
## DayOfWeekSunday                        -4.0342     0.7723  -5.224 1.76e-07 ***
## DayOfWeekThursday                      -0.2812     0.7799  -0.361 0.718390    
## DayOfWeekTuesday                        0.1089     0.7763   0.140 0.888393    
## DayOfWeekWednesday                     -0.1706     0.7817  -0.218 0.827270    
## shiftB                                  2.5005     0.5063   4.939 7.86e-07 ***
## shiftC                                 -0.6240     0.5058  -1.234 0.217346    
## Month02                                -5.6342     1.0059  -5.601 2.13e-08 ***
## Month03                                -7.6323     0.9891  -7.717 1.20e-14 ***
## Month04                                -6.3495     0.9981  -6.361 2.00e-10 ***
## Month05                                -7.5836     0.9854  -7.696 1.41e-14 ***
## Month06                                -7.5963     1.0216  -7.436 1.05e-13 ***
## Month07                                -8.5329     0.9951  -8.575  < 2e-16 ***
## Month08                                -7.2737     0.9915  -7.336 2.21e-13 ***
## Month09                                -7.4497     0.9959  -7.481 7.44e-14 ***
## Month10                                -5.7232     0.9860  -5.805 6.46e-09 ***
## Month11                                -5.1842     0.9944  -5.214 1.86e-07 ***
## Month12                                -5.2021     0.9749  -5.336 9.52e-08 ***
## AlarmTime12-17                        -49.9845     0.6514 -76.738  < 2e-16 ***
## AlarmTime18-23                        -40.1459     0.6528 -61.502  < 2e-16 ***
## AlarmTime6-11                         -41.2591     0.6701 -61.572  < 2e-16 ***
## NatureCode50MAJ                        -7.7294    56.8312  -0.136 0.891816    
## NatureCode50MCI1                       27.0040    85.9155   0.314 0.753287    
## NatureCode50PD                         45.0858    44.2582   1.019 0.308347    
## NatureCode50PI                         11.7468    43.1613   0.272 0.785498    
## NatureCode50PIN                         6.9150    50.3770   0.137 0.890822    
## NatureCode50VOT                        43.8551    67.9249   0.646 0.518513    
## NatureCodeABDOM                        26.3874    42.9878   0.614 0.539326    
## NatureCodeACHRPD                       60.0472    46.2310   1.299 0.193998    
## NatureCodeACHRPI                       49.6325    46.2311   1.074 0.283015    
## NatureCodeALLERG                       23.6859    43.0192   0.551 0.581917    
## NatureCodeANBITE                       11.7936    43.4519   0.271 0.786071    
## NatureCodeASSLT                        22.2219    43.0910   0.516 0.606066    
## NatureCodeBCKPN                        34.2506    43.0908   0.795 0.426703    
## NatureCodeBLEED                        22.0207    42.9813   0.512 0.608419    
## NatureCodeBREATH                       20.7730    42.9697   0.483 0.628789    
## NatureCodeBRUSH                        48.3069    43.6321   1.107 0.268235    
## NatureCodeBURN                         24.7300    43.6683   0.566 0.571181    
## NatureCodeCARDIA                       11.0232    42.9896   0.256 0.797631    
## NatureCodeCHEST                        21.0846    42.9715   0.491 0.623664    
## NatureCodeCHOKE                        16.0340    43.0508   0.372 0.709562    
## NatureCodeCOALRM                      119.6694    43.0211   2.782 0.005409 ** 
## NatureCodeCODVIO                       73.9665    67.9347   1.089 0.276250    
## NatureCodeCONFIN                       29.0330    43.6706   0.665 0.506169    
## NatureCodeDIABET                       16.4890    42.9818   0.384 0.701256    
## NatureCodeDROWN                        10.8063    60.7568   0.178 0.858832    
## NatureCodeEHAZ                         46.7907    43.0476   1.087 0.277060    
## NatureCodeELEC                        -12.2493    45.5716  -0.269 0.788089    
## NatureCodeELEV                        -48.5650    48.4698  -1.002 0.316363    
## NatureCodeEXPLO                        15.8532    44.0853   0.360 0.719145    
## NatureCodeEXPOS                        14.7874    44.0274   0.336 0.736971    
## NatureCodeEXTRI                        -7.2951    46.2297  -0.158 0.874614    
## NatureCodeEYE                          28.6906    48.9859   0.586 0.558084    
## NatureCodeFALL                         31.8106    42.9782   0.740 0.459206    
## NatureCodeFALLFO                       75.4685    43.2777   1.744 0.081192 .  
## NatureCodeFIRAL                        38.2288    42.9653   0.890 0.373596    
## NatureCodeFIRAST                       46.0743    43.0613   1.070 0.284635    
## NatureCodeFIRINS                     -189.2691    85.9246  -2.203 0.027615 *  
## NatureCodeFIRINV                     -107.5676    43.9085  -2.450 0.014294 *  
## NatureCodeFSERV                        88.0140    85.9199   1.024 0.305661    
## NatureCodeFUELSP                       36.5523    43.6190   0.838 0.402038    
## NatureCodeGAS                          36.8301    43.0052   0.856 0.391773    
## NatureCodeHAR                           5.3311    50.3750   0.106 0.915719    
## NatureCodeHAZ                          41.3331    44.4666   0.930 0.352615    
## NatureCodeHAZMAT                       43.4820    43.9721   0.989 0.322736    
## NatureCodeHEAD                         22.4047    43.0229   0.521 0.602534    
## NatureCodeHEART                        21.1062    42.9918   0.491 0.623472    
## NatureCodeINACC                        26.2219    45.6769   0.574 0.565920    
## NatureCodeINVES                      -120.4648    45.8054  -2.630 0.008541 ** 
## NatureCodeLIGHT                       108.4958    45.6771   2.375 0.017537 *  
## NatureCodeMED                          16.8593    43.1383   0.391 0.695931    
## NatureCodeMEDALM                       57.5180    43.4639   1.323 0.185721    
## NatureCodeMEDEM                        14.2228    43.0287   0.331 0.740990    
## NatureCodeMEDNE                        33.8170    85.9202   0.394 0.693887    
## NatureCodeMUTUAL                      -17.8019    70.7769  -0.252 0.801411    
## NatureCodeNORESP                     -103.4574    67.9310  -1.523 0.127767    
## NatureCodeODOR                         16.5666    43.1954   0.384 0.701331    
## NatureCodeOSFIR                        29.2011    43.0088   0.679 0.497167    
## NatureCodeOVDOSE                       23.8240    42.9877   0.554 0.579439    
## NatureCodePREG                         20.6101    43.0197   0.479 0.631880    
## NatureCodePSYCH                        31.0813    43.1266   0.721 0.471096    
## NatureCodeSEIZUR                       18.4210    42.9777   0.429 0.668202    
## NatureCodeSERV                         29.4488    42.9892   0.685 0.493328    
## NatureCodeSICK                         21.5386    42.9721   0.501 0.616215    
## NatureCodeSMOKE                        31.7279    43.1991   0.734 0.462671    
## NatureCodeSTAB                         25.4310    43.0629   0.591 0.554820    
## NatureCodeSTRINJ                       -2.1027    49.6544  -0.042 0.966223    
## NatureCodeSTROKE                       20.8259    42.9779   0.485 0.627981    
## NatureCodeSTRTRP                        4.8910    45.2669   0.108 0.913958    
## NatureCodeSTRUC                        -2.0314    43.0133  -0.047 0.962332    
## NatureCodeTRAUM                        26.2160    43.1103   0.608 0.543112    
## NatureCodeTSUI                         14.0629    43.7729   0.321 0.748007    
## NatureCodeUNCON                        20.7339    42.9724   0.482 0.629456    
## NatureCodeUNKNFO                       66.8186    49.6103   1.347 0.178025    
## NatureCodeUNKNWN                       16.4417    43.0290   0.382 0.702383    
## NatureCodeVFIRE                        20.0373    44.0782   0.455 0.649409    
## NatureCodeWATER                       -86.4962    67.9272  -1.273 0.202890    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 74.4 on 130053 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.1166, Adjusted R-squared:  0.1158 
## F-statistic: 145.5 on 118 and 130053 DF,  p-value: < 2.2e-16

From the model above we are able to identify how each variable holds significance in terms of their impact on response time. We are also able to see which components inside each given variable hold impact and how much using the P-Value produced from the model.

Then we used the stepwise regression for our first method of variable selection.

## Start:  AIC=1122051
## response_time_seconds ~ TotalStaffOnIncident + FireDistrict + 
##     DayOfWeek + shift + Month + AlarmTime + NatureCode
## 
##                        Df Sum of Sq       RSS     AIC
## <none>                              719839931 1122051
## - DayOfWeek             6    281067 720120999 1122090
## - shift                 2    237672 720077603 1122090
## - TotalStaffOnIncident  1    227388 720067319 1122090
## - Month                11    627583 720467514 1122142
## - NatureCode           76  21137791 740977722 1125666
## - AlarmTime             3  34467031 754306963 1128133
## - FireDistrict         19  37785762 757625693 1128673

## 
## Call:
## lm(formula = response_time_seconds ~ TotalStaffOnIncident + FireDistrict + 
##     DayOfWeek + shift + Month + AlarmTime + NatureCode, data = gso.fire.filtered)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -340.43  -47.60   -6.53   40.29  383.81 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           408.9655    45.3873   9.011  < 2e-16 ***
## TotalStaffOnIncident                    0.9030     0.1409   6.410 1.46e-10 ***
## FireDistrictBattalion I              -147.1994    14.6499 -10.048  < 2e-16 ***
## FireDistrictBattalion II             -168.3688    14.6493 -11.493  < 2e-16 ***
## FireDistrictBattalion III            -136.0269    14.6590  -9.279  < 2e-16 ***
## FireDistrictBattalion IV             -135.2603    14.6599  -9.227  < 2e-16 ***
## FireDistrictBattalion V              -140.2406    14.6571  -9.568  < 2e-16 ***
## FireDistrictColfax                     43.1908    36.3430   1.188 0.234670    
## FireDistrictFire District 13          -75.2848    15.9148  -4.731 2.24e-06 ***
## FireDistrictFire District 14          -72.6721    14.7392  -4.931 8.21e-07 ***
## FireDistrictGuilford  College        -158.4621    15.6616 -10.118  < 2e-16 ***
## FireDistrictHigh Point Fire District -163.8343    42.3830  -3.866 0.000111 ***
## FireDistrictMcLeansville             -116.1506    17.8435  -6.509 7.57e-11 ***
## FireDistrictNortheast Fire            197.9903    75.8371   2.611 0.009036 ** 
## FireDistrictOak Ridge Fire Dept       -48.9577    33.7024  -1.453 0.146324    
## FireDistrictOther Protection Agency    82.9478    75.8323   1.094 0.274031    
## FireDistrictPinecroft Sedgefield      -88.8137    17.3627  -5.115 3.14e-07 ***
## FireDistrictPleasant Garden            -3.0358    22.8157  -0.133 0.894149    
## FireDistrictPTI Airport               -70.8423    75.8393  -0.934 0.350248    
## FireDistrictSummerfield                27.2998    21.2993   1.282 0.199942    
## FireDistrictWhitsett Fire District     62.2344    75.8355   0.821 0.411847    
## DayOfWeekMonday                        -0.2482     0.7727  -0.321 0.748055    
## DayOfWeekSaturday                      -2.3148     0.7737  -2.992 0.002774 ** 
## DayOfWeekSunday                        -4.0342     0.7723  -5.224 1.76e-07 ***
## DayOfWeekThursday                      -0.2812     0.7799  -0.361 0.718390    
## DayOfWeekTuesday                        0.1089     0.7763   0.140 0.888393    
## DayOfWeekWednesday                     -0.1706     0.7817  -0.218 0.827270    
## shiftB                                  2.5005     0.5063   4.939 7.86e-07 ***
## shiftC                                 -0.6240     0.5058  -1.234 0.217346    
## Month02                                -5.6342     1.0059  -5.601 2.13e-08 ***
## Month03                                -7.6323     0.9891  -7.717 1.20e-14 ***
## Month04                                -6.3495     0.9981  -6.361 2.00e-10 ***
## Month05                                -7.5836     0.9854  -7.696 1.41e-14 ***
## Month06                                -7.5963     1.0216  -7.436 1.05e-13 ***
## Month07                                -8.5329     0.9951  -8.575  < 2e-16 ***
## Month08                                -7.2737     0.9915  -7.336 2.21e-13 ***
## Month09                                -7.4497     0.9959  -7.481 7.44e-14 ***
## Month10                                -5.7232     0.9860  -5.805 6.46e-09 ***
## Month11                                -5.1842     0.9944  -5.214 1.86e-07 ***
## Month12                                -5.2021     0.9749  -5.336 9.52e-08 ***
## AlarmTime12-17                        -49.9845     0.6514 -76.738  < 2e-16 ***
## AlarmTime18-23                        -40.1459     0.6528 -61.502  < 2e-16 ***
## AlarmTime6-11                         -41.2591     0.6701 -61.572  < 2e-16 ***
## NatureCode50MAJ                        -7.7294    56.8312  -0.136 0.891816    
## NatureCode50MCI1                       27.0040    85.9155   0.314 0.753287    
## NatureCode50PD                         45.0858    44.2582   1.019 0.308347    
## NatureCode50PI                         11.7468    43.1613   0.272 0.785498    
## NatureCode50PIN                         6.9150    50.3770   0.137 0.890822    
## NatureCode50VOT                        43.8551    67.9249   0.646 0.518513    
## NatureCodeABDOM                        26.3874    42.9878   0.614 0.539326    
## NatureCodeACHRPD                       60.0472    46.2310   1.299 0.193998    
## NatureCodeACHRPI                       49.6325    46.2311   1.074 0.283015    
## NatureCodeALLERG                       23.6859    43.0192   0.551 0.581917    
## NatureCodeANBITE                       11.7936    43.4519   0.271 0.786071    
## NatureCodeASSLT                        22.2219    43.0910   0.516 0.606066    
## NatureCodeBCKPN                        34.2506    43.0908   0.795 0.426703    
## NatureCodeBLEED                        22.0207    42.9813   0.512 0.608419    
## NatureCodeBREATH                       20.7730    42.9697   0.483 0.628789    
## NatureCodeBRUSH                        48.3069    43.6321   1.107 0.268235    
## NatureCodeBURN                         24.7300    43.6683   0.566 0.571181    
## NatureCodeCARDIA                       11.0232    42.9896   0.256 0.797631    
## NatureCodeCHEST                        21.0846    42.9715   0.491 0.623664    
## NatureCodeCHOKE                        16.0340    43.0508   0.372 0.709562    
## NatureCodeCOALRM                      119.6694    43.0211   2.782 0.005409 ** 
## NatureCodeCODVIO                       73.9665    67.9347   1.089 0.276250    
## NatureCodeCONFIN                       29.0330    43.6706   0.665 0.506169    
## NatureCodeDIABET                       16.4890    42.9818   0.384 0.701256    
## NatureCodeDROWN                        10.8063    60.7568   0.178 0.858832    
## NatureCodeEHAZ                         46.7907    43.0476   1.087 0.277060    
## NatureCodeELEC                        -12.2493    45.5716  -0.269 0.788089    
## NatureCodeELEV                        -48.5650    48.4698  -1.002 0.316363    
## NatureCodeEXPLO                        15.8532    44.0853   0.360 0.719145    
## NatureCodeEXPOS                        14.7874    44.0274   0.336 0.736971    
## NatureCodeEXTRI                        -7.2951    46.2297  -0.158 0.874614    
## NatureCodeEYE                          28.6906    48.9859   0.586 0.558084    
## NatureCodeFALL                         31.8106    42.9782   0.740 0.459206    
## NatureCodeFALLFO                       75.4685    43.2777   1.744 0.081192 .  
## NatureCodeFIRAL                        38.2288    42.9653   0.890 0.373596    
## NatureCodeFIRAST                       46.0743    43.0613   1.070 0.284635    
## NatureCodeFIRINS                     -189.2691    85.9246  -2.203 0.027615 *  
## NatureCodeFIRINV                     -107.5676    43.9085  -2.450 0.014294 *  
## NatureCodeFSERV                        88.0140    85.9199   1.024 0.305661    
## NatureCodeFUELSP                       36.5523    43.6190   0.838 0.402038    
## NatureCodeGAS                          36.8301    43.0052   0.856 0.391773    
## NatureCodeHAR                           5.3311    50.3750   0.106 0.915719    
## NatureCodeHAZ                          41.3331    44.4666   0.930 0.352615    
## NatureCodeHAZMAT                       43.4820    43.9721   0.989 0.322736    
## NatureCodeHEAD                         22.4047    43.0229   0.521 0.602534    
## NatureCodeHEART                        21.1062    42.9918   0.491 0.623472    
## NatureCodeINACC                        26.2219    45.6769   0.574 0.565920    
## NatureCodeINVES                      -120.4648    45.8054  -2.630 0.008541 ** 
## NatureCodeLIGHT                       108.4958    45.6771   2.375 0.017537 *  
## NatureCodeMED                          16.8593    43.1383   0.391 0.695931    
## NatureCodeMEDALM                       57.5180    43.4639   1.323 0.185721    
## NatureCodeMEDEM                        14.2228    43.0287   0.331 0.740990    
## NatureCodeMEDNE                        33.8170    85.9202   0.394 0.693887    
## NatureCodeMUTUAL                      -17.8019    70.7769  -0.252 0.801411    
## NatureCodeNORESP                     -103.4574    67.9310  -1.523 0.127767    
## NatureCodeODOR                         16.5666    43.1954   0.384 0.701331    
## NatureCodeOSFIR                        29.2011    43.0088   0.679 0.497167    
## NatureCodeOVDOSE                       23.8240    42.9877   0.554 0.579439    
## NatureCodePREG                         20.6101    43.0197   0.479 0.631880    
## NatureCodePSYCH                        31.0813    43.1266   0.721 0.471096    
## NatureCodeSEIZUR                       18.4210    42.9777   0.429 0.668202    
## NatureCodeSERV                         29.4488    42.9892   0.685 0.493328    
## NatureCodeSICK                         21.5386    42.9721   0.501 0.616215    
## NatureCodeSMOKE                        31.7279    43.1991   0.734 0.462671    
## NatureCodeSTAB                         25.4310    43.0629   0.591 0.554820    
## NatureCodeSTRINJ                       -2.1027    49.6544  -0.042 0.966223    
## NatureCodeSTROKE                       20.8259    42.9779   0.485 0.627981    
## NatureCodeSTRTRP                        4.8910    45.2669   0.108 0.913958    
## NatureCodeSTRUC                        -2.0314    43.0133  -0.047 0.962332    
## NatureCodeTRAUM                        26.2160    43.1103   0.608 0.543112    
## NatureCodeTSUI                         14.0629    43.7729   0.321 0.748007    
## NatureCodeUNCON                        20.7339    42.9724   0.482 0.629456    
## NatureCodeUNKNFO                       66.8186    49.6103   1.347 0.178025    
## NatureCodeUNKNWN                       16.4417    43.0290   0.382 0.702383    
## NatureCodeVFIRE                        20.0373    44.0782   0.455 0.649409    
## NatureCodeWATER                       -86.4962    67.9272  -1.273 0.202890    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 74.4 on 130053 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.1166, Adjusted R-squared:  0.1158 
## F-statistic: 145.5 on 118 and 130053 DF,  p-value: < 2.2e-16

From this result we are able to see in further in detail which components affect response time and at what level due to the stepwise regression model eliminating the components that have little to no impact on response time.

After that we also tried LASSO variable selection using the glmnet function.

## Warning in storage.mode(xd) <- "double": NAs introduced by coercion

##           Length Class     Mode   
## a0          79   -none-    numeric
## beta      5214   dgCMatrix S4     
## df          79   -none-    numeric
## dim          2   -none-    numeric
## lambda      79   -none-    numeric
## dev.ratio   79   -none-    numeric
## nulldev      1   -none-    numeric
## npasses      1   -none-    numeric
## jerr         1   -none-    numeric
## offset       1   -none-    logical
## call         4   -none-    call   
## nobs         1   -none-    numeric

This result allows us to view the level of significance in terms of each coefficient due to the shrinkage that this model produces.

Random Forest

As an alternative to our linear regression models we ran a tree based random forest model. For this model we produced a bar graph to display the significance in each variable from a model we created using the Ranger() command. In this graph we are able to visually identify how much the models accuracy would be affected by each variable.

After producing the bar plot we are able to see that Nature Code holds the most importance in relation to the model followed by Fire District and Alarm Time.

Conclusion

Our extensive research has provided us with the ability to provide sufficient results for each of our research questions. Throughout this report, we have worked on the public data set provided by the Greensboro Fire Department which holds all reported fire incidents in Greensboro, NC from July 1, 2010, to May 31, 2022. Firstly, to attack our research questions on the topics of analyzing response time and forecasting future fire incidents we explored our data set to a great extent making multiple univariate and bivariate visualizations. We then followed this by using a variety of different forecasting and regression models to produce high-accuracy results.

While identifying what factors affect fire service response time and how to improve fire service response time, we first visualized the relationship between response time and every other tested variable. Through this, we were able to decide which variables to apply to a linear regression model to produce the level of significance that each variable holds with response time. In our initial linear regression model, we tested the variables; TotalStaffonIncident, FireDistrict, DayofWeek, shift, Month, AlarmTime, and NatureCode. We then used two different variable selection methods (Stepwise and LASSO) and analyzed the results between the three models. Following the results from our linear regression models, we ran a random forest model to display the significance that each given variable obtains. With our results from this process, we were able to conclude that NatureCode followed by FireDistrict and AlarmTime held the most significance in relation to response time.

In regards to the effect response time has on the total loss we were able to quickly identify this through our bivariate visualizations. In this process, we compared the variables PropertyLoss, TotalLosses, CivilianInjuries, CivilianFatalities, FireServiceInjuries, and FireServiceFatalities to response time and produced scatter plots and a combination of box and violin plots to visualize the relationship between the two. After making our property and total loss visualizations we were able to see a very slight relationship between them and response time. Throughout the different distributions of response time, both variables seemed to linger around the middle betwen 100 and 400 seconds. Contrary to belief, it’s safe to conclude that most of the time there is not much effect response time has on total and property loss. Given the situation, there are definitely occasions where total and property loss are affected massively depending on the speed at which the fire service responds to the call. In regards to Civilian and Fire Service injuries and fatalities, there were very few situations where either occurred across the data set. From our visualizations, we can see that across the twelve years there have only been ten total civilian fatalities, two occasions where more than two civilians were reported injured, six occasions where more than one fire service injury occurred, and absolutely no fatalities within the fire department ever. Through our results, we were able to conclude not much significance in the relationship between fire service response time and total loss.

Furthermore, used many models to forecast the daily and monthly frequencies of fire incidents in Greensboro, NC. The models that we used were TBATS, ARIMA, Holt-Winters, and Combination. All the models tended to do fairly well except when there was a drastic drop in the frequency of fire incidents in 2020. But in general, the models followed the trend of the plots nicely. The reason for that drop could have been covid and when people stayed home a lot more. From our research, we found that the models did fairly well, and sometimes some models did better than others.

Discussion

In this report we worked intensively in order to analyze the key factors for reported fire incidents in Greensboro, NC using all reported data from June 1, 2010 to May 31, 2022. When further analyzing the data set given we were able to identify through our exploratory data analysis as well as regression and forecasting models how each variable holds significance in relations to fire service response time and successfully forecast the future in regards to reported fire incidents in Greensboro, NC. We also were able to further visualize the effects of response time on total loss.

Although our extensive data set equips us with a variety of different variables over a long time period, it is not perfect due to missing values within the data set and human error from the importation of the reported data. With us having to cater a bit to an imperfect data set we were still able to produce highly accurate results with the amount of information given. With this further exploration we were then able to forecast an accurate prediction in terms of reported fire incidents for the rest of the year.

With our research results there are now many safe assumptions that can be made behind what holds affect within the different components that we studied in reported fire incidents. Through our exploratory data analysis we can see the affects of all given variables in relation to response time. Our regression models accurately depict to what extent the importance of each given variable has on the speed of response time. And lastly, our forecasting model equips the Greensboro Fire Department with an accurate prediction to continue through the rest of the year under maximum efficiency.

Looking through our findings there are many different observations that can be made in order to minimize response time thus leading towards safer situations. We were able to identify the difference in response time on any given day, week, year and even looked at how different response time varies in terms of the working shift for the fire service workers. Our findings also highlighted how much response time effects losses in terms of civilian injuries and fatalities, fire service injuries and fatalities, and total property loss. All of our discoveries from this research help equip the Greensboro Fire Department with high level information in order to perform they’re job to maximum efficiency as well as minimizing response time in order to assure the safest outcome possible.

References

1 Aleisa, E., & Savsar, M. (2014). Response Time Analysis of firefighting operations using discrete event simulation. IEEE Xplore. Retrieved July 26, 2022, from https://ieeexplore.ieee.org/abstract/document/7046087

2 Buffington, T., & Ezekoye, O. A. (2019). Statistical Analysis of Fire Department Response Times … - researchgate. Retrieved July 26, 2022, from https://www.researchgate.net/publication/333158843_Statistical_Analysis_of_Fire_Department_Response_Times_and_Effects_on_Fire_Outcomes_in_the_United_States

3 Challands, N. (n.d.). The relationships between fire service response time and … - researchgate. Retrieved July 26, 2022, from https://www.researchgate.net/publication/226273683_The_Relationships_Between_Fire_Service_Response_Time_and_Fire_Outcomes

4 KC, K. and Corcoran, J., 2017. Modelling residential fire incident response times: A spatial analytic approach. Applied Geography, 84, pp.64-74. https://doi.org/10.1016/j.apgeog.2017.03.004

5 Lian, X., Melancon, S., Presta, J.-R., Reevesman, A., Spiering, B. J., & Woodbridge, D. M.-kyung. (2019). Scalable real-time prediction and analysis of San Francisco Fire Department Response Times. IEEE Xplore. Retrieved July 26, 2022, from https://ieeexplore.ieee.org/document/9060320

6 Mukhopadhyay, A., Vorobeychik, Y., Dubey, A., & Biswas, G. (2017, May 1). Prioritized allocation of emergency responders based on a continuous-time incident prediction model: Proceedings of the 16th conference on autonomous agents and multiagent systems. ACM Other conferences. Retrieved July 26, 2022, from https://dl.acm.org/doi/10.5555/3091125.3091154

7 Yuanwei Li, Sikui Zhang, and Guoyi Fu. 2022. Forest fire modeling and analysis based on K-means clustering algorithm and time series forecasting. In 2022 2nd International Conference on Bioinformatics and Intelligent Computing (BIC 2022), January 21–23, 2022, Harbin, China. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3523286.3524560

Acknowledgements

The authors acknowledge financial support by NSF Grant HRD#1719498 and NSA Grant #H98230-22-1-0017.

Appendix

This section contains the code to reproduce the analysis described in this report.

Below is a list of the libraries that we used throughout the report.

Loading necessary packages

library(readr)
library(lubridate)
library(ggplot2)
library(dslabs)
library(dplyr)
library(ggrepel)
library(ggthemes)
library(broom)
library(glmnet)
library(MASS)
library(tidyverse)
library(Amelia)
library(mice)
library(caret)
library(forecast)
library(urca)
library(randomForest)
library(ranger)
library(astsa)

Data Preprocessing

gso.fire <- read_csv("Greensboro_Fire_Incidents.csv")

Filtering for incidents prior to June 1st 2022

gso.fire = 
  gso.fire %>%
  mutate(AlarmDate2 = make_date(Year, Month, Day)) %>%
  filter(AlarmDate2 < '2022-06-01')

Filter for property type 1-2 family dwellings

gso.fire = gso.fire %>%
  filter(PropertyUse == '419 - 1 or 2 family dwelling')

Convert to call process time to seconds and add to gso.fire

gso.fire = gso.fire %>%
  mutate(call_process_period = lubridate::hms(CallProcessingTime), 
         call_process_seconds = period_to_seconds(call_process_period))

Convert response time to seconds and add to gso.fire

gso.fire = gso.fire %>%
  mutate(response_time_period = lubridate::hms(ResponseTime),
         response_time_seconds = period_to_seconds(response_time_period))

Convert total response time to seconds and add to gso.fire

gso.fire = gso.fire %>%
  mutate(total_response_period = lubridate::hms(TotalResponseTime),
         total_response_seconds = period_to_seconds(total_response_period))

Cutoff point for response time

cutoff.response_time = gso.fire %>% 
  dplyr::select(response_time_seconds) %>%
  filter(!is.na(response_time_seconds)) %>%
  summarize(c = quantile(response_time_seconds, .75) + 3 * IQR(response_time_seconds)) %>%
  pull(c)

Univariate graphs

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = response_time_seconds, fill = "red")) +
  geom_histogram(show.legend = FALSE, color = "black") +
  xlab("Response Time (Seconds)") +
  ylab("Count") +
  ggtitle("Distribution of Response Time") +
  theme_economist()

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = TotalStaffOnIncident)) +
  geom_histogram(bins = 25,show.legend = FALSE, color = "black", fill = "pink") +
  scale_y_log10() +
  xlab("Total Staff on Incident") +
  ylab("Count (Log10 scale)") +
  ggtitle("Distribution of Total Staff on Incident") +
  theme_economist() +
  xlim(0,25)

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  count(CivilianInjuries) %>%
  mutate(nlog = log10(n)) %>%
  ggplot(aes(x = CivilianInjuries, y = nlog)) +
  geom_col(color = "black", fill = "pink") +
  xlab("Civilian Injuries") +
  ylab("Count (Log10 scale)") +
  ggtitle("Distribution of Civilian Injuries") +
  theme_economist()

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  count(CivilianFatalities) %>%
  mutate(nlog = log10(n)) %>%
  ggplot(aes(x = CivilianFatalities, y = nlog)) +
  geom_col(color = "black", fill = "pink") +
  xlab("Civilian Fatalities") +
  ylab("Count (Log10 scale)") +
  ggtitle("Distribution of Civilian Fatalities") +
  theme_economist()

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  count(FireServiceInjuries) %>%
  mutate(nlog = log10(n)) %>%
  ggplot(aes(x = FireServiceInjuries, y = nlog)) +
  geom_col(color = "black", fill = "pink") +
  xlab("Fire Service Injuries") +
  ylab("Count (Log10 scale)") +
  ggtitle("Distribution of Fire Service Injuries") +
  theme_economist()

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = PropertyLoss)) +
  geom_histogram(bins = 20,show.legend = FALSE, color = "black", fill = "pink") +
  scale_x_log10() +
  xlab("Property Loss (Log10 scale)") +
  ylab("Count") +
  ggtitle("Distribution of Property Loss") +
  theme_economist()

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = TotalLosses)) +
  geom_histogram(bins = 20,show.legend = FALSE, color = "black", fill = "pink") +
  scale_x_log10() +
  xlab("Total Loss (Log10 scale)") +
  ylab("Count") +
  ggtitle("Distribution of Total Loss") +
  theme_economist()

gso.fire %>%
  ggplot(aes(x = AlarmHour)) +
  geom_bar(color = "black", fill = "pink") +
  scale_x_continuous(breaks = seq(0,23,1)) +
  theme(axis.text.x = element_text(angle = 45)) +
  xlab("Alarm Hour") +
  ylab("Count") +
  ggtitle("Frequency of Fire Alarms by Alarm Hour")

gso.fire %>%
  count(station) %>%
  mutate(station = reorder(station, n)) %>%
  ggplot(aes(x = station, y = n)) +
  geom_bar(stat = "identity", color = "black", fill = "pink") +
  theme(axis.text.y = element_text(size = 6)) +
  xlab("Fire Station (Station Number)") +
  ylab("Count") +
  ggtitle("Frequency of Calls by Fire Station") +
  coord_flip()

gso.fire %>%
  count(FireDistrict) %>%
  mutate(FireDistrict = reorder(FireDistrict, n)) %>%
  ggplot(aes(x = FireDistrict, y = n)) +
  geom_bar(stat = "identity", color = "black", fill = "pink") +
  theme(axis.text.y = element_text(size = 6)) +
  xlab("Fire Station (Station Number)") +
  ylab("Count") +
  ggtitle("Frequency of Calls by Fire District") +
  coord_flip()

gso.fire %>%
  count(Year) %>%
  ggplot(aes(x = Year, y = n)) +
  geom_bar(stat = "identity", fill = "pink", color = "black") +
  scale_x_continuous(breaks = seq(2010,2022,1)) +
  xlab("Year") +
  ylab("Count") +
  ggtitle("Frequency of Calls by Year")

gso.fire %>%
  ggplot(aes(x = shift)) +
  geom_bar(show.legend = FALSE, fill = "pink", color = "black") +
  xlab("Shift") +
  ylab("Count") +
  ggtitle("Frequency of Calls by Shift")

gso.fire %>%
  ggplot(aes(x = DayOfWeek)) +
  geom_bar(fill = "pink", color = "black") +
  xlab("Days of the Week") +
  ylab("Count") +
  ggtitle("Frequency of Calls by Days of the Week")

gso.fire %>%
  ggplot(aes(x = Week)) +
  geom_bar(fill = "pink", color = "black") +
  scale_x_continuous(breaks = seq(1,53,1)) +
  theme(axis.text.x = element_text(angle = 90)) +
  xlab("Week") +
  ylab("Count") +
  ggtitle("Frequency of Calls by Week")

gso.fire %>%
  ggplot(aes(x = Month)) +
  geom_bar(fill = "pink", color = "black") +
  xlab("Month") +
  ylab("Count") +
  ggtitle("Frequency of Calls by Month")

Bivariate Graphs

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = Week, y = response_time_seconds, group = Week)) +
  geom_boxplot() +
  scale_x_continuous(breaks = seq(1,53,1)) +
  theme(axis.text.x = element_text(angle = 90)) +
  xlab("Week") +
  ylab("Response Time (Seconds)") +
  ggtitle("Week vs Response Time")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = station, y = response_time_seconds)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 90)) +
  xlab("Station") +
  ylab("Response Time (Seconds)") +
  ggtitle("Station vs Response Time")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = AlarmHour, y = response_time_seconds, group = AlarmHour)) +
  geom_boxplot() +
  scale_x_continuous(breaks = seq(0,23,1)) +
  xlab("Alarm Hour") +
  ylab("Response Time (Seconds)") +
  ggtitle("Alarm Hour vs Response Time")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = NumberOfAlarms, y = response_time_seconds, group = NumberOfAlarms)) +
  geom_boxplot(show.legend = FALSE) +
  xlab("Number of Alarms") +
  ylab("Response Time (Seconds)") +
  ggtitle("Number of Alarms vs Response Time")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = NatureCode, y = response_time_seconds)) +
  theme(axis.text.x = element_text(angle = 90)) +
  geom_boxplot() +
  xlab("Nature Code") +
  ylab("Response Time (Seconds)") +
  ggtitle("Nature Code vs Response Time")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(y = TotalLosses, x = as.factor(AlarmHour))) +
  geom_boxplot(show.legend = FALSE) +
  scale_y_log10() +
  ylab("Total Loss (Log10 scale)") +
  xlab("Alarm Hour") +
  ggtitle("Total Loss by Alarm Hour")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(y = TotalLosses, x = shift)) +
  geom_boxplot(show.legend = FALSE) +
  scale_y_log10() +
  ylab("Total Loss (Log10 scale)") +
  xlab("Shift") +
  ggtitle("Total Loss by Shift")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(y = PropertyLoss, x = as.factor(AlarmHour))) +
  geom_boxplot(show.legend = FALSE) +
  scale_y_log10() +
  ylab("Property Loss (Log10 scale)") +
  xlab("Alarm Hour") +
  ggtitle("Property Loss by Alarm Hour")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(y = PropertyLoss, x = shift)) +
  geom_boxplot(show.legend = FALSE) +
  scale_y_log10() +
  ylab("Property Loss (Log10 scale)") +
  xlab("Shift") +
  ggtitle("Property Loss by Shift") +
  theme_economist()

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = response_time_seconds, y = CivilianInjuries)) +
  geom_point(stat = "identity", show.legend = FALSE) +
  ylab("Civilian Injuries") +
  xlab("Response Time (Seconds)") +
  ggtitle("Civilian Injuries vs Response Time") +
  theme_economist()

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = TotalStaffOnIncident, y = response_time_seconds)) +
  geom_point(stat = "identity", show.legend = FALSE) +
  xlab("Total Staff On Incident") +
  ylab("Response Time (Seconds)") +
  ggtitle("Total Staff on Incident vs Response Time") +
  theme_economist()

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(y = PropertyLoss, x = response_time_seconds)) +
  geom_point(stat = "identity", show.legend = FALSE) +
  scale_y_log10() +
  ylab("Property Loss (Log10 scale)") +
  xlab("Response Time (Seconds)") +
  ggtitle("Property Loss vs Response Time") +
  theme_economist()

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(y = TotalLosses, x = response_time_seconds)) +
  geom_point(stat = "identity", show.legend = FALSE) +
  scale_y_log10() +
  ylab("Total Loss (Log10 scale)") +
  xlab("Response Time (Seconds)") +
  ggtitle("Total Loss vs Response Time") +
  theme_economist()

gso.fire %>%
  filter(!is.na(response_time_seconds)) %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = DayOfWeek, y = response_time_seconds)) +
  geom_violin(trim = FALSE, fill = "light blue") +
  geom_boxplot(width = 0.3, fill = "pink") +
  xlab("Days of the Week") +
  ylab("Response Time (Seconds)") +
  ggtitle("Violin & Box Plot of Response Time Densities by Days of the Week") +
  theme(axis.text.x = element_text(angle = 90), legend.position = "hide")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = Year, y = response_time_seconds, group = Year)) +
  geom_violin(trim = FALSE, fill = "light blue") +
  geom_boxplot(show.legend = FALSE, width = .3, fill = "pink") +
  scale_x_continuous(breaks = seq(2010,2022,1)) +
  xlab("Year") +
  ylab("Response Time (Seconds)") +
  ggtitle("Violin & Box Plot of Response Time Densities by Year")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = shift, y = response_time_seconds)) +
  geom_violin(show.legend = FALSE, fill = "light blue") +
  geom_boxplot(width = .5, show.legend = FALSE, fill = "pink") +
  xlab("Work shift") +
  ylab("Response Time (Seconds)") +
  ggtitle("Violin and Box Plot of Response Time Densities by Work Shift")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = FireDistrict, y = response_time_seconds)) +
  geom_violin(fill = "light blue") +
  geom_boxplot(show.legend = FALSE, width = .2, fill = "pink") +
  theme(axis.text.x = element_text(angle = 90), legend.position = "hide") +
  xlab("Fire District") +
  ylab("Response Time (Seconds)") +
  ggtitle("Violin and Box Plot of Response Time Densities by Fire District")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = Month, y = response_time_seconds)) +
  geom_violin(show.legend = FALSE, fill = "light blue") +
  geom_boxplot(show.legend = FALSE, width = .3, fill = "pink") +
  xlab("Month") +
  ylab("Response Time (Seconds)") +
  ggtitle("Violin and Box Plot of Response Time Densities by Month")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = AlarmHour, y = response_time_seconds, group = AlarmHour)) +
  geom_violin(fill = "light blue") +
  geom_boxplot(width = .3, fill = "pink") +
  scale_x_continuous(breaks = seq(0,23,1)) +
  xlab("Alarm Hour") +
  ylab("Response Time (Seconds)") +
  ggtitle("Violin and Box Plot of Response Time Densities by Alarm Hour")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = NumberOfAlarms, y = response_time_seconds, group = NumberOfAlarms)) +
  geom_violin(fill = "light blue") +
  geom_boxplot(width = .3, fill = "pink") +
  xlab("Number of Alarms") +
  ylab("Response Time (Seconds)") +
  ggtitle("Violin and Box Plot of Response Time Densities by Number of Alarms")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = CivilianInjuries, y = response_time_seconds, group = CivilianInjuries)) +
  geom_violin(show.legend = FALSE, fill = "light blue") +
  geom_boxplot(width = .5, show.legend = FALSE, fill = "pink") +
  xlab("Civilian Injuries") +
  ylab("Response Time (Seconds)") +
  ggtitle("Violin and Box Plot of Response Time Densities by Civilian Injuries")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = CivilianFatalities, y = response_time_seconds, group = CivilianFatalities)) +
  geom_violin(show.legend = FALSE, fill = "light blue") +
  geom_boxplot(width = .5, show.legend = FALSE, fill = "pink") +
  xlab("Civilian Fatalities") +
  ylab("Response Time (Seconds)") +
  ggtitle("Violin and Box Plot of Response Time Densities by Civilian Fatalities")

gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>%
  ggplot(aes(x = FireServiceInjuries, y = response_time_seconds, group = FireServiceInjuries)) +
  geom_violin(show.legend = FALSE, fill = "light blue") +
  geom_boxplot(width = .5, show.legend = FALSE, fill = "pink") +
  xlab("Fire Service Injuries") +
  ylab("Response Time (Seconds)") +
  ggtitle("Violin and Box Plot of Response Time Densities by Fire Service Injuries")

Time Series Visualization

Visualization of fire incidents in Greensboro, NC from July 1, 2010 to May 31, 2022.

gso.fire.ts = gso.fire %>%
  count(AlarmDate2)

gso.fire.ts %>%
  ggplot(aes(x = AlarmDate2, y = n)) +
  geom_line() +
  scale_x_date(date_labels = "%m-%Y", date_breaks = "6 month") + 
  xlab("Year") +
  ylab("Number of Fire Incidents") +
  ggtitle("Time series of daily number of fire incidents") +
  theme(axis.text.x = element_text(angle = 90))

incidents_annual = gso.fire.ts %>%
  separate(AlarmDate2, sep="-", into = c("Year", "Month", "Day"))

incidents_annual = cbind(incidents_annual, gso.fire.ts$AlarmDate2)
colnames(incidents_annual) = c("Year","Month","Day","n","AlarmDate2")

incidents_annual %>%
  ggplot(aes(x = AlarmDate2, y = n)) + 
  ggtitle("Time Series Plots for Frequency of Daily Fire Incidents Organized by Year") + 
  geom_line() + 
  scale_x_date(date_labels = "%m-%Y", date_breaks = "1 month") + 
  xlab("Date") + ylab("Count") + 
  theme(axis.text.x = element_text(angle = 90, size = 14)) +
  theme(axis.text.y = element_text(size = 14)) +
  theme(plot.title = element_text(size=22)) +
  theme(axis.title=element_text(size=18)) +
  facet_wrap(~Year, scales = "free_x")

Monthly time series visualizations

gso.fire.ts.month = gso.fire %>%
  mutate(Monthly = make_date(Year, Month)) %>%
  group_by(Monthly, Month, Year) %>%
  count()

gso.fire.ts.month %>%
  ggplot(aes(x = Monthly, y = n)) +
  geom_line() +
  scale_x_date(date_labels = "%m-%y", date_breaks = "6 month") + 
  xlab("Month") +
  ylab("Number of Fire Incidents") +
  ggtitle("Time series of monthly number of fire incidents for 2010-2022") +
  theme(axis.text.x = element_text(angle = 90))

Time Series Forecasting

#forecasting for daily number of fire incidents
gso.fire.ts.2 = ts(gso.fire.ts$n, start = c(2010, 182), end = c(2022, 153),
                 frequency = 365)
ndiffs(gso.fire.ts.2)
ndiffs(gso.fire.ts.2)
acf(gso.fire.ts.2)
summary(ur.kpss(gso.fire.ts.2))

#attempt at differencing
gso.fire.ts.2.dif = diff(gso.fire.ts.2)
acf(gso.fire.ts.2.dif)

cbind("Fire Incidents" = gso.fire.ts.2,
      "First differenced" = diff(gso.fire.ts.2)) %>%
  autoplot(facets=TRUE) +
  xlab("Date") + ylab("") +
  ggtitle("Stationarity Transformations of Daily Fire Incidents")

y.dif = msts(gso.fire.ts.2.dif, seasonal.periods=c(7,365.25))
daily.dif.tbats = tbats(y.dif)
daily.dif.tbats.fc = forecast::forecast(daily.dif.tbats, h = 214)

#point forecast values
dif.Yhat = daily.dif.tbats.fc$mean
dif.Yhat.upper = daily.dif.tbats.fc$upper
dif.Yhat.lower = daily.dif.tbats.fc$lower

Yhat = cumsum(c(gso.fire.ts.2[length(gso.fire.ts.2)],dif.Yhat))
Yhat = ts(Yhat, start = c(2022, 152), frequency=365)
Yhat_upper = cumsum(c(gso.fire.ts.2[length(gso.fire.ts.2)],dif.Yhat.upper))
Yhat_upper = ts(Yhat_upper, start = c(2022, 152), end = c(2023,1), frequency=365)
Yhat_lower = cumsum(c(gso.fire.ts.2[length(gso.fire.ts.2)],dif.Yhat.lower))
Yhat_lower = ts(Yhat_lower, start = c(2022, 152), end = c(2023,1), frequency=365)

fire.ts = ts(gso.fire.ts.2[3992:4352],start = c(2021, 25), end = c(2022, 153),
              frequency = 360) # A 360 day trim to see forecast better

#TBAT forecasting
autoplot(fire.ts) + 
  autolayer(Yhat, series="Point Forecasts") + 
  ggtitle("TBATS Forecasting Model for Daily Number of Fire Incidents") +
  xlab("Date") + ylab("Fire Incidents") + 
  theme(title = element_text(size = 10), legend.position = "bottom")

accuracy(daily.dif.tbats.3)

#ARIMA forecasting
daily.arima = auto.arima(gso.fire.ts.2)
daily.arima.fc = forecast(daily.arima, h=214)
autoplot(daily.arima.fc) +
  ggtitle("ARIMA Forecasting Model for Daily Number of Fire Incidents") +
  xlab("Date") + 
  coord_cartesian(xlim = c(2020, 2023)) +
  ylab("Fire Incidents") + 
  theme(title = element_text(size = 10))

accuracy(daily.arima.3)

gso.fire.ts.3 = ts(gso.fire.ts$n, start = c(2010, 182), end = c(2022, 59),
                                 frequency = 365)
#attempt at differencing
gso.fire.ts.3.dif = diff(gso.fire.ts.3)

y.dif.3 = msts(gso.fire.ts.3.dif, seasonal.periods=c(7,365.25))
daily.dif.tbats.3 = tbats(y.dif.3)
daily.dif.tbats.fc.3 = forecast::forecast(daily.dif.tbats.3, h=671)
dif.Yhat.3 = daily.dif.tbats.fc.3$mean #point forecast values 

Yhat.3 = cumsum(c(gso.fire.ts.3[length(gso.fire.ts.3)],dif.Yhat.3))
Yhat.3 = ts(Yhat.3, start = c(2022, 60), frequency=365)

gso.fire.ts.true = ts(gso.fire.ts$n, start = c(2010, 182), end = c(2022, 153),
                    frequency = 365)
#TBAT forecasting
autoplot(gso.fire.ts.3) + 
  autolayer(gso.fire.ts.true, alpha=0.7, series="True Count") +
  autolayer(Yhat.3, alpha=0.6, series="Point Forecasts") + 
  ggtitle("TBATS Forecasting Model for Daily Number of Fire Incidents") +
  xlab("Date") +
  ylab("Fire Incidents") + 
  theme(title = element_text(size = 10), legend.position = "bottom")

accuracy(daily.dif.tbats.3)

#ARIMA forecasting
daily.arima.3 = auto.arima(gso.fire.ts.3)
daily.arima.fc.3 = forecast(daily.arima.3, h=671)
autoplot(daily.arima.fc.3) + 
  ggtitle("ARIMA Forecasting Model for Daily Number of Fire Incidents") +
  xlab("Date") +
  coord_cartesian(xlim = c(2020, 2023)) +
  ylab("Fire Incidents") + 
  theme(title = element_text(size = 10))

accuracy(daily.arima.3)

Forecasting for monthly number of fire incidents.

#convert gso.fire.ts.month to official ts object
gso.fire.ts.month.2 = ts(gso.fire.ts.month$n, start = c(2010, 07, 01), end = c(2022, 05, 01),
                          frequency = 12)

#looking at stationarity
acf(gso.fire.ts.month.2)
summary(ur.kpss(gso.fire.ts.month.2))

ndiffs(gso.fire.ts.month.2) #1
nsdiffs(gso.fire.ts.month.2) #0

#attempt at differencing
gso.fire.ts.month.2.dif = diff(gso.fire.ts.month.2)

#looking at stationarity for difference
acf(gso.fire.ts.month.2.dif)

summary(ur.kpss(gso.fire.ts.month.2.dif))
ndiffs(gso.fire.ts.month.2.dif) #0
nsdiffs(gso.fire.ts.month.2.dif) #0

cbind("Fire Incidents" = gso.fire.ts.month.2,
      "First differenced" = diff(gso.fire.ts.month.2)) %>%
  autoplot(facets=TRUE) +
  xlab("Date") + ylab("") +
  ggtitle("Stationarity Transformation of Monthly Fire Incidents")

y.dif.month = msts(gso.fire.ts.month.2.dif, seasonal.periods=12)
dif.tbats.month = tbats(y.dif.month)
dif.tbats.month.fc = forecast::forecast(dif.tbats.month, h = 8)

#point forecast values
dif.Yhat.month = dif.tbats.month.fc$mean

Yhat.month = cumsum(c(gso.fire.ts.month.2[length(gso.fire.ts.month.2)],dif.Yhat.month))
Yhat.month = ts(Yhat.month, start = c(2022, 05), frequency=12)

#TBATS forecasting
autoplot(gso.fire.ts.month.2) + 
  autolayer(Yhat.month, series="Point Forecasts") + 
  ggtitle("TBATS Forecasting Model for Monthly Frequency of Fire Incidents") +
  xlab("Date") +
  ylab("Frequency of Fire Incidents") + 
  theme(axis.text.x = element_text(angle = 90), title = element_text(size=10), 
        legend.position = "bottom") 

#checking accuracy
accuracy(dif.tbats.month)

#ARIMA forecasting
monthly.arima = auto.arima(gso.fire.ts.month.2)
monthly.arima.fc = forecast(monthly.arima, h=8)
autoplot(monthly.arima.fc) +
  ggtitle("ARIMA Forecasting Model for Monthly Frequency of Fire Incidents") +
  xlab("Date") + 
  coord_cartesian(xlim = c(2020, 2023)) +
  ylab("Frequency of Fire Incidents") + 
  theme(title = element_text(size = 10))

#checking accuracy
accuracy(monthly.arima)

#HoltWinters forecasting
dif.monthly.holt = HoltWinters(gso.fire.ts.month.2.dif)
dif.fcast.monthly = forecast::forecast(dif.monthly.holt, h=8, level=c(80,95))
dYhat.Holt = dif.fcast.monthly$mean

Yhat.Holt = cumsum(c(gso.fire.ts.month.2[length(gso.fire.ts.month.2)],dYhat.Holt))
Yhat.Holt = ts(Yhat.Holt, start = c(2022, 05), frequency=12)

autoplot(gso.fire.ts.month.2) +
  autolayer(Yhat.Holt, series = "Point Forecasts") +
  ggtitle("Forecasts for Monthly Frequency of Fire Incidents using HoltWinters") +
  xlab("Date") +
  ylab("Frequency of Fire Incidents") +
  theme(axis.text.x = element_text(angle = 90), title = element_text(size=9), 
        legend.position = "bottom")

#checking accuracy
accuracy(dif.fcast.monthly)

Multi-step forecasting for daily data

#getting differenced training and test sets 
y.dif.2 = msts(gso.fire.ts.2.dif, seasonal.periods=c(7,365.25))
dif.training.tbats.2 = subset(y.dif.2, end=length(y.dif.2)-151)
dif.test.tbats.2 = subset(y.dif.2, start=length(y.dif.2)-150)
dif.daily.train.tbats.2 = tbats(dif.training.tbats.2)
dif.train.fc = forecast(dif.daily.train.tbats.2, h=151)

#getting point forecasts 
dYhat.tbats = dif.train.fc$mean

#setting up normal ts training and test sets 
y.2 = msts(gso.fire.ts.2, seasonal.periods=c(7,365.25))
training.tbats.2 = subset(y.2, end=length(y.2)-151)
test.tbats.2 = subset(y.2, start=length(y.2)-150)

#reverting back to original points   
Yhat.tbats = cumsum(c(training.tbats.2[length(training.tbats.2)], dYhat.tbats))
Yhat.tbats = ts(Yhat.tbats, start = c(2022, 1), frequency = 365)

autoplot(training.tbats.2) + 
  autolayer(Yhat.tbats, series="Point Forecasts") + 
  autolayer(test.tbats.2, series="Test Set", alpha = 0.7) + 
  ggtitle("Multi-Step TBATS Daily Forecasts of Frequency of Fire Incidents") + 
  xlab("Date") +
  ylab("Frequency of Fire Incidents") +
  coord_cartesian(xlim = c(2020, 2023)) +
  theme(legend.position = "bottom")

dif.daily.test.tbats.2 = tbats(dif.test.tbats.2)
accuracy(dif.daily.test.tbats.2)

#multi-step ARIMA forecast for daily data
training.arima = subset(gso.fire.ts.2, end=length(gso.fire.ts.2)-151)
test.arima = subset(gso.fire.ts.2, start=length(gso.fire.ts.2)-150)
daily.train.arima = auto.arima(training.arima)
fc.train.arima = forecast(daily.train.arima, h=151)

autoplot(fc.train.arima) + 
  autolayer(test.arima, series="Test Set", alpha = 0.7) + 
  ggtitle("Multi-Step ARIMA Daily Forecasts of Frequency of Fire Incidents") + 
  xlab("Date") +
  ylab("Frequency of Fire Incidents") + 
  coord_cartesian(xlim = c(2020, 2023)) +
  theme(legend.position = "bottom")

daily.test.arima = arima(test.arima)
accuracy(daily.test.arima)

#time series combination
train = window(gso.fire.ts.2, end=c(2019, 12))
h = length(gso.fire.ts.2) - length(train)

#ARIMA model 
ARIMA = forecast::forecast(auto.arima(train, lambda=0, biasadj=TRUE), h=h)

#TBATS model 
train.tbats = msts(gso.fire.ts.2.dif, end=c(2019,12), seasonal.periods = c(7,365.25))
TBATS = forecast::forecast(tbats(train.tbats, biasadj = TRUE), h=h)
dYhat.combo = TBATS$mean

Yhat.combo = cumsum(c(train[length(train)], dYhat.combo))
Yhat.combo = ts(Yhat.combo, start = c(2019, 12), frequency = 365)
Combination = (ARIMA[["mean"]] + Yhat.combo)/2
true.daily = gso.fire.ts %>%
  filter(AlarmDate2 >= as.Date("2019/12/01") & AlarmDate2 <= as.Date("2021/05/31"))
true.daily.ts = ts(true.daily$n, 
                            start = c(2019,12), end = c(2021,153), 
                            frequency = 365)

autoplot(train) +
  autolayer(true.daily.ts) +
  autolayer(Combination, series="Combination", alpha=0.8) +
  autolayer(ARIMA, series="ARIMA", PI=F) +
  autolayer(Yhat.combo, series="TBATS", alpha=0.7) +
  xlab("Date") +
  ylab("Frequency of Fire Incidents") +
  ggtitle("Time Series Combination for Daily Frequency of Fire Incidents") + 
  theme(legend.position = "bottom")

c(ARIMA = accuracy(ARIMA, gso.fire.ts.2)["Test set", "RMSE"],
  TBATS = accuracy(Yhat.combo, gso.fire.ts.2)["Test set", "RMSE"],
  Combination =
    accuracy(Combination, gso.fire.ts.2)["Test set", "RMSE"])

Multi-step forecasting for monthly data

#multi-step TBATS
y2.dif.monthly = msts(gso.fire.ts.month.2.dif, seasonal.periods=12)
dif.training.tbats.2.monthly = subset(y2.dif.monthly, end=length(y2.dif.monthly)-5)
dif.test.tbats.2.monthly = subset(y2.dif.monthly, start=length(y2.dif.monthly)-4)
dif.train.tbats.2.monthly = tbats(dif.training.tbats.2.monthly)
dif.fc.train.tbats.2.monthly = forecast(dif.train.tbats.2.monthly, h=5)
dYhat.ms.tbats.monthly = dif.fc.train.tbats.2.monthly$mean

y2.monthly = msts(gso.fire.ts.month.2, seasonal.periods=12)
training.tbats.monthly.2 = subset(y2.monthly, end=length(y2.monthly)-5)
test.tbats.monthly.2 = subset(y2.monthly, start=length(y2.monthly)-4)

Yhat.ms.tbats.monthly = cumsum(c(training.tbats.monthly.2[length(training.tbats.monthly.2)],
                                 dYhat.ms.tbats.monthly))
Yhat.ms.tbats.monthly = ts(Yhat.ms.tbats.monthly, start = c(2022,01), frequency=12)

autoplot(training.tbats.monthly.2) + 
  autolayer(Yhat.ms.tbats.monthly, series="Point Forecasts") + 
  autolayer(test.tbats.monthly.2, series="Test Set", alpha = 0.7) + 
  ggtitle("Multi-Step TBATS Monthly Forecasts of Frequency of Fire Incidents") + 
  xlab("Date") +
  ylab("Frequency of Fire Incidents") + 
  theme(legend.position = "bottom", title=element_text(size=10)) + 
  theme(axis.text.x = element_text(angle = 90))

dif.test.tbats.2.monthly = tbats(dif.test.tbats.2.monthly)
accuracy(dif.test.tbats.2.monthly)

#multi-step ARIMA
training.arima.monthly = subset(gso.fire.ts.month.2, end=length(gso.fire.ts.month.2)-5)
test.arima.monthly = subset(gso.fire.ts.month.2, start=length(gso.fire.ts.month.2)-4)
train.arima.monthly.3 = auto.arima(training.arima.monthly)
fc.train.arima.monthly = forecast(train.arima.monthly.3, h=5)

autoplot(fc.train.arima.monthly) + 
  autolayer(test.arima.monthly, series="Test Set", alpha = 0.7) + 
  ggtitle("Multi-Step ARIMA Monthly Forecasts of Frequency of Fire Incidents") + 
  xlab("Date") +
  ylab("Frequency of Fire Incidents") + 
  coord_cartesian(xlim = c(2018, 2023)) +
  theme(legend.position = "bottom") + 
  theme(axis.text.x = element_text(angle = 90), title = element_text(size=9), 
        legend.position = "bottom")

test.arima.monthly.3 = arima(test.arima.monthly)
accuracy(test.arima.monthly.3)

#multi-step HoltWinters
dif.training.HW.monthly = subset(gso.fire.ts.month.2.dif, end=length(gso.fire.ts.month.2.dif)-5)
dif.test.HW.monthly = subset(gso.fire.ts.month.2.dif, start=length(gso.fire.ts.month.2.dif)-4)
dif.train.HW.monthly = HoltWinters(dif.training.HW.monthly)
dif.fc.HW = forecast::forecast(dif.train.HW.monthly, h=5)

dYhat.ms.HW = dif.fc.HW$mean
Yhat.ms.HW = cumsum(c(gso.fire.ts.month.2[length(gso.fire.ts.month.2)],dYhat.ms.HW))
Yhat.ms.HW = ts(Yhat.ms.HW, start = c(2022,01,01), frequency=12)

training.HW.monthly = subset(gso.fire.ts.month.2, end=length(gso.fire.ts.month.2)-5)
test.HW.monthly = subset(gso.fire.ts.month.2, start=length(gso.fire.ts.month.2)-4)
true.count.monthly = ts(gso.fire.ts.month.2, start=c(2010,07,01), end=c(2021,12,31), frequency=12)

autoplot(training.HW.monthly) + 
  autolayer(Yhat.ms.HW, series = "Point Forecasts") + 
  autolayer(test.HW.monthly, series = "Test Set") + 
  ggtitle("Multi-Step HoltWinters Monthly Forecasts of Frequency of Fire Incidents") + 
  xlab("Date") +
  ylab("Frequency of Fire Incidents") +
  theme(axis.text.x = element_text(angle = 90), legend.position = "bottom", 
        title = element_text(size = 9))

accuracy(Yhat.ms.HW, gso.fire.ts.month.2)

#multi-step HoltWinters
dif.training.HW.monthly = subset(gso.fire.ts.month.2.dif, end=length(gso.fire.ts.month.2.dif)-5)
dif.test.HW.monthly = subset(gso.fire.ts.month.2.dif, start=length(gso.fire.ts.month.2.dif)-4)
dif.train.HW.monthly = HoltWinters(dif.training.HW.monthly)
dif.fc.HW = forecast::forecast(dif.train.HW.monthly, h=5)

dYhat.ms.HW = dif.fc.HW$mean
Yhat.ms.HW = cumsum(c(gso.fire.ts.month.2[length(gso.fire.ts.month.2)],dYhat.ms.HW))
Yhat.ms.HW = ts(Yhat.ms.HW, start = c(2022,01,01), frequency=12)

training.HW.monthly = subset(gso.fire.ts.month.2, end=length(gso.fire.ts.month.2)-5)
test.HW.monthly = subset(gso.fire.ts.month.2, start=length(gso.fire.ts.month.2)-4)
true.count.monthly = ts(gso.fire.ts.month.2, start=c(2010,07,01), end=c(2021,12,31), frequency=12)

autoplot(training.HW.monthly) + 
  autolayer(Yhat.ms.HW, series = "Point Forecasts") + 
  autolayer(test.HW.monthly, series = "Test Set") + 
  ggtitle("Multi-Step HoltWinters Monthly Forecasts of Frequency of Fire Incidents") + 
  xlab("Date") +
  ylab("Frequency of Fire Incidents") +
  theme(axis.text.x = element_text(angle = 90), legend.position = "bottom", 
        title = element_text(size = 9))

accuracy(Yhat.ms.HW, gso.fire.ts.month.2)

#Time series combination for monthly
train.monthly.2 = window(gso.fire.ts.month.2, end=c(2021, 05))
h.monthly.2 = length(gso.fire.ts.month.2) - length(train.monthly.2)

#ARIMA model 
ARIMA.monthly.2 = forecast::forecast(auto.arima(train.monthly.2, lambda=0, biasadj=TRUE),
                                   h=h.monthly.2)

#TBATS model 
train.tbats.monthly.2 = msts(gso.fire.ts.month.2.dif, end=c(2021,05), seasonal.periods = 12)
tbats.monthly.2 = forecast::forecast(tbats(train.tbats.monthly.2, biasadj = TRUE),
                                     h=h.monthly.2)
dYhat.combo.monthly.2 = tbats.monthly.2$mean

Yhat.combo.monthly.2 = cumsum(c(train.monthly.2[length(train.monthly.2)], dYhat.combo.monthly.2))
Yhat.combo.monthly.2 = ts(Yhat.combo.monthly.2, start = c(2021,05), frequency = 12)

#HoltWinters Model 
HW.monthly.2 = forecast::forecast(HoltWinters(gso.fire.ts.month.2.dif), h=h.monthly.2)

dYhat.HW.monthly.2 = HW.monthly.2$mean

Yhat.HW.monthly.2 = cumsum(c(train.monthly.2[length(train.monthly.2)],dYhat.HW.monthly.2))
Yhat.HW.monthly.2 = ts(Yhat.HW.monthly.2, start = c(2021,05), frequency=12)
Combination.monthly.2 = (ARIMA.monthly.2[["mean"]] + Yhat.combo.monthly.2 + 
                           Yhat.HW.monthly.2)/3

y.monthly = msts(gso.fire.ts.month.2, seasonal.periods=12)
test.monthly = subset(y.monthly, start=length(y.monthly)-12)

autoplot(train.monthly.2) +
  autolayer(test.monthly, series="True Count") +
  autolayer(Combination.monthly.2, series="Combination", alpha=0.8) +
  autolayer(ARIMA.monthly.2, series="ARIMA", PI=F) +
  autolayer(Yhat.combo.monthly.2, series="TBATS", alpha=0.7) +
  autolayer(Yhat.HW.monthly.2, series="HoltWinters", alpha=0.7) + 
  xlab("Date") +
  ylab("Frequency of Fire Incidents") +
  ggtitle("Time Series Combination for Monthly Frequency of Fire Incidents") + 
  theme(axis.text.x = element_text(angle = 90), legend.position = "bottom", 
        title = element_text(size = 9)) 

c(ARIMA = accuracy(ARIMA.monthly.2, gso.fire.ts.month.2)["Test set", "RMSE"],
  TBATS = accuracy(Yhat.combo.monthly.2, gso.fire.ts.month.2)["Test set", "RMSE"],
  HW = accuracy(Yhat.HW.monthly.2, gso.fire.ts.month.2)["Test set", "RMSE"],
  Combination =
    accuracy(Combination.monthly.2, gso.fire.ts.month.2)["Test set", "RMSE"])

Modeling of Response Time

gso.fire.filtered = gso.fire %>%
  filter(response_time_seconds < cutoff.response_time) %>% 
  dplyr::select(response_time_seconds, everything())

gso.fire.filtered = gso.fire.filtered %>%
  mutate(AlarmTime = case_when(AlarmHour >= 0 & AlarmHour < 6 ~ "0-5", 
                               AlarmHour >= 6 & AlarmHour < 12 ~ "6-11",
                               AlarmHour >= 12 & AlarmHour < 18 ~ "12-17",
                               TRUE ~ "18-23"))

Linear Regression Model

#manual variable selection
lm.res.time = lm(response_time_seconds ~ TotalStaffOnIncident + FireDistrict + DayOfWeek + 
                   shift + Month + AlarmTime + NatureCode, data = gso.fire.filtered)
summary(lm.res.time)
round(coef(lm.res.time), 4)

lm.res.time.select = stepAIC(lm.res.time)
summary(lm.res.time.select)

#Lasso variable selection
las.Mod <- glmnet(as.matrix(gso.fire.filtered[,-1]), gso.fire.filtered$response_time_seconds, family = "gaussian")
plot(las.Mod)

Random Forest

#Random Forrest with ranger function
gso.fire.filtered.rf = gso.fire.filtered %>%
  dplyr::select(response_time_seconds,TotalStaffOnIncident , FireDistrict , DayOfWeek , 
           shift , Month , AlarmTime , NatureCode)

RF.Mod.Ranger <- ranger(response_time_seconds~TotalStaffOnIncident + FireDistrict + DayOfWeek + 
                         shift + Month + AlarmTime + NatureCode,
                       data = gso.fire.filtered.rf[complete.cases(gso.fire.filtered.rf),], importance = "impurity", num.trees = 500)

variable = names(RF.Mod.Ranger$variable.importance)
rf.vIMP.df = data.frame(variable = variable, vIMP = RF.Mod.Ranger$variable.importance)

ggplot(rf.vIMP.df, aes(x=reorder(variable,vIMP), y=vIMP,fill=variable))+ 
  geom_bar(stat="identity", position="dodge", show.legend = FALSE)+ coord_flip()+
  ylab("Variable Importance")+
  xlab("")+
  ggtitle("Information Value Summary")

Statistical Analysis of Fire Incidents in the City of Greensboro, NC

Danielle Mirzaeian, Nahom Fisseha

Faculty Research Mentor Dr. Sayed Mostafa

Graduate Research Assistant Mentor Chinedu Jude Nzekwe

2022-06-22

Abstract

Introduction

Related Literature

Data, Methods, and Analysis

Data

Methods

Exploratory Data Analysis

Univariate Explorations

Bivariate Explorations

Time Series Visualization

Daily Time series

Monthly Time Series

Results

Time Series Forecasting

Daily number of fire incidents from 2010 to 2022

Monthly number of fire incidents from 2010 to 2022

Multi-Step Forecasting

Modeling of Response Time

Linear Regression

Random Forest

Conclusion

Discussion

References

Acknowledgements

Appendix

Loading necessary packages

Data Preprocessing

Filtering for incidents prior to June 1st 2022

Filter for property type 1-2 family dwellings

Convert to call process time to seconds and add to gso.fire

Convert response time to seconds and add to gso.fire

Convert total response time to seconds and add to gso.fire

Cutoff point for response time

Univariate graphs

Bivariate Graphs

Time Series Visualization

Time Series Forecasting

Modeling of Response Time

Linear Regression Model

Random Forest