The research in this report examines twelve years of reported fire incidents in Greensboro, NC to further analyze how fire service response time is affected as well as how fire service response time affects the rate of total loss in household fires. Additionally, this report will provide a variety of forecasting models in order to predict the future in regards to reported fire incidents. To produce sufficient results a variety of statistical analysis approaches have been accomplished. We have visualized the data (univariate , bivariate, time series), forecasted the data (TBATS, ARIMA, Holtwinters), and built linear regression and random forest models in order to fully conquer the research questions at task. In result, we were able to identify what factors hold most significance in relation to response time as well as depict to what extent response time affects total loss. In addition we produced multiple forecasting models in order to accurately predict the future of reported fire incidents in Greensboro, NC.
Many calls are made to fire departments all the time. Whether it is a false alarm or not, it is important for first responders to take care of fire incidents as soon as possible. Therefore it is important to look at data about different fire incidents and see how the response time can be improved for all kinds of situations.
There are all kinds of fire incidents, some not even about fires. For example someone could be having a stroke and the local fire department was called to take action on the situation. If the incident is a fire, it could be at all kinds of different properties. Some different properties are small houses, apartment buildings, and highways. The fire could vary in size, like the wildfires in California.
No matter the type of fire incident. It is vital that first responders attend to the situation quickly. Now, there can be some factors that can affect the response time of the first responders. Some of those factors are the location, the type of property, and the type of the incident. From this project we want to explore which factors from the data have a significant affect on the response time of the fire department. By learning about these factors, the fire department could use this information to work on decreasing their response time.
For more than ten years, the city of Greensboro, NC has been collecting data on fire incidents in their area. Some of this data includes the date of the incident, the response time, the location, and the type of property. Yet, take note when looking at the data that it goes from July 1, 2010, to May 31, 2022. So for the years 2010 and 2022, only data for about half of those years was collected.
For this project we will be visualizing the fire incidents in Greensboro, NC. Our research questions are 1) What factors affect the response time to fire incidents? 2) How to improve the response time? 3) How does response time by fire emergency services affect the rate of total loss and civilian injuries? 4)Can we build models to forecast the number of fire incidents in certain areas of Greensboro based on the data from 2010 to 2022? 5) What Dynamic models can we make for forecasting fire incidents?
For this analysis, we utilized public data from the Greensboro Fire Department. This data set accounts for all reported fire incidents in the city of Greensboro and updates itself continuously. Although, we noticed there were some missing values in the data. So we did a map to check the percentage of missing values and it ended being about 14 percent for the whole data set. To further cater towards our research we filtered the data set in two areas; the date (from July 1, 2010 to May 31st, 2022) and the Property Use (“1-2 family dwelling”). In addition to this we converted ResponseTime to seconds. The initial data set can be found at this website link:https://data.greensboro-nc.gov/datasets/greensboro-fire-incidents/explore
The first step in our research project was exploratory data analysis. During this process we investigated the data looking for patterns and outliers. This was carried out by making various graphical representations of the data.
Data Visualizations consisting of one variable
For the histogram above, we learned that the typical response time falls between about 100 seconds to 400 seconds and that the distribution is normal.
From the histogram above, we can visually see the distribution of the amount of staff members reported towards a given incident. It is important to note that the Y-axis on this graph has been scaled. Overall, the great majority of total stall on incident is 4.
For the graph above, we can identify that there are not many occasions in which civilians have reported injured. There has only been two incidents in which two or more civilian injuries occurred. It is important to note that the Y-axis on this graph has been scaled. Overall, the great majority of civilian injuries is 0.
For the graph above, we can identify that there has only been a couple incidents in which fatalities have been reported in regards to civilians. It is important to note that the Y-axis on this graph has been scaled. Overall, the great majority of civilian fatalities is 0.
For the graph above, we can identify that there has only been a handful of incidents in which injuries have been reported in regards to fire service respondents. It is important to note that the Y-axis on this graph has been scaled. Overall, the great majority of fire service injuries is 0.
The histogram above displays the amount of property loss across the amount of incidents reported.
The histogram above displays the amount of total loss (property + content) across the amount of incidents reported.
From this bar graph, it can be seen that the frequency of calls to the fire stations are less during the middle of the night and increase as people start waking up.
For this graph, we wanted to find out which fire stations got the most calls. Note that the calls counted are from July 1, 2010 to Jun 1, 2022. From the results we can tell that the top five fire stations that got the most calls over ten years in Greensboro NC, was 7, 8, 14, 4, and 56.
For this bar plot, we wanted to find out which fire districts got the most calls in Greensboro, NC over ten years. The fire districts consist of a few fire stations. The most popular fire districts from this bar plot from greatest to least were Battalion II, Battalion I and Battalion III.
From this graph, we wanted to learn if there were some years that had a significant difference in number of phone calls to the fire stations in Greensboro, NC. By looking at those differences we could explore how response time varied between years that were busier than others. Note from this graph, that only half of the data was collected for the years 2010 and 2022, so those bars will be obviously lower than the bars of other years displayed in the graph.
This visual was created to look at if the number of calls to the fire station would vary depending on which shift was currently working. For this data, there were three shifts, A, B, and C. From the results, it can be seen that there is no obvious difference in the frequency of the calls to the fire station on different work shifts.
From this bar graph, it can be seen that the frequency of calls to the fire station over ten years ended up being about the same between each day of the week with no obvious differences.
From the bar plot above, it is shown that the first week and the last week of the year typically had fewer calls to the fire station compared to other weeks in the year. This graph had a little more variability in frequency in calls. For example, the dip in the middle of the bar plot. This plot helps us to look at the busier times in the data and compare the response times.
This visual was used to look further into if there was a higher frequency of calls to fire stations in certain parts of the year. From the results, June and February seem to have the lowest frequency of calls, while January and December have the highest frequency of calls. With this knowledge we can look at how response time varied with those months.
Data Visualizations of a given variable vs response time
From this box plot, we wanted to see if certain weeks would have notable differences in response time. Looking at the results, there does not seem to be much variation. Therefore, the weeks in the year might not have a significant impact on response time.
The box plot above shows that there is a lot of variability in response time depending on which station responded to the call. By looking at these results, we can explore and question why different fire stations have such big contrasts in response time.
For this graph above, we looked at nature code and how it affects response time. Nature code represents the nature of the fire incident, whether it was a fire or someone having a stroke. As shown by the graph, there is a lot of variability in response time depending on the nature of the fire incident.
From this scatter plot, we wanted to see if the number of staff on incident would affect the response time of the fire responders. The outcome of the graph shows that there is no significant change in response time when having more people on shift. The response time still falls between 100 and 600 seconds, no matter the number of responders.
From the scatter plot above, we wanted to visualize the relationship between response time and property loss. Through this we were able to identify a very slight correlation between the two. The majority of the higher property loss values fall between 100 and 400 seconds
From the scatter plot above, we wanted to visualize the relationship between response time and total loss. Through this we were able to identify a very slight correlation between the two. Similar to property loss, the majority of the higher total loss values fall between 100 and 400 seconds.
Below, the box and violin plots had similar results to the bar graphs presented earlier. Each graph will summarize briefly what have seen before.
For this violin and box plot, it can be seen that there is very little difference in response time between the days of the week.
From the plot above there is a slight increase in the average response time after the year 2013 and it continues to stay the same or slightly increase more.
For this graph it can be seen that different work shifts did not affect the response time.
As shown by the graph, the response time varies depending on which fire district responded to the fire incident.
For this box and violin plot, we learned that the time of month in the year has very little effect on the response time.
From this visual, we learned that alarm hour has a slight effect on response time, as shown by the dips in the plot.
For the graph above, there would sometimes be a couple alarms set off for a fire incident, but it did not affect the response time whether the alarm was once or twice.
For the graph above, we can see the how each level of civilian injury various in relations to response time.
For the graph above, we can see the how each level of civilian fatality various in relations to response time.
For the graph above, we can see the how each level of fire service injury various in relations to response time.
This section contains the visualization of all the data of the fire incidents in Greensboro, NC from July 1, 2010 to May 31, 2022.
Above is a time series plot of the daily number of fire incidents from July 1, 2010 to June 1st 2022. As shown by the plot, there were big spikes around 2014, 2018, and 2021.
To take a deeper look into the time series below is the daily time series graphs for the years 2010 to 2022.
Above is the time series of the daily fire incidents broken into plots by year. From these graphs we have a clearer view of peaks for each year. So from the results, we can see that there were major peeks around April 2011, March 2014, October 2018, and February 2021.
Above is a time series graph depicting the number of monthly fire incidents from July 1, 2010 to May 31 2022. The graph shows an upward trend but with drastic drops occurring at dates like November of 2019 and November 2021.
In order to forecast the data we had to convert it into time series objects with the ts function in R studio.
Before forecasting we checked the data for stationarity. Meaning that we are checking that the time series has a constant mean and variance over time.
To check if the data needs any first difference calculations then we used the ndiffs function. Also to check if there needs to be any seasonal difference calculations then we used the nsdiffs function.
#determine the number of differences required for time series to be made stationary
ndiffs(gso.fire.ts.2) #1
## [1] 1
nsdiffs(gso.fire.ts.2) #0
## [1] 0
From the results, the data needs one difference, so we used the diff function to do that.
Using the acf function we check that the data is stationary after we had done the first difference for the data.
#attempt at differencing
gso.fire.ts.2.dif = diff(gso.fire.ts.2)
#checking stationarity of first difference
acf(gso.fire.ts.2.dif)
Using the acf function we made sure that it did not take many lags to hit 0. From the results, the first difference made our data stationary.
The plot above shows the daily time series graph before and after the first difference was calculated.
After checking for stationarity we could begin forecasting.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.112468 6.553429 4.981169 NaN Inf 0.4296495 0.001337923
We used the Exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components (TBATS model) because it is a popular model that tends to perform well at forecasting time series.
For the plot above, we used the whole data set, from July 1, 2010 to May 31, 2022. We planned to forecast the fire incidents from June 1, 2010 to December 31, 2022. From the plot we can see that the TBATS was doing well, it seemed to predict a slow big dip that would eventually start going up again before 2022 ended. The dip though seems to go below the average at one point.
## ME RMSE MAE MPE MAPE MASE
## Training set 0.07291525 6.507242 4.929365 -4.066674 16.8887 0.6931919
## ACF1
## Training set -0.0002287735
We also chose the Autoregressive Integrated Moving Average (ARIMA) model because it has been known to perform competently. Compared to the TBATS model, the ARIMA automatically calculates the differences, so those don’t have to be done manually.
For the plot above, there was a slight increase briefly and then after that the forecast created a horizontal line. The horizontal line appears to be close to the average line of the data.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.1338178 6.518173 4.943409 NaN Inf 0.4280165 0.001781049
In the plot above we can see that the TBATS model(light red) did not follow the true number of fire incidents(light blue). This appears to be because the data end three months earlier at March 1, 2022 to see how the predictions would appear. So around that time period there were some drastic drops in number of fire incidents which led to the TBATS model forecasting such a severe drop in incidents.
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0767143 6.502043 4.919923 -4.048972 16.86875 0.6907802
## ACF1
## Training set -0.0001927383
From the plot above, there was an increase briefly and then the forecast created a horizontal line. Plus, the horizontal line appears to be close to the average line of the data. So when ended the data set three months earlier the ARIMA model did better than the TBATS for predicting number of future fire incidents.
We again converted the data into a time series object us the ts function in order to use the functions for forecasting the time series data.
Then we checked the monthly data for stationarity.
We used the ndiffs and nsdiffs functions for checking if a first difference or seasonal difference had to be calculated
ndiffs(gso.fire.ts.2) #1
## [1] 1
nsdiffs(gso.fire.ts.2) #0
## [1] 0
So, our data needs one first difference and that is done with the diff function. After that, we used the acf function to make sure that the difference transformation made our data stationary.
Using the acf function again we made sure that it did not take many lags to hit 0. From the results, the first difference made our data stationary.
The plot above shows the monthly time series graph before and after the first difference was calculated.
Now that the monthly data is stationary we can being forecasting.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -4.171551 56.04023 43.91867 -Inf Inf 0.3824204 -0.02345482
From the time series plot above, the TBATS model (red line) shows an increase for the future number of fire incidents. The line appears to fit fairly well with the trend of the data.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 6.008386 66.80605 49.59427 0.2636763 5.173699 0.6704695 -0.0142654
In the plot above, it can be seen that the ARIMA model was producing similar spikes compared to the TBATS model and had done a fairly good job.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -5.423912 81.16663 60.75395 -Inf Inf 0.7440428 -0.5099868
As seen above, the HoltWinters model predicted much larger spikes upwards compared to the TBATS model. As a result, with such a drastic increase it is likely it would be less accurate than the TBATS model for this data.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.593745 7.71444 5.982271 -Inf Inf 0.4312064 -0.005924354
For the plot above, the data was split into a training set and a test set. The training set was from July 1, 2010 to December 31, 2022. The test set was from January 1, 2022 to May 31, 2022. The results show that the TBATS model did fairly well, but it did not follow the big dip in the true data.
From the accuracy function, we can see that the errors of the TBATS model were small. The root mean square error was only 7.71.
## ME RMSE MAE MPE MAPE MASE
## Training set -2.836533e-13 7.793247 5.969387 -6.113375 19.90425 0.7813334
## ACF1
## Training set 0.2085914
From the visual above, the data was again split into a training set and a test set. The training set was from July 1, 2010 to December 31, 2022. The test set was from January 1, 2022 to May 31, 2022. With the results of th plot and the accuracy function, the ARIMA model did fairly and had a root mean square error of only 7.79.
## ARIMA TBATS Combination
## 8.853813 19.038254 9.176881
As can be seen above, the data was split into training and test sets with the same dates as the two previous plots. From the results it can be seen that the TBATS model did not do that well and predicted a bigger decrease than what actually happened with the actual data.
From the accuracy function, we can see that the ARIMA and Combination models did better
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.7417519 170.9214 133.6995 123.8174 123.8174 0.5040507 -0.4291488
For this plot, it can be seen that the TBATS model did not expect such a big dip in the true data but it still predicted values that appear to fit well with the plot.
From the accuracy function, we had a root mean squared error of 170.92, which is pretty high but that is because of the big dip in the true data that the TBATS model was not able to predict.
## ME RMSE MAE MPE MAPE MASE
## Training set 1.364256e-13 114.6902 93.28 -1.371908 9.592816 0.6410997
## ACF1
## Training set -0.3627437
As you can see, the ARIMA model had the same issue as the TBATS model with the big dip in the true data.
The root mean squared error for this model ended up being 114.69, which is pretty high.
## ME RMSE MAE MPE MAPE ACF1 Theil's U
## Test set -27.91337 134.5553 99.61927 -4.403427 10.79837 -0.4022097 0.6941563
For the model above it can been seen that the HoltWinters method was also not able to predict the big dip in the true data, but what it did predict followed the trend of the data nicely.
From the accuracy function, the root mean squared error was 134.55, which is significantly high.
## ARIMA TBATS HW Combination
## 110.67193 79.13765 55.57372 76.41376
As shown by the plot above, the models were unable to predict the big dip in the true data that happened in 2020. But all the models followed the trend of the data fairly well.
From the accuracy function the root mean squared error for ARIMA was 110.67, TBATS’ was 79.14, HoltWinters’ was 55.57, and the Combination model’s was 76.41. All were fairly high because of the huge drop in the true data.
In this section the results of the predictive modeling of fire incidents is displayed.
There were two types of modeling done, linear regression and random forest. With this, we can evaluate the main factors that affect response time for fire incidents.
In order to successfully depict the affect of each given variable in relation to response time the first model we sought was a Linear Regression model. With this approach we are able to identify which surrounding variables from our original data set directly impact our independent variable which in this case is fire service response time. First we filtered our cutoff point in regards to response time and ran a linear regression model with all given variables that we identified to have a direct impact on response time. Then we used two different variable selection methods to finalize the surrounding dependent variables being used in the Linear Regression model Once we completed the variable selection process we were able to identify that the variables holding significance in correlation to response time were TotalStaffOnIncident, FireDistrict, DayOfWeek, shift, Month, AlarmHour, and NatureCode. When originally running this test a couple of our dependent variables had processed as one big character component not giving the model its true ability to compare all given components within that given variable. To solve this we mutated these variables as factors in order to unravel the different contents within each variables allowing the model to accurately compare each given variable to fire service response time. In return we were able to mathematically see the significance of each variable and its impact on response time as a whole.
First we created a linear regression model on all given variables that directly impact response time.
##
## Call:
## lm(formula = response_time_seconds ~ TotalStaffOnIncident + FireDistrict +
## DayOfWeek + shift + Month + AlarmTime + NatureCode, data = gso.fire.filtered)
##
## Residuals:
## Min 1Q Median 3Q Max
## -340.43 -47.60 -6.53 40.29 383.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 408.9655 45.3873 9.011 < 2e-16 ***
## TotalStaffOnIncident 0.9030 0.1409 6.410 1.46e-10 ***
## FireDistrictBattalion I -147.1994 14.6499 -10.048 < 2e-16 ***
## FireDistrictBattalion II -168.3688 14.6493 -11.493 < 2e-16 ***
## FireDistrictBattalion III -136.0269 14.6590 -9.279 < 2e-16 ***
## FireDistrictBattalion IV -135.2603 14.6599 -9.227 < 2e-16 ***
## FireDistrictBattalion V -140.2406 14.6571 -9.568 < 2e-16 ***
## FireDistrictColfax 43.1908 36.3430 1.188 0.234670
## FireDistrictFire District 13 -75.2848 15.9148 -4.731 2.24e-06 ***
## FireDistrictFire District 14 -72.6721 14.7392 -4.931 8.21e-07 ***
## FireDistrictGuilford College -158.4621 15.6616 -10.118 < 2e-16 ***
## FireDistrictHigh Point Fire District -163.8343 42.3830 -3.866 0.000111 ***
## FireDistrictMcLeansville -116.1506 17.8435 -6.509 7.57e-11 ***
## FireDistrictNortheast Fire 197.9903 75.8371 2.611 0.009036 **
## FireDistrictOak Ridge Fire Dept -48.9577 33.7024 -1.453 0.146324
## FireDistrictOther Protection Agency 82.9478 75.8323 1.094 0.274031
## FireDistrictPinecroft Sedgefield -88.8137 17.3627 -5.115 3.14e-07 ***
## FireDistrictPleasant Garden -3.0358 22.8157 -0.133 0.894149
## FireDistrictPTI Airport -70.8423 75.8393 -0.934 0.350248
## FireDistrictSummerfield 27.2998 21.2993 1.282 0.199942
## FireDistrictWhitsett Fire District 62.2344 75.8355 0.821 0.411847
## DayOfWeekMonday -0.2482 0.7727 -0.321 0.748055
## DayOfWeekSaturday -2.3148 0.7737 -2.992 0.002774 **
## DayOfWeekSunday -4.0342 0.7723 -5.224 1.76e-07 ***
## DayOfWeekThursday -0.2812 0.7799 -0.361 0.718390
## DayOfWeekTuesday 0.1089 0.7763 0.140 0.888393
## DayOfWeekWednesday -0.1706 0.7817 -0.218 0.827270
## shiftB 2.5005 0.5063 4.939 7.86e-07 ***
## shiftC -0.6240 0.5058 -1.234 0.217346
## Month02 -5.6342 1.0059 -5.601 2.13e-08 ***
## Month03 -7.6323 0.9891 -7.717 1.20e-14 ***
## Month04 -6.3495 0.9981 -6.361 2.00e-10 ***
## Month05 -7.5836 0.9854 -7.696 1.41e-14 ***
## Month06 -7.5963 1.0216 -7.436 1.05e-13 ***
## Month07 -8.5329 0.9951 -8.575 < 2e-16 ***
## Month08 -7.2737 0.9915 -7.336 2.21e-13 ***
## Month09 -7.4497 0.9959 -7.481 7.44e-14 ***
## Month10 -5.7232 0.9860 -5.805 6.46e-09 ***
## Month11 -5.1842 0.9944 -5.214 1.86e-07 ***
## Month12 -5.2021 0.9749 -5.336 9.52e-08 ***
## AlarmTime12-17 -49.9845 0.6514 -76.738 < 2e-16 ***
## AlarmTime18-23 -40.1459 0.6528 -61.502 < 2e-16 ***
## AlarmTime6-11 -41.2591 0.6701 -61.572 < 2e-16 ***
## NatureCode50MAJ -7.7294 56.8312 -0.136 0.891816
## NatureCode50MCI1 27.0040 85.9155 0.314 0.753287
## NatureCode50PD 45.0858 44.2582 1.019 0.308347
## NatureCode50PI 11.7468 43.1613 0.272 0.785498
## NatureCode50PIN 6.9150 50.3770 0.137 0.890822
## NatureCode50VOT 43.8551 67.9249 0.646 0.518513
## NatureCodeABDOM 26.3874 42.9878 0.614 0.539326
## NatureCodeACHRPD 60.0472 46.2310 1.299 0.193998
## NatureCodeACHRPI 49.6325 46.2311 1.074 0.283015
## NatureCodeALLERG 23.6859 43.0192 0.551 0.581917
## NatureCodeANBITE 11.7936 43.4519 0.271 0.786071
## NatureCodeASSLT 22.2219 43.0910 0.516 0.606066
## NatureCodeBCKPN 34.2506 43.0908 0.795 0.426703
## NatureCodeBLEED 22.0207 42.9813 0.512 0.608419
## NatureCodeBREATH 20.7730 42.9697 0.483 0.628789
## NatureCodeBRUSH 48.3069 43.6321 1.107 0.268235
## NatureCodeBURN 24.7300 43.6683 0.566 0.571181
## NatureCodeCARDIA 11.0232 42.9896 0.256 0.797631
## NatureCodeCHEST 21.0846 42.9715 0.491 0.623664
## NatureCodeCHOKE 16.0340 43.0508 0.372 0.709562
## NatureCodeCOALRM 119.6694 43.0211 2.782 0.005409 **
## NatureCodeCODVIO 73.9665 67.9347 1.089 0.276250
## NatureCodeCONFIN 29.0330 43.6706 0.665 0.506169
## NatureCodeDIABET 16.4890 42.9818 0.384 0.701256
## NatureCodeDROWN 10.8063 60.7568 0.178 0.858832
## NatureCodeEHAZ 46.7907 43.0476 1.087 0.277060
## NatureCodeELEC -12.2493 45.5716 -0.269 0.788089
## NatureCodeELEV -48.5650 48.4698 -1.002 0.316363
## NatureCodeEXPLO 15.8532 44.0853 0.360 0.719145
## NatureCodeEXPOS 14.7874 44.0274 0.336 0.736971
## NatureCodeEXTRI -7.2951 46.2297 -0.158 0.874614
## NatureCodeEYE 28.6906 48.9859 0.586 0.558084
## NatureCodeFALL 31.8106 42.9782 0.740 0.459206
## NatureCodeFALLFO 75.4685 43.2777 1.744 0.081192 .
## NatureCodeFIRAL 38.2288 42.9653 0.890 0.373596
## NatureCodeFIRAST 46.0743 43.0613 1.070 0.284635
## NatureCodeFIRINS -189.2691 85.9246 -2.203 0.027615 *
## NatureCodeFIRINV -107.5676 43.9085 -2.450 0.014294 *
## NatureCodeFSERV 88.0140 85.9199 1.024 0.305661
## NatureCodeFUELSP 36.5523 43.6190 0.838 0.402038
## NatureCodeGAS 36.8301 43.0052 0.856 0.391773
## NatureCodeHAR 5.3311 50.3750 0.106 0.915719
## NatureCodeHAZ 41.3331 44.4666 0.930 0.352615
## NatureCodeHAZMAT 43.4820 43.9721 0.989 0.322736
## NatureCodeHEAD 22.4047 43.0229 0.521 0.602534
## NatureCodeHEART 21.1062 42.9918 0.491 0.623472
## NatureCodeINACC 26.2219 45.6769 0.574 0.565920
## NatureCodeINVES -120.4648 45.8054 -2.630 0.008541 **
## NatureCodeLIGHT 108.4958 45.6771 2.375 0.017537 *
## NatureCodeMED 16.8593 43.1383 0.391 0.695931
## NatureCodeMEDALM 57.5180 43.4639 1.323 0.185721
## NatureCodeMEDEM 14.2228 43.0287 0.331 0.740990
## NatureCodeMEDNE 33.8170 85.9202 0.394 0.693887
## NatureCodeMUTUAL -17.8019 70.7769 -0.252 0.801411
## NatureCodeNORESP -103.4574 67.9310 -1.523 0.127767
## NatureCodeODOR 16.5666 43.1954 0.384 0.701331
## NatureCodeOSFIR 29.2011 43.0088 0.679 0.497167
## NatureCodeOVDOSE 23.8240 42.9877 0.554 0.579439
## NatureCodePREG 20.6101 43.0197 0.479 0.631880
## NatureCodePSYCH 31.0813 43.1266 0.721 0.471096
## NatureCodeSEIZUR 18.4210 42.9777 0.429 0.668202
## NatureCodeSERV 29.4488 42.9892 0.685 0.493328
## NatureCodeSICK 21.5386 42.9721 0.501 0.616215
## NatureCodeSMOKE 31.7279 43.1991 0.734 0.462671
## NatureCodeSTAB 25.4310 43.0629 0.591 0.554820
## NatureCodeSTRINJ -2.1027 49.6544 -0.042 0.966223
## NatureCodeSTROKE 20.8259 42.9779 0.485 0.627981
## NatureCodeSTRTRP 4.8910 45.2669 0.108 0.913958
## NatureCodeSTRUC -2.0314 43.0133 -0.047 0.962332
## NatureCodeTRAUM 26.2160 43.1103 0.608 0.543112
## NatureCodeTSUI 14.0629 43.7729 0.321 0.748007
## NatureCodeUNCON 20.7339 42.9724 0.482 0.629456
## NatureCodeUNKNFO 66.8186 49.6103 1.347 0.178025
## NatureCodeUNKNWN 16.4417 43.0290 0.382 0.702383
## NatureCodeVFIRE 20.0373 44.0782 0.455 0.649409
## NatureCodeWATER -86.4962 67.9272 -1.273 0.202890
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 74.4 on 130053 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.1166, Adjusted R-squared: 0.1158
## F-statistic: 145.5 on 118 and 130053 DF, p-value: < 2.2e-16
From the model above we are able to identify how each variable holds significance in terms of their impact on response time. We are also able to see which components inside each given variable hold impact and how much using the P-Value produced from the model.
Then we used the stepwise regression for our first method of variable selection.
## Start: AIC=1122051
## response_time_seconds ~ TotalStaffOnIncident + FireDistrict +
## DayOfWeek + shift + Month + AlarmTime + NatureCode
##
## Df Sum of Sq RSS AIC
## <none> 719839931 1122051
## - DayOfWeek 6 281067 720120999 1122090
## - shift 2 237672 720077603 1122090
## - TotalStaffOnIncident 1 227388 720067319 1122090
## - Month 11 627583 720467514 1122142
## - NatureCode 76 21137791 740977722 1125666
## - AlarmTime 3 34467031 754306963 1128133
## - FireDistrict 19 37785762 757625693 1128673
##
## Call:
## lm(formula = response_time_seconds ~ TotalStaffOnIncident + FireDistrict +
## DayOfWeek + shift + Month + AlarmTime + NatureCode, data = gso.fire.filtered)
##
## Residuals:
## Min 1Q Median 3Q Max
## -340.43 -47.60 -6.53 40.29 383.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 408.9655 45.3873 9.011 < 2e-16 ***
## TotalStaffOnIncident 0.9030 0.1409 6.410 1.46e-10 ***
## FireDistrictBattalion I -147.1994 14.6499 -10.048 < 2e-16 ***
## FireDistrictBattalion II -168.3688 14.6493 -11.493 < 2e-16 ***
## FireDistrictBattalion III -136.0269 14.6590 -9.279 < 2e-16 ***
## FireDistrictBattalion IV -135.2603 14.6599 -9.227 < 2e-16 ***
## FireDistrictBattalion V -140.2406 14.6571 -9.568 < 2e-16 ***
## FireDistrictColfax 43.1908 36.3430 1.188 0.234670
## FireDistrictFire District 13 -75.2848 15.9148 -4.731 2.24e-06 ***
## FireDistrictFire District 14 -72.6721 14.7392 -4.931 8.21e-07 ***
## FireDistrictGuilford College -158.4621 15.6616 -10.118 < 2e-16 ***
## FireDistrictHigh Point Fire District -163.8343 42.3830 -3.866 0.000111 ***
## FireDistrictMcLeansville -116.1506 17.8435 -6.509 7.57e-11 ***
## FireDistrictNortheast Fire 197.9903 75.8371 2.611 0.009036 **
## FireDistrictOak Ridge Fire Dept -48.9577 33.7024 -1.453 0.146324
## FireDistrictOther Protection Agency 82.9478 75.8323 1.094 0.274031
## FireDistrictPinecroft Sedgefield -88.8137 17.3627 -5.115 3.14e-07 ***
## FireDistrictPleasant Garden -3.0358 22.8157 -0.133 0.894149
## FireDistrictPTI Airport -70.8423 75.8393 -0.934 0.350248
## FireDistrictSummerfield 27.2998 21.2993 1.282 0.199942
## FireDistrictWhitsett Fire District 62.2344 75.8355 0.821 0.411847
## DayOfWeekMonday -0.2482 0.7727 -0.321 0.748055
## DayOfWeekSaturday -2.3148 0.7737 -2.992 0.002774 **
## DayOfWeekSunday -4.0342 0.7723 -5.224 1.76e-07 ***
## DayOfWeekThursday -0.2812 0.7799 -0.361 0.718390
## DayOfWeekTuesday 0.1089 0.7763 0.140 0.888393
## DayOfWeekWednesday -0.1706 0.7817 -0.218 0.827270
## shiftB 2.5005 0.5063 4.939 7.86e-07 ***
## shiftC -0.6240 0.5058 -1.234 0.217346
## Month02 -5.6342 1.0059 -5.601 2.13e-08 ***
## Month03 -7.6323 0.9891 -7.717 1.20e-14 ***
## Month04 -6.3495 0.9981 -6.361 2.00e-10 ***
## Month05 -7.5836 0.9854 -7.696 1.41e-14 ***
## Month06 -7.5963 1.0216 -7.436 1.05e-13 ***
## Month07 -8.5329 0.9951 -8.575 < 2e-16 ***
## Month08 -7.2737 0.9915 -7.336 2.21e-13 ***
## Month09 -7.4497 0.9959 -7.481 7.44e-14 ***
## Month10 -5.7232 0.9860 -5.805 6.46e-09 ***
## Month11 -5.1842 0.9944 -5.214 1.86e-07 ***
## Month12 -5.2021 0.9749 -5.336 9.52e-08 ***
## AlarmTime12-17 -49.9845 0.6514 -76.738 < 2e-16 ***
## AlarmTime18-23 -40.1459 0.6528 -61.502 < 2e-16 ***
## AlarmTime6-11 -41.2591 0.6701 -61.572 < 2e-16 ***
## NatureCode50MAJ -7.7294 56.8312 -0.136 0.891816
## NatureCode50MCI1 27.0040 85.9155 0.314 0.753287
## NatureCode50PD 45.0858 44.2582 1.019 0.308347
## NatureCode50PI 11.7468 43.1613 0.272 0.785498
## NatureCode50PIN 6.9150 50.3770 0.137 0.890822
## NatureCode50VOT 43.8551 67.9249 0.646 0.518513
## NatureCodeABDOM 26.3874 42.9878 0.614 0.539326
## NatureCodeACHRPD 60.0472 46.2310 1.299 0.193998
## NatureCodeACHRPI 49.6325 46.2311 1.074 0.283015
## NatureCodeALLERG 23.6859 43.0192 0.551 0.581917
## NatureCodeANBITE 11.7936 43.4519 0.271 0.786071
## NatureCodeASSLT 22.2219 43.0910 0.516 0.606066
## NatureCodeBCKPN 34.2506 43.0908 0.795 0.426703
## NatureCodeBLEED 22.0207 42.9813 0.512 0.608419
## NatureCodeBREATH 20.7730 42.9697 0.483 0.628789
## NatureCodeBRUSH 48.3069 43.6321 1.107 0.268235
## NatureCodeBURN 24.7300 43.6683 0.566 0.571181
## NatureCodeCARDIA 11.0232 42.9896 0.256 0.797631
## NatureCodeCHEST 21.0846 42.9715 0.491 0.623664
## NatureCodeCHOKE 16.0340 43.0508 0.372 0.709562
## NatureCodeCOALRM 119.6694 43.0211 2.782 0.005409 **
## NatureCodeCODVIO 73.9665 67.9347 1.089 0.276250
## NatureCodeCONFIN 29.0330 43.6706 0.665 0.506169
## NatureCodeDIABET 16.4890 42.9818 0.384 0.701256
## NatureCodeDROWN 10.8063 60.7568 0.178 0.858832
## NatureCodeEHAZ 46.7907 43.0476 1.087 0.277060
## NatureCodeELEC -12.2493 45.5716 -0.269 0.788089
## NatureCodeELEV -48.5650 48.4698 -1.002 0.316363
## NatureCodeEXPLO 15.8532 44.0853 0.360 0.719145
## NatureCodeEXPOS 14.7874 44.0274 0.336 0.736971
## NatureCodeEXTRI -7.2951 46.2297 -0.158 0.874614
## NatureCodeEYE 28.6906 48.9859 0.586 0.558084
## NatureCodeFALL 31.8106 42.9782 0.740 0.459206
## NatureCodeFALLFO 75.4685 43.2777 1.744 0.081192 .
## NatureCodeFIRAL 38.2288 42.9653 0.890 0.373596
## NatureCodeFIRAST 46.0743 43.0613 1.070 0.284635
## NatureCodeFIRINS -189.2691 85.9246 -2.203 0.027615 *
## NatureCodeFIRINV -107.5676 43.9085 -2.450 0.014294 *
## NatureCodeFSERV 88.0140 85.9199 1.024 0.305661
## NatureCodeFUELSP 36.5523 43.6190 0.838 0.402038
## NatureCodeGAS 36.8301 43.0052 0.856 0.391773
## NatureCodeHAR 5.3311 50.3750 0.106 0.915719
## NatureCodeHAZ 41.3331 44.4666 0.930 0.352615
## NatureCodeHAZMAT 43.4820 43.9721 0.989 0.322736
## NatureCodeHEAD 22.4047 43.0229 0.521 0.602534
## NatureCodeHEART 21.1062 42.9918 0.491 0.623472
## NatureCodeINACC 26.2219 45.6769 0.574 0.565920
## NatureCodeINVES -120.4648 45.8054 -2.630 0.008541 **
## NatureCodeLIGHT 108.4958 45.6771 2.375 0.017537 *
## NatureCodeMED 16.8593 43.1383 0.391 0.695931
## NatureCodeMEDALM 57.5180 43.4639 1.323 0.185721
## NatureCodeMEDEM 14.2228 43.0287 0.331 0.740990
## NatureCodeMEDNE 33.8170 85.9202 0.394 0.693887
## NatureCodeMUTUAL -17.8019 70.7769 -0.252 0.801411
## NatureCodeNORESP -103.4574 67.9310 -1.523 0.127767
## NatureCodeODOR 16.5666 43.1954 0.384 0.701331
## NatureCodeOSFIR 29.2011 43.0088 0.679 0.497167
## NatureCodeOVDOSE 23.8240 42.9877 0.554 0.579439
## NatureCodePREG 20.6101 43.0197 0.479 0.631880
## NatureCodePSYCH 31.0813 43.1266 0.721 0.471096
## NatureCodeSEIZUR 18.4210 42.9777 0.429 0.668202
## NatureCodeSERV 29.4488 42.9892 0.685 0.493328
## NatureCodeSICK 21.5386 42.9721 0.501 0.616215
## NatureCodeSMOKE 31.7279 43.1991 0.734 0.462671
## NatureCodeSTAB 25.4310 43.0629 0.591 0.554820
## NatureCodeSTRINJ -2.1027 49.6544 -0.042 0.966223
## NatureCodeSTROKE 20.8259 42.9779 0.485 0.627981
## NatureCodeSTRTRP 4.8910 45.2669 0.108 0.913958
## NatureCodeSTRUC -2.0314 43.0133 -0.047 0.962332
## NatureCodeTRAUM 26.2160 43.1103 0.608 0.543112
## NatureCodeTSUI 14.0629 43.7729 0.321 0.748007
## NatureCodeUNCON 20.7339 42.9724 0.482 0.629456
## NatureCodeUNKNFO 66.8186 49.6103 1.347 0.178025
## NatureCodeUNKNWN 16.4417 43.0290 0.382 0.702383
## NatureCodeVFIRE 20.0373 44.0782 0.455 0.649409
## NatureCodeWATER -86.4962 67.9272 -1.273 0.202890
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 74.4 on 130053 degrees of freedom
## (4 observations deleted due to missingness)
## Multiple R-squared: 0.1166, Adjusted R-squared: 0.1158
## F-statistic: 145.5 on 118 and 130053 DF, p-value: < 2.2e-16
From this result we are able to see in further in detail which components affect response time and at what level due to the stepwise regression model eliminating the components that have little to no impact on response time.
After that we also tried LASSO variable selection using the glmnet function.
## Warning in storage.mode(xd) <- "double": NAs introduced by coercion
## Length Class Mode
## a0 79 -none- numeric
## beta 5214 dgCMatrix S4
## df 79 -none- numeric
## dim 2 -none- numeric
## lambda 79 -none- numeric
## dev.ratio 79 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## call 4 -none- call
## nobs 1 -none- numeric
This result allows us to view the level of significance in terms of each coefficient due to the shrinkage that this model produces.
As an alternative to our linear regression models we ran a tree based random forest model. For this model we produced a bar graph to display the significance in each variable from a model we created using the Ranger() command. In this graph we are able to visually identify how much the models accuracy would be affected by each variable.
After producing the bar plot we are able to see that Nature Code holds the most importance in relation to the model followed by Fire District and Alarm Time.
Our extensive research has provided us with the ability to provide sufficient results for each of our research questions. Throughout this report, we have worked on the public data set provided by the Greensboro Fire Department which holds all reported fire incidents in Greensboro, NC from July 1, 2010, to May 31, 2022. Firstly, to attack our research questions on the topics of analyzing response time and forecasting future fire incidents we explored our data set to a great extent making multiple univariate and bivariate visualizations. We then followed this by using a variety of different forecasting and regression models to produce high-accuracy results.
While identifying what factors affect fire service response time and how to improve fire service response time, we first visualized the relationship between response time and every other tested variable. Through this, we were able to decide which variables to apply to a linear regression model to produce the level of significance that each variable holds with response time. In our initial linear regression model, we tested the variables; TotalStaffonIncident, FireDistrict, DayofWeek, shift, Month, AlarmTime, and NatureCode. We then used two different variable selection methods (Stepwise and LASSO) and analyzed the results between the three models. Following the results from our linear regression models, we ran a random forest model to display the significance that each given variable obtains. With our results from this process, we were able to conclude that NatureCode followed by FireDistrict and AlarmTime held the most significance in relation to response time.
In regards to the effect response time has on the total loss we were able to quickly identify this through our bivariate visualizations. In this process, we compared the variables PropertyLoss, TotalLosses, CivilianInjuries, CivilianFatalities, FireServiceInjuries, and FireServiceFatalities to response time and produced scatter plots and a combination of box and violin plots to visualize the relationship between the two. After making our property and total loss visualizations we were able to see a very slight relationship between them and response time. Throughout the different distributions of response time, both variables seemed to linger around the middle betwen 100 and 400 seconds. Contrary to belief, it’s safe to conclude that most of the time there is not much effect response time has on total and property loss. Given the situation, there are definitely occasions where total and property loss are affected massively depending on the speed at which the fire service responds to the call. In regards to Civilian and Fire Service injuries and fatalities, there were very few situations where either occurred across the data set. From our visualizations, we can see that across the twelve years there have only been ten total civilian fatalities, two occasions where more than two civilians were reported injured, six occasions where more than one fire service injury occurred, and absolutely no fatalities within the fire department ever. Through our results, we were able to conclude not much significance in the relationship between fire service response time and total loss.
Furthermore, used many models to forecast the daily and monthly frequencies of fire incidents in Greensboro, NC. The models that we used were TBATS, ARIMA, Holt-Winters, and Combination. All the models tended to do fairly well except when there was a drastic drop in the frequency of fire incidents in 2020. But in general, the models followed the trend of the plots nicely. The reason for that drop could have been covid and when people stayed home a lot more. From our research, we found that the models did fairly well, and sometimes some models did better than others.
In this report we worked intensively in order to analyze the key factors for reported fire incidents in Greensboro, NC using all reported data from June 1, 2010 to May 31, 2022. When further analyzing the data set given we were able to identify through our exploratory data analysis as well as regression and forecasting models how each variable holds significance in relations to fire service response time and successfully forecast the future in regards to reported fire incidents in Greensboro, NC. We also were able to further visualize the effects of response time on total loss.
Although our extensive data set equips us with a variety of different variables over a long time period, it is not perfect due to missing values within the data set and human error from the importation of the reported data. With us having to cater a bit to an imperfect data set we were still able to produce highly accurate results with the amount of information given. With this further exploration we were then able to forecast an accurate prediction in terms of reported fire incidents for the rest of the year.
With our research results there are now many safe assumptions that can be made behind what holds affect within the different components that we studied in reported fire incidents. Through our exploratory data analysis we can see the affects of all given variables in relation to response time. Our regression models accurately depict to what extent the importance of each given variable has on the speed of response time. And lastly, our forecasting model equips the Greensboro Fire Department with an accurate prediction to continue through the rest of the year under maximum efficiency.
Looking through our findings there are many different observations that can be made in order to minimize response time thus leading towards safer situations. We were able to identify the difference in response time on any given day, week, year and even looked at how different response time varies in terms of the working shift for the fire service workers. Our findings also highlighted how much response time effects losses in terms of civilian injuries and fatalities, fire service injuries and fatalities, and total property loss. All of our discoveries from this research help equip the Greensboro Fire Department with high level information in order to perform they’re job to maximum efficiency as well as minimizing response time in order to assure the safest outcome possible.
1 Aleisa, E., & Savsar, M. (2014). Response Time Analysis of firefighting operations using discrete event simulation. IEEE Xplore. Retrieved July 26, 2022, from https://ieeexplore.ieee.org/abstract/document/7046087
2 Buffington, T., & Ezekoye, O. A. (2019). Statistical Analysis of Fire Department Response Times … - researchgate. Retrieved July 26, 2022, from https://www.researchgate.net/publication/333158843_Statistical_Analysis_of_Fire_Department_Response_Times_and_Effects_on_Fire_Outcomes_in_the_United_States
3 Challands, N. (n.d.). The relationships between fire service response time and … - researchgate. Retrieved July 26, 2022, from https://www.researchgate.net/publication/226273683_The_Relationships_Between_Fire_Service_Response_Time_and_Fire_Outcomes
4 KC, K. and Corcoran, J., 2017. Modelling residential fire incident response times: A spatial analytic approach. Applied Geography, 84, pp.64-74. https://doi.org/10.1016/j.apgeog.2017.03.004
5 Lian, X., Melancon, S., Presta, J.-R., Reevesman, A., Spiering, B. J., & Woodbridge, D. M.-kyung. (2019). Scalable real-time prediction and analysis of San Francisco Fire Department Response Times. IEEE Xplore. Retrieved July 26, 2022, from https://ieeexplore.ieee.org/document/9060320
6 Mukhopadhyay, A., Vorobeychik, Y., Dubey, A., & Biswas, G. (2017, May 1). Prioritized allocation of emergency responders based on a continuous-time incident prediction model: Proceedings of the 16th conference on autonomous agents and multiagent systems. ACM Other conferences. Retrieved July 26, 2022, from https://dl.acm.org/doi/10.5555/3091125.3091154
7 Yuanwei Li, Sikui Zhang, and Guoyi Fu. 2022. Forest fire modeling and analysis based on K-means clustering algorithm and time series forecasting. In 2022 2nd International Conference on Bioinformatics and Intelligent Computing (BIC 2022), January 21–23, 2022, Harbin, China. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3523286.3524560
The authors acknowledge financial support by NSF Grant HRD#1719498 and NSA Grant #H98230-22-1-0017.
This section contains the code to reproduce the analysis described in this report.
Below is a list of the libraries that we used throughout the report.
library(readr)
library(lubridate)
library(ggplot2)
library(dslabs)
library(dplyr)
library(ggrepel)
library(ggthemes)
library(broom)
library(glmnet)
library(MASS)
library(tidyverse)
library(Amelia)
library(mice)
library(caret)
library(forecast)
library(urca)
library(randomForest)
library(ranger)
library(astsa)
gso.fire <- read_csv("Greensboro_Fire_Incidents.csv")
gso.fire =
gso.fire %>%
mutate(AlarmDate2 = make_date(Year, Month, Day)) %>%
filter(AlarmDate2 < '2022-06-01')
gso.fire = gso.fire %>%
filter(PropertyUse == '419 - 1 or 2 family dwelling')
gso.fire = gso.fire %>%
mutate(call_process_period = lubridate::hms(CallProcessingTime),
call_process_seconds = period_to_seconds(call_process_period))
gso.fire = gso.fire %>%
mutate(response_time_period = lubridate::hms(ResponseTime),
response_time_seconds = period_to_seconds(response_time_period))
gso.fire = gso.fire %>%
mutate(total_response_period = lubridate::hms(TotalResponseTime),
total_response_seconds = period_to_seconds(total_response_period))
cutoff.response_time = gso.fire %>%
dplyr::select(response_time_seconds) %>%
filter(!is.na(response_time_seconds)) %>%
summarize(c = quantile(response_time_seconds, .75) + 3 * IQR(response_time_seconds)) %>%
pull(c)
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = response_time_seconds, fill = "red")) +
geom_histogram(show.legend = FALSE, color = "black") +
xlab("Response Time (Seconds)") +
ylab("Count") +
ggtitle("Distribution of Response Time") +
theme_economist()
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = TotalStaffOnIncident)) +
geom_histogram(bins = 25,show.legend = FALSE, color = "black", fill = "pink") +
scale_y_log10() +
xlab("Total Staff on Incident") +
ylab("Count (Log10 scale)") +
ggtitle("Distribution of Total Staff on Incident") +
theme_economist() +
xlim(0,25)
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
count(CivilianInjuries) %>%
mutate(nlog = log10(n)) %>%
ggplot(aes(x = CivilianInjuries, y = nlog)) +
geom_col(color = "black", fill = "pink") +
xlab("Civilian Injuries") +
ylab("Count (Log10 scale)") +
ggtitle("Distribution of Civilian Injuries") +
theme_economist()
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
count(CivilianFatalities) %>%
mutate(nlog = log10(n)) %>%
ggplot(aes(x = CivilianFatalities, y = nlog)) +
geom_col(color = "black", fill = "pink") +
xlab("Civilian Fatalities") +
ylab("Count (Log10 scale)") +
ggtitle("Distribution of Civilian Fatalities") +
theme_economist()
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
count(FireServiceInjuries) %>%
mutate(nlog = log10(n)) %>%
ggplot(aes(x = FireServiceInjuries, y = nlog)) +
geom_col(color = "black", fill = "pink") +
xlab("Fire Service Injuries") +
ylab("Count (Log10 scale)") +
ggtitle("Distribution of Fire Service Injuries") +
theme_economist()
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = PropertyLoss)) +
geom_histogram(bins = 20,show.legend = FALSE, color = "black", fill = "pink") +
scale_x_log10() +
xlab("Property Loss (Log10 scale)") +
ylab("Count") +
ggtitle("Distribution of Property Loss") +
theme_economist()
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = TotalLosses)) +
geom_histogram(bins = 20,show.legend = FALSE, color = "black", fill = "pink") +
scale_x_log10() +
xlab("Total Loss (Log10 scale)") +
ylab("Count") +
ggtitle("Distribution of Total Loss") +
theme_economist()
gso.fire %>%
ggplot(aes(x = AlarmHour)) +
geom_bar(color = "black", fill = "pink") +
scale_x_continuous(breaks = seq(0,23,1)) +
theme(axis.text.x = element_text(angle = 45)) +
xlab("Alarm Hour") +
ylab("Count") +
ggtitle("Frequency of Fire Alarms by Alarm Hour")
gso.fire %>%
count(station) %>%
mutate(station = reorder(station, n)) %>%
ggplot(aes(x = station, y = n)) +
geom_bar(stat = "identity", color = "black", fill = "pink") +
theme(axis.text.y = element_text(size = 6)) +
xlab("Fire Station (Station Number)") +
ylab("Count") +
ggtitle("Frequency of Calls by Fire Station") +
coord_flip()
gso.fire %>%
count(FireDistrict) %>%
mutate(FireDistrict = reorder(FireDistrict, n)) %>%
ggplot(aes(x = FireDistrict, y = n)) +
geom_bar(stat = "identity", color = "black", fill = "pink") +
theme(axis.text.y = element_text(size = 6)) +
xlab("Fire Station (Station Number)") +
ylab("Count") +
ggtitle("Frequency of Calls by Fire District") +
coord_flip()
gso.fire %>%
count(Year) %>%
ggplot(aes(x = Year, y = n)) +
geom_bar(stat = "identity", fill = "pink", color = "black") +
scale_x_continuous(breaks = seq(2010,2022,1)) +
xlab("Year") +
ylab("Count") +
ggtitle("Frequency of Calls by Year")
gso.fire %>%
ggplot(aes(x = shift)) +
geom_bar(show.legend = FALSE, fill = "pink", color = "black") +
xlab("Shift") +
ylab("Count") +
ggtitle("Frequency of Calls by Shift")
gso.fire %>%
ggplot(aes(x = DayOfWeek)) +
geom_bar(fill = "pink", color = "black") +
xlab("Days of the Week") +
ylab("Count") +
ggtitle("Frequency of Calls by Days of the Week")
gso.fire %>%
ggplot(aes(x = Week)) +
geom_bar(fill = "pink", color = "black") +
scale_x_continuous(breaks = seq(1,53,1)) +
theme(axis.text.x = element_text(angle = 90)) +
xlab("Week") +
ylab("Count") +
ggtitle("Frequency of Calls by Week")
gso.fire %>%
ggplot(aes(x = Month)) +
geom_bar(fill = "pink", color = "black") +
xlab("Month") +
ylab("Count") +
ggtitle("Frequency of Calls by Month")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = Week, y = response_time_seconds, group = Week)) +
geom_boxplot() +
scale_x_continuous(breaks = seq(1,53,1)) +
theme(axis.text.x = element_text(angle = 90)) +
xlab("Week") +
ylab("Response Time (Seconds)") +
ggtitle("Week vs Response Time")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = station, y = response_time_seconds)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90)) +
xlab("Station") +
ylab("Response Time (Seconds)") +
ggtitle("Station vs Response Time")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = AlarmHour, y = response_time_seconds, group = AlarmHour)) +
geom_boxplot() +
scale_x_continuous(breaks = seq(0,23,1)) +
xlab("Alarm Hour") +
ylab("Response Time (Seconds)") +
ggtitle("Alarm Hour vs Response Time")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = NumberOfAlarms, y = response_time_seconds, group = NumberOfAlarms)) +
geom_boxplot(show.legend = FALSE) +
xlab("Number of Alarms") +
ylab("Response Time (Seconds)") +
ggtitle("Number of Alarms vs Response Time")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = NatureCode, y = response_time_seconds)) +
theme(axis.text.x = element_text(angle = 90)) +
geom_boxplot() +
xlab("Nature Code") +
ylab("Response Time (Seconds)") +
ggtitle("Nature Code vs Response Time")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(y = TotalLosses, x = as.factor(AlarmHour))) +
geom_boxplot(show.legend = FALSE) +
scale_y_log10() +
ylab("Total Loss (Log10 scale)") +
xlab("Alarm Hour") +
ggtitle("Total Loss by Alarm Hour")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(y = TotalLosses, x = shift)) +
geom_boxplot(show.legend = FALSE) +
scale_y_log10() +
ylab("Total Loss (Log10 scale)") +
xlab("Shift") +
ggtitle("Total Loss by Shift")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(y = PropertyLoss, x = as.factor(AlarmHour))) +
geom_boxplot(show.legend = FALSE) +
scale_y_log10() +
ylab("Property Loss (Log10 scale)") +
xlab("Alarm Hour") +
ggtitle("Property Loss by Alarm Hour")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(y = PropertyLoss, x = shift)) +
geom_boxplot(show.legend = FALSE) +
scale_y_log10() +
ylab("Property Loss (Log10 scale)") +
xlab("Shift") +
ggtitle("Property Loss by Shift") +
theme_economist()
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = response_time_seconds, y = CivilianInjuries)) +
geom_point(stat = "identity", show.legend = FALSE) +
ylab("Civilian Injuries") +
xlab("Response Time (Seconds)") +
ggtitle("Civilian Injuries vs Response Time") +
theme_economist()
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = TotalStaffOnIncident, y = response_time_seconds)) +
geom_point(stat = "identity", show.legend = FALSE) +
xlab("Total Staff On Incident") +
ylab("Response Time (Seconds)") +
ggtitle("Total Staff on Incident vs Response Time") +
theme_economist()
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(y = PropertyLoss, x = response_time_seconds)) +
geom_point(stat = "identity", show.legend = FALSE) +
scale_y_log10() +
ylab("Property Loss (Log10 scale)") +
xlab("Response Time (Seconds)") +
ggtitle("Property Loss vs Response Time") +
theme_economist()
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(y = TotalLosses, x = response_time_seconds)) +
geom_point(stat = "identity", show.legend = FALSE) +
scale_y_log10() +
ylab("Total Loss (Log10 scale)") +
xlab("Response Time (Seconds)") +
ggtitle("Total Loss vs Response Time") +
theme_economist()
gso.fire %>%
filter(!is.na(response_time_seconds)) %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = DayOfWeek, y = response_time_seconds)) +
geom_violin(trim = FALSE, fill = "light blue") +
geom_boxplot(width = 0.3, fill = "pink") +
xlab("Days of the Week") +
ylab("Response Time (Seconds)") +
ggtitle("Violin & Box Plot of Response Time Densities by Days of the Week") +
theme(axis.text.x = element_text(angle = 90), legend.position = "hide")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = Year, y = response_time_seconds, group = Year)) +
geom_violin(trim = FALSE, fill = "light blue") +
geom_boxplot(show.legend = FALSE, width = .3, fill = "pink") +
scale_x_continuous(breaks = seq(2010,2022,1)) +
xlab("Year") +
ylab("Response Time (Seconds)") +
ggtitle("Violin & Box Plot of Response Time Densities by Year")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = shift, y = response_time_seconds)) +
geom_violin(show.legend = FALSE, fill = "light blue") +
geom_boxplot(width = .5, show.legend = FALSE, fill = "pink") +
xlab("Work shift") +
ylab("Response Time (Seconds)") +
ggtitle("Violin and Box Plot of Response Time Densities by Work Shift")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = FireDistrict, y = response_time_seconds)) +
geom_violin(fill = "light blue") +
geom_boxplot(show.legend = FALSE, width = .2, fill = "pink") +
theme(axis.text.x = element_text(angle = 90), legend.position = "hide") +
xlab("Fire District") +
ylab("Response Time (Seconds)") +
ggtitle("Violin and Box Plot of Response Time Densities by Fire District")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = Month, y = response_time_seconds)) +
geom_violin(show.legend = FALSE, fill = "light blue") +
geom_boxplot(show.legend = FALSE, width = .3, fill = "pink") +
xlab("Month") +
ylab("Response Time (Seconds)") +
ggtitle("Violin and Box Plot of Response Time Densities by Month")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = AlarmHour, y = response_time_seconds, group = AlarmHour)) +
geom_violin(fill = "light blue") +
geom_boxplot(width = .3, fill = "pink") +
scale_x_continuous(breaks = seq(0,23,1)) +
xlab("Alarm Hour") +
ylab("Response Time (Seconds)") +
ggtitle("Violin and Box Plot of Response Time Densities by Alarm Hour")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = NumberOfAlarms, y = response_time_seconds, group = NumberOfAlarms)) +
geom_violin(fill = "light blue") +
geom_boxplot(width = .3, fill = "pink") +
xlab("Number of Alarms") +
ylab("Response Time (Seconds)") +
ggtitle("Violin and Box Plot of Response Time Densities by Number of Alarms")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = CivilianInjuries, y = response_time_seconds, group = CivilianInjuries)) +
geom_violin(show.legend = FALSE, fill = "light blue") +
geom_boxplot(width = .5, show.legend = FALSE, fill = "pink") +
xlab("Civilian Injuries") +
ylab("Response Time (Seconds)") +
ggtitle("Violin and Box Plot of Response Time Densities by Civilian Injuries")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = CivilianFatalities, y = response_time_seconds, group = CivilianFatalities)) +
geom_violin(show.legend = FALSE, fill = "light blue") +
geom_boxplot(width = .5, show.legend = FALSE, fill = "pink") +
xlab("Civilian Fatalities") +
ylab("Response Time (Seconds)") +
ggtitle("Violin and Box Plot of Response Time Densities by Civilian Fatalities")
gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
ggplot(aes(x = FireServiceInjuries, y = response_time_seconds, group = FireServiceInjuries)) +
geom_violin(show.legend = FALSE, fill = "light blue") +
geom_boxplot(width = .5, show.legend = FALSE, fill = "pink") +
xlab("Fire Service Injuries") +
ylab("Response Time (Seconds)") +
ggtitle("Violin and Box Plot of Response Time Densities by Fire Service Injuries")
Visualization of fire incidents in Greensboro, NC from July 1, 2010 to May 31, 2022.
gso.fire.ts = gso.fire %>%
count(AlarmDate2)
gso.fire.ts %>%
ggplot(aes(x = AlarmDate2, y = n)) +
geom_line() +
scale_x_date(date_labels = "%m-%Y", date_breaks = "6 month") +
xlab("Year") +
ylab("Number of Fire Incidents") +
ggtitle("Time series of daily number of fire incidents") +
theme(axis.text.x = element_text(angle = 90))
incidents_annual = gso.fire.ts %>%
separate(AlarmDate2, sep="-", into = c("Year", "Month", "Day"))
incidents_annual = cbind(incidents_annual, gso.fire.ts$AlarmDate2)
colnames(incidents_annual) = c("Year","Month","Day","n","AlarmDate2")
incidents_annual %>%
ggplot(aes(x = AlarmDate2, y = n)) +
ggtitle("Time Series Plots for Frequency of Daily Fire Incidents Organized by Year") +
geom_line() +
scale_x_date(date_labels = "%m-%Y", date_breaks = "1 month") +
xlab("Date") + ylab("Count") +
theme(axis.text.x = element_text(angle = 90, size = 14)) +
theme(axis.text.y = element_text(size = 14)) +
theme(plot.title = element_text(size=22)) +
theme(axis.title=element_text(size=18)) +
facet_wrap(~Year, scales = "free_x")
Monthly time series visualizations
gso.fire.ts.month = gso.fire %>%
mutate(Monthly = make_date(Year, Month)) %>%
group_by(Monthly, Month, Year) %>%
count()
gso.fire.ts.month %>%
ggplot(aes(x = Monthly, y = n)) +
geom_line() +
scale_x_date(date_labels = "%m-%y", date_breaks = "6 month") +
xlab("Month") +
ylab("Number of Fire Incidents") +
ggtitle("Time series of monthly number of fire incidents for 2010-2022") +
theme(axis.text.x = element_text(angle = 90))
#forecasting for daily number of fire incidents
gso.fire.ts.2 = ts(gso.fire.ts$n, start = c(2010, 182), end = c(2022, 153),
frequency = 365)
ndiffs(gso.fire.ts.2)
ndiffs(gso.fire.ts.2)
acf(gso.fire.ts.2)
summary(ur.kpss(gso.fire.ts.2))
#attempt at differencing
gso.fire.ts.2.dif = diff(gso.fire.ts.2)
acf(gso.fire.ts.2.dif)
cbind("Fire Incidents" = gso.fire.ts.2,
"First differenced" = diff(gso.fire.ts.2)) %>%
autoplot(facets=TRUE) +
xlab("Date") + ylab("") +
ggtitle("Stationarity Transformations of Daily Fire Incidents")
y.dif = msts(gso.fire.ts.2.dif, seasonal.periods=c(7,365.25))
daily.dif.tbats = tbats(y.dif)
daily.dif.tbats.fc = forecast::forecast(daily.dif.tbats, h = 214)
#point forecast values
dif.Yhat = daily.dif.tbats.fc$mean
dif.Yhat.upper = daily.dif.tbats.fc$upper
dif.Yhat.lower = daily.dif.tbats.fc$lower
Yhat = cumsum(c(gso.fire.ts.2[length(gso.fire.ts.2)],dif.Yhat))
Yhat = ts(Yhat, start = c(2022, 152), frequency=365)
Yhat_upper = cumsum(c(gso.fire.ts.2[length(gso.fire.ts.2)],dif.Yhat.upper))
Yhat_upper = ts(Yhat_upper, start = c(2022, 152), end = c(2023,1), frequency=365)
Yhat_lower = cumsum(c(gso.fire.ts.2[length(gso.fire.ts.2)],dif.Yhat.lower))
Yhat_lower = ts(Yhat_lower, start = c(2022, 152), end = c(2023,1), frequency=365)
fire.ts = ts(gso.fire.ts.2[3992:4352],start = c(2021, 25), end = c(2022, 153),
frequency = 360) # A 360 day trim to see forecast better
#TBAT forecasting
autoplot(fire.ts) +
autolayer(Yhat, series="Point Forecasts") +
ggtitle("TBATS Forecasting Model for Daily Number of Fire Incidents") +
xlab("Date") + ylab("Fire Incidents") +
theme(title = element_text(size = 10), legend.position = "bottom")
accuracy(daily.dif.tbats.3)
#ARIMA forecasting
daily.arima = auto.arima(gso.fire.ts.2)
daily.arima.fc = forecast(daily.arima, h=214)
autoplot(daily.arima.fc) +
ggtitle("ARIMA Forecasting Model for Daily Number of Fire Incidents") +
xlab("Date") +
coord_cartesian(xlim = c(2020, 2023)) +
ylab("Fire Incidents") +
theme(title = element_text(size = 10))
accuracy(daily.arima.3)
gso.fire.ts.3 = ts(gso.fire.ts$n, start = c(2010, 182), end = c(2022, 59),
frequency = 365)
#attempt at differencing
gso.fire.ts.3.dif = diff(gso.fire.ts.3)
y.dif.3 = msts(gso.fire.ts.3.dif, seasonal.periods=c(7,365.25))
daily.dif.tbats.3 = tbats(y.dif.3)
daily.dif.tbats.fc.3 = forecast::forecast(daily.dif.tbats.3, h=671)
dif.Yhat.3 = daily.dif.tbats.fc.3$mean #point forecast values
Yhat.3 = cumsum(c(gso.fire.ts.3[length(gso.fire.ts.3)],dif.Yhat.3))
Yhat.3 = ts(Yhat.3, start = c(2022, 60), frequency=365)
gso.fire.ts.true = ts(gso.fire.ts$n, start = c(2010, 182), end = c(2022, 153),
frequency = 365)
#TBAT forecasting
autoplot(gso.fire.ts.3) +
autolayer(gso.fire.ts.true, alpha=0.7, series="True Count") +
autolayer(Yhat.3, alpha=0.6, series="Point Forecasts") +
ggtitle("TBATS Forecasting Model for Daily Number of Fire Incidents") +
xlab("Date") +
ylab("Fire Incidents") +
theme(title = element_text(size = 10), legend.position = "bottom")
accuracy(daily.dif.tbats.3)
#ARIMA forecasting
daily.arima.3 = auto.arima(gso.fire.ts.3)
daily.arima.fc.3 = forecast(daily.arima.3, h=671)
autoplot(daily.arima.fc.3) +
ggtitle("ARIMA Forecasting Model for Daily Number of Fire Incidents") +
xlab("Date") +
coord_cartesian(xlim = c(2020, 2023)) +
ylab("Fire Incidents") +
theme(title = element_text(size = 10))
accuracy(daily.arima.3)
Forecasting for monthly number of fire incidents.
#convert gso.fire.ts.month to official ts object
gso.fire.ts.month.2 = ts(gso.fire.ts.month$n, start = c(2010, 07, 01), end = c(2022, 05, 01),
frequency = 12)
#looking at stationarity
acf(gso.fire.ts.month.2)
summary(ur.kpss(gso.fire.ts.month.2))
ndiffs(gso.fire.ts.month.2) #1
nsdiffs(gso.fire.ts.month.2) #0
#attempt at differencing
gso.fire.ts.month.2.dif = diff(gso.fire.ts.month.2)
#looking at stationarity for difference
acf(gso.fire.ts.month.2.dif)
summary(ur.kpss(gso.fire.ts.month.2.dif))
ndiffs(gso.fire.ts.month.2.dif) #0
nsdiffs(gso.fire.ts.month.2.dif) #0
cbind("Fire Incidents" = gso.fire.ts.month.2,
"First differenced" = diff(gso.fire.ts.month.2)) %>%
autoplot(facets=TRUE) +
xlab("Date") + ylab("") +
ggtitle("Stationarity Transformation of Monthly Fire Incidents")
y.dif.month = msts(gso.fire.ts.month.2.dif, seasonal.periods=12)
dif.tbats.month = tbats(y.dif.month)
dif.tbats.month.fc = forecast::forecast(dif.tbats.month, h = 8)
#point forecast values
dif.Yhat.month = dif.tbats.month.fc$mean
Yhat.month = cumsum(c(gso.fire.ts.month.2[length(gso.fire.ts.month.2)],dif.Yhat.month))
Yhat.month = ts(Yhat.month, start = c(2022, 05), frequency=12)
#TBATS forecasting
autoplot(gso.fire.ts.month.2) +
autolayer(Yhat.month, series="Point Forecasts") +
ggtitle("TBATS Forecasting Model for Monthly Frequency of Fire Incidents") +
xlab("Date") +
ylab("Frequency of Fire Incidents") +
theme(axis.text.x = element_text(angle = 90), title = element_text(size=10),
legend.position = "bottom")
#checking accuracy
accuracy(dif.tbats.month)
#ARIMA forecasting
monthly.arima = auto.arima(gso.fire.ts.month.2)
monthly.arima.fc = forecast(monthly.arima, h=8)
autoplot(monthly.arima.fc) +
ggtitle("ARIMA Forecasting Model for Monthly Frequency of Fire Incidents") +
xlab("Date") +
coord_cartesian(xlim = c(2020, 2023)) +
ylab("Frequency of Fire Incidents") +
theme(title = element_text(size = 10))
#checking accuracy
accuracy(monthly.arima)
#HoltWinters forecasting
dif.monthly.holt = HoltWinters(gso.fire.ts.month.2.dif)
dif.fcast.monthly = forecast::forecast(dif.monthly.holt, h=8, level=c(80,95))
dYhat.Holt = dif.fcast.monthly$mean
Yhat.Holt = cumsum(c(gso.fire.ts.month.2[length(gso.fire.ts.month.2)],dYhat.Holt))
Yhat.Holt = ts(Yhat.Holt, start = c(2022, 05), frequency=12)
autoplot(gso.fire.ts.month.2) +
autolayer(Yhat.Holt, series = "Point Forecasts") +
ggtitle("Forecasts for Monthly Frequency of Fire Incidents using HoltWinters") +
xlab("Date") +
ylab("Frequency of Fire Incidents") +
theme(axis.text.x = element_text(angle = 90), title = element_text(size=9),
legend.position = "bottom")
#checking accuracy
accuracy(dif.fcast.monthly)
Multi-step forecasting for daily data
#getting differenced training and test sets
y.dif.2 = msts(gso.fire.ts.2.dif, seasonal.periods=c(7,365.25))
dif.training.tbats.2 = subset(y.dif.2, end=length(y.dif.2)-151)
dif.test.tbats.2 = subset(y.dif.2, start=length(y.dif.2)-150)
dif.daily.train.tbats.2 = tbats(dif.training.tbats.2)
dif.train.fc = forecast(dif.daily.train.tbats.2, h=151)
#getting point forecasts
dYhat.tbats = dif.train.fc$mean
#setting up normal ts training and test sets
y.2 = msts(gso.fire.ts.2, seasonal.periods=c(7,365.25))
training.tbats.2 = subset(y.2, end=length(y.2)-151)
test.tbats.2 = subset(y.2, start=length(y.2)-150)
#reverting back to original points
Yhat.tbats = cumsum(c(training.tbats.2[length(training.tbats.2)], dYhat.tbats))
Yhat.tbats = ts(Yhat.tbats, start = c(2022, 1), frequency = 365)
autoplot(training.tbats.2) +
autolayer(Yhat.tbats, series="Point Forecasts") +
autolayer(test.tbats.2, series="Test Set", alpha = 0.7) +
ggtitle("Multi-Step TBATS Daily Forecasts of Frequency of Fire Incidents") +
xlab("Date") +
ylab("Frequency of Fire Incidents") +
coord_cartesian(xlim = c(2020, 2023)) +
theme(legend.position = "bottom")
dif.daily.test.tbats.2 = tbats(dif.test.tbats.2)
accuracy(dif.daily.test.tbats.2)
#multi-step ARIMA forecast for daily data
training.arima = subset(gso.fire.ts.2, end=length(gso.fire.ts.2)-151)
test.arima = subset(gso.fire.ts.2, start=length(gso.fire.ts.2)-150)
daily.train.arima = auto.arima(training.arima)
fc.train.arima = forecast(daily.train.arima, h=151)
autoplot(fc.train.arima) +
autolayer(test.arima, series="Test Set", alpha = 0.7) +
ggtitle("Multi-Step ARIMA Daily Forecasts of Frequency of Fire Incidents") +
xlab("Date") +
ylab("Frequency of Fire Incidents") +
coord_cartesian(xlim = c(2020, 2023)) +
theme(legend.position = "bottom")
daily.test.arima = arima(test.arima)
accuracy(daily.test.arima)
#time series combination
train = window(gso.fire.ts.2, end=c(2019, 12))
h = length(gso.fire.ts.2) - length(train)
#ARIMA model
ARIMA = forecast::forecast(auto.arima(train, lambda=0, biasadj=TRUE), h=h)
#TBATS model
train.tbats = msts(gso.fire.ts.2.dif, end=c(2019,12), seasonal.periods = c(7,365.25))
TBATS = forecast::forecast(tbats(train.tbats, biasadj = TRUE), h=h)
dYhat.combo = TBATS$mean
Yhat.combo = cumsum(c(train[length(train)], dYhat.combo))
Yhat.combo = ts(Yhat.combo, start = c(2019, 12), frequency = 365)
Combination = (ARIMA[["mean"]] + Yhat.combo)/2
true.daily = gso.fire.ts %>%
filter(AlarmDate2 >= as.Date("2019/12/01") & AlarmDate2 <= as.Date("2021/05/31"))
true.daily.ts = ts(true.daily$n,
start = c(2019,12), end = c(2021,153),
frequency = 365)
autoplot(train) +
autolayer(true.daily.ts) +
autolayer(Combination, series="Combination", alpha=0.8) +
autolayer(ARIMA, series="ARIMA", PI=F) +
autolayer(Yhat.combo, series="TBATS", alpha=0.7) +
xlab("Date") +
ylab("Frequency of Fire Incidents") +
ggtitle("Time Series Combination for Daily Frequency of Fire Incidents") +
theme(legend.position = "bottom")
c(ARIMA = accuracy(ARIMA, gso.fire.ts.2)["Test set", "RMSE"],
TBATS = accuracy(Yhat.combo, gso.fire.ts.2)["Test set", "RMSE"],
Combination =
accuracy(Combination, gso.fire.ts.2)["Test set", "RMSE"])
Multi-step forecasting for monthly data
#multi-step TBATS
y2.dif.monthly = msts(gso.fire.ts.month.2.dif, seasonal.periods=12)
dif.training.tbats.2.monthly = subset(y2.dif.monthly, end=length(y2.dif.monthly)-5)
dif.test.tbats.2.monthly = subset(y2.dif.monthly, start=length(y2.dif.monthly)-4)
dif.train.tbats.2.monthly = tbats(dif.training.tbats.2.monthly)
dif.fc.train.tbats.2.monthly = forecast(dif.train.tbats.2.monthly, h=5)
dYhat.ms.tbats.monthly = dif.fc.train.tbats.2.monthly$mean
y2.monthly = msts(gso.fire.ts.month.2, seasonal.periods=12)
training.tbats.monthly.2 = subset(y2.monthly, end=length(y2.monthly)-5)
test.tbats.monthly.2 = subset(y2.monthly, start=length(y2.monthly)-4)
Yhat.ms.tbats.monthly = cumsum(c(training.tbats.monthly.2[length(training.tbats.monthly.2)],
dYhat.ms.tbats.monthly))
Yhat.ms.tbats.monthly = ts(Yhat.ms.tbats.monthly, start = c(2022,01), frequency=12)
autoplot(training.tbats.monthly.2) +
autolayer(Yhat.ms.tbats.monthly, series="Point Forecasts") +
autolayer(test.tbats.monthly.2, series="Test Set", alpha = 0.7) +
ggtitle("Multi-Step TBATS Monthly Forecasts of Frequency of Fire Incidents") +
xlab("Date") +
ylab("Frequency of Fire Incidents") +
theme(legend.position = "bottom", title=element_text(size=10)) +
theme(axis.text.x = element_text(angle = 90))
dif.test.tbats.2.monthly = tbats(dif.test.tbats.2.monthly)
accuracy(dif.test.tbats.2.monthly)
#multi-step ARIMA
training.arima.monthly = subset(gso.fire.ts.month.2, end=length(gso.fire.ts.month.2)-5)
test.arima.monthly = subset(gso.fire.ts.month.2, start=length(gso.fire.ts.month.2)-4)
train.arima.monthly.3 = auto.arima(training.arima.monthly)
fc.train.arima.monthly = forecast(train.arima.monthly.3, h=5)
autoplot(fc.train.arima.monthly) +
autolayer(test.arima.monthly, series="Test Set", alpha = 0.7) +
ggtitle("Multi-Step ARIMA Monthly Forecasts of Frequency of Fire Incidents") +
xlab("Date") +
ylab("Frequency of Fire Incidents") +
coord_cartesian(xlim = c(2018, 2023)) +
theme(legend.position = "bottom") +
theme(axis.text.x = element_text(angle = 90), title = element_text(size=9),
legend.position = "bottom")
test.arima.monthly.3 = arima(test.arima.monthly)
accuracy(test.arima.monthly.3)
#multi-step HoltWinters
dif.training.HW.monthly = subset(gso.fire.ts.month.2.dif, end=length(gso.fire.ts.month.2.dif)-5)
dif.test.HW.monthly = subset(gso.fire.ts.month.2.dif, start=length(gso.fire.ts.month.2.dif)-4)
dif.train.HW.monthly = HoltWinters(dif.training.HW.monthly)
dif.fc.HW = forecast::forecast(dif.train.HW.monthly, h=5)
dYhat.ms.HW = dif.fc.HW$mean
Yhat.ms.HW = cumsum(c(gso.fire.ts.month.2[length(gso.fire.ts.month.2)],dYhat.ms.HW))
Yhat.ms.HW = ts(Yhat.ms.HW, start = c(2022,01,01), frequency=12)
training.HW.monthly = subset(gso.fire.ts.month.2, end=length(gso.fire.ts.month.2)-5)
test.HW.monthly = subset(gso.fire.ts.month.2, start=length(gso.fire.ts.month.2)-4)
true.count.monthly = ts(gso.fire.ts.month.2, start=c(2010,07,01), end=c(2021,12,31), frequency=12)
autoplot(training.HW.monthly) +
autolayer(Yhat.ms.HW, series = "Point Forecasts") +
autolayer(test.HW.monthly, series = "Test Set") +
ggtitle("Multi-Step HoltWinters Monthly Forecasts of Frequency of Fire Incidents") +
xlab("Date") +
ylab("Frequency of Fire Incidents") +
theme(axis.text.x = element_text(angle = 90), legend.position = "bottom",
title = element_text(size = 9))
accuracy(Yhat.ms.HW, gso.fire.ts.month.2)
#multi-step HoltWinters
dif.training.HW.monthly = subset(gso.fire.ts.month.2.dif, end=length(gso.fire.ts.month.2.dif)-5)
dif.test.HW.monthly = subset(gso.fire.ts.month.2.dif, start=length(gso.fire.ts.month.2.dif)-4)
dif.train.HW.monthly = HoltWinters(dif.training.HW.monthly)
dif.fc.HW = forecast::forecast(dif.train.HW.monthly, h=5)
dYhat.ms.HW = dif.fc.HW$mean
Yhat.ms.HW = cumsum(c(gso.fire.ts.month.2[length(gso.fire.ts.month.2)],dYhat.ms.HW))
Yhat.ms.HW = ts(Yhat.ms.HW, start = c(2022,01,01), frequency=12)
training.HW.monthly = subset(gso.fire.ts.month.2, end=length(gso.fire.ts.month.2)-5)
test.HW.monthly = subset(gso.fire.ts.month.2, start=length(gso.fire.ts.month.2)-4)
true.count.monthly = ts(gso.fire.ts.month.2, start=c(2010,07,01), end=c(2021,12,31), frequency=12)
autoplot(training.HW.monthly) +
autolayer(Yhat.ms.HW, series = "Point Forecasts") +
autolayer(test.HW.monthly, series = "Test Set") +
ggtitle("Multi-Step HoltWinters Monthly Forecasts of Frequency of Fire Incidents") +
xlab("Date") +
ylab("Frequency of Fire Incidents") +
theme(axis.text.x = element_text(angle = 90), legend.position = "bottom",
title = element_text(size = 9))
accuracy(Yhat.ms.HW, gso.fire.ts.month.2)
#Time series combination for monthly
train.monthly.2 = window(gso.fire.ts.month.2, end=c(2021, 05))
h.monthly.2 = length(gso.fire.ts.month.2) - length(train.monthly.2)
#ARIMA model
ARIMA.monthly.2 = forecast::forecast(auto.arima(train.monthly.2, lambda=0, biasadj=TRUE),
h=h.monthly.2)
#TBATS model
train.tbats.monthly.2 = msts(gso.fire.ts.month.2.dif, end=c(2021,05), seasonal.periods = 12)
tbats.monthly.2 = forecast::forecast(tbats(train.tbats.monthly.2, biasadj = TRUE),
h=h.monthly.2)
dYhat.combo.monthly.2 = tbats.monthly.2$mean
Yhat.combo.monthly.2 = cumsum(c(train.monthly.2[length(train.monthly.2)], dYhat.combo.monthly.2))
Yhat.combo.monthly.2 = ts(Yhat.combo.monthly.2, start = c(2021,05), frequency = 12)
#HoltWinters Model
HW.monthly.2 = forecast::forecast(HoltWinters(gso.fire.ts.month.2.dif), h=h.monthly.2)
dYhat.HW.monthly.2 = HW.monthly.2$mean
Yhat.HW.monthly.2 = cumsum(c(train.monthly.2[length(train.monthly.2)],dYhat.HW.monthly.2))
Yhat.HW.monthly.2 = ts(Yhat.HW.monthly.2, start = c(2021,05), frequency=12)
Combination.monthly.2 = (ARIMA.monthly.2[["mean"]] + Yhat.combo.monthly.2 +
Yhat.HW.monthly.2)/3
y.monthly = msts(gso.fire.ts.month.2, seasonal.periods=12)
test.monthly = subset(y.monthly, start=length(y.monthly)-12)
autoplot(train.monthly.2) +
autolayer(test.monthly, series="True Count") +
autolayer(Combination.monthly.2, series="Combination", alpha=0.8) +
autolayer(ARIMA.monthly.2, series="ARIMA", PI=F) +
autolayer(Yhat.combo.monthly.2, series="TBATS", alpha=0.7) +
autolayer(Yhat.HW.monthly.2, series="HoltWinters", alpha=0.7) +
xlab("Date") +
ylab("Frequency of Fire Incidents") +
ggtitle("Time Series Combination for Monthly Frequency of Fire Incidents") +
theme(axis.text.x = element_text(angle = 90), legend.position = "bottom",
title = element_text(size = 9))
c(ARIMA = accuracy(ARIMA.monthly.2, gso.fire.ts.month.2)["Test set", "RMSE"],
TBATS = accuracy(Yhat.combo.monthly.2, gso.fire.ts.month.2)["Test set", "RMSE"],
HW = accuracy(Yhat.HW.monthly.2, gso.fire.ts.month.2)["Test set", "RMSE"],
Combination =
accuracy(Combination.monthly.2, gso.fire.ts.month.2)["Test set", "RMSE"])
gso.fire.filtered = gso.fire %>%
filter(response_time_seconds < cutoff.response_time) %>%
dplyr::select(response_time_seconds, everything())
gso.fire.filtered = gso.fire.filtered %>%
mutate(AlarmTime = case_when(AlarmHour >= 0 & AlarmHour < 6 ~ "0-5",
AlarmHour >= 6 & AlarmHour < 12 ~ "6-11",
AlarmHour >= 12 & AlarmHour < 18 ~ "12-17",
TRUE ~ "18-23"))
#manual variable selection
lm.res.time = lm(response_time_seconds ~ TotalStaffOnIncident + FireDistrict + DayOfWeek +
shift + Month + AlarmTime + NatureCode, data = gso.fire.filtered)
summary(lm.res.time)
round(coef(lm.res.time), 4)
lm.res.time.select = stepAIC(lm.res.time)
summary(lm.res.time.select)
#Lasso variable selection
las.Mod <- glmnet(as.matrix(gso.fire.filtered[,-1]), gso.fire.filtered$response_time_seconds, family = "gaussian")
plot(las.Mod)
#Random Forrest with ranger function
gso.fire.filtered.rf = gso.fire.filtered %>%
dplyr::select(response_time_seconds,TotalStaffOnIncident , FireDistrict , DayOfWeek ,
shift , Month , AlarmTime , NatureCode)
RF.Mod.Ranger <- ranger(response_time_seconds~TotalStaffOnIncident + FireDistrict + DayOfWeek +
shift + Month + AlarmTime + NatureCode,
data = gso.fire.filtered.rf[complete.cases(gso.fire.filtered.rf),], importance = "impurity", num.trees = 500)
variable = names(RF.Mod.Ranger$variable.importance)
rf.vIMP.df = data.frame(variable = variable, vIMP = RF.Mod.Ranger$variable.importance)
ggplot(rf.vIMP.df, aes(x=reorder(variable,vIMP), y=vIMP,fill=variable))+
geom_bar(stat="identity", position="dodge", show.legend = FALSE)+ coord_flip()+
ylab("Variable Importance")+
xlab("")+
ggtitle("Information Value Summary")