NFIRS 2015 is Vastly Improved: Just Ask FEMA

Introduction

Data from the National Fire Incident Reporting System (NFIRS) is now available from the U.S. Fire Administration (USFA) for 2015. The agency collects data reported from the majority of fire departments across the country. While working with fire departments, in my current role, I’ve observed that most reporting agencies provide detailed information about every incident along with times associated with each of the incident’s responding units. Additional details accompany serious fires, mutual aid responses, casualties (injuries and deaths), and assorted special circumstances (such as hazardous materials and wildfires).

In the past (see my prior post), the USFA’s public data release (PDR) would only contain fire incidents, which amounts to a small but significant fraction (8.7% for 2015) of all incidents involving fire departments. Now, you can request information about all incidents; however, unit-level information still remains unavailable. Last May, I received a full data set for 2015 and a matching set of data for 2014 as well. You can request your own data set here.

I’m excited about the update as this new expanded data set places NFIRS on par with other similar national data sets such as the National Highway and Traffic Safety Administration’s (NHTSA) National Emergency Medical Services Information System (NEMSIS) that records emergency medical incidents involving public or private ambulance services throughout the country. Moreover, it allows a researcher to get a fuller picture of any single agency as well as the activities of the fire service as a whole. 2015’s data includes approximately 24.8 million records, while 2014’s original “fires only” data release contained approximately 2.1 million records. Focusing on the country’s largest fire department, the New York Fire Department (NYFD), NFIRS 2015 data includes approximately 557,000 incidents, but only 80,000 fire-specific incidents.

I’d like to take a few moments to discuss the new data set. NFIRS 2015 data presents challenges that arise whenever one deals with larger data sets. In addition, I recently noticed that many agency’s data submissions are incomplete. Nevertheless, I’d like to provide some interesting summary information. Some of this will leverage the newly included information, highlighting medical incidents. Another will emphasize casualty data which is an important focus of the USFA’s statistical analysis; but I’ll expand the scope to include fire injuries as well as fire deaths. At the end, I will include some foods for thought, links to my own analysis for the technically savvy, and hints at further analysis to come.

Difficulties Associated With Data Size

I usually conduct the majority of my data analysis using the R statistical programming language on a single workstation. This works fine for smaller data sets that can be loaded comfortably into a computer’s RAM. However, the NFIRS 2015 data release includes more than 10 files and contains more than 7 Gigabytes of information. Even looking at files individually, the primary data file alone exceeds 4 Gigabytes in size. I’d like to discuss some reasonable approaches to this specific problem that I plan to explore going forward along with others that I’ve already discarded.

  • Ignoring the problem: I’m certainly not loading all the data into memory and storing the information as one RData file. It’s just too big for my system.
  • Standard Relational Database: It seems clear that USFA is using a standard database to manage it’s own NFIRS data. Here’s why I won’t be reproducing that method.
    • I’ve explored using MySQL and/or MariaDB for this purpose. While useful for me in general, the data loading process for files of this size is still a bit involved.
    • For a database of this size, with large individual tables, running queries on my home computer is still quite slow.
    • Cloud hosting is possible but adds an extra layer of effort with limited benefit as I’m not sharing the data with many users.
  • Limiting Data: To accomplish the analysis incorporated in later sections, I’ve chosen to simply read in a very limited portion of the data set. Careful decisions about what is absolutely necessary for an analysis can alleviate problems with data size. At this point, I’m just using read_csv from the readr package. It allows me to selectively load 11 out of 40 columns from the NFIRS 2015 basic incident table. The resulting RDS file, which is a standard internal R format for a single data set (or “data frame”) is less than 90 Megabytes when compared to 4 Gigabytes. Nevertheless, the file still took over 4 minutes to load. Another alternative is using read.csv.sql from the sqldf package.
  • Drill: For the next NFIRS-related post, I will likely use Apache Drill for basic queries. It can also convert the current text-delimited data files into a more efficient columnar Parquet format. I’ve been experimenting with Drill recently and it’s worked quite well for me.
  • Athena: I’m a current Amazon Web Services user and Athena would allow me a cost-effective way to leverage cloud computing and my Simple Storage Service (S3) to perform a complicated analysis in a reasonable amount of time. I haven’t explored this option adequately to offer an informed opinion on it.

Basic Information

The 2015 NFIRS data set includes approximately 24,991,000 incidents.

In the past, the public data set was limited to only 71 fire incident types. In 2015, these fire incident types represented approximately 2,166,000 calls. The following table shows these types sorted from most to least frequent.

It’s reasonable to expect that the departments with the largest number of calls also manage areas with the largest populations. While there are 23,031 agencies reporting at least some of their calls to the national system, below is a list of the 100 agencies with the most reported calls. While there are 50 states in the U.S., these 100 agencies are concentrated in 31 states. For example, no agencies are included from either Michigan or New Jersey.

Medical Calls and Percentage

Without examining populations in detail, it should be clear that the number of calls is not always proportional to an area’s population. For example, Fayetteville Fire and EMS reported a similar number of calls as the San Francisco Fire Department. However, San Francisco’s population is approximately 865,000 people while Fayetteville (NC)’s population is approximately 204,000 people. The main difference is that Fayetteville’s fire department reported more medical calls to NFIRS than San Francisco’s Fire Department.1 A way to compare reporting by each agency is to convert separate calls into fire calls, medical calls, and the rest. Then, we calculate the percentage of reported calls that were fire calls (as per prior year’s releases) or medical calls.

Here are the NFIRS incident types that I considered to be medical in nature ordered from most frequent to least frequent.

Overall, there were approximately 14,390,000 medical calls which was 57.6% of total calls. The next table shows calls and associated percentages for all agencies that reported at least 1,000 calls in 2015.2

I will just say two things about medical calls and NFIRS:
- Some agencies may not report their medical calls despite responding to them. These agencies may feel that NEMSIS is the proper way to report medical calls and that it is redundant to report the information to FEMA as part of NFIRS as well. - When reported to NFIRS, medical calls are usually a large proportion of total calls. So, the information is more illustrative when medical calls are reported than in situations when they are absent.

Food for Thought

In upcoming posts, I’ll examine:

  • Problems associated with the NFIRS 2015 data set,
  • Components of NFIRS associated with casualties (death and injury), and
  • Merging NFIRS data with census information to determine associated rates

  1. Keep in mind that counts of “reported calls” may differ from actual calls. This will be explored in a future post. This actually shows how San Francisco distinguishes calls for service from their (non-medical) incidents. You can see both data sets at San Francisco’s Open Data Portal.

  2. The table’s fire department names rely on information supplied by NFIRS which may contain an occasional typo.

 Share!