The Commercial Fishing Incident Database (CFID) “is a surveillance system managed by [the National Institute for Occupational Safety and Health (NIOSH)] to collect information on fatalities and vessel disasters that occur in the US fishing industry.” The database enables the government to ask questions such as, “Where are the most hazardous fisheries?” “What are the worst problems?” “What causes fatalities in hazardous fisheries?” and “Where will prevention efforts be most effective?”
In January 2023, the Data Liberation Project filed a FOIA request to the Centers for Disease Control and Prevention (CDC) — NIOSH’s parent agency — seeking a copy of the database (minus fields containing personally-identifiable information) and all relevant database documentation.
In subsequent communications with the Data Liberation Project, the CDC indicated that it could not provide the database in its original structure (a series of interrelated tables) but rather as “a linked dataset (a single Excel sheet with all seven data tables already linked but without the linking identifier) that excludes PII and incorporates strategies for reducing disclosure risk, such as the removal, redaction, or modification of variables[.]” The Data Liberation Project agreed to modify the scope of its request to allow for that.
On May 17, 2024, the CDC provided several data and documentation files in response to the request:
- The “linked dataset”, containing 3,559 rows (each representing a person affected by an incident, as well as characteristics about the incident) and 157 columns.
- A data dictionary, which describes the meaning and formatting of the database fields.
- An 80-page PDF titled, “NIOSH Commercial Fishing Incident Database (CFID) v2.0: Standard Operating Procedures and Guidelines for Data Management.” The CDC has redacted some portions of the document.
- A spreadsheet translating various numeric codes to their descriptions.
Unfortunately, some characteristics are represented only very coarsely. For example, incident dates are not provided, only year ranges: 2000-2002
, 2003-2007
, 2008-2012
, 2013-2017
, or 2018-2022
. Similarly, incident locations are provided only as Pacific/Atlantic and the number of miles from shore.