A year ago, I began a reporting project to track COVID statistics in New York. At first, the data was just for internal use, so the reporters at Gothamist and WNYC would have an easy way to check the major statistics each day and see how they were evolving. After a few weeks, we decided that making our stats public would complement the other reporting we were doing, and I put up the first version of an article entitled “Coronavirus Statistics: Tracking the Epidemic in New York.” We’ve updated this article every day since then, and it has gone through several iterations, each time offering more data on the epidemic and using better graphing tools to present the information.
During the early days of the first wave, back in March and April of 2020, collecting the data was a laborious process. The numbers came mainly from the daily press conferences the mayor and governor held, and later, from simple dashboards assembled by these officials. But on March 26th, the city began to offer its data in a machine-readable format on Github, which allowed us to access each day’s data conveniently, and later, to automate the daily update.
The city’s data has not been perfect. In the beginning, they offered less information, especially about geography and race. But they’ve released a new version of their feeds every few months, and today, the only information missing is about vaccine administration, which they make available through a dashboard at the city’s COVID site, but not in a feed format on Github.
It’s important to emphasize why feeds are valuable. Without them, a journalist or researcher must go through a laborious process of manually copying data from the dashboard each day or coding scripts to crawl pages. These extra hurdles subtract from the time available to do actual reporting or analysis. They often mean that data that isn’t provided in machine-readable feeds gets examined less frequently, to the detriment of informing the public of what’s going on.
However, whatever flaws the city’s data has is nothing compared to the problems we have with New York State’s numbers. Currently, the state offers only 13 data feeds for its COVID statistics, and 10 of those are information about the State Parks’ response to the epidemic. Contrast that with the city’s data repository, which offers almost 50 data feeds. They cover just about any statistic a journalist or researcher might need, from daily case counts to hospitalizations to death counts, organized by geography, demography, or time.
The state makes several dashboards available to the public, including its main one, which tracks cases and fatalities, as well as others that detail hospitalizations, regional statistics, and vaccine administration. Yet, none of these are available in data feeds that make it easy for journalists or researchers to use.
To fill this gap, third parties such as the COVID Tracking Project stepped in and did the daily work of transcribing the state’s data into a feed that sites like ours could access and turn into graphs and charts. For months, we’ve been using this project’s New York State feed to get statistics for that portion of our data tracker, especially hospitalization and death information. Sadly, they shuttered on March 7th, as their organizers now feel the Biden administration is providing adequate data via the Centers for Disease Control and Prevention and other federal agencies. This transition meant we had to do some reprogramming to replace COVID Tracking Project feeds with the new federal sources. While doing that, I checked again to see if the state had improved its data access and was disappointed to find that Governor Cuomo’s administration was still failing to do so even after a year of the epidemic.
It’s unclear why the state and city have taken such different approaches to COVID-19 data access. Cuomo has recently been embroiled in a scandal around access to nursing home records, which his administration withheld for months in an effort to slow down investigations into deaths that might have been connected to his policy of transferring COVID patients from hospitals to nursing homes. But the basic data listed above has not been hidden. It has simply been made difficult to use without manually transcribing it, or filing a Freedom of Information Law request to receive it in a spreadsheet (typically after several months delay).
As an alternative, the governor has established a system whereby the essential data is often entwined with his public image and appearances. Remember the Governor’s hundred-plus press conferences during the first wave of the epidemic: The practice limited the flow of information, which was withheld from the public, even on the dashboards, until he announced it every day. It made those events into must-watch appearances. Providing unlimited access to easily machine-readable data might have produced news articles or research papers that conflicted with the narrative Cuomo was crafting or revealed discrepancies in nursing home death counts sooner.
Gothamist will continue updating our COVID tracker until the pandemic is over—for now, with state data pulled directly from the CDC. This does introduce a delay into the reporting, which consistently affects a handful of our charts. Our team will continue to press the State Department of Health for machine-readable feeds, and file Freedom of Information Law requests to get the data they’re still not offering. Hopefully, another epidemic on par with COVID won’t happen again, but if it does, perhaps this work will make state data slightly more accessible for future journalists and researchers to do their job and keep the public informed.