Data rescue movement takes root in Boston

Rowena Lindsay photo 5
The hackers with the harvesters track wrote code in three different languages to scrape data covering topics ranging from water quality and snow cover, to crime and grain phenotype and genotype.

On March 24, Data Rescue Boston brought scientists, academics, programmers, librarians, journalists, activists and concerned citizens together in Northeastern University’s Snell Library to work on preserving at-risk federal data.

Friday’s hack-a-thon was one of several such events that have taken place across the country in the last few months as public interest in protecting scientific data increases along with the risks posed by the President Donald Trump anti-science rhetoric. Since Trump took office,  government agency websites have removed references to climate change, Environmental Protection Agency (EPA) employees have been forbidden from talking to the press or publishing new research, and harsh budget cuts have been proposed for the EPA and National Oceanic and Atmospheric Administration (NOAA).

To counter the fear and despair many are feeling at the idea of the government sabotaging its own scientific data, a community dedicated to data rescue has risen up in resistance.

“For participants, it is something really concrete that you can do that is not just feeling that you’re unsettled by the way this administration approaches science,” said Sarah Wylie, assistant professor of sociology and health science at Northeastern and one of the organizers of the event. “It is a really good tangible activity for people to become a part of where they meet people of like minds. That is a really important outcome, the community building part of it.”

Born of a collaboration between the Environmental Data & Governance Initiative (EDGI), DataRefuge, Boston Civic Media, The Engagement Lab at Emerson College, and Northeastern’s Social Science Environmental Health Research Institute, the event held in order to archive data from the U.S. Fish and Wildlife services.

Hack-a-ton attendees split into three groups: harvesters, seeders and storytellers.

The harvesters wrote code in three different languages to scrape data on topics ranging from water quality and snow cover to crime and grain phenotype and genotype. The seeders collected 1,100 URLs from the U.S. Fish & Wildlife Service pages and nominated them to Internet Archive – a non-profit digital library that has archived more than 286 billion web pages. The storytellers created signs for the March For Science that took place on April 22, made a visualization of #MyEPA tweets over times and worked on redesigning the EDGI website.

Screen Shot 2017-04-23 at 10.04.02 AM.png

Click here to watch my video about the Boston March for Science.

The event was primarily the work of EDGI, which formed in December with the goal of demonstrating that there is a public interest in the existence and the preservation of federal data. In addition to putting on data rescue events, EDGI is monitoring 25,000  federal websites for any change to data and conducting interviews with people leaving federal agencies to record the human experience of this historical transition.

“The term data rescue has been uniquely picked up for this moment, but the concept of preserving data is certainly not new,” said Wylie, who is also a founding member of EDGI.

There has always been some concern about preserving data when the White House transitions from one president to another. In 2008, the End of Term Harvest Project was formed to harvest federal government domains in order to create a portrait of the George W. Bush administration and track changes Barack Obama took the White House. But their goal was to protect against more benign threats that those federal agencies are currently facing.

“We’re used to hearing the occasional story of lost research data, but those are generally situations that happen either accidentally or through benign neglect, when data hardware or formats become obsolete and can no longer be accessed,”Jen Ferguson, the research data management librarian at Northeastern and organizer of the event, said. She cited the famous incident where NASA accidentally taped over footage of the moon landing.

“But this is on a much larger scale, the idea that data could be disappeared en masse to serve a point of view – in other words, to remove or obscure evidence of climate change,” said Ferguson.

According to Ferguson, this  is the first time in American history that federal data and the scientific research it supports have been so at risk from the government itself. However, one only has to look as far as Canada to see historical similarities.

In 2006 Stephen Harper became the Prime Minister of Canada and implemented a broad policy forbidding scientists from speaking to the press about their work and cutting government funding for scientific research, and slashing the size of agencies. With Prime Minister Harper – as with President Trump – anti-intellectual sentiment was used to discredit the research that scientists were doing for government agencies.

“There was a feeling that the government was not interested in expert opinion, and I think it’s the same kind of thing that you are probably going to see with the new [Trump] administration” David Tarasick, a senior research scientist at Environment and Climate Change Canada (the equivalent of the U.S. EPA), told Scientific American in December.

The parallels between what is happening now and what happened under the Harper administration is one of the reasons that the data rescue movement caught on so quickly in America, according to Ferguson. And because they have experienced it before, Canadian scientists have been quick to join the data rescue effort.

It is illegal to destroy government data, but access can always be denied in other ways.

“Because the data that we’re talking about is accessible over the internet, no-one would even necessarily have to destroy the data.  Just breaking links to it would serve the same purpose,” said Ferguson. “And while other copies of a given data set no doubt exist in the world – it’s been backed up and downloaded by people –  in today’s political climate, can you imagine the argument that would ensue over the legitimacy of a data set that was in the hands of a scientist studying climate change?”

Lost scientific data has the potential to set scientific research back very far, very quickly – and in some cases permanently. It would also change the lives of the millions of Americans who work with and rely on data.

“All of science depends on data if you think about it,” said Wylie. “Climate science, completely depends on our knowledge of past records of temperature or past records of water depth.  There is harm to any kind of scientific enterprise if there is a loss of a data set that is key to being able to follow patterns in the world.”

Ultimately, the data rescue movement it not about fear, but about civic engagement with public data (data funded by tax dollars) and building a community that will last well beyond the current threats to science.

“[Data rescue] is certainly something that was started because of the election, but I think the ideas of it are hopefully must longer lasting than just a few years,” said EDGI member and Harvard doctoral candidate Maya Anjur-Dietrich. “There is also the question of federal data being something that people are aware of and interested in and taking ownership of because this your data, this is public data.”

“And so while this idea that a Trump presidency and an anti-science perspectives has scared people, I think this kind of thing should be happening all the time because I think having civic engagement with public knowledge is something that ideally we would have all the time.”


Scott Pruitt suggests abandoning Paris Climate Agreement

Scott Pruitt, the Environmental Protection Agency Administrator has said that the U.S. should exit the Paris Climate Agreement, a move that could cause the deal to collapse.

“Paris is something we need to look at closely. It’s something we need to exit in my opinion,” Pruitt said in an interview with Fox & Friends morning news program. “It’s a bad deal for America,” he said. “It’s an America second, third or fourth’ kind of approach.”

This is a stronger stance than anyone else in the Trump administration has taken thus far. Even Secretary of State Rex Tillerson has said that staying in the Paris Climate Agreement will allow the U.S. to “maintain its seat at the table” – a position shared by many coal companies who see it as an opportunity to push for the future of coal on the global stage.

However, just days previously, Pruitt too expressed this opinion: “Engagement internationally is very important,” he told Fox’s Chris Wallace. “To demonstrate the leadership that we have shown on this issue with China and India and other nations is very important. Those discussions should ensue.”

But today, Pruitt expressed the idea that cutting emissions to 26-28 percent below 2005 levels by 2025 as the U.S. is obligated to do under the Paris Agreement would hurt the United States economically while helping India and China, who do not need to start cutting emissions until 2030.

In response, Nathaniel Keohane, Nathaniel Keohane, vice president on global climate at the Environmental Defense Fund, told Inside Climate News that this move would only hurt America, economically as well as environmentally.

“Pulling out of the Paris climate accord would damage the U.S. more than it damages the Paris Agreement or climate action globally,” he said. “American leadership on climate is the key to attracting jobs and investment in the industries and sectors that will define the 21st century.”

The past, present and future of FOIA

I have been working on an analysis of a data set containing 622,493 Federal Freedom of Information Act (FOIA) requests over the last 12 years. The data set covered who requested the data, the agency and department the request was set to, the dates the cases were opened and closed, what data the request was for and what the agency response was. Unfortunately the dataset was very large and fairly messy, which made it challenging to work with. There are some queries that I would have like to run on the data too see how certain factors influenced each other, but it some cases there were simply too many holes in the data. However, I was able to come up with three visualizations of the data that shed some light on the landscape of FOIA requests. (Links to the interactive version of these graphics coming as soon as I can solve some technical issues.)

Table 1: Yearly Federal FOIA Requests. Table 2: FOIA Data Requesters. Table 3: Most FOIAed Federal Agencies.

To gain some insight and context for my findings, I sat down and talked with Northeastern University journalism professor Laurel Leff to learn more about FOIA, how it has changed over the years and why it is so important now.

I the data I found a huge increase in the number of FOIA requests, starting in 2003 but continuing to increase over the years with noticeable growth in 2009 and 2012. (Table 2). According to Professor Leff, there were two main forces that affected the FOIA act in the past few decades.

“One is the very obvious. Now we have the possibility of making this stuff available digitally, so that means that even though there might not be a philosophical change there needs to be a change in the way statutes are written,” Leff said.

“Being able to get access to [data] online makes a difference and being able to get the data in [digital] form makes an enormous difference so then you can do the kind of calculations that people do with data. Even though the government was collecting data way before you could get an electronic version, it was obviously a lot harder to manipulate it any use it in that sense. There has been a really profound transition as a result of that.”

The other change that Professor Leff discussed had the opposite affect, making data less available and the government less transparent.

“In the wake of 9/11 there was clearly a crackdown on the part of the Bush administration to take information that would have been available and say that it now presents some sort of national security threat,” Leff said. “Certainly the courts that heard these requests … were very willing to say ‘there is nothing we can do about it, it is classified information and the requester isn’t going to be able to get it.’ There are a whole slew of those kinds of cases that occur in the post- 9/11 period.”


As a journalist, I have always learned about FOIA in the context of news organizations filing FOIA requests, but unsurprisingly it is investment managers and lawyers who actually file the most FOIA requests. The difference being, Leff said, that when journalists file requests they often turn into big stories that the public learns about, where as when other types of organizations request the information it is typically kept for internal use.

“It is generally assumed that the biggest users of FOIA are businesses, not news organizations and not the public. The government is a huge repository of data about business practices, and so they do so to get a competitive edge, not using trade secrets but using any kind of data you can get that could illustrate some aspect of the economy.

“For law firms and for public interest law firms it would be it is not competitive in the sense of getting a business advantage, but if you’re filing a law suit and you’re trying to determine what the status of a group of people – either for a class action or just to bolster a claim – the government in general is the best storehouse of information about pretty much anything.”


However, in the uncertain times we are living the future of open government, and by extension the Freedom of Information Act, hangs in the balance. There is a lot of concern with federal data, particularly scientific data, being classified, altered or even deleted and causing huge set backs in research and government accountability around the world, not just in America.

“The intent of FOIA is to provide openness in government, but as I think we’ve seen in so many other areas a lot of the actual strength of our institutions is that we do it and we believe in it, not just in the fine letter of the law,” Leff said. “Certainly if someone who is skeptical of the press and of public scrutiny could do a great deal to try and get information, documents from getting to the public. And certainly that could be done fairly easily in the national security context, and probably in other contexts if you turn your attention to it. And if nothing else you can muck things up.”

But for once the size of the federal bureaucracy may benefit the people.

“The federal government is a vast enterprise and it’s hard to imagine doing that for everything,” Leff said. “There are just so many requests and so much information to be able to have control over all of it, to necessarily know what is more important is pretty hard to do. I think to some extent we can count on the size of the federal bureaucracy to mean that even with someone who might have the intent to keep information from getting to the public it would be pretty hardtop do.”

Urban Tensions Hackathon at Northeastern

This past weekend I attended the Urban Tensions Hackathon at Northeastern University the goal of which was to use urban data to tell the story of conflict in Boston. The event started with a round of lighting talks. Christine Dixon of Project Hope discussed getting Boston public housing data from the courts and Ben Green who works for the City of Boston discussed the Analyze Boston data portal that launched last week.

With that inspiration, we were let lose to form groups and do our own data analysis. My group members Elle Williams, Rowan Walrath and Paxtyn Merten, worked with Twine  – an interactive, non-linear storytelling tool – to create an interactive game called Broken Bootstraps designed to build empathy for people being evicted from their homes. Elle got the idea from Depression Quest, a similar empathy building game, also built with Twine, in which user played a character with depression and had to make a series of life choices, which were affected by their depression.

What we made in four hours is very much a draft. There were a lot of things we wanted to include but did not have time to fully research, including public v. subsidized housing and the affect on race on the eviction process. That being said I was very impressed with what we were able to accomplish from idea conception to completion in such a short time. (And other’s seemed to agree since we won 4th place at the hackathon!)

Screen Shot 2017-04-13 at 12.44.30 PM.png

“Conversations” journalism conference held at Northeastern University

On Friday, Northeastern’s School of Journalism and The Center for the Arts and Social Impact co-hosted a conference called Conversations: New Frameworks for Public Discourse. I attended the first event of the conference, a panel called “True listening beyond the data: making sure we hear and understand.”

Director of the School of Journalism Jonathan Kaufman opened the conference with a welcome address about the goals of the conference and how productive conversations are even more important in this time of extreme polarization. “It is not just about political divides anymore,” Kaufman said, “many of us can’t even agree on what it means to be an American.”

Kaufman went on to discuss the role that news media has historically played in over coming highly bipartisan circumstances.

“We believed that the internet would bring people together and expose people to different views. In fact it has had the opposite effect. What we discovered in the past year is that we all live in bubbles. We live in bubbles that we create.”

The crux of the panel that I attended was data journalism. And data may provide a way out of those bubbles.

Like all journalistic endeavours, data journalism presents a variety of ethical and technical challenges and opportunities and the three panelists – Brooke Foucault  Wells; an assistant professor of communications int he Emergent Media Program at Northeastern, Andrew Heywerd; former president of CBS News and researcher at the MIT Media Lab, and David Lazer; professor of political science and computer and information science at Northeastern – addressed these issues:

Data journalism is often the combined efforts of interdisciplinary teams, but using data responsibly can be difficult. “I am worried about the over reliance on data without explaining what the data is,” Wells said. ” For example I don’t think that the average person understands algorithmic biases. The data is only as good as the biases that got baked into them. The news does a good job of training people to be critical about sources, we need to start training people to be critical about data algorithms”

Heyward said that data, when used responsibly, could be used to portray the world more realistically: “Data can be used to move from the one sized fits all model of media to a much more nuanced model that reflects what society is like rather than trying to squeeze it into a generic box that everyone will like.”

Wells said that data gives the media a tool to better monitor their own practices and keep themselves in check: “I would like if media organizations used data to analyze their own representation practices,” Wells said. “The media have enormous power to effect not what to think about issues, but which issues to think about.  They have a lot of power to dictate what the public is caring about. Data can help look at who  the voices being features in stories are. Are we having black people speak on black issues are we having women speak on women issues. People need to space to speak on issues they care about.”


Coffee in Boston: Wired Puppy


Price for a medium cup of coffee: $3.00

Wired Puppy originated in Providence (379 Commercial St.) but opened a second location at 250 Newbury St. in 2011. It is close to the Prudential Center T stop and is open from 6:30 a.m. – 7:30 p.m. on weekdays and 7:30 a.m. – 7:30 p.m. on weekends.

Wired Puppy offered a variety of hot, cold and frozen coffee and espresso beverages as well as brewed tea and tea lattes. The café even sells dog treats and collars for its customers’ furry friends.

Tucked away below street level, the café has a cozy vibe and seats about 15 customers.

“I often come here to study. It is busy, but never too loud, so I can get work done,” said Rachel Borne, a student at Suffolk University, who was trying the café’s signature lavender latte for the first time.

The café served Revelator Coffee Company coffee, a coffee roasting company based in Alabama. Committed to sustainable and eco-friendly business and food practices, all of its products are certified organic, direct trade, and shade grown.

Customers at Wired Puppy can try a wide variety of Revelator’s coffees at the Single Cup Brew Bar, which offers patrons their choice of coffee as well as brewing method, including pour over, siphon, aeropress, chemex, or French press.

While the café’s offerings of baked goods are limited, they are made fresh in the café’s kitchen each day.

Wired Puppy can be reached by phone at 857-366-4655 and online at

Not handicapped accessible.

Final Project Proposal

For my final project I will be reporting on what Boston-area scientists are doing to protect their research and data for future generations.

The article and the video will focus on organizations and people who are working to protect data, to come up with new policies, and to educate the public about science in this new climate denying era (for example: Data Rescue Boston, Boston universities, and other scientific institutions such as the Museum of Science and Boston Scientific). While I am interested in the organizations, I plan to focus more on the work is being done and what progress can be made despite the administration. The video will take a similar approach to the story. I already have plans to attend a Data Rescue Boston meeting this Thursday night and I have reached out to the other organizations and universities to set up interviews in the next few weeks.

For my photo story I will be covering an event Data Rescue Boston is holding at Northeastern called #DataRescue for Climate Learning and Action on Friday March 24. The event will bring scientists, data analysts, journalists, programmers, and anyone who is an advocate for science to work on archiving federal environmental data and making media campaigns for scientist marches occurring in April. In my photo story I will be focusing on the work being done as well as the individuals at the event and what brought them there.