By Thea Atwood, Data Services Librarian
During my training as an undergraduate research assistant investigating the neural underpinnings of writing, I encountered a formative challenge: data analysis (and reanalysis) took an exorbitant amount of time to complete. The brain scans for our twenty subjects, collected using functional magnetic resonance imaging—a tool used to look at blood flow in the brain, correlating higher blood flow with the use of a particular brain region—took at least 40 hours to fully analyze. Because I was trained that misanalysis of data was a breach of trust and confidence in science, I ended up reanalyzing my data four times; I couldn’t remember if I’d ticked the same boxes exactly the same way for each subject in the analysis software.
Imagine my surprise when I learned of another lab who used a programming script instead of a piece of software. They ran their script on all their subjects, and they could see each and every step in analysis taken along the way—from cleaning to normalizing to statistics. They knew exactly what steps were taken and could retrace each of their steps. Their work was reproducible, consistent, and, to me, profoundly reassuring.
Data is an asset to be cared for, and to be placed in stable, well-funded data repositories for future access and reuse. How can we stand on the shoulders of giants if our giants do not have shoulders?
That was my first exposure to coding, and it is what I often credit as the moment that opened the door to my career in librarianship, data management, and data stewardship.
Since 2018, I’ve been in a role I have wanted since entering grad school in 2010: Data Services Librarian. I help scholars and researchers take better care of the data they collect and generate. I am deeply invested in how researchers manage their data, including the steps we take to improve the reproducibility and trust in their data.
It is a delight to be involved in an initiative started by the Association of Academic Universities and the Association of Public Land Grant Universities (AAU-APLU) on the Guide To Accelerate Public Access to Research Data. The crux of the Guide is to help researchers share the data and evidence they generate and collect. “Sharing” doesn’t necessarily mean “anyone can access anything.” Folks working with human subjects are still beholden to restrictions laid out by our Institutional Review Board (or IRB), for instance. But it does mean that data does not have to sit on a hard drive in someone’s closet and fall victim to issues like mechanical failure or format obsolescence. Data is an asset to be cared for, and to be placed in stable, well-funded data repositories for future access and reuse. How can we stand on the shoulders of giants if our giants do not have shoulders?
Public access to research data hinges on our researchers’ ability to be good stewards of their data from the very beginning. However, few researchers receive training in data management and stewardship. And even those that do find guidance tend not to receive it in any systematic way: they are either lucky to have a mentor to guide them, or they learn by costly setbacks, including massive data loss due to hard drive failure.
The Guide is about transforming data from a liability in a closet to an asset in a repository—which means making the resources to care for research products more widely available. That requires infrastructure and services, workflows for compliance, and, critically, experts to provide guidance and training. This is where the libraries play an important role. As experts in ensuring long-term access to information—like books, electronic resources, and archival materials—we are a highly collaborative field of professionals invested in developing and sustaining the infrastructure that can help researchers both safeguard what they have and find what they need.
We capture our data stewardship aspirations in the UMass Amherst Research Data Management Strategic Plan, which I co-authored with a variety of stakeholders across campus in response to the guidance provided by the Guide.
This is an exciting time to work in data services; making data accessible to the public requires input and collaboration from all areas of campus. I work closely with my colleagues at all stages of the research lifecycle: compliance, information technology, information security, and research development, as well as with faculty ambassadors.
The payoff of our endeavors will have a profound impact on our world and our ability to innovate. Anyone with a means to connect to the internet will be able to find the research they need to improve their community, and to make new, exciting findings. This is the democratization of science and scholarship.
Images: Thea Atwood, data services librarian. Top Illustration by Jørgen Stamp:
CC BY 2.5 Denmark license www.digitalbevaring.dk.