Biomedical Representation: Web Archive Collecting for “All of Us”
By Annabelle Smith, Susan Speaker and Christie Moffatt, and Margaret Long ~
In January 2015, President Barack Obama announced the Precision Medicine Initiative—a federally funded opportunity to advance the study of precision medicine and address the lack of diversity in biomedical research. Individuals from historically underrepresented groups, such as people of color and members of the LGBTQ+ community, have often had inadequate access to reliable health information—due in part to the lack of research that includes diverse populations. To begin addressing these disparities, the National Institutes of Health (NIH) formed the Precision Medicine Initiative Working Group of the Advisory Committee to the Director in March 2015, laying the foundation for what would become the All of Us Research Program.
In late 2021, the National Library of Medicine’s (NLM) Web Collecting and Archiving Working Group, a team of archivists, librarians, and historians, initiated a new web archive collecting effort to document the formation, growth, and impact of the All of Us Research Program for future historical research. This work is supported by the Collection Development Guidelines of the NLM, which considers websites, blogs, social media, and other web content to play an increasingly important role in documenting the scholarly biomedical record and illustrating a diversity of cultural perspectives in health and medicine. Together with NLM’s Office of Engagement and Training, members of the All of Us team met periodically with the Working Group to provide essential feedback on topical content and suggest potential leads for additional sources.
The All of Us Research Program web archive includes a wide variety of born-digital web content documenting engagement, participant recruitment, research projects using collected data, ethics and data management, public response, and impact of COVID-19, as well as general sources about the program. The web collection documents several significant milestones contributing to a more diverse collection of medical information, including initial beta testing for participant enrollment, expanding consultation with Tribal Nations, and ongoing program developments. In March 2022, All of Us released nearly 100,000 whole genome sequences from participant DNA data into the All of Us Researcher Workbench to allow broad researcher use. As of January 2023, the program has enrolled over 590,000 participants. Over 50 percent of the program’s participants identify with a racial and ethnic minority group and over 80 percent come from communities historically underrepresented in biomedical research—including people with disabilities, those who live in rural communities, and people with lower income and educational attainment.
All of Us acknowledges historical transgressions that have excluded some communities from participating in biomedical research or were not transparent about how their information would be used. The All of Us Research Program actively works to foster trust with all participants through transparency and has developed rigorous privacy standards for the storage and use of genetic data, which can be viewed through the program’s data browser. In addition, All of Us continuously works alongside members of each community to best understand their relationship to medicine. Program materials are offered in Spanish as well as English to accommodate members of the Spanish-speaking community who wish to participate, and the program hopes to add materials in more languages in the future.

The data collected from the All of Us Research Program will be used to help providers identify the best treatment plans for each individual, instead of relying on a “one size fits all” approach to medical care. The participants’ data will be used to find answers to questions such as what makes certain people more likely to develop a disease, how can an individual’s environment, lifestyle, and genes impact their health, and what types of tools can be developed to detect health conditions and encourage healthy habits.
The All of Us Research Program web archive documents many of the initiatives and accomplishments of All of Us and its partners, including a traveling health exhibit, educational materials, information on data literacy, and the Community Engagement Network, created by NLM and the NIH All of Us Research Program to improve the public’s access to health information and increase awareness of the program among historically underrepresented communities. The collection includes a wide variety of born-digital web content such as official program protocols, social media campaigns, local news reports, blogs, public discourse, participant testimonials, and articles on the health and technology industry.

The Working Group also collected content on the roll of All of Us in supporting COVID-19 research during the COVID-19 pandemic as in-person events and outreach opportunities were affected by the safety guidelines. Amid engendering community support, All of Us captured essential information on the participants’ experience during the COVID-19 pandemic through data surveys, antibody studies, and monitoring the long-term effects of the pandemic on participants.
Identifying and reviewing content for inclusion in this ongoing effort is a crucial first step in the collection process, however, a web collection requires a significant amount of work before archived web content is ready for future researchers. The Working Group conducts crawls of recommended and approved content using the Internet Archive’s Archive-It service, Conifer, and Webrecorder tools, reviews archived content for quality, and addresses capture challenges throughout the process. The team conducts comprehensive reviews of the collection and adds detailed source descriptions to support future discovery and research.
Much like historians and archivists use primary sources to illuminate a more diverse representation of the past, the All of Us team aims to use their diverse collection of biomedical data to construct a more representative view of health in our diverse nation. The All of Us web archive collection furthers NLM efforts to enrich the public understanding of the importance of history and diversity in medical research and supplements existing collections that emphasize the lack, or mistreatment, of underrepresented populations in medical research across time. As All of Us continues to develop, future access to the web and social media resources documenting its growth, ancillary studies, engagement strategies, and milestones depends on actively identifying and collecting material as it happens. This effort by the NLM Working Group will persist as All of Us continues to release new information and furthers its efforts to engage 1 million participants. We hope that this web collection will serve to document this historic effort in precision medicine.

All of Us is a registered service mark of the U.S. Department of Health & Human Services (HHS).
The NLM Web Collecting and Archiving Working Group includes Delia Golden, Marielle Gage (History Associates Incorporated), Christie Moffatt, Katie Platt (HAI), John Rees, Shirleon Sharron (HAI), Susan Speaker, Caitlin Sullivan, Erica Williams (HAI), and Kristina Womack. Explore NLM web archive collections, including ongoing collecting to document the COVID-19 Pandemic, at Archive-It and in Circulating Now.
Annabelle Smith is Contract Research Historian with History Associates Incorporated.
Susan Speaker, PhD, is Historian for the Digital Manuscripts Program of the History of Medicine Division at the National Library of Medicine.
Christie Moffatt is Manager of the Digital Manuscripts Program in the History of Medicine Division at the National Library of Medicine and Chair of NLM’s Web Collecting and Archiving Working Group.
Margaret Long is Contract Archivist with History Associates Incorporated.
Related
Source link