January 9, 2018
Digital HPS

In late 2012, Biology and Society graduate students Erick Peirson and Julia Damerow started a project under the guidance of their advisor, Manfred Laubichler. The project started out by hiring two Master’s students from the computer science department and three undergraduate students from Biology and Society to support the development of software and its application to research projects of the Laubichler lab. This project was later named the Digital Innovation Group (DigInG).

Five years later, the Digital Innovation Group has now developed multiple software tools that support researchers in processing and analyzing their data. DigInG has further supported several research projects of the Laubichler lab, as well as projects The Giles Ecosystemexternal to the lab. Among the developed tools are software packages for large-scale text extraction and OCR of documents, annotation of documents, and a library for bibliographic metadata analysis. Since mid-2016, DigInG has been collaborating with Department I of the Max Planck Institute for the History of Science in Berlin, Germany. The main goal of this collaboration has been to combine software development efforts for the computational history of science. Since mid-2017, the Digital Innovation Group is one of the founding members of DHTech, a group of software developers and scholars with programming skills, that has the goal of supporting the development and reuse of software in the Digital Humanities.

DigInG brings together students from computer science and the humanities in order to create new and innovative tools, infrastructure, and methods for computational history and philosophy of science research and training. In so doing, it also creates new educational resources, opportunities, and experiences for its students. Students working for DigInG acquire hands-on skills in software engineering or computational research methods that give them an advantage when moving on to their careers in academia or the industry. Over 80% of former DigInG students are now working for companies such as Amazon, Microsoft, and PayPal, or are pursuing graduate degrees at ASU and other universities. Over the years, the Digital Innovation Group has offered several digital humanities courses and workshops focused on computational methods for the humanities as well as software development for digital humanities projects.

The Data Mining and Informatics Team (Data Team) at the Laubichler Lab builds unique and innovative data systems that capture in unprecedented detail the processes that drive important scientific innovation. Combining expertise in data wrangling, network science, and advanced statistical modeling, we push at the interdisciplinary boundaries of the life sciences, medicine, clinical research, data science, and digital humanities.

In the past year, the Data Team has collected, cleaned, and wrangled with over 80 gigabytes of data. This data represents a large diachronic cross section of multiple institutional, social, and knowledge domains.  For instance, in order to understand the emergence of the microbiome concept, we have gathered the full text and accompanying metadata of every scientific paper ever published containing the word "microbiome" and every US funded microbiome research project.  Also, to better understand how scientific fields are influenced by language and social systems, we analyzed and mapped how the content of evolution journals has changed in over 50 different journals from 1900 to 2015. These complete and carefully curated datasets allow us to approach questions about scientific innovation that have never before been answerable: How do scientific innovations spread from obscure corners of science into the mainstream?  What hidden (or not so hidden) variables influence the likelihood of funding for innovative science?

In addition to data collection and cleaning, the Data Team has been honing data analysis and communication skills through Data Competitions.  Students get hands-on experience, feedback, and mentoring by analyzing real-world datasets and presenting their findings to other staff and students.

