bioRxiv. 2023 Aug 07. pii: 2023.07.31.551384. [Epub ahead of print]
Nitesh Kumar Sharma,
Ram Ayyala,
Dhrithi Deshpande,
Yesha M Patel,
Viorel Munteanu,
Dumitru Ciorba,
Andrada Fiscutean,
Mohammad Vahed,
Aditya Sarkar,
Ruiwei Guo,
Andrew Moore,
Nicholas Darci-Maher,
Nicole A Nogoy,
Malak S Abedalthagafi,
Serghei Mangul.
Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared. We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten papers organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability (p=2.71×10 -9 ). Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses (p=1.15*10 -07 ). In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research. By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.