Made with ❤️ by Anton Vasetenkov
COVID-19 resources for bioinformaticians and data scientists
Apr 4, 2020

COVID-19 resources for bioinformaticians and data scientists

Resources about coronavirus disease 2019 and the 2019–20 coronavirus pandemic.

COVID-19 case data

Coronavirus case data is reported by the following sources:

SARS-CoV-2 genome analysis

The first genome sequence of the virus was published on February 3 and is available in GenBank (MN908947).

This has been followed by a number of papers further analyzing the genomic data. In The proximal origin of SARS-CoV-2, Andersen et al. show that the new virus arose naturally and was not engineered in a lab and deliberately released to make people sick.

In NCBI's Taxonomy Database, the virus is assigned the ID 2697049 (NCBI:txid2697049). The taxonomy page contains links to all NCBI data related to the virus.

There are about 500 SARS-CoV-2 sequences currently available in GenBank and 300 in the Sequence Read Archive (SRA). The live list is published by NCBI here and is available in the YAML format here.

Structural biology of SARS-CoV-2

The structure of the spike protein through which the virus binds to human cells was published on March 13. This glycoprotein is a key target for vaccines, therapeutic antibodies, and diagnostics.

COVID-19 clinical trials data

ClinicalTrails.gov indicates that there are over 300 COVID-19 clinical trials that are under way around the world.

The first clinical trial (NCT04280705) in the U.S. aims to evaluate the safety and efficacy of the drug remdesivir.

Additional resources

See also
Public sequence data sources
What are the public sources of protein and nucleic acid sequence data?