By Eleanor Lin
The novel coronavirus has often been spoken of as if it were a living, breathing enemy. Yet SARS-CoV-2, the virus which causes COVID-19, is not considered fully "alive" by scientists, since (like all viruses) it cannot reproduce on its own. Instead, it must hijack human cells in order to produce copies of itself, damaging the host cells and causing illness in the process.
The virus's genome (its full RNA sequence) holds the key to this hijacking mechanism. Since the first complete SARS-CoV-2 genome (from a patient in Wuhan, China) was sequenced in January this year, scientists have been competing and collaborating with one another to understand what each part of the genome does. This quest to crack the virus's genetic code is laying the groundwork for better COVID-19 detection, prevention, and treatments.
A genome, found in every cell of every living organism, is the instruction manual for building and maintaining life. Each type of virus also possesses its own unique genome—the instructions, but not the machinery, for copying itself. (It is this lack of reproductive machinery which causes scientists to classify viruses as non-living.) Unlike living organisms, however, the genome of SARS-CoV-2 consists of ribonucleic acid (RNA). Similar to DNA (the molecule which makes up the human genome), RNA consists of four distinct nucleotides, chemical building blocks repeated in unique combinations to encode information. When scientists “sequence” the SARS-CoV-2 genome, they determine the precise order of the nucleotides in a particular sample of viral RNA. To date, roughly 20,000 SARS-CoV-2 genomes have been sequenced in their entirety and published in the National Center for Biotechnology Information's NCBI Virus database.
Placed in the context of the relatively short history of genome sequencing, the speed at which so much SARS-CoV-2 data has been generated is astonishing. When molecular biologist Walter Fiers and his colleagues completed the genomic sequence of Bacteriophage MS2 in 1976, it was a major breakthrough: the first genome ever had been sequenced in its entirety. The MS2 genome is 3,569 nucleotides long and took years to sequence. The SARS-CoV-2 genome is almost 30,000 nucleotides long, yet scientists were able to sequence and publish it on the timescale of months. This rapidity reflects the huge advances that have been made in genome sequencing technology.
Why are scientists so eager to produce and interpret new sequences? Firstly, controlling the spread of COVID-19 requires knowing who is infected, so that they can be isolated and prevented from infecting others. After all, the most accurate diagnostic tests for COVID-19 rely on detecting viral RNA. Only by knowing the exact sequence of the SARS-CoV-2 genome can scientists design good tests—able to detect not only miniscule amounts of RNA from a sample such as a nose swab (sensitivity), but also only SARS-CoV-2, not other viruses with similar genomes (specificity).
The virus’s genome also suggests where it emerged and how to avoid future outbreaks. Random changes called mutations occur in the genome over time, and are useful as a molecular barcode for scientists to track the global spread of the virus. For instance, from the SARS-CoV-2 genome sequences of 84 New Yorkers, the authors of a study published in Science concluded that the virus was introduced to New York City multiple times, "mainly from Europe and other parts of the United States."
Comparing the SARS-CoV-2 genome with other viral genomes, and across times and locations, allows scientists to make conjectures about the evolutionary history of the virus. Based on its genetic similarity to other coronaviruses known to have zoonotic (i.e. animal) origins, SARS-CoV-2 most likely first circulated among bats ("reservoir hosts") then among other mammalian species ("intermediate hosts") and gradually accumulated genetic mutations, enhancing its ability to infect humans. Therefore, reducing human contact with potential animal hosts, including by stopping wildlife trafficking and deforestation, can reduce the future risk of pandemics.
Developing safe and effective vaccines and treatments for COVID-19 requires a detailed understanding of SARS-CoV-2 structure and function, which are dictated by the virus’s genome.
A common concern is that mutations could change the virus enough that the vaccines currently under development (many of which are based on the first SARS-CoV-2 genome ever sequenced, from January 2020) would no longer be effective on all the new, mutated versions of SARS-CoV-2. However, a retrospective study of over 18,000 SARS-CoV-2 sequences, published in the Proceedings of the National Academy of Sciences, concluded that the genetic profile of the virus is actually becoming more homogeneous and stable, rather than more varied. As a result, the study concluded, a single vaccine should be effective against all the current versions of SARS-CoV-2.
There is still much to be uncovered about the coronavirus, and the unknowns can be daunting. But genomic sequencing data will continue to help scientists fill the gaps in our knowledge, one nucleotide at a time.