The omicron variant quickly took over the global coronavirus landscape after it was first reported in South Africa in late November, 2021. The U.S. became the 24th country to report a case of omicron infection when health officials announced on Dec. 1, 2021, that the new strain had been identified in a patient in California.
How do scientists know what versions of the coronavirus are present? How quickly can they see which viral variants are making inroads in a population?
Alexander Sundermann and Lee Harrison are epidemiologists who study novel approaches for outbreak detection. Here they explain how the genomic surveillance system works in the U.S. and why it’s important to know which virus variants are circulating.
What is genomic surveillance?
Genomic surveillance provides an early warning system for SARS-CoV-2. The same way a smoke alarm helps firefighters know where a fire is breaking out, genomic surveillance helps public health officials see which coronavirus variants are popping up where.
Labs sequence the genome in coronavirus samples taken from patients’ COVID-19 tests. These are diagnostic PCR tests that have come back positive for SARS-CoV-2. Then scientists are able to tell from the virus’s genome which coronavirus variant infected the patient.
By sequencing enough coronavirus genomes, scientists are able to build up a representative picture of which variants are circulating in the population overall. Some variants have genetic mutations that have implications for prevention and treatment of COVID-19. So genomic surveillance can inform decisions about the right countermeasures – helping to control and put out the fire before it spreads.
For example, the omicron variant has mutations that diminish how well existing COVID-19 vaccines work. In response, officials recommended booster shots to enhance protection. Similarly, mutations in omicron reduce the effectiveness of some monoclonal antibodies, which are used both to prevent and treat COVID-19 in high-risk patients. Knowing which variants are circulating is therefore crucial for determining which monoclonal antibodies are likely to be effective.
How does genomic surveillance work in the US?
The U.S. Centers for Disease Control and Prevention leads a consortium called the National SARS-CoV-2 Strain Surveillance (NS3) system. It gathers around 750 SARS-CoV-2-positive samples per week from state public health labs across the U.S. Independent of CDC efforts, commercial, university and health department laboratories sequence additional specimens.
Each type of lab has its own strengths in genomic surveillance. Commercial laboratories can sequence a high number of tests, rapidly. Academic partners can provide research expertise. And public health laboratories can supply insight into local transmission dynamics and outbreaks.
Regardless of the source, the sequence data is generally made publicly available and therefore contributes to genomic surveillance.
What data gets tracked?
When a lab sequences a SARS-CoV-2 genome, it uploads the results to a public database that includes when and where the coronavirus specimen was collected.
The open-access Global Initiative on Sharing Avian Influenza Data (GISAID) is an example of one of these databases. Scientists launched GISAID in 2008 to provide a quick and easy way to see what influenza strains were circulating across the globe. Since then, GISAID has grown and pivoted to now provide access to SARS-CoV-2 genomic sequences.
The database compares a sample’s genetic information to all the other samples collected and shows how that particular strain has evolved. To date, over 6.7 million SARS-CoV-2 sequences from 241 countries and territories have been uploaded to GISAID.
Taken together, this patchwork of genomic surveillance data provides a picture of the current variants spreading in the U.S. For example, on Dec. 4, 2021, the CDC projected that omicron accounted for 0.6% of the COVID-19 cases in the U.S. The estimated proportion rose to 95% by Jan. 1, 2022. Surveillance gave a stark warning of how quickly this variant was becoming predominant, allowing researchers to study which countermeasures would work best.
It’s important to note, however, that genomic surveillance data is often dated. The time between a patient taking a COVID-19 test and the viral genome sequence getting uploaded to GISAID can be many days or even weeks. Because of the multiple steps in the process, the median time from collection to GISAID in the U.S. ranges from seven days (Kansas) to 27 days (Alaska). The CDC uses statistical methods to estimate variant proportions for the most recent past until the official data has come in.
How many COVID-19 samples get sequenced?
Earlier in 2021, the CDC and other public health laboratories were sequencing about 10,000 COVID-19 specimens per week total. Considering that hundreds of thousands of cases have been diagnosed weekly during most of the pandemic, epidemiologists considered that number to be too small a proportion to provide a complete picture of circulating strains. More recently, the CDC and public health labs have been sequencing closer to around 60,000 cases per week.
Despite this improvement, there is still a wide gap in the percentages of COVID-19 cases sequenced from state to state, ranging from a low of 0.19% in Oklahoma to a high of 10.0% in North Dakota within the past 30 days.
Moreover, the U.S. overall sequences a much smaller percentage of COVID-19 cases compared to some other countries: 2.3% in the U.S. compared to the 7.0% in the U.K., 14.8% in New Zealand and 17% in Israel.
Which COVID-19 tests get sequenced?
Imagine if researchers collected COVID-19 tests from only one neighborhood in an entire state. The surveillance data would be biased toward the variant circulating in that neighborhood, since people are likely transmitting the same strain locally. The system might not even register another variant that is gaining steam in a different city.
That’s why scientists aim to gather a diverse sample from across a region. Random geographically and demographically representative sampling gives researchers a good sense of the big picture in terms of which variants are predominant or diminishing.
Why don’t patients in the US get variant results?
There are a few reasons patients are generally not informed about the results if their specimen gets sequenced.
First, the time lag from specimen collection to sequence results is often too long to make the information clinically useful. Many patients will have progressed far into their illness by the time their variant is identified.
Second, the information is often not relevant for patient care. Treatment options are largely the same regardless of what variant has caused a COVID-19 infection. In some cases, a doctor might select the most appropriate monoclonal antibodies for treatment based on which variant a patient has, but this information can often be gleaned from faster laboratory methods.
As we begin 2022, it is more important than ever to have a robust genomic surveillance program that can capture whatever the next new coronavirus variant is. A system that provides a representative picture of current variants and fast turnaround is ideal. Proper investment in genomic surveillance for SARS-CoV-2 and other pathogens and data infrastructure will aid the U.S. in fighting future waves of COVID-19 and other infectious diseases.