By decoding the complete genetic code, or genome, of more than 1,000 people whose homelands stretch from Africa and Asia to Europe and the Americas, scientists have compiled the largest and most detailed catalog yet of human genetic variation. This massive resource will help medical researchers find the genetic roots of rare and common diseases in populations worldwide.
“With this resource, researchers have a roadmap to search for the genetic origins of diseases in populations around the globe,” says Elaine Mardis, PhD, one of the study’s co-principal investigators and the co-director of The Genome Institute at Washington University. “We estimate that each person carries up to several hundred rare DNA variants that could potentially contribute to disease. Now, scientists can investigate how detrimental particular rare variants are in different ethnic groups.”
Importance of rare variants
At the genetic level, any two people are more than 99 percent alike. But rare variants–those that occur with a frequency of 1 percent or less in a population–are thought to contribute to rare diseases as well as common conditions like cancer, heart disease and diabetes. Rare variants may also explain why some medications are not effective in certain people or cause side effects.
Identifying rare variants across different populations is a major goal of the project. During the pilot phase of the effort, researchers found that most rare variants differed from one population to another. These variants developed recently in human evolutionary history, after populations in Europe, Africa, Asia and the Americas diverged from a single group.
“This information is crucial and will improve our interpretation of individual genomes,” says another of the study’s co-principal investigators, Richard K. Wilson, PhD, director of The Genome Institute and a pioneer in cancer genome sequencing. “Now, if we want to study cancer in Mexican Americans or Japanese Americans, for example, we can do so in the context of their diverse geographic or ancestry-based genetic backgrounds.”
Results of the new study are based on DNA sequencing of the following populations:
- Yorubas in Nigeria
- Han Chinese in Beijing
- Japanese in Tokyo
- Utah residents with ancestry from northern and western Europe
- Luhyas in Kenya
- People of African ancestry in the southwestern United States
- Toscanis in Italy
- People of Mexican ancestry in Los Angeles
- Southern Han Chinese in China
- Iberians from Spain
- British in England and Scotland
- Finnish from Finland
- Colombians in Columbia
- Puerto Ricans in Puerto Rico
All study participants submitted anonymous DNA samples and agreed to have their genetic data included in an online database. To catalog the variants, the researchers first sequenced the entire genome—all the DNA, or genetic information—of each individual in the study multiple times. The process yields the precise order of DNA’s molecular building blocks, called nucleotides. Surveying the genome in this way finds common DNA changes but misses many rare variants.
Then, to find rare variants, they repeatedly sequenced the small portion of the genome that contains genes about 80 times for each participant to ensure accuracy. They then looked closely for changes in the DNA sequence involving a single nucleotide, called SNPs (for single-nucleotide polymorphisms).
Using special tools developed to analyze and integrate the data, researchers discovered a total of 38 million SNPs. They also found more than one million structural variants—sections of extra or missing DNA.
SNPs and structural variants can help explain an individual’s susceptibility to disease, response to drugs, or reaction to environmental factors such as air pollution or stress. Other studies have found an association between structural variants and diseases such as autism and schizophrenia.
Massive amounts of data
The 1,000 Genomes Project has generated massive amounts of genomic data. Simply recording the raw information took up some 180 terabytes of hard-drive space, enough to fill more than 40,000 DVDs. All of the information is freely available on the Internet through public databases.
“This tremendous resource builds on the knowledge of the Human Genome Project,” says co-author George Weinstock, PhD, associate director of The Genome Institute. “Scientists and, ultimately, patients worldwide will benefit from the extensive effort to understand the shared features and geographic diversity of the human genome.”
The 1,000 Genomes Project involved some 200 scientists at Washington University and other institutions. Results detailing the DNA variations of individuals from 14 ethnic groups were published in the journal Nature. Eventually, the initiative will involve 2,500 individuals from 26 populations.
In addition to The Genome Institute, other research centers involved in the project include: the Human Genome Sequencing Center at the Baylor College of Medicine, Houston; The Broad Institute of MIT and Harvard University in Cambridge, Mass., the Wellcome Trust Sanger Institute in England; BGI Shenzhen in China; the Max Planck Institute for Molecular Genetics in Berlin; and Illumina, Inc., in San Diego.
Q & A: The ABCs of DNA
What is DNA?
Deoxyribonucleic acid, or DNA, is the genetic blueprint for life. DNA carries the instructions for an organism—be it a flower, a dog or a person—to develop, survive and reproduce. DNA is passed down from parents to their offspring, and it is what makes each of us unique. Most DNA is located in the nucleus (or brain) of the cell, where it is packed tightly into 23 pairs of chromosomes. If unwound and tied together, the DNA in just one cell would stretch 6 feet.
What’s the difference between a gene and DNA?
Genes are the stretches of DNA that code for proteins, the workhorses of cells. Humans have about 20,000 genes, and together they make up only 1 to 2 percent of a person’s DNA. The rest of the DNA is thought to influence the activity of the genes.
What is a genome, and why is it studied?
A genome is the complete DNA sequence of an organism. In humans, that sequence is made up of 3 billion chemical units represented by the letters A, T, G and C. Spelling out the entire DNA sequence of a person would fill an estimated 200 New York City phone books. At the genetic level, any two people are more than 99 percent alike. By studying the genome, scientists can identify variations in the DNA sequence that may contribute to good health or increase the risk of disease.
To read more about the study and access an audio report, visit the Newsroom.