New Study Makes Whole-Genome Association Studies
Possible
San Diego and Berkeley, CA, February 17, 2005 - Computer
scientists at ICSI and Calit2, research centers affiliated with the
University of California, have teamed with biologists from Perlegen
Sciences, Inc., to map key genetic signposts across three human
populations. Their study - published in the Feb. 18 issue of
Science - could make widely accessible the analysis of human
variation based on whole-genome data, and speed efforts to pinpoint DNA
variations that are associated with disease or with how patients respond
differently to drugs.
"This project sets a new milestone in the search for genetic
elements linked to complex genetic diseases such as Alzheimer's, cancer
and multiple sclerosis," said co-author David R. Cox, Chief Scientific
Officer at Mountain View, CA-based Perlegen. "Genome-wide analysis may
soon become a standard methodology in the search for more effective,
individualized treatments."
Researchers at Perlegen sequenced the single-letter variations
(called single-nucleotide polymorphisms, or SNPs) in the DNA of 71
individuals of European American, African American, and Han Chinese
American ancestry. Subsequently, scientists at the California Institute
for Telecommunications and Information Technology (Calit2) at the
University of California, San Diego, and the UC Berkeley-affiliated
International Computer Science Institute (ICSI) helped analyze the set
of over 100 million genotypes from the over 1.5 million SNPs sequenced
in each sample by Perlegen.
"This is the first time that a SNP data set of that scale is being
sequenced," said Eran Halperin, a research scientist at Berkeley-based
ICSI. "For each of the 23 pairs of chromosomes in human DNA, the
resulting data set consisted of 71 genotypes, which mix together the
information from both copies of the chromosome. To see a clearer picture
of a variation, we really want to know the variation on each chromosome,
and we can do that by inferring haplotypes - the sequences of nucleotide
bases in each copy of the chromosome."
Halperin and Calit2 researcher Eleazar Eskin, who co-authored the
study with Perlegen scientists, have pioneered a method for translating
genotypes into haplotypes, using the HAP software tool they
co-developed. For this study, the bioinformatics researchers had to
process more than 190 million data points. "Using other programs,
haplotyping would require at least a few months of CPU time," said
Eskin, an assistant professor in Computer Science and Engineering at UC
San Diego's Jacobs School of Engineering. "Using HAP on a regular
laptop, this work would take only 200 CPU hours. But we were able to use
a cluster of computers from Calit2's OptIPuter project, and that allowed
us to perform our final entire analysis in less than 12 hours."
Until now, due to the high cost of sequencing technology, disease
association studies have traditionally been performed over short genomic
regions. The Science study indicates that genome-wide association
studies will now be possible for a considerably reduced budget, as
scientists build on the publicly-available data and tools made available
by Perlegen, ICSI and Calit2.
The researchers in San Diego and Berkeley also used the HAP tool
to partition the human genome into 'blocks', or regions, or limited
diversity. These are regions where only a few common patterns account
for the majority of the variation in the population. The resulting
haplotype 'maps' across the three populations appeared qualitatively
similar to the maps compiled by Perlegen using a different technique
called 'linkage disequilibrium' (LD). LD involves correlations of DNA
variants in physical proximity along a chromosome, and results from a
combination of processes including mutation, natural selection, and
genetic drift. Linkage disequilibrium is complex and varies from one
region of the genome to another, as well as between different
populations. According to the study, "LD maps and haplotype maps
represent somewhat different aspects of the local structure of genetic
variation."
"The partitioning of genomes into highly correlated regions may be
extremely useful for geneticists worldwide," added ICSI's Halperin.
"They could choose to sequence a small subset of SNPs in each region,
and use the high correlations between the different SNPs in order to
predict the SNPs that were not sequenced."
The HAP study found substantially more blocks in the African
American map than in the European American and Han Chinese maps,
indicating that the greatest genetic diversity was in samples of African
American descent (a finding consistent with previous studies).
Other findings in the Science paper, titles "Whole Genome
Patterns of Common DNA Variation in Three Diverse Human Populations,"
include:
- Most functional human genetic variation is not
population-specific;
- The majority of the 1.58 million SNPs with high-quality genotypes
were common in all three populations; and
- "Private SNPs" - those SNPs segragating in only one population
sample - were only 18% of the total.
Maps of the haplotype structure and the variants that are common
in each region can be downloaded from the Calit2 HAP site, which is
hosted by the National Biomedical Computational Resource at UCSD (see
Related Links below). "We hope that researchers interested in specific
regions of the genome will use this site to obtain information on the
human variation in those regions," said Calit2 director Larry Smarr.
"This is a great example of the revolution in computational biology and
its potential benefits to society in the study of cardiovascular
disease, mental illness and other conditions thought to result from a
complex interplay of multiple genetic and environmental factors."
The SNPs analyzed in the Science study represent only a
fraction of the more than 10 million common SNPs expected to exist in
the human genome. But researchers at Perlegen developed a mathematical
algorithm to identify so-called 'tag SNPs' that provide guideposts for
finding common variants in the human genome. "This study and software
tools mean that you no longer have to wait to do whole-genome
association studies," said Perlegen scientist David A. Hinds, lead
author on the study. "We've effectively figured out how to reduce the
genotyping burden by identifying a reduced set of tag SNPs, thus
decreasing the difficulty and cost of association studies. That said,
even when reducing to tag SNPs, we still need to be able to genotype at
least several hundred thousand SNPs to have a comprehensive whole-genome
association study."
"This research provides a tool for exploring many questions
remaining regarding the causal role of common human DNA variation in
complex human traits and for investigating the nature of genetic
variation within and between human populations," the Science
paper concludes.
Perlegen is also cooperating with the public-sector International
HapMap Project, which is expected to release more detailed descriptions
of genetic variations later this year. "We see these two efforts as
complementary," said Perlegen's Hinds. "The HapMap project will yield a
denser map, with more SNPs across a deeper set of individuals." HapMap
will describe variation across individuals of Japanese, Chinese,
Nigerian and European ancestry.
About ICSI
The International Computer Science Institute (ICSI) is an
independent, nonprofit research center affiliated with the University of
California campus in Berkeley, California. Founded in 1986, ICSI
provides a vibrant, international environment for approximately eighty
scientists pursuing leading-edge research in networking, algorithms,
bioinformatics, artificial intelligence, computational linguistics and
spoken language processing. ICSI research is sponsored by a mix of
government contracts, commercial partnerships and international visitor
programs. www.icsi.berkeley.edu
About Calit2
The California Institute for Telecommunications and Information
Technology (Calit2) is one of four California Institutes for Science and
Innovation created in late 2000 by California to ensure that the state
maintain its leadership in cutting-edge technologies and industries. Its
mission: to extend the reach of the Internet throughout the physical
world - enabling anywhere/anytime access to the Web. More than 200
faculty members from UC San Diego and UC Irvine are collaborating on
interdisciplinary projects, with funding and other support from more
than 50 industry partners. www.calit2.net >>
Related Links
Perlegen Sciences www.perlegen.com >>
American Association for the Advancement of Science www.aaas.org >>
Science Magazine www.sciencemag.org >>
HAP Website http://research/calit2.net/hap/ >>
HAP Webserver http://research.calit2.net/hap/WebServer.htm
>>
International HapMap Project www.hapmap.org >>
National Biomedical Computation Resource http://nbcr.sdsc.edu/
>>
top |