2 min readFinding What’s Important in Our Genome: New Integrated Tool to Predict the Function of Non-Coding Variants
Hinxton, UK – Researchers at the Wellcome Trust Sanger Institute and the EMBL-European Bioinformatics Institute have developed software that predicts the likelihood of variants in non-coding regions – relatively unknown regions of DNA that make up 98 per cent of genome – having a functional role.
The software, called GWAVA, integrates an enormous amount of information about the way genes are regulated, and prioritises non-coding variants in the human genome. This helps researchers focus their research on the most promising candidates, potentially saving considerable time and resources.
In recent years scientists have found a lot of links between our genes and susceptibility to disease – but there is still a long way to go before we fully understand how DNA variation underlies disease. While much is known about the way protein-coding genes work, our three-billion-base-pair-long genome is bursting with other types of information. One of the big challenges in genomics is figuring out how non-coding regions of the human genome are involved in disease.
“The information provided by the ENCODE consortium, the 1000 Genomes Project and the NIH’s Roadmap Epigenomics project are extremely useful resources for understanding non-coding variants,” said Paul Flicek, co-lead author from EMBL-EBI. “But ranking that information is no small task. There is a lot of benign variation in our genome, so we needed a way to narrow down which regions play a role in disease.”
The team investigated if a combination of information related to genes, genetic regions associated with regulation and genome-wide properties can be used to identify the most likely variants that contribute to disease in the non-coding part of the genome.
“GWAVA uses a classifier to discriminate apparently harmless non-coding variants from those that are likely to be involved in disease,” said Graham Ritchie, first author from EMBL-EBI and the Sanger Institute. “We tested it out using several scenarios and found that it consistently prioritises the regions known to be associated with disease. This could be really useful for people who need to decide which mutations to look at as cancer drivers, for example.”
The authors hope that using GWAVA predictions for non-coding variants in disease association studies will substantially improve the chances of finding genetic variants that are involved in human disease.
“We’ve combined freely available data to predict the impact of these variants in the non-coding region of the genome,” says Professor Eleftheria Zeggini, co-lead author from the Sanger Institute. “Most disease-associated variants discovered to date fall outside genes. This tool can help us start to understand how they work.”
Publication: Functional annotation of noncoding sequence variants. Graham R S Ritchie, Ian Dunham,Eleftheria Zeggini & Paul Flicek. Nature Methods (2014): http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2832.html