Palindromes in Human Genome
Palindromes or inverted repeats in a genome sequence are a pair of complementary sequences that appear in reverse order. If the two complementary halves appear in tandem, the sequence is referred to as a palindrome; if the two halves are separated by intervening base pairs, they are referred to as an inverse repeat. The location and length of palindromic sequences in the genome may alter DNA replication and gene expression that may lead to genomic instability. Palindromes are linked with diseases including several cancers, mental retardation, X-linked recessive diseases and many physical abnormalities. While evidence supports the role of palindromes in human disease, the prevalence and function of palindromes in the human genome and their variability across individuals are not fully understood. We developed an extension to our suite of tools for genome sequence analysis called the Biological Language Modeling Toolkit to identify all palindromic sequences in the human genome, and applied it to study palindromes in whole genomes. We will present our findings on the distribution of palindromes in human reference genome as well as some preliminary results from our on-going analysis of personal genomes.