In principle, locating genes should be easy. DNA sequences that code for proteins begin with the three bases ATG that code for the amino acid methionine and they end with one or more stop codons ...
The latest model is based on 128,000 genomes, including those of humans and other animals, plants and other eukaryotic organisms. These genomes encompass a total of 9.3 trillion DNA letters.