If you have CDS, then it is not really transcription evidence, rather it was derived from protein annotations and the protein data should be used instead. The alt-EST option is only for datasets where no protein models exists, but you do have experimental evidence from transcription (i.e. not transcript predictions but actual mRNA experimentation).
Nucleotides sequence diverges more rapidly than amino acid sequence because of codon redundancy and amino acid similarity. As a result mRNA derived sequence can be compared in nucleotide space to genomic sequence of related organisms with an evolutionary difference of something like human to gorilla, but human to mouse would require amino acid level alignment to find the homology. So with alt-EST we use TBLASTX and Exonerate cdna2genome to perform a six frame translations of both the genome and assembly for alignment in amino acid space. This is of course very expensive computationally and more likely to generate spurious alignments. So for CDS sequence it would be both redundant to and less accurate than using the protein data directly.
> On Apr 25, 2017, at 5:35 AM, Miche Zacharias <[hidden email]> wrote:
> Would there be any benefit to passing CDS instead of cDNA sequences as alternative EST evidence, if both is available? Or when would you use one or the other?
> I've read in previous discussions how it's expected that the organism, from which we use sequences as alternative EST and protein evidence, is not closely related to the genome we are annotating.
> In what terms is "closely related" defined? It's now unclear to me when you would put EST/protein evidence as alternative and when not or put it as direct EST/protein evidence?
> Would a species from the same family classify as alternative or direct evidence? Is ~30mya divergence too much?
> What about species of the same genus?
> Many thanks!
> maker-devel mailing list
> [hidden email] > http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org