Export GFF3 file with coordinates based on the edited reference?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Export GFF3 file with coordinates based on the edited reference?

Celine Prakash
I have exported our annotations in GFF3 format as well as the genomic FASTA sequence of one of our chromosomes from Web Apollo. When looking through the output of a tool 'gffread’ that I used to extract the gene's peptide sequences with the exported GFF3 file and the new genomic fasta, I noticed that all the gene models downstream of the first position (i.e. left most) where we made deletions to the sequence of the genome have multiple stop codons in the extracted sequence and do not start with Methionine. From this I guess I can assume that the coordinates of the annotations in the GFF3 file are based on the original reference sequence. However, if I were to use the original genomic fasta, gene models where we have added insertions or deletions into the reference do not have the genomic changes, and the resulting frameshift, incorporated into the extracted the peptide sequence. These sequences therefore have multiple stop codons. I was wondering, is there a possibility to export a GFF3 file with coordinates based on the edited reference sequence?

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Export GFF3 file with coordinates based on the edited reference?

nathandunn

Currently this does not happen.  It would be up to the user’s to re-assemble the FASTA file with the genome assembly correction annotations exported in the FASTA file. 

That being said, is there a reason / option NOT to provide an automated re-assembly as an option (other than the time to implement it)? 

Please put discussion in the ticket below or reply here. 


Thanks,

Nathan


On Jan 16, 2020, at 12:37 PM, Celine Prakash <[hidden email]> wrote:

I have exported our annotations in GFF3 format as well as the genomic FASTA sequence of one of our chromosomes from Web Apollo. When looking through the output of a tool 'gffread’ that I used to extract the gene's peptide sequences with the exported GFF3 file and the new genomic fasta, I noticed that all the gene models downstream of the first position (i.e. left most) where we made deletions to the sequence of the genome have multiple stop codons in the extracted sequence and do not start with Methionine. From this I guess I can assume that the coordinates of the annotations in the GFF3 file are based on the original reference sequence. However, if I were to use the original genomic fasta, gene models where we have added insertions or deletions into the reference do not have the genomic changes, and the resulting frameshift, incorporated into the extracted the peptide sequence. These sequences therefore have multiple stop codons. I was wondering, is there a possibility to export a GFF3 file with coordinates based on the edited reference sequence?

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].