Names of genes/transcripts changed after bulk upload

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Names of genes/transcripts changed after bulk upload

Robin A. Ohm
Hello,

I'm trying to bulk upload genes from a GFF file into the user annotation
track using add_transcripts_from_gff3_to_annotations.pl script.
I notice that the names of the genes and transcript are changed after
uploading. For example, after I upload the GFF3 file below, the
resulting gene name is "SA|3829a" and the transcript name is
"SA|3829a-00001". I would prefer to use the original names in the GFF3
file. Is that possible?

Thanks, best regards, Robin

##gff-version 3
##sequence-region scaffold1 1 232163
scaffold1    FGDB    gene    752    2086    .    -    . ID=SA|3829|gene
scaffold1    FGDB    mRNA    752    2086    .    -    .
ID=SA|3829;Parent=SA|3829|gene;proteinId=SA|3829;Name=SA|3829
scaffold1    FGDB    exon    752    2086    .    -    .
ID=SA|3829|exon1;Parent=SA|3829
scaffold1    FGDB    CDS    752    2086    .    -    0
ID=SA|3829|CDS;Parent=SA|3829

--
Robin A. Ohm, PhD | Assistant Professor | Microbiology | Utrecht University
Kruyt Building | Room W402 | Padualaan 8 | 3584 CH | Utrecht | The Netherlands | +31 (0) 30 2533016





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|

Re: Names of genes/transcripts changed after bulk upload

Monica Munoz-Torres
Hi Robin, 

The short answer is that this is configurable. 

The explanation for why it exists right now is that Apollo assumes that each genomic element on the User-created Annotations area ('annotation track') is a transcript for a given gene model (in the case of coding genes). Starting in v2.0.x Apollo automatically assigns a [configurable] number to the first transcript, and additional transcripts of the same gene will keep the root name and have an increment in the numbers as you create more. 

In this case, the gene model in the evidence track would be "SA|3829a" and the first transcript in the User-created Annotations area is "SA|3829a-00001". If there are more than one splice forms for this gene SA|3829a, the next one will be labeled SA|3829a-00002, etc. You can customize it to be -RA, -RB (per FlyBase naming conventions, followed also by NCBI). 

The goal of this feature is to assist curators in appropriately naming and keeping track of isoforms. Otherwise, they could end up, for example, with three identically labeled isoforms, despite the fact that they represent different transcripts of the same gene (in evidence track) - as it was the case in v1.0.x:
 
Inline image 2

In the case you describe, if you are sure that there will be no conflicts because your users are not going to encounter the scenario of there being more than one splice form of their gene of interest, I am quite sure you may customize your configuration to not include this count. 

I'll let Nathan share with you where this lives in the code. 

I would also like to learn more about your use case. How you are implementing the use of those gene models on your User-created Annotations area directly from the GFF3 file. Are you doing this for all scaffolds? Or only for a few gene models at a time? 

cheers, 
~moni.


On Fri, Feb 12, 2016 at 8:36 AM, Robin A. Ohm <[hidden email]> wrote:
Hello,

I'm trying to bulk upload genes from a GFF file into the user annotation track using add_transcripts_from_gff3_to_annotations.pl script.
I notice that the names of the genes and transcript are changed after uploading. For example, after I upload the GFF3 file below, the resulting gene name is "SA|3829a" and the transcript name is "SA|3829a-00001". I would prefer to use the original names in the GFF3 file. Is that possible?

Thanks, best regards, Robin

##gff-version 3
##sequence-region scaffold1 1 232163
scaffold1    FGDB    gene    752    2086    .    -    . ID=SA|3829|gene
scaffold1    FGDB    mRNA    752    2086    .    -    . ID=SA|3829;Parent=SA|3829|gene;proteinId=SA|3829;Name=SA|3829
scaffold1    FGDB    exon    752    2086    .    -    . ID=SA|3829|exon1;Parent=SA|3829
scaffold1    FGDB    CDS    752    2086    .    -    0 ID=SA|3829|CDS;Parent=SA|3829

--
Robin A. Ohm, PhD | Assistant Professor | Microbiology | Utrecht University
Kruyt Building | Room W402 | Padualaan 8 | 3584 CH | Utrecht | The Netherlands | <a href="tel:%2B31%20%280%29%2030%202533016" value="+31302533016" target="_blank">+31 (0) 30 2533016





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.





--
Mentorship Matters!
--
Monica Munoz-Torres, PhD.
Berkeley Bioinformatics Open-source Projects (BBOP)
Environmental Genomics and Systems Biology Division
Lawrence Berkeley National Laboratory

Mailing Address:
Lawrence Berkeley National Laboratory
1 Cyclotron Road Mailstop 977
Berkeley, CA 94720




This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|

Re: Names of genes/transcripts changed after bulk upload

nathandunn
In reply to this post by Robin A. Ohm

Robin,

Short-answer is that it **should** work by default just as you described, but I am getting the same results you are.

I think I know what’s wrong and I’ll try to get a fix soon.  
 

tools/data/add_transcripts_from_gff3_to_annotations.pl \
        -U localhost:8080/apollo -u “[hidden email]" -p “password" -o "Honeybee"\
        -i Annotations56.gff3 -t mRNA -d CDS -g gene -e exon

Should properly import a GFF3 export of the type we already export (e.g.):

Group1.10 . gene 1290368 1293149 . + . Name=GB40862-RA;date_creation=2016-02-04;owner=[hidden email];ID=70cfdae4-1950-41f1-b56c-bc2f105624b4;date_last_modified=2016-02-04
Group1.10 . mRNA 1290368 1293149 . + . Name=GB40862-RA-00001;date_creation=2016-02-04;Parent=70cfdae4-1950-41f1-b56c-bc2f105624b4;owner=[hidden email];ID=90b0c2fe-45dd-47c8-b27d-b9929e098895;date_last_modified=2016-02-04
Group1.10 . non_canonical_three_prime_splice_site 1291824 1291824 . + . Name=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_three_prive_splice_site-1291823;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_three_prive_splice_site-1291823
Group1.10 . non_canonical_three_prime_splice_site 1292399 1292399 . + . Name=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_three_prive_splice_site-1292398;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_three_prive_splice_site-1292398
Group1.10 . non_canonical_five_prime_splice_site 1292317 1292317 . + . Name=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_five_prime_splice_site-1292316;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_five_prime_splice_site-1292316
Group1.10 . exon 1291824 1292314 . + . Name=e5ce94a7-36a3-4cfa-bc2e-53652cbf1953-exon;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=e5ce94a7-36a3-4cfa-bc2e-53652cbf1953
Group1.10 . exon 1290368 1290636 . + . Name=256d4f9a-31dc-45ab-970a-596777c90f17-exon;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=256d4f9a-31dc-45ab-970a-596777c90f17
Group1.10 . exon 1290765 1290929 . + . Name=b4fb60f7-e382-458b-a990-ae749640b1f3-exon;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=b4fb60f7-e382-458b-a990-ae749640b1f3
Group1.10 . exon 1293140 1293149 . + . Name=abd2c3a9-5f65-4fb1-8bd6-14adca5c57f9-exon;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=abd2c3a9-5f65-4fb1-8bd6-14adca5c57f9
Group1.10 . exon 1292399 1292764 . + . Name=6100efd1-ca74-438f-92a0-88fb2b6bfdf8-exon;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=6100efd1-ca74-438f-92a0-88fb2b6bfdf8
Group1.10 . CDS 1290577 1290636 . + 0 Name=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9-CDS;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9
Group1.10 . CDS 1290765 1290929 . + 0 Name=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9-CDS;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9
Group1.10 . CDS 1291824 1291841 . + 0 Name=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9-CDS;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9
###


Nathan Dunn, PhD
Berkeley Bioinformatics Open-source Projects (BBOP)
Genomics Division, Lawrence Berkeley National Laboratory
[hidden email]


> On Feb 12, 2016, at 8:36 AM, Robin A. Ohm <[hidden email]> wrote:
>
> Hello,
>
> I'm trying to bulk upload genes from a GFF file into the user annotation track using add_transcripts_from_gff3_to_annotations.pl script.
> I notice that the names of the genes and transcript are changed after uploading. For example, after I upload the GFF3 file below, the resulting gene name is "SA|3829a" and the transcript name is "SA|3829a-00001". I would prefer to use the original names in the GFF3 file. Is that possible?
>
> Thanks, best regards, Robin
>
> ##gff-version 3
> ##sequence-region scaffold1 1 232163
> scaffold1    FGDB    gene    752    2086    .    -    . ID=SA|3829|gene
> scaffold1    FGDB    mRNA    752    2086    .    -    . ID=SA|3829;Parent=SA|3829|gene;proteinId=SA|3829;Name=SA|3829
> scaffold1    FGDB    exon    752    2086    .    -    . ID=SA|3829|exon1;Parent=SA|3829
> scaffold1    FGDB    CDS    752    2086    .    -    0 ID=SA|3829|CDS;Parent=SA|3829
>
> --
> Robin A. Ohm, PhD | Assistant Professor | Microbiology | Utrecht University
> Kruyt Building | Room W402 | Padualaan 8 | 3584 CH | Utrecht | The Netherlands | +31 (0) 30 2533016
>
>
>
>
> This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
> If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.
>





This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.