How to keep gene name&ID and mRNA name stable when importing and exporting to/from Apollo

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to keep gene name&ID and mRNA name stable when importing and exporting to/from Apollo

Wim S
Hi,

We are hoping to feed back updates made to gene models in Apollo into internal (or even external) releases of an updated gene model that gets used in more applications.
For example for variant effect prediction, expression quantification or just viewing in other genome browsers (e.g. IGV).

We noticed a few things after importing a full gene model into the user created content track, making some updates and then exporting the full gene model back to GFF3.
Some of these thing might be improvable, for others I just wonder what the thoughts of other people are on these issues and what logic is used in Apollo. 

I can share the following imported and exported gene and mRNA as an example.

Original GFF3
Chr_01  maker   gene    4937        7818        .       +       .       ID=XXXX05X018630;Name=maker-XXXX_0004-snap-gene-6.642
Chr_01  maker   mRNA    4937        7818        .       +       .       ID=XXXX05X018630.1;Parent=XXXX05X018630;Name=maker-XXXX_0004-snap-gene-6.642-mRNA-1;



GFF3 exported from Apollo
Chr_01 . gene 4868 7927 . + . ID=3fbefd64-492d-4d24-895f-052a51144781;date_last_modified=2019-06-07;Name=XXXX05X018630;date_creation=2017-10-23
Chr_01 . mRNA 4937 7818 . + . Parent=3fbefd64-492d-4d24-895f-052a51144781;ID=4fb14a2f-4778-4d6a-a4cf-c1e62d8dd887;date_last_modified=2017-10-23;Name=XXXX05X018630-00001;date_creation=2017-10-23
Chr_01 . mRNA 4868 7927 . + . Parent=3fbefd64-492d-4d24-895f-052a51144781;ID=78cee0bb-8291-46e0-a62f-08c55bd7d9d0;date_last_modified=2019-06-07;Name=XXXX05X018630-00002_USER_ID;date_creation=2019-06-07


The above example shows:


1)  gene-ID becomes the gene-name,  but only during first import export round.
For some species we have original GFF3 where the gene-ID and gene-name are not identical. Importing and exporting to/from Apollo overwrites the original gene-name with the original gene-ID. But only for the first import and export round. Can you share what the logic is that is being used? So far this worked correct for us, but I can imagine GFF3 files where the original gene-ID the important variable, not the original gene-name.  Also strange that a second export does not use the overwrite the new gene-name given by Apollo with the new gene-ID given by Apollo.

2) mRNA-name and mRNA-ID are overwritten during import (and sorted by mRNA length?).
The original mRNA name is not imported, but an auto generated mRNA-name is set on based on the original gene-ID plus a 00001 postfix?
If the user creates a mRNA with mRNA-name Name=XXXX05X018630-00002_USER_ID this exported to GFF3. But after importing this GFF3 file into a new Apollo instance (because we migrated to a now Apollo version/VM) the user created mRNA name is gone, and the mRNAs are auto named based on length? Is there are way to keep original or (in Apollo) user created mRNA names?

3) exported gene start and stop are updated with the lowest and highest feature start and stop found over all the mRNAs?
This makes sense but I would like to confirm this.


I wonder what your thoughts are on the above, if you explain the logic used in Apollo and / or have tips on how to keep the gene-ID/Name and mRNA-name stable.

Thank you.

Wim

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to keep gene name&ID and mRNA name stable when importing and exporting to/from Apollo

nathandunn



More details on #2.  This is correct, the names are autogenerated by default, though we do store the “orig_id", though I realize this is insufficient.  If you are using “add_features_from_gff3_to_annotation.pl” to bulk load annotations there is a “use_name_for_feature” that should use the provided name.  Not sure if this will help with #1 at all. 

#3 you are correct. By default it sets the start and stop as the longest translated peptide.  There is a “disable_cds_recalculation” option if you are using the perl script above. 

So, as a further addendum, we are also addressing #1 and #2 in the next major version of Apollo (working on that now) by having the concept of an “Official GFF3” model.  This will allow you to preserver IDs and other metadata (comments, notes, GO Annotations, etc.) from previously published annotations and other metadata as you refine your annotations.  I think this provides the flexibility and features that is actually being requested here. 



Nathan


On Jun 3, 2020, at 7:06 AM, Wim S <[hidden email]> wrote:

Hi,

We are hoping to feed back updates made to gene models in Apollo into internal (or even external) releases of an updated gene model that gets used in more applications.
For example for variant effect prediction, expression quantification or just viewing in other genome browsers (e.g. IGV).

We noticed a few things after importing a full gene model into the user created content track, making some updates and then exporting the full gene model back to GFF3.
Some of these thing might be improvable, for others I just wonder what the thoughts of other people are on these issues and what logic is used in Apollo. 

I can share the following imported and exported gene and mRNA as an example.

Original GFF3
Chr_01  maker   gene    4937        7818        .       +       .       ID=XXXX05X018630;Name=maker-XXXX_0004-snap-gene-6.642
Chr_01  maker   mRNA    4937        7818        .       +       .       ID=XXXX05X018630.1;Parent=XXXX05X018630;Name=maker-XXXX_0004-snap-gene-6.642-mRNA-1;



GFF3 exported from Apollo
Chr_01 . gene 4868 7927 . + . ID=3fbefd64-492d-4d24-895f-052a51144781;date_last_modified=2019-06-07;Name=XXXX05X018630;date_creation=2017-10-23
Chr_01 . mRNA 4937 7818 . + . Parent=3fbefd64-492d-4d24-895f-052a51144781;ID=4fb14a2f-4778-4d6a-a4cf-c1e62d8dd887;date_last_modified=2017-10-23;Name=XXXX05X018630-00001;date_creation=2017-10-23
Chr_01 . mRNA 4868 7927 . + . Parent=3fbefd64-492d-4d24-895f-052a51144781;ID=78cee0bb-8291-46e0-a62f-08c55bd7d9d0;date_last_modified=2019-06-07;Name=XXXX05X018630-00002_USER_ID;date_creation=2019-06-07


The above example shows:


1)  gene-ID becomes the gene-name,  but only during first import export round.
For some species we have original GFF3 where the gene-ID and gene-name are not identical. Importing and exporting to/from Apollo overwrites the original gene-name with the original gene-ID. But only for the first import and export round. Can you share what the logic is that is being used? So far this worked correct for us, but I can imagine GFF3 files where the original gene-ID the important variable, not the original gene-name.  Also strange that a second export does not use the overwrite the new gene-name given by Apollo with the new gene-ID given by Apollo.

2) mRNA-name and mRNA-ID are overwritten during import (and sorted by mRNA length?).
The original mRNA name is not imported, but an auto generated mRNA-name is set on based on the original gene-ID plus a 00001 postfix?
If the user creates a mRNA with mRNA-name Name=XXXX05X018630-00002_USER_ID this exported to GFF3. But after importing this GFF3 file into a new Apollo instance (because we migrated to a now Apollo version/VM) the user created mRNA name is gone, and the mRNAs are auto named based on length? Is there are way to keep original or (in Apollo) user created mRNA names?

3) exported gene start and stop are updated with the lowest and highest feature start and stop found over all the mRNAs?
This makes sense but I would like to confirm this.


I wonder what your thoughts are on the above, if you explain the logic used in Apollo and / or have tips on how to keep the gene-ID/Name and mRNA-name stable.

Thank you.

Wim


--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to keep gene name&ID and mRNA name stable when importing and exporting to/from Apollo

Wim S
Hi Nathan,

Thank you for the information. Good to know that mRNA names can be preserved , we did use “add_features_from_gff3_to_annotation.pl” but did not know about the argument.

We look forwarded to an upcoming Apollo release were the gene model annotation (names, IDs, comments, notes etc) are preserved (as much is possible/makes sense).


Op woensdag 3 juni 2020 20:57:56 UTC+2 schreef Nathan Dunn:



More details on #2.  This is correct, the names are autogenerated by default, though we do store the “orig_id", though I realize this is insufficient.  If you are using “<a href="http://add_features_from_gff3_to_annotation.pl" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fadd_features_from_gff3_to_annotation.pl\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEhNgiSnzfPPRHtoqzb6WsEB-e7YQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fadd_features_from_gff3_to_annotation.pl\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEhNgiSnzfPPRHtoqzb6WsEB-e7YQ&#39;;return true;">add_features_from_gff3_to_annotation.pl” to bulk load annotations there is a “use_name_for_feature” that should use the provided name.  Not sure if this will help with #1 at all. 

#3 you are correct. By default it sets the start and stop as the longest translated peptide.  There is a “disable_cds_recalculation” option if you are using the perl script above. 

So, as a further addendum, we are also addressing #1 and #2 in the next major version of Apollo (working on that now) by having the concept of an “Official GFF3” model.  This will allow you to preserver IDs and other metadata (comments, notes, GO Annotations, etc.) from previously published annotations and other metadata as you refine your annotations.  I think this provides the flexibility and features that is actually being requested here. 



Nathan


On Jun 3, 2020, at 7:06 AM, Wim S <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="vdZaOq_jAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">wim...@...> wrote:

Hi,

We are hoping to feed back updates made to gene models in Apollo into internal (or even external) releases of an updated gene model that gets used in more applications.
For example for variant effect prediction, expression quantification or just viewing in other genome browsers (e.g. IGV).

We noticed a few things after importing a full gene model into the user created content track, making some updates and then exporting the full gene model back to GFF3.
Some of these thing might be improvable, for others I just wonder what the thoughts of other people are on these issues and what logic is used in Apollo. 

I can share the following imported and exported gene and mRNA as an example.

Original GFF3
Chr_01  maker   gene    4937        7818        .       +       .       ID=XXXX05X018630;Name=maker-XXXX_0004-snap-gene-6.642
Chr_01  maker   mRNA    4937        7818        .       +       .       ID=XXXX05X018630.1;Parent=XXXX05X018630;Name=maker-XXXX_0004-snap-gene-6.642-mRNA-1;



GFF3 exported from Apollo
Chr_01 . gene 4868 7927 . + . ID=3fbefd64-492d-4d24-895f-052a51144781;date_last_modified=2019-06-07;Name=XXXX05X018630;date_creation=2017-10-23
Chr_01 . mRNA 4937 7818 . + . Parent=3fbefd64-492d-4d24-895f-052a51144781;ID=4fb14a2f-4778-4d6a-a4cf-c1e62d8dd887;date_last_modified=2017-10-23;Name=XXXX05X018630-00001;date_creation=2017-10-23
Chr_01 . mRNA 4868 7927 . + . Parent=3fbefd64-492d-4d24-895f-052a51144781;ID=78cee0bb-8291-46e0-a62f-08c55bd7d9d0;date_last_modified=2019-06-07;Name=XXXX05X018630-00002_USER_ID;date_creation=2019-06-07


The above example shows:


1)  gene-ID becomes the gene-name,  but only during first import export round.
For some species we have original GFF3 where the gene-ID and gene-name are not identical. Importing and exporting to/from Apollo overwrites the original gene-name with the original gene-ID. But only for the first import and export round. Can you share what the logic is that is being used? So far this worked correct for us, but I can imagine GFF3 files where the original gene-ID the important variable, not the original gene-name.  Also strange that a second export does not use the overwrite the new gene-name given by Apollo with the new gene-ID given by Apollo.

2) mRNA-name and mRNA-ID are overwritten during import (and sorted by mRNA length?).
The original mRNA name is not imported, but an auto generated mRNA-name is set on based on the original gene-ID plus a 00001 postfix?
If the user creates a mRNA with mRNA-name Name=XXXX05X018630-00002_USER_ID this exported to GFF3. But after importing this GFF3 file into a new Apollo instance (because we migrated to a now Apollo version/VM) the user created mRNA name is gone, and the mRNAs are auto named based on length? Is there are way to keep original or (in Apollo) user created mRNA names?

3) exported gene start and stop are updated with the lowest and highest feature start and stop found over all the mRNAs?
This makes sense but I would like to confirm this.


I wonder what your thoughts are on the above, if you explain the logic used in Apollo and / or have tips on how to keep the gene-ID/Name and mRNA-name stable.

Thank you.

Wim


--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].