loading proteins

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

loading proteins

Moseley, Robert C.
Hi,

I’ve loaded a fasta file containing protein sequences and synced them.  Everything seems to have worked fine, the right number of polypeptide features shows up in the feature composition graph, the polypeptide is linked to the appropriate mRNA feature, etc.  However, The feature name and unique name are shown as the same for the polypeptide even though I didn’t specify that. The description line for each protein sequence looks like this:

>Kalax.1326s0009.1.p pacid=32530537 transcript=Kalax.1326s0009.1 locus=Kalax.1326s0009 ID=Kalax.1326s0009.1.v1.1 annot-version=v1.1


My advanced options where filled out like this:

-------------------------------------------------------


Regular expression for the name: >(.*?)\s   "Kalax.1326s0009.1.p"


Regular expression for the unique name:  >.*?ID=(.*?)\s     "Kalax.1326s0009.1.v1.1”


Relationship type: produced by


Regular expression for the parent:  >.*?ID=(.*?)\s     "Kalax.1326s0009.1.v1.1”


Parent type:  mRNA

--------------------------------------------------------


The mRNA name would be Kalax.1326s0009.1 and the unique name would be Kalax.1326s0009.1.v1.1.

The polypeptide name should be Kalax.1326s0009.1.p and the unique name should be Kalax.1326s0009.1.v1.1 but instead the name is Kalax.1326s0009.1.v1.1 and the unique name is Kalax.1326s0009.1.v1.1.


I looked into this with another Tripal user and we looked through the code of the fasta loader but couldn’t find anything that would cause this issue.  Any help would greatly be appreciated!


Regards,


Rob Moseley


------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: loading proteins

Stephen Ficklin-2
Hi Robert,

When you loaded your protein sequences you used the Tripal FASTA loader?  Did those protein sequences already exist in the database prior to loading the FASTA file?  If you loaded the transcripts using the GFF importer then it will automatically create protein features in Chado for you if they are not in the GFF.   And I just looked at the GFF loader and this behavior is not clearly stated.  I'm wondering if this might be the source of the problem...

If you did load your transcripts with the GFF loader, and you want to change the unique name, can you indicate on the FASTA loader to match on the 'name' field rather than the 'unique name'?  If can uniquely match on the 'name' field then the loader should update the unique names to match your regular expression.

Stephen

On 12/2/2015 1:12 PM, Moseley, Robert C. wrote:
Hi,

I’ve loaded a fasta file containing protein sequences and synced them.  Everything seems to have worked fine, the right number of polypeptide features shows up in the feature composition graph, the polypeptide is linked to the appropriate mRNA feature, etc.  However, The feature name and unique name are shown as the same for the polypeptide even though I didn’t specify that. The description line for each protein sequence looks like this:

>Kalax.1326s0009.1.p pacid=32530537 transcript=Kalax.1326s0009.1 locus=Kalax.1326s0009 ID=Kalax.1326s0009.1.v1.1 annot-version=v1.1


My advanced options where filled out like this:

-------------------------------------------------------


Regular expression for the name: >(.*?)\s   "Kalax.1326s0009.1.p"


Regular expression for the unique name:  >.*?ID=(.*?)\s     "Kalax.1326s0009.1.v1.1”


Relationship type: produced by


Regular expression for the parent:  >.*?ID=(.*?)\s     "Kalax.1326s0009.1.v1.1”


Parent type:  mRNA

--------------------------------------------------------


The mRNA name would be Kalax.1326s0009.1 and the unique name would be Kalax.1326s0009.1.v1.1.

The polypeptide name should be Kalax.1326s0009.1.p and the unique name should be Kalax.1326s0009.1.v1.1 but instead the name is Kalax.1326s0009.1.v1.1 and the unique name is Kalax.1326s0009.1.v1.1.


I looked into this with another Tripal user and we looked through the code of the fasta loader but couldn’t find anything that would cause this issue.  Any help would greatly be appreciated!


Regards,


Rob Moseley



------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal