loading kegg with names

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

loading kegg with names

Sofia Robb
Hi,

My KEGG results have my feature names in the output. When I load I get this error:

syslog() expects parameter 1 to be long, string given syslog.module:118                                                         [warning]
WD trp_kegg: Failed (Ambiguous): 'mk5-SmedSxl-v31.030480-0.2-1' matches more than one feature and is being skipped.             [error]


When I search for this feature name through the web interface only one record is found. I can find and replace all the names with the unique ids in my kegg output, but I would rather have the names appear in the KEGG report. Is there something I can do?

Thanks,
Sofia

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: loading kegg with names

Stephen Ficklin-2
Hi Sofia,

When loading the KEGG file are you setting the 'Query type' and 'Organism' fields in the analysis setup page?   Chado has a unique constraint on the feature table that includes the feature uniquename, organism_id and type_id fields.  So, you can have two features with the same unique name for the same organism as long as they are of different types. This sometimes is the case for mRNA and corresponding protein features which will have the same name (and perhaps uniquename) but are of different types.   The KEGG loader does provide some help.  The 'Query Type' and 'Organism' fields will let you guide the loader to distinguish which feature you really want to associate the annotations with (e.g. mRNA rather than the polypeptide).   So, if your issue is that proteins and mRNA have the same names, you can just set the 'Query Type' field to mRNA and that will tell the loader that the annotations belong to the mRNA. 

However, if you have two features for the same organism, of the same type, the same name, but with different unique names then you'll have a problem.  There's no way to distinguish between them.  If this is the case and you can't distinguish between them with the 'Organism' and 'Query Type' settings, then the only way around this would be to either re-run the analysis with the unique names, or write a script to convert the names.  I've made this mistake before unfortunately and re-running the analysis was what I ended up doing. 

Stephen

On 3/21/2016 8:57 AM, Sofia Robb wrote:
Hi,

My KEGG results have my feature names in the output. When I load I get this error:

syslog() expects parameter 1 to be long, string given syslog.module:118                                                         [warning]
WD trp_kegg: Failed (Ambiguous): 'mk5-SmedSxl-v31.030480-0.2-1' matches more than one feature and is being skipped.             [error]


When I search for this feature name through the web interface only one record is found. I can find and replace all the names with the unique ids in my kegg output, but I would rather have the names appear in the KEGG report. Is there something I can do?

Thanks,
Sofia


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: loading kegg with names

Stephen Ficklin-2
Great. I'm glad it looks like it may be resolved.

On 3/22/2016 2:25 PM, Sofia Robb wrote:
I think I figured it out. I believe that I had an error when loading the first time (but didnt notice) until I saw that I did not have any kegg data loaded, but some features loaded. When I tried to edit the analysis and submit the job again, i had non unique features. I have deleted the analysis and am trying again, i think it is working.

On Mon, Mar 21, 2016 at 2:12 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Sofia,

When loading the KEGG file are you setting the 'Query type' and 'Organism' fields in the analysis setup page?   Chado has a unique constraint on the feature table that includes the feature uniquename, organism_id and type_id fields.  So, you can have two features with the same unique name for the same organism as long as they are of different types. This sometimes is the case for mRNA and corresponding protein features which will have the same name (and perhaps uniquename) but are of different types.   The KEGG loader does provide some help.  The 'Query Type' and 'Organism' fields will let you guide the loader to distinguish which feature you really want to associate the annotations with (e.g. mRNA rather than the polypeptide).   So, if your issue is that proteins and mRNA have the same names, you can just set the 'Query Type' field to mRNA and that will tell the loader that the annotations belong to the mRNA. 

However, if you have two features for the same organism, of the same type, the same name, but with different unique names then you'll have a problem.  There's no way to distinguish between them.  If this is the case and you can't distinguish between them with the 'Organism' and 'Query Type' settings, then the only way around this would be to either re-run the analysis with the unique names, or write a script to convert the names.  I've made this mistake before unfortunately and re-running the analysis was what I ended up doing. 

Stephen


On 3/21/2016 8:57 AM, Sofia Robb wrote:
Hi,

My KEGG results have my feature names in the output. When I load I get this error:

syslog() expects parameter 1 to be long, string given syslog.module:118                                                         [warning]
WD trp_kegg: Failed (Ambiguous): 'mk5-SmedSxl-v31.030480-0.2-1' matches more than one feature and is being skipped.             [error]


When I search for this feature name through the web interface only one record is found. I can find and replace all the names with the unique ids in my kegg output, but I would rather have the names appear in the KEGG report. Is there something I can do?

Thanks,
Sofia


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal




------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal