Load annotations

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Load annotations

Tony Power
Hi all,

I would like to load annotations produced by blast2go.  Nevertheless, I have a problem:
The annotations were made on contigs that have a name different from the ones that are already in the Tripal database, but I have a excel table that relates the names.

How would you suggest me to do? Should I, for instance, rename all the contigs in the tripal database by using that table, and then load the annotations? If so, what would be the tables that I needed to change?

I see that there are two labels for the contigs, the Name and Unique Name. Could I use that feature and change only one of them, and then load the annotations using the changed one.



Thank you very much for your help,

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: Load annotations

Stephen Ficklin-2
Hi Tony,

Yes, the name in the input file should match either the 'name' or 'uniquename' of the feature, and what you suggest should work.   

So, you should either change the name in your input files to match what's in your database or change the names in your database.   Deciding whether to change the name or the uniquename is important.  Usually, the 'name' is what is more meaningful for a human reader and the 'uniquename' helps distinguish between two features that may have the same name (e.g. an mRNA & protein on a genome assembly).   When loading, you would only choose to match on the feature 'name' if you are certain your feature names are unique for the organism and feature type (e.g. contig).  Otherwise, if you have two features with the same name, then you should use the 'uniquename'.

As you have contigs from a Unigene I imagine that the names and uniquenames of your features are probably the same and are both unique?   So, it probably doesn't matter which one you use, except to keep in mind that you will want to put the human recognizable name in the 'name' field.

If you are using the development version and encounter any problems please let us know on the gmod-tripal-devel mailing list.  For the upcoming Tripal v1.0 release we have rewritten almost all of the loaders to make them faster. 

Stephen

On 10/31/2012 6:28 AM, Tony Power wrote:
Hi all,

I would like to load annotations produced by blast2go.  Nevertheless, I have a problem:
The annotations were made on contigs that have a name different from the ones that are already in the Tripal database, but I have a excel table that relates the names.

How would you suggest me to do? Should I, for instance, rename all the contigs in the tripal database by using that table, and then load the annotations? If so, what would be the tables that I needed to change?

I see that there are two labels for the contigs, the Name and Unique Name. Could I use that feature and change only one of them, and then load the annotations using the changed one.



Thank you very much for your help,


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: Load annotations

Tony Power
Thanks Stephen,

I hope the hurricane didn't get to you.

The 'name' and 'uniquename' are the same.
I assume that the 'name' is what is showing to the users on the website, right?
In that case I'd have to change all the 'names' according to my external excel table, and then load the annotations.

Would you, please, let me know what tables should the 'name' records be changed. Then I'll make a script to connect to the DB and change around 30000 contigs. Probably either 'name' or 'uniquename' is being used as a key, right?

Thank you very much,
Tony


On Wed, Oct 31, 2012 at 1:39 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Tony,

Yes, the name in the input file should match either the 'name' or 'uniquename' of the feature, and what you suggest should work.   

So, you should either change the name in your input files to match what's in your database or change the names in your database.   Deciding whether to change the name or the uniquename is important.  Usually, the 'name' is what is more meaningful for a human reader and the 'uniquename' helps distinguish between two features that may have the same name (e.g. an mRNA & protein on a genome assembly).   When loading, you would only choose to match on the feature 'name' if you are certain your feature names are unique for the organism and feature type (e.g. contig).  Otherwise, if you have two features with the same name, then you should use the 'uniquename'.

As you have contigs from a Unigene I imagine that the names and uniquenames of your features are probably the same and are both unique?   So, it probably doesn't matter which one you use, except to keep in mind that you will want to put the human recognizable name in the 'name' field.

If you are using the development version and encounter any problems please let us know on the gmod-tripal-devel mailing list.  For the upcoming Tripal v1.0 release we have rewritten almost all of the loaders to make them faster. 

Stephen


On 10/31/2012 6:28 AM, Tony Power wrote:
Hi all,

I would like to load annotations produced by blast2go.  Nevertheless, I have a problem:
The annotations were made on contigs that have a name different from the ones that are already in the Tripal database, but I have a excel table that relates the names.

How would you suggest me to do? Should I, for instance, rename all the contigs in the tripal database by using that table, and then load the annotations? If so, what would be the tables that I needed to change?

I see that there are two labels for the contigs, the Name and Unique Name. Could I use that feature and change only one of them, and then load the annotations using the changed one.



Thank you very much for your help,


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: Load annotations

Stephen Ficklin-2
Hi Tony,

Oh, just a bit of wind here the day after but not too bad for us.  Much worse further north :-(

In your Drupal database you should see a 'public' schema and a 'chado' schema.  You'll want to edit the 'feature' table.  The feature table has the columns 'name' and 'uniquename'.   The feature table has a primary key 'feature_id'.  And there is a unique constraint on organism_id (foreign key to organism table), type_id (foreign key to cvterm table), and uniquename.   So, as long as all of the features in your database are for the same organism and are of the same type (e.g. contig), then you should be fine matching on either the 'name' or 'uniquename' in your script.

Here's a description of that table:  http://gmod.org/wiki/Chado_Sequence_Module#Table:_feature
And a diagram:  http://gmod.org/wiki/Chado_Sequence_Module#Names_of_Features

Stephen

On 10/31/2012 12:02 PM, Tony Power wrote:
Thanks Stephen,

I hope the hurricane didn't get to you.

The 'name' and 'uniquename' are the same.
I assume that the 'name' is what is showing to the users on the website, right?
In that case I'd have to change all the 'names' according to my external excel table, and then load the annotations.

Would you, please, let me know what tables should the 'name' records be changed. Then I'll make a script to connect to the DB and change around 30000 contigs. Probably either 'name' or 'uniquename' is being used as a key, right?

Thank you very much,
Tony


On Wed, Oct 31, 2012 at 1:39 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Tony,

Yes, the name in the input file should match either the 'name' or 'uniquename' of the feature, and what you suggest should work.   

So, you should either change the name in your input files to match what's in your database or change the names in your database.   Deciding whether to change the name or the uniquename is important.  Usually, the 'name' is what is more meaningful for a human reader and the 'uniquename' helps distinguish between two features that may have the same name (e.g. an mRNA & protein on a genome assembly).   When loading, you would only choose to match on the feature 'name' if you are certain your feature names are unique for the organism and feature type (e.g. contig).  Otherwise, if you have two features with the same name, then you should use the 'uniquename'.

As you have contigs from a Unigene I imagine that the names and uniquenames of your features are probably the same and are both unique?   So, it probably doesn't matter which one you use, except to keep in mind that you will want to put the human recognizable name in the 'name' field.

If you are using the development version and encounter any problems please let us know on the gmod-tripal-devel mailing list.  For the upcoming Tripal v1.0 release we have rewritten almost all of the loaders to make them faster. 

Stephen


On 10/31/2012 6:28 AM, Tony Power wrote:
Hi all,

I would like to load annotations produced by blast2go.  Nevertheless, I have a problem:
The annotations were made on contigs that have a name different from the ones that are already in the Tripal database, but I have a excel table that relates the names.

How would you suggest me to do? Should I, for instance, rename all the contigs in the tripal database by using that table, and then load the annotations? If so, what would be the tables that I needed to change?

I see that there are two labels for the contigs, the Name and Unique Name. Could I use that feature and change only one of them, and then load the annotations using the changed one.



Thank you very much for your help,


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal




------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: Load annotations

Tony Power
Hi Stephen,

Glad to hear that.

So, according to what you're saying I can even change both (name and uniquename) since they only appear on feature table and they are not related to any other table, right?
The only important thing is that they get the same name that was used in the annotations file, so when I import the annotation, they will match.

Thank you very much,

On Wed, Oct 31, 2012 at 4:37 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Tony,

Oh, just a bit of wind here the day after but not too bad for us.  Much worse further north :-(

In your Drupal database you should see a 'public' schema and a 'chado' schema.  You'll want to edit the 'feature' table.  The feature table has the columns 'name' and 'uniquename'.   The feature table has a primary key 'feature_id'.  And there is a unique constraint on organism_id (foreign key to organism table), type_id (foreign key to cvterm table), and uniquename.   So, as long as all of the features in your database are for the same organism and are of the same type (e.g. contig), then you should be fine matching on either the 'name' or 'uniquename' in your script.

Here's a description of that table:  http://gmod.org/wiki/Chado_Sequence_Module#Table:_feature
And a diagram:  http://gmod.org/wiki/Chado_Sequence_Module#Names_of_Features

Stephen


On 10/31/2012 12:02 PM, Tony Power wrote:
Thanks Stephen,

I hope the hurricane didn't get to you.

The 'name' and 'uniquename' are the same.
I assume that the 'name' is what is showing to the users on the website, right?
In that case I'd have to change all the 'names' according to my external excel table, and then load the annotations.

Would you, please, let me know what tables should the 'name' records be changed. Then I'll make a script to connect to the DB and change around 30000 contigs. Probably either 'name' or 'uniquename' is being used as a key, right?

Thank you very much,
Tony


On Wed, Oct 31, 2012 at 1:39 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Tony,

Yes, the name in the input file should match either the 'name' or 'uniquename' of the feature, and what you suggest should work.   

So, you should either change the name in your input files to match what's in your database or change the names in your database.   Deciding whether to change the name or the uniquename is important.  Usually, the 'name' is what is more meaningful for a human reader and the 'uniquename' helps distinguish between two features that may have the same name (e.g. an mRNA & protein on a genome assembly).   When loading, you would only choose to match on the feature 'name' if you are certain your feature names are unique for the organism and feature type (e.g. contig).  Otherwise, if you have two features with the same name, then you should use the 'uniquename'.

As you have contigs from a Unigene I imagine that the names and uniquenames of your features are probably the same and are both unique?   So, it probably doesn't matter which one you use, except to keep in mind that you will want to put the human recognizable name in the 'name' field.

If you are using the development version and encounter any problems please let us know on the gmod-tripal-devel mailing list.  For the upcoming Tripal v1.0 release we have rewritten almost all of the loaders to make them faster. 

Stephen


On 10/31/2012 6:28 AM, Tony Power wrote:
Hi all,

I would like to load annotations produced by blast2go.  Nevertheless, I have a problem:
The annotations were made on contigs that have a name different from the ones that are already in the Tripal database, but I have a excel table that relates the names.

How would you suggest me to do? Should I, for instance, rename all the contigs in the tripal database by using that table, and then load the annotations? If so, what would be the tables that I needed to change?

I see that there are two labels for the contigs, the Name and Unique Name. Could I use that feature and change only one of them, and then load the annotations using the changed one.



Thank you very much for your help,


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal





------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal
Reply | Threaded
Open this post in threaded view
|

Re: Load annotations

Stephen Ficklin-2
Hi Tony,

Yes, you can change both names if you like. Those names are only used by the 'feature' table.  Any relationships with those features occurs via the primary key (feature_id).  You may have a problem with the Drupal contig page titles not changing.  They may keep the old name.  The title of the page is part of the Drupal node.    If that happens let me know.

Stephen

On 11/2/2012 11:23 AM, Tony Power wrote:
Hi Stephen,

Glad to hear that.

So, according to what you're saying I can even change both (name and uniquename) since they only appear on feature table and they are not related to any other table, right?
The only important thing is that they get the same name that was used in the annotations file, so when I import the annotation, they will match.

Thank you very much,

On Wed, Oct 31, 2012 at 4:37 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Tony,

Oh, just a bit of wind here the day after but not too bad for us.  Much worse further north :-(

In your Drupal database you should see a 'public' schema and a 'chado' schema.  You'll want to edit the 'feature' table.  The feature table has the columns 'name' and 'uniquename'.   The feature table has a primary key 'feature_id'.  And there is a unique constraint on organism_id (foreign key to organism table), type_id (foreign key to cvterm table), and uniquename.   So, as long as all of the features in your database are for the same organism and are of the same type (e.g. contig), then you should be fine matching on either the 'name' or 'uniquename' in your script.

Here's a description of that table:  http://gmod.org/wiki/Chado_Sequence_Module#Table:_feature
And a diagram:  http://gmod.org/wiki/Chado_Sequence_Module#Names_of_Features

Stephen


On 10/31/2012 12:02 PM, Tony Power wrote:
Thanks Stephen,

I hope the hurricane didn't get to you.

The 'name' and 'uniquename' are the same.
I assume that the 'name' is what is showing to the users on the website, right?
In that case I'd have to change all the 'names' according to my external excel table, and then load the annotations.

Would you, please, let me know what tables should the 'name' records be changed. Then I'll make a script to connect to the DB and change around 30000 contigs. Probably either 'name' or 'uniquename' is being used as a key, right?

Thank you very much,
Tony


On Wed, Oct 31, 2012 at 1:39 PM, Stephen Ficklin <[hidden email]> wrote:
Hi Tony,

Yes, the name in the input file should match either the 'name' or 'uniquename' of the feature, and what you suggest should work.   

So, you should either change the name in your input files to match what's in your database or change the names in your database.   Deciding whether to change the name or the uniquename is important.  Usually, the 'name' is what is more meaningful for a human reader and the 'uniquename' helps distinguish between two features that may have the same name (e.g. an mRNA & protein on a genome assembly).   When loading, you would only choose to match on the feature 'name' if you are certain your feature names are unique for the organism and feature type (e.g. contig).  Otherwise, if you have two features with the same name, then you should use the 'uniquename'.

As you have contigs from a Unigene I imagine that the names and uniquenames of your features are probably the same and are both unique?   So, it probably doesn't matter which one you use, except to keep in mind that you will want to put the human recognizable name in the 'name' field.

If you are using the development version and encounter any problems please let us know on the gmod-tripal-devel mailing list.  For the upcoming Tripal v1.0 release we have rewritten almost all of the loaders to make them faster. 

Stephen


On 10/31/2012 6:28 AM, Tony Power wrote:
Hi all,

I would like to load annotations produced by blast2go.  Nevertheless, I have a problem:
The annotations were made on contigs that have a name different from the ones that are already in the Tripal database, but I have a excel table that relates the names.

How would you suggest me to do? Should I, for instance, rename all the contigs in the tripal database by using that table, and then load the annotations? If so, what would be the tables that I needed to change?

I see that there are two labels for the contigs, the Name and Unique Name. Could I use that feature and change only one of them, and then load the annotations using the changed one.



Thank you very much for your help,


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct


_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal






------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Gmod-tripal mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal