Problem loading NCBI taxonomy

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem loading NCBI taxonomy

Nicolas Joannin
Hello,

I have followed the instructions for loading the NCBI taxonomy into CHADO (using the load_ncbi_taxonomy.pl script), but am getting the following error command line output:

$ perl load_ncbi_taxonomy.pl -H localhost -D test_chado -d Pg -p ***** -u Nicolas -v -t
H= localhost, D= test_chado, v=1, t=1, i=  
Created a new phylotee with id 6


No parent id found for  species Norovirus river water/GII.4/RS2Aug09/ARG/2009 (id = 1267486) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Tellervini (id = 127219) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Hirondellea (id = 58017) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! Check your input file !!

However, when I check the nodes.dmp file, I believe these species do have parent ids, and their parent ids are present as well (see incriminated lines from the nodes file below)... So I'm not sure what the problem really is. 

Does anyone have an idea what might be wrong?

Best regards,
Nicolas

Incriminated lines from the nodes.dmp file:

1267486 | 489821 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
489821 | 122929 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |

127219 | 127218 | tribe | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
127218 | 33415 | subfamily | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

58017 | 134545 | genus | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
134545 | 44330 | family | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

695965 | 2249 | species | HS | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | |
2249 | 2236 | genus | | 0 | 1 | 11 | 1 | 0 | 1 | 0 | 0 | |




Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Problem loading NCBI taxonomy

Naama Menda-3
hi Nicolas,

how did you generate the nodes.dmp file ? Are you trying to load all NCBI taxonomy or a subset? 

thanks
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, May 29, 2013 at 1:58 AM, Nicolas Joannin <[hidden email]> wrote:
Hello,

I have followed the instructions for loading the NCBI taxonomy into CHADO (using the load_ncbi_taxonomy.pl script), but am getting the following error command line output:

$ perl load_ncbi_taxonomy.pl -H localhost -D test_chado -d Pg -p ***** -u Nicolas -v -t
H= localhost, D= test_chado, v=1, t=1, i=  
Created a new phylotee with id 6


No parent id found for  species Norovirus river water/GII.4/RS2Aug09/ARG/2009 (id = 1267486) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Tellervini (id = 127219) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Hirondellea (id = 58017) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! Check your input file !!

However, when I check the nodes.dmp file, I believe these species do have parent ids, and their parent ids are present as well (see incriminated lines from the nodes file below)... So I'm not sure what the problem really is. 

Does anyone have an idea what might be wrong?

Best regards,
Nicolas

Incriminated lines from the nodes.dmp file:

1267486 | 489821 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
489821 | 122929 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |

127219 | 127218 | tribe | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
127218 | 33415 | subfamily | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

58017 | 134545 | genus | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
134545 | 44330 | family | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

695965 | 2249 | species | HS | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | |
2249 | 2236 | genus | | 0 | 1 | 11 | 1 | 0 | 1 | 0 | 0 | |




Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Problem loading NCBI taxonomy

Nicolas Joannin
Hi Naama,

Thanks for the quick reply!

I downloaded the nodes.dmp file (and the names.dmp) from the NCBI ftp site...
More specifically, I retrieved the taxdump.tar.gz, unpacked and move the nodes and names files to the same directory as the load_ncbi_taxonomy.pl file is located.

I am just testing, so I thought I'd try loading everything.
Should I have generated the files? If yes, in which way?

Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Wed, May 29, 2013 at 9:29 PM, Naama Menda <[hidden email]> wrote:
hi Nicolas,

how did you generate the nodes.dmp file ? Are you trying to load all NCBI taxonomy or a subset? 

thanks
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, May 29, 2013 at 1:58 AM, Nicolas Joannin <[hidden email]> wrote:
Hello,

I have followed the instructions for loading the NCBI taxonomy into CHADO (using the load_ncbi_taxonomy.pl script), but am getting the following error command line output:

$ perl load_ncbi_taxonomy.pl -H localhost -D test_chado -d Pg -p ***** -u Nicolas -v -t
H= localhost, D= test_chado, v=1, t=1, i=  
Created a new phylotee with id 6


No parent id found for  species Norovirus river water/GII.4/RS2Aug09/ARG/2009 (id = 1267486) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Tellervini (id = 127219) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Hirondellea (id = 58017) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! Check your input file !!

However, when I check the nodes.dmp file, I believe these species do have parent ids, and their parent ids are present as well (see incriminated lines from the nodes file below)... So I'm not sure what the problem really is. 

Does anyone have an idea what might be wrong?

Best regards,
Nicolas

Incriminated lines from the nodes.dmp file:

1267486 | 489821 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
489821 | 122929 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |

127219 | 127218 | tribe | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
127218 | 33415 | subfamily | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

58017 | 134545 | genus | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
134545 | 44330 | family | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

695965 | 2249 | species | HS | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | |
2249 | 2236 | genus | | 0 | 1 | 11 | 1 | 0 | 1 | 0 | 0 | |




Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Problem loading NCBI taxonomy

Nicolas Joannin
Hi Naama,

Thanks for your quick reply!

I just realized that your reply was only addressed to me, so I've added the gmod-schema mailing list :)
This seems to fix the problem. At least I could launch the program, but it's still running now so I don't know if it'll complete or not.
Crossing my fingers (it's such a long process to include all NCBI Taxonomy ;)!

I now have a new problem though, but I will start a new thread for it...

Cheers,
Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Thu, May 30, 2013 at 1:38 AM, Naama Menda <[hidden email]> wrote:
hi Nicolas,

please update the script from svn.
The problem was that I've never tried before to load the entire NCBI taxonomy, so I always used a filtering file (option -i ) .
I did not run the entire loader all the way through, so please let me know if you get any more errors.

good luck 
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, May 29, 2013 at 8:36 AM, Nicolas Joannin <[hidden email]> wrote:
Hi Naama,

Thanks for the quick reply!

I downloaded the nodes.dmp file (and the names.dmp) from the NCBI ftp site...
More specifically, I retrieved the taxdump.tar.gz, unpacked and move the nodes and names files to the same directory as the load_ncbi_taxonomy.pl file is located.

I am just testing, so I thought I'd try loading everything.
Should I have generated the files? If yes, in which way?

Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Wed, May 29, 2013 at 9:29 PM, Naama Menda <[hidden email]> wrote:
hi Nicolas,

how did you generate the nodes.dmp file ? Are you trying to load all NCBI taxonomy or a subset? 

thanks
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, May 29, 2013 at 1:58 AM, Nicolas Joannin <[hidden email]> wrote:
Hello,

I have followed the instructions for loading the NCBI taxonomy into CHADO (using the load_ncbi_taxonomy.pl script), but am getting the following error command line output:

$ perl load_ncbi_taxonomy.pl -H localhost -D test_chado -d Pg -p ***** -u Nicolas -v -t
H= localhost, D= test_chado, v=1, t=1, i=  
Created a new phylotee with id 6


No parent id found for  species Norovirus river water/GII.4/RS2Aug09/ARG/2009 (id = 1267486) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Tellervini (id = 127219) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Hirondellea (id = 58017) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! Check your input file !!

However, when I check the nodes.dmp file, I believe these species do have parent ids, and their parent ids are present as well (see incriminated lines from the nodes file below)... So I'm not sure what the problem really is. 

Does anyone have an idea what might be wrong?

Best regards,
Nicolas

Incriminated lines from the nodes.dmp file:

1267486 | 489821 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
489821 | 122929 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |

127219 | 127218 | tribe | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
127218 | 33415 | subfamily | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

58017 | 134545 | genus | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
134545 | 44330 | family | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

695965 | 2249 | species | HS | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | |
2249 | 2236 | genus | | 0 | 1 | 11 | 1 | 0 | 1 | 0 | 0 | |




Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Problem loading NCBI taxonomy

Nicolas Joannin
Hi Naama,

The script has finally finished running on my laptop: it took forever :)
Unfortunately, it failed :(
See the last few lines of the script below.

I do not need to load all of the NCBI Taxonomy, so there is no need to debug this for my sake.
I hadn't anticipated that it would be so long, and therefore thought it would be fun to try...

If you do debug it, I can test it on my desktop machine at work, just let me know!

Best regards,
Nicolas

Output (end thereof):
New organism 1380665 (species=Blackburnia costata)
New organism 1380666 (species=Arenaria montana)
Arenaria montana LEVEL=( species) Synonym is Arenaria montana L. 
New organism 1380667 (species=Influenza A virus (A/reassortant/NYMC X-197(Brisbane/11/2010 x Puerto Rico/8/1934)(H3N2)))
New organism 1380668 (species=Blumea cf. canalensis Bernardi 9667)
New organism 1380669 (species=Nematanthus brasiliensis)
New organism 1380670 (species=Nostoc sp. HKAR-2)
the root_id is 
An error occured! Rolling back! 
 Context::Preserve::preserve_context(): No organism id found for root node!  at load_ncbi_taxonomy.pl line 558
 
 Resetting database sequences...






Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Mon, Jun 3, 2013 at 2:34 PM, Nicolas Joannin <[hidden email]> wrote:
Hi Naama,

Thanks for your quick reply!

I just realized that your reply was only addressed to me, so I've added the gmod-schema mailing list :)
This seems to fix the problem. At least I could launch the program, but it's still running now so I don't know if it'll complete or not.
Crossing my fingers (it's such a long process to include all NCBI Taxonomy ;)!

I now have a new problem though, but I will start a new thread for it...

Cheers,
Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Thu, May 30, 2013 at 1:38 AM, Naama Menda <[hidden email]> wrote:
hi Nicolas,

please update the script from svn.
The problem was that I've never tried before to load the entire NCBI taxonomy, so I always used a filtering file (option -i ) .
I did not run the entire loader all the way through, so please let me know if you get any more errors.

good luck 
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, May 29, 2013 at 8:36 AM, Nicolas Joannin <[hidden email]> wrote:
Hi Naama,

Thanks for the quick reply!

I downloaded the nodes.dmp file (and the names.dmp) from the NCBI ftp site...
More specifically, I retrieved the taxdump.tar.gz, unpacked and move the nodes and names files to the same directory as the load_ncbi_taxonomy.pl file is located.

I am just testing, so I thought I'd try loading everything.
Should I have generated the files? If yes, in which way?

Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Wed, May 29, 2013 at 9:29 PM, Naama Menda <[hidden email]> wrote:
hi Nicolas,

how did you generate the nodes.dmp file ? Are you trying to load all NCBI taxonomy or a subset? 

thanks
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, May 29, 2013 at 1:58 AM, Nicolas Joannin <[hidden email]> wrote:
Hello,

I have followed the instructions for loading the NCBI taxonomy into CHADO (using the load_ncbi_taxonomy.pl script), but am getting the following error command line output:

$ perl load_ncbi_taxonomy.pl -H localhost -D test_chado -d Pg -p ***** -u Nicolas -v -t
H= localhost, D= test_chado, v=1, t=1, i=  
Created a new phylotee with id 6


No parent id found for  species Norovirus river water/GII.4/RS2Aug09/ARG/2009 (id = 1267486) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Tellervini (id = 127219) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Hirondellea (id = 58017) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! Check your input file !!

However, when I check the nodes.dmp file, I believe these species do have parent ids, and their parent ids are present as well (see incriminated lines from the nodes file below)... So I'm not sure what the problem really is. 

Does anyone have an idea what might be wrong?

Best regards,
Nicolas

Incriminated lines from the nodes.dmp file:

1267486 | 489821 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
489821 | 122929 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |

127219 | 127218 | tribe | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
127218 | 33415 | subfamily | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

58017 | 134545 | genus | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
134545 | 44330 | family | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

695965 | 2249 | species | HS | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | |
2249 | 2236 | genus | | 0 | 1 | 11 | 1 | 0 | 1 | 0 | 0 | |




Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Problem loading NCBI taxonomy

Naama Menda-3
yes, it takes forever because NCBI taxonomy is big. If you run it on a machine with more CPU it will be faster, but will still take some time to run.
I've never tried to load the entire taxonomy, since we don't really need to store the entire thing. 
From the error, I think the full taxonomy does not have a root node. If that is the case, I'll have to add a 'dummy' root.
I've also seen NCBI dumps that have some glitch in one of the nodes, but I don't think that's the case here (and I have to say that the NCBI helpdesk is very responsive, and they've always fixed those issues quickly when I contacted them!). 

Have you tried loading a subset of the taxonomy? Did that finish?

Thanks for helping debugging this!
-Naama




Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, Jun 5, 2013 at 9:34 PM, Nicolas Joannin <[hidden email]> wrote:
Hi Naama,

The script has finally finished running on my laptop: it took forever :)
Unfortunately, it failed :(
See the last few lines of the script below.

I do not need to load all of the NCBI Taxonomy, so there is no need to debug this for my sake.
I hadn't anticipated that it would be so long, and therefore thought it would be fun to try...

If you do debug it, I can test it on my desktop machine at work, just let me know!

Best regards,
Nicolas

Output (end thereof):
New organism 1380665 (species=Blackburnia costata)
New organism 1380666 (species=Arenaria montana)
Arenaria montana LEVEL=( species) Synonym is Arenaria montana L. 
New organism 1380667 (species=Influenza A virus (A/reassortant/NYMC X-197(Brisbane/11/2010 x Puerto Rico/8/1934)(H3N2)))
New organism 1380668 (species=Blumea cf. canalensis Bernardi 9667)
New organism 1380669 (species=Nematanthus brasiliensis)
New organism 1380670 (species=Nostoc sp. HKAR-2)
the root_id is 
An error occured! Rolling back! 
 Context::Preserve::preserve_context(): No organism id found for root node!  at load_ncbi_taxonomy.pl line 558
 
 Resetting database sequences...






Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Mon, Jun 3, 2013 at 2:34 PM, Nicolas Joannin <[hidden email]> wrote:
Hi Naama,

Thanks for your quick reply!

I just realized that your reply was only addressed to me, so I've added the gmod-schema mailing list :)
This seems to fix the problem. At least I could launch the program, but it's still running now so I don't know if it'll complete or not.
Crossing my fingers (it's such a long process to include all NCBI Taxonomy ;)!

I now have a new problem though, but I will start a new thread for it...

Cheers,
Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Thu, May 30, 2013 at 1:38 AM, Naama Menda <[hidden email]> wrote:
hi Nicolas,

please update the script from svn.
The problem was that I've never tried before to load the entire NCBI taxonomy, so I always used a filtering file (option -i ) .
I did not run the entire loader all the way through, so please let me know if you get any more errors.

good luck 
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, May 29, 2013 at 8:36 AM, Nicolas Joannin <[hidden email]> wrote:
Hi Naama,

Thanks for the quick reply!

I downloaded the nodes.dmp file (and the names.dmp) from the NCBI ftp site...
More specifically, I retrieved the taxdump.tar.gz, unpacked and move the nodes and names files to the same directory as the load_ncbi_taxonomy.pl file is located.

I am just testing, so I thought I'd try loading everything.
Should I have generated the files? If yes, in which way?

Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Wed, May 29, 2013 at 9:29 PM, Naama Menda <[hidden email]> wrote:
hi Nicolas,

how did you generate the nodes.dmp file ? Are you trying to load all NCBI taxonomy or a subset? 

thanks
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, May 29, 2013 at 1:58 AM, Nicolas Joannin <[hidden email]> wrote:
Hello,

I have followed the instructions for loading the NCBI taxonomy into CHADO (using the load_ncbi_taxonomy.pl script), but am getting the following error command line output:

$ perl load_ncbi_taxonomy.pl -H localhost -D test_chado -d Pg -p ***** -u Nicolas -v -t
H= localhost, D= test_chado, v=1, t=1, i=  
Created a new phylotee with id 6


No parent id found for  species Norovirus river water/GII.4/RS2Aug09/ARG/2009 (id = 1267486) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Tellervini (id = 127219) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Hirondellea (id = 58017) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! Check your input file !!

However, when I check the nodes.dmp file, I believe these species do have parent ids, and their parent ids are present as well (see incriminated lines from the nodes file below)... So I'm not sure what the problem really is. 

Does anyone have an idea what might be wrong?

Best regards,
Nicolas

Incriminated lines from the nodes.dmp file:

1267486 | 489821 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
489821 | 122929 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |

127219 | 127218 | tribe | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
127218 | 33415 | subfamily | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

58017 | 134545 | genus | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
134545 | 44330 | family | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

695965 | 2249 | species | HS | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | |
2249 | 2236 | genus | | 0 | 1 | 11 | 1 | 0 | 1 | 0 | 0 | |




Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema






------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Problem loading NCBI taxonomy

Nicolas Joannin
Hello Naama,

Thanks for the quick reply :)
I have tried a subset of the taxonomy and that works, as long as there is only one branch (as we discussed in another thread).
I guess your thought about the taxonomy not having a root node makes sense... 
Let me know when you have a fix and I'll test it!

Cheers,
Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Thu, Jun 6, 2013 at 11:39 AM, Naama Menda <[hidden email]> wrote:
yes, it takes forever because NCBI taxonomy is big. If you run it on a machine with more CPU it will be faster, but will still take some time to run.
I've never tried to load the entire taxonomy, since we don't really need to store the entire thing. 
From the error, I think the full taxonomy does not have a root node. If that is the case, I'll have to add a 'dummy' root.
I've also seen NCBI dumps that have some glitch in one of the nodes, but I don't think that's the case here (and I have to say that the NCBI helpdesk is very responsive, and they've always fixed those issues quickly when I contacted them!). 

Have you tried loading a subset of the taxonomy? Did that finish?

Thanks for helping debugging this!
-Naama




Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, Jun 5, 2013 at 9:34 PM, Nicolas Joannin <[hidden email]> wrote:
Hi Naama,

The script has finally finished running on my laptop: it took forever :)
Unfortunately, it failed :(
See the last few lines of the script below.

I do not need to load all of the NCBI Taxonomy, so there is no need to debug this for my sake.
I hadn't anticipated that it would be so long, and therefore thought it would be fun to try...

If you do debug it, I can test it on my desktop machine at work, just let me know!

Best regards,
Nicolas

Output (end thereof):
New organism 1380665 (species=Blackburnia costata)
New organism 1380666 (species=Arenaria montana)
Arenaria montana LEVEL=( species) Synonym is Arenaria montana L. 
New organism 1380667 (species=Influenza A virus (A/reassortant/NYMC X-197(Brisbane/11/2010 x Puerto Rico/8/1934)(H3N2)))
New organism 1380668 (species=Blumea cf. canalensis Bernardi 9667)
New organism 1380669 (species=Nematanthus brasiliensis)
New organism 1380670 (species=Nostoc sp. HKAR-2)
the root_id is 
An error occured! Rolling back! 
 Context::Preserve::preserve_context(): No organism id found for root node!  at load_ncbi_taxonomy.pl line 558
 
 Resetting database sequences...






Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Mon, Jun 3, 2013 at 2:34 PM, Nicolas Joannin <[hidden email]> wrote:
Hi Naama,

Thanks for your quick reply!

I just realized that your reply was only addressed to me, so I've added the gmod-schema mailing list :)
This seems to fix the problem. At least I could launch the program, but it's still running now so I don't know if it'll complete or not.
Crossing my fingers (it's such a long process to include all NCBI Taxonomy ;)!

I now have a new problem though, but I will start a new thread for it...

Cheers,
Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Thu, May 30, 2013 at 1:38 AM, Naama Menda <[hidden email]> wrote:
hi Nicolas,

please update the script from svn.
The problem was that I've never tried before to load the entire NCBI taxonomy, so I always used a filtering file (option -i ) .
I did not run the entire loader all the way through, so please let me know if you get any more errors.

good luck 
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, May 29, 2013 at 8:36 AM, Nicolas Joannin <[hidden email]> wrote:
Hi Naama,

Thanks for the quick reply!

I downloaded the nodes.dmp file (and the names.dmp) from the NCBI ftp site...
More specifically, I retrieved the taxdump.tar.gz, unpacked and move the nodes and names files to the same directory as the load_ncbi_taxonomy.pl file is located.

I am just testing, so I thought I'd try loading everything.
Should I have generated the files? If yes, in which way?

Nicolas



Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan



On Wed, May 29, 2013 at 9:29 PM, Naama Menda <[hidden email]> wrote:
hi Nicolas,

how did you generate the nodes.dmp file ? Are you trying to load all NCBI taxonomy or a subset? 

thanks
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


On Wed, May 29, 2013 at 1:58 AM, Nicolas Joannin <[hidden email]> wrote:
Hello,

I have followed the instructions for loading the NCBI taxonomy into CHADO (using the load_ncbi_taxonomy.pl script), but am getting the following error command line output:

$ perl load_ncbi_taxonomy.pl -H localhost -D test_chado -d Pg -p ***** -u Nicolas -v -t
H= localhost, D= test_chado, v=1, t=1, i=  
Created a new phylotee with id 6


No parent id found for  species Norovirus river water/GII.4/RS2Aug09/ARG/2009 (id = 1267486) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Tellervini (id = 127219) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Hirondellea (id = 58017) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! This means your species is the root node, or there is an error in yout input file 
No parent id found for  species Halococcus sp. KeC-02 (id = 695965) !! Check your input file !!

However, when I check the nodes.dmp file, I believe these species do have parent ids, and their parent ids are present as well (see incriminated lines from the nodes file below)... So I'm not sure what the problem really is. 

Does anyone have an idea what might be wrong?

Best regards,
Nicolas

Incriminated lines from the nodes.dmp file:

1267486 | 489821 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |
489821 | 122929 | no rank | | 9 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | |

127219 | 127218 | tribe | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
127218 | 33415 | subfamily | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

58017 | 134545 | genus | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |
134545 | 44330 | family | | 1 | 1 | 1 | 1 | 5 | 1 | 0 | 0 | |

695965 | 2249 | species | HS | 0 | 1 | 11 | 1 | 0 | 1 | 1 | 0 | |
2249 | 2236 | genus | | 0 | 1 | 11 | 1 | 0 | 1 | 0 | 0 | |




Nicolas Joannin, Ph.D.
Bioinformatics Center
Kyoto University, Uji campus, Japan


------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema







------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema