Fwd: about running MAKER

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: about running MAKER

girlwithglasses
From the GMOD helpdesk; please cc Lin, [hidden email].

---------- Forwarded message ----------
From: Yunxi Lin <[hidden email]>
Date: Sun, Jun 23, 2013 at 4:14 PM
Subject: about running MAKER
To: "[hidden email]" <[hidden email]>


Hi 

I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. 

Thank you. 

Sincerely,
Lin



--
Amelia Ireland
GMOD Community Support
http://gmod.org || @gmodproject


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: about running MAKER

Carson Holt-2
Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use.  If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time.  This is especially true if you use the alt_est option for evidence as these are aligned via tblastx  which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor.

Also you do not need results from the entire genome to train SNAP.  If you get results from ~10Mb of the genome that is usually sufficient.  Also make sure you are taking advantage of parallelization.  Launch via MPI to get maximum performance.  I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days.

--Carson


From: Amelia Ireland <[hidden email]>
Date: Sunday, 23 June, 2013 10:15 PM
To: <[hidden email]>
Cc: <[hidden email]>
Subject: [maker-devel] Fwd: about running MAKER

From the GMOD helpdesk; please cc Lin, [hidden email].

---------- Forwarded message ----------
From: Yunxi Lin <[hidden email]>
Date: Sun, Jun 23, 2013 at 4:14 PM
Subject: about running MAKER
To: "[hidden email]" <[hidden email]>


Hi 

I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. 

Thank you. 

Sincerely,
Lin



--
Amelia Ireland
GMOD Community Support
http://gmod.org || @gmodproject

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: about running MAKER

Yunxi Lin
Hi Carson

Thank your for your help. 

My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. 

I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. 

Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. 

Sincerely,
Yunxi



2013/6/24 Carson Holt <[hidden email]>
Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use.  If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time.  This is especially true if you use the alt_est option for evidence as these are aligned via tblastx  which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor.

Also you do not need results from the entire genome to train SNAP.  If you get results from ~10Mb of the genome that is usually sufficient.  Also make sure you are taking advantage of parallelization.  Launch via MPI to get maximum performance.  I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days.

--Carson


From: Amelia Ireland <[hidden email]>
Date: Sunday, 23 June, 2013 10:15 PM
To: <[hidden email]>
Cc: <[hidden email]>
Subject: [maker-devel] Fwd: about running MAKER

From the GMOD helpdesk; please cc Lin, [hidden email].

---------- Forwarded message ----------
From: Yunxi Lin <[hidden email]>
Date: Sun, Jun 23, 2013 at 4:14 PM
Subject: about running MAKER
To: "[hidden email]" <[hidden email]>


Hi 

I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. 

Thank you. 

Sincerely,
Lin



--
Amelia Ireland
GMOD Community Support
http://gmod.org || @gmodproject

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: about running MAKER

Carson Holt-3
You are most likely only getting 1 cpu of performance.

You should just install MPICH2.  It's easy just to let MAKER do it for you:
Go to the …/maker/src/ directory
Run './Build mpich2'
Once it finishes installing, it will be in the …/maker/exe/mpich2/bin/ directory.

Setup MAKER again to use MPICH2:
Go to the …/maker/src/ directory
Run 'perl Build.PL'
Say yes to the "use MPI": question
Run './Build install'

Now run MAKER via 'mpiexec'.
Example --> …/maker/exe/mpich2/bin/mpiexec -n 16 maker

The –n flag specifies how many CPUS to use. Mpiexec handles process communication either on the same machine or across machines.  You will get much better performance.

Thanks,
Carson



From: Yunxi Lin <[hidden email]>
Date: Monday, 24 June, 2013 7:11 PM
To: Carson Holt <[hidden email]>
Cc: Amelia Ireland <[hidden email]>, <[hidden email]>
Subject: Re: [maker-devel] Fwd: about running MAKER

Hi Carson

Thank your for your help. 

My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. 

I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. 

Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. 

Sincerely,
Yunxi



2013/6/24 Carson Holt <[hidden email]>
Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use.  If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time.  This is especially true if you use the alt_est option for evidence as these are aligned via tblastx  which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor.

Also you do not need results from the entire genome to train SNAP.  If you get results from ~10Mb of the genome that is usually sufficient.  Also make sure you are taking advantage of parallelization.  Launch via MPI to get maximum performance.  I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days.

--Carson


From: Amelia Ireland <[hidden email]>
Date: Sunday, 23 June, 2013 10:15 PM
To: <[hidden email]>
Cc: <[hidden email]>
Subject: [maker-devel] Fwd: about running MAKER

From the GMOD helpdesk; please cc Lin, [hidden email].

---------- Forwarded message ----------
From: Yunxi Lin <[hidden email]>
Date: Sun, Jun 23, 2013 at 4:14 PM
Subject: about running MAKER
To: "[hidden email]" <[hidden email]>


Hi 

I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. 

Thank you. 

Sincerely,
Lin



--
Amelia Ireland
GMOD Community Support
http://gmod.org || @gmodproject

_______________________________________________ maker-devel mailing list [hidden email]http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: about running MAKER

Daniel Ence
In reply to this post by Yunxi Lin
Hi Yunxi, 

During the maker installation, there is an option to automatically install MPICH2, which would let you run maker parallelized. Try rerunning the perl Build.PL script in the "maker/src" directory, and when the option to install MPICH2 comes up, tell it yes. This will start an automated download and install onto your server.

You can also start more than one maker process. They will work on annotating the genome together. You can start as many as ten or more processes like this, but MPI is a better parallelizing option.  

Hope that helps,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

From: maker-devel [[hidden email]] on behalf of Yunxi Lin [[hidden email]]
Sent: Monday, June 24, 2013 5:11 PM
To: Carson Holt
Cc: [hidden email]; Amelia Ireland
Subject: Re: [maker-devel] Fwd: about running MAKER

Hi Carson

Thank your for your help. 

My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. 

I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. 

Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. 

Sincerely,
Yunxi



2013/6/24 Carson Holt <[hidden email]>
Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use.  If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time.  This is especially true if you use the alt_est option for evidence as these are aligned via tblastx  which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor.

Also you do not need results from the entire genome to train SNAP.  If you get results from ~10Mb of the genome that is usually sufficient.  Also make sure you are taking advantage of parallelization.  Launch via MPI to get maximum performance.  I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days.

--Carson


From: Amelia Ireland <[hidden email]>
Date: Sunday, 23 June, 2013 10:15 PM
To: <[hidden email]>
Cc: <[hidden email]>
Subject: [maker-devel] Fwd: about running MAKER

From the GMOD helpdesk; please cc Lin, [hidden email].

---------- Forwarded message ----------
From: Yunxi Lin <[hidden email]>
Date: Sun, Jun 23, 2013 at 4:14 PM
Subject: about running MAKER
To: "[hidden email]" <[hidden email]>


Hi 

I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. 

Thank you. 

Sincerely,
Lin



--
Amelia Ireland
GMOD Community Support
http://gmod.org || @gmodproject

_______________________________________________ maker-devel mailing list [hidden email] http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: about running MAKER

Carson Holt-3
In reply to this post by girlwithglasses
You can get blast to use more than 1 cpu via the cpus= option, but that is still significantly limiting MAKER's performance.

When you let MAKER install MPICH2, it will be local to the MAKER installation (MAKER only).  It will be in …/maker/exe/mpich2.  This was purposely done for people who have limited access and install MAKER themselves, so they can run via MPI without having to get upgraded privileges.  So I don't know if you installed MAKER yourself, but if you did, then this is an option that will let you run.

--Carson


From: csusm <[hidden email]>
Date: Tuesday, 25 June, 2013 11:40 AM
To: Carson Holt <[hidden email]>
Subject: Re: [maker-devel] Fwd: about running MAKER

Hi Carson

Thank you for your suggestion. Do you mean if I dont use MPI, i could only run it on one cpu?  Because my school own the server, I only have the limit authorization. 

Yunxi Lin

On Jun 24, 2013, at 5:39 PM, Carson Holt <[hidden email]> wrote:

You are most likely only getting 1 cpu of performance.

You should just install MPICH2.  It's easy just to let MAKER do it for you:
Go to the …/maker/src/ directory
Run './Build mpich2'
Once it finishes installing, it will be in the …/maker/exe/mpich2/bin/ directory.

Setup MAKER again to use MPICH2:
Go to the …/maker/src/ directory
Run 'perl Build.PL'
Say yes to the "use MPI": question
Run './Build install'

Now run MAKER via 'mpiexec'.
Example --> …/maker/exe/mpich2/bin/mpiexec -n 16 maker

The –n flag specifies how many CPUS to use. Mpiexec handles process communication either on the same machine or across machines.  You will get much better performance.

Thanks,
Carson



From: Yunxi Lin <[hidden email]>
Date: Monday, 24 June, 2013 7:11 PM
To: Carson Holt <[hidden email]>
Cc: Amelia Ireland <[hidden email]>, <[hidden email]>
Subject: Re: [maker-devel] Fwd: about running MAKER

Hi Carson

Thank your for your help. 

My genome estimated size is 250M base pairs. I ran it in 16cpu, but we don't have the MPI so I cannot use it. I don't think I'm using the alt_est option. I was following the tutorial to do that. I used TopHat and Cufflinks to generate the ESTs from the assembly sequence based on RNA-seq. I used that ESTs to run the MAKER. 

I think I already got more than 10Mb data. The information you mentioned is very helpful. I may go to use them to try to train the SNAP and Augustus. 

Because this is my first time using the MAKER, I ran already a month, I was wondering maybe the command I used in a wrong way. 

Sincerely,
Yunxi



2013/6/24 Carson Holt <[hidden email]>
Run time is dependent on the size of your evidence dataset, genome size, and number of processors you use.  If you have a large genome (Gb size) and you are running on a single cpu then that could take a long time.  This is especially true if you use the alt_est option for evidence as these are aligned via tblastx  which is 3-4 times slower than protein alignments, and 10-20 time slower than standard EST alignments. 95% of MAKER's runtime is BLAST alignment so your evidence dataset is the major factor.

Also you do not need results from the entire genome to train SNAP.  If you get results from ~10Mb of the genome that is usually sufficient.  Also make sure you are taking advantage of parallelization.  Launch via MPI to get maximum performance.  I commonly launch on 16 and 32 cpu Linux servers which can annotate most fungal genomes in a few hours and larger genomes in a few days.

--Carson


From: Amelia Ireland <[hidden email]>
Date: Sunday, 23 June, 2013 10:15 PM
To: <[hidden email]>
Cc: <[hidden email]>
Subject: [maker-devel] Fwd: about running MAKER

From the GMOD helpdesk; please cc Lin, [hidden email].

---------- Forwarded message ----------
From: Yunxi Lin <[hidden email]>
Date: Sun, Jun 23, 2013 at 4:14 PM
Subject: about running MAKER
To: "[hidden email]" <[hidden email]>


Hi 

I'm running a eukaryote project on our server. Because our server do not have the GUI, is that still work for MAKER? And our command already ran more than one month to try to generate the model use for the training of SNAP and Augustus. Is that normal? I'm running on a 256G memory 64 Linux server. 

Thank you. 

Sincerely,
Lin



--
Amelia Ireland
GMOD Community Support
http://gmod.org || @gmodproject

_______________________________________________ maker-devel mailing list [hidden email]http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org