Pregenerating Blast/RepeatMasker output

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Pregenerating Blast/RepeatMasker output

Khan, Anar
Hello, again
 
I’d like to pregenerate Blast and RepeatMasker results on a compute farm and feed the results into MAKER, to decrease processing time. I did a small test using one contig, where I compared MAKER output obtained when protein Blast results (ncbi BLASTX against SwissProt) were generated in situ (protein=) versus externally (protein_gff=). To generate the gff for the latter, I run blastx using command line options which were written to stderr on the “in situ” run in order to keep them constant between runs:
 
-d uniprot_sprot.fasta -p blastx -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 3 -U -F T -I T
 
then parsed all hsps to gff format.
 
The number of “maker” predictions obtained using the two methods differed:
 
In situ blastx = 189
External blastx = 293
 
Should I expect the results to be the same? If one specifies externally generated BLAST (or RepeatMasker) results, is it possible to mimic the behaviour of a standard run?
 
Thanks!
Anar
 
 
 
 

 


Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.


 


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Pregenerating Blast/RepeatMasker output

Carson Hinton Holt
Re: [maker-devel] Pregenerating Blast/RepeatMasker output MAKER does some extra filtering downstream of BLAST.  It pre-masks the input file before running BLAST, and then examines % identity, % coverage, and filters for overlapping low complexity hits so the results of a raw BLAST and MAKER can be different.

MAKER creates a query.masked.fasta in theVoid directory for each contig.  Pre-computing on this file would bring the number somewhat closer to what MAKER produces internally.  Also if you have a compute farm, you can run mpi_maker which would allow you to speed up everything internally rather than as a separate process.

Thanks,
Carson


On 2/20/11 4:18 PM, "Khan, Anar" <Anar.Khan@...> wrote:

Hello, again
 
I’d like to pregenerate Blast and RepeatMasker results on a compute farm and feed the results into MAKER, to decrease processing time. I did a small test using one contig, where I compared MAKER output obtained when protein Blast results (ncbi BLASTX against SwissProt) were generated in situ (protein=) versus externally (protein_gff=). To generate the gff for the latter, I run blastx using command line options which were written to stderr on the “in situ” run in order to keep them constant between runs:
 
-d uniprot_sprot.fasta -p blastx -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 3 -U -F T -I T
 
then parsed all hsps to gff format.
 
The number of “maker” predictions obtained using the two methods differed:
 
In situ blastx = 189
External blastx = 293
 
Should I expect the results to be the same? If one specifies externally generated BLAST (or RepeatMasker) results, is it possible to mimic the behaviour of a standard run?
 
Thanks!
Anar
 
 
 
 
 


Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Pregenerating Blast/RepeatMasker output

Khan, Anar
Hi Carson

Thanks for the information, I see. The compute farm scheduler we use currently is condor, and our sysadmin isn't thrilled by the idea of hooking it up to mpi. Perhaps some bribery of the chocolate form is in order.

Cheers
Anar
________________________________________
From: Carson Holt [[hidden email]]
Sent: Monday, 21 February 2011 4:09 p.m.
To: Khan, Anar; [hidden email]
Subject: Re: [maker-devel] Pregenerating Blast/RepeatMasker output

MAKER does some extra filtering downstream of BLAST.  It pre-masks the input file before running BLAST, and then examines % identity, % coverage, and filters for overlapping low complexity hits so the results of a raw BLAST and MAKER can be different.

MAKER creates a query.masked.fasta in theVoid directory for each contig.  Pre-computing on this file would bring the number somewhat closer to what MAKER produces internally.  Also if you have a compute farm, you can run mpi_maker which would allow you to speed up everything internally rather than as a separate process.

Thanks,
Carson


On 2/20/11 4:18 PM, "Khan, Anar" <[hidden email]> wrote:

Hello, again

I’d like to pregenerate Blast and RepeatMasker results on a compute farm and feed the results into MAKER, to decrease processing time. I did a small test using one contig, where I compared MAKER output obtained when protein Blast results (ncbi BLASTX against SwissProt) were generated in situ (protein=) versus externally (protein_gff=). To generate the gff for the latter, I run blastx using command line options which were written to stderr on the “in situ” run in order to keep them constant between runs:

-d uniprot_sprot.fasta -p blastx -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 3 -U -F T -I T

then parsed all hsps to gff format.

The number of “maker” predictions obtained using the two methods differed:

In situ blastx = 189
External blastx = 293

Should I expect the results to be the same? If one specifies externally generated BLAST (or RepeatMasker) results, is it possible to mimic the behaviour of a standard run?

Thanks!
Anar






________________________________
Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.

________________________________



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Pregenerating Blast/RepeatMasker output

Carson Hinton Holt
Re: [maker-devel] Pregenerating Blast/RepeatMasker output MPI is the de facto standard for parallel computation on non-shared memory systems and computer clusters, so condor should be heavily optimized to support it just like other scheduling systems.  More important than scheduler support would be system architecture, i.e. Is there shared NFS etc.  If you have questions setting it up just let me know.

--Carson

On 2/22/11 1:26 AM, "Khan, Anar" <Anar.Khan@...> wrote:

Hi Carson

Thanks for the information, I see. The compute farm scheduler we use currently is condor, and our sysadmin isn't thrilled by the idea of hooking it up to mpi. Perhaps some bribery of the chocolate form is in order.

Cheers
Anar
________________________________________
From: Carson Holt [carson.holt@...]
Sent: Monday, 21 February 2011 4:09 p.m.
To: Khan, Anar; maker-devel@...
Subject: Re: [maker-devel] Pregenerating Blast/RepeatMasker output

MAKER does some extra filtering downstream of BLAST.  It pre-masks the input file before running BLAST, and then examines % identity, % coverage, and filters for overlapping low complexity hits so the results of a raw BLAST and MAKER can be different.

MAKER creates a query.masked.fasta in theVoid directory for each contig.  Pre-computing on this file would bring the number somewhat closer to what MAKER produces internally.  Also if you have a compute farm, you can run mpi_maker which would allow you to speed up everything internally rather than as a separate process.

Thanks,
Carson


On 2/20/11 4:18 PM, "Khan, Anar" <Anar.Khan@...> wrote:

Hello, again

I’d like to pregenerate Blast and RepeatMasker results on a compute farm and feed the results into MAKER, to decrease processing time. I did a small test using one contig, where I compared MAKER output obtained when protein Blast results (ncbi BLASTX against SwissProt) were generated in situ (protein=) versus externally (protein_gff=). To generate the gff for the latter, I run blastx using command line options which were written to stderr on the “in situ” run in order to keep them constant between runs:

-d uniprot_sprot.fasta -p blastx -b 100000 -v 100000 -e 1e-06 -z 300 -Y 500000000 -a 3 -U -F T -I T

then parsed all hsps to gff format.

The number of “maker” predictions obtained using the two methods differed:

In situ blastx = 189
External blastx = 293

Should I expect the results to be the same? If one specifies externally generated BLAST (or RepeatMasker) results, is it possible to mimic the behaviour of a standard run?

Thanks!
Anar






________________________________
Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately.

________________________________





_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org