problem with dsindex

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

problem with dsindex

kdelmore
I am having some trouble with the dsindex tool. I used the fasta_tool to
split my original multifasta file and ran maker with the –base and –g
flags. I then used the dsindex tool to summarize results from each fasta.
The tool finished without an error message and pointed me to where the
files should be but when I went to that directory there was no datastore
and the index.log said that it had started on each of the fastas but not
finished. I got around this problem using gff3_merge by using the –o
option and providing paths to the gff files but this is not working with
the fasta_merge tool. I don’t want to just cat the files together because
I want to be sure the merged gff and protein.fasta files are the same for
downstream annotation steps. I’ve included examples of the commands I used
below and the output from dsindex. Note that the individual fastas
finished without errors and produced datastores.

I would really appreciate any input you might have with this problem and
THANK YOU for developing such a user friendly pipeline.

/maker/bin/fasta_tool --split placed.fasta

mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides

maker/bin/maker -dsindex -fix_nucleotides
STATUS: Parsing control files...
STATUS: Processing and indexing input FASTA files...
STATUS: Setting up database for any GFF3 input...
A data structure will be created for you at:
/placed.maker.output/placed_datastore ##this directory was not generated
To access files for individual sequences use the datastore index:
/placed.maker.output/placed_master_datastore_index.log

/maker/bin/gff3_merge -o placed.gff *

/maker/bin/fasta_merge –o placed.all 1.maker.proteins.fasta
2.maker.proteins.fasta ##this did not work



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: problem with dsindex

Carson Hinton Holt
I don't think all your contigs are finished or you did not supply the
-base tag when running -dsindex.  If it says STARTED rather than FINISHED,
then the output files for that contig are missing from the directory it is
looking at.

For example this is how you should be running everything -->
/maker/bin/fasta_tool --split placed.fasta

mpiexec -n 4 /maker/bin/maker -base placed -g 1.fasta -fix_nucleotides
mpiexec -n 4 /maker/bin/maker -base placed -g 2.fasta -fix_nucleotides

mpiexec -n 4 /maker/bin/maker -base placed -g 3.fasta -fix_nucleotides

mpiexec -n 4 /maker/bin/maker -base placed -g 4.fasta -fix_nucleotides

mpiexec -n 4 /maker/bin/maker -base placed -g 5.fasta -fix_nucleotides


Now all will write to placed.maker.output

Then you need to do this-->
maker/bin/maker -dsindex -base placed -g placed.fasta


Then it will rebuild the index for
placed.maker.output/placed_master_datastore_index.log

Thanks,
Carson



On 4/22/14, 10:48 PM, "[hidden email]" <[hidden email]>
wrote:

>I am having some trouble with the dsindex tool. I used the fasta_tool to
>split my original multifasta file and ran maker with the –base and –g
>flags. I then used the dsindex tool to summarize results from each fasta.
>The tool finished without an error message and pointed me to where the
>files should be but when I went to that directory there was no datastore
>and the index.log said that it had started on each of the fastas but not
>finished. I got around this problem using gff3_merge by using the –o
>option and providing paths to the gff files but this is not working with
>the fasta_merge tool. I don’t want to just cat the files together because
>I want to be sure the merged gff and protein.fasta files are the same for
>downstream annotation steps. I’ve included examples of the commands I used
>below and the output from dsindex. Note that the individual fastas
>finished without errors and produced datastores.
>
>I would really appreciate any input you might have with this problem and
>THANK YOU for developing such a user friendly pipeline.
>
>/maker/bin/fasta_tool --split placed.fasta
>
>mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides
>
>maker/bin/maker -dsindex -fix_nucleotides
>STATUS: Parsing control files...
>STATUS: Processing and indexing input FASTA files...
>STATUS: Setting up database for any GFF3 input...
>A data structure will be created for you at:
>/placed.maker.output/placed_datastore ##this directory was not generated
>To access files for individual sequences use the datastore index:
>/placed.maker.output/placed_master_datastore_index.log
>
>/maker/bin/gff3_merge -o placed.gff *
>
>/maker/bin/fasta_merge –o placed.all 1.maker.proteins.fasta
>2.maker.proteins.fasta ##this did not work
>
>
>

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: problem with dsindex

Carson Holt-2
In reply to this post by kdelmore
Also fasta_merge works differently than gff3_merge.  It requires the
datastore index because it is trying to find directories and the 'type'
and 'group' the fasta files in those directories.
Without the datastore index, it is the equivalent of 'cat file1.fa
file2.fa > file3.fa'. It also requires the '-i' flag when specifying
individual fasta files.

--Carson



On 4/22/14, 10:48 PM, "[hidden email]" <[hidden email]>
wrote:

>I am having some trouble with the dsindex tool. I used the fasta_tool to
>split my original multifasta file and ran maker with the –base and –g
>flags. I then used the dsindex tool to summarize results from each fasta.
>The tool finished without an error message and pointed me to where the
>files should be but when I went to that directory there was no datastore
>and the index.log said that it had started on each of the fastas but not
>finished. I got around this problem using gff3_merge by using the –o
>option and providing paths to the gff files but this is not working with
>the fasta_merge tool. I don’t want to just cat the files together because
>I want to be sure the merged gff and protein.fasta files are the same for
>downstream annotation steps. I’ve included examples of the commands I used
>below and the output from dsindex. Note that the individual fastas
>finished without errors and produced datastores.
>
>I would really appreciate any input you might have with this problem and
>THANK YOU for developing such a user friendly pipeline.
>
>/maker/bin/fasta_tool --split placed.fasta
>
>mpiexec -n 4 /maker/bin/maker -base 1 -g 1.fasta -fix_nucleotides
>
>maker/bin/maker -dsindex -fix_nucleotides
>STATUS: Parsing control files...
>STATUS: Processing and indexing input FASTA files...
>STATUS: Setting up database for any GFF3 input...
>A data structure will be created for you at:
>/placed.maker.output/placed_datastore ##this directory was not generated
>To access files for individual sequences use the datastore index:
>/placed.maker.output/placed_master_datastore_index.log
>
>/maker/bin/gff3_merge -o placed.gff *
>
>/maker/bin/fasta_merge –o placed.all 1.maker.proteins.fasta
>2.maker.proteins.fasta ##this did not work
>
>
>
>_______________________________________________
>maker-devel mailing list
>[hidden email]
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org