Display Broad fungal data in gbrowse

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Display Broad fungal data in gbrowse

Johnny Quest
Hello,

I've been trying to display the genome for N. crassa OR74A (http://www.broadinstitute.org/annotation/genome/neurospora/MultiHome.html) in gbrowse 1.69 (genes, annotations, proteins, etc.). So far I have not found a good way to display the sequence data (.fasta, .agp, .gtf are the available formats) in gbrowse, since they don't appear to provide any kind of GFF file or anything similar. Do any of you have any experience with display the data from the Broad Institute in gbrowse 1.xx? If so, I'd like to hear some suggestions as to how I might be able to get this to work. I use the MYSQL back-end, and normally use bp_bulk_load_gff.pl, specifying my database, fasta file(s) and the GFF file. 


p.s. When performing a feature search, there is an option to export the result in GFF format -- Unfortunately, this results in an error every time.



Best Regards,


R.S.W.


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Display Broad fungal data in gbrowse

Jason Stajich-2
Seems like you best bet is to ask the Broad Institute... However some of us have converted the data into use in Gbrowse.

agp is just the assembly info I don't know why you want that.

You can convert gtf to gff3 with this script that works on the broad and JGI GTF outputs (which are different): It is geared towards Gbrowse2 and Bio::DB::SeqFeature usage though.
http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl

I have older GFF3 for Neurospora and other fungi here and are working on updating these.

http://fungalgenomes.org/data/GFF/
http://fungalgenomes.org/data/NT/

You may also try microbesonline.org

-jason

Johnny Quest wrote, On 6/7/10 12:33 PM:
Hello,

I've been trying to display the genome for N. crassa OR74A (
http://www.broadinstitute.org/annotation/genome/neurospora/MultiHome.html)
in gbrowse 1.69 (genes, annotations, proteins, etc.). So far I have not
found a good way to display the sequence data (.fasta, .agp, .gtf are the
available formats) in gbrowse, since they don't appear to provide any kind
of GFF file or anything similar. Do any of you have any experience with
display the data from the Broad Institute in gbrowse 1.xx? If so, I'd like
to hear some suggestions as to how I might be able to get this to work. I
use the MYSQL back-end, and normally use bp_bulk_load_gff.pl, specifying my
database, fasta file(s) and the GFF file.


p.s. When performing a feature search, there is an option to export the
result in GFF format -- Unfortunately, this results in an error every time.



Best Regards,


R.S.W.

  

------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo

_______________________________________________ Gmod-gbrowse mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Display Broad fungal data in gbrowse

Lincoln Stein
In reply to this post by Johnny Quest
Hi Johnny,

Jason Stajich at Berkeley has done a lot of work displaying the Broad fungal data on GBrowse (and I just now noticed that he has sent a message to you directly, so I will finish off now).

Lincoln

On Mon, Jun 7, 2010 at 3:33 PM, Johnny Quest <[hidden email]> wrote:
Hello,

I've been trying to display the genome for N. crassa OR74A (http://www.broadinstitute.org/annotation/genome/neurospora/MultiHome.html) in gbrowse 1.69 (genes, annotations, proteins, etc.). So far I have not found a good way to display the sequence data (.fasta, .agp, .gtf are the available formats) in gbrowse, since they don't appear to provide any kind of GFF file or anything similar. Do any of you have any experience with display the data from the Broad Institute in gbrowse 1.xx? If so, I'd like to hear some suggestions as to how I might be able to get this to work. I use the MYSQL back-end, and normally use bp_bulk_load_gff.pl, specifying my database, fasta file(s) and the GFF file. 


p.s. When performing a feature search, there is an option to export the result in GFF format -- Unfortunately, this results in an error every time.



Best Regards,


R.S.W.


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Display Broad fungal data in gbrowse

Johnny Quest
In reply to this post by Jason Stajich-2
Hi Jason,

Thanks for the links you provided, I will check them out. 

Now as far as the Broad data -- what would be the logical step-wise process to displaying the data in gbrowse? Broad provides plenty of FASTA files for gene and sequence data, but only the single transcript.gtf file (to be converted to gff3). Don't I need more GFF files, or one GFF encompassing the entire genome? -- I converted the GTF file to gff3 just for testing purposes, but I get the 'landmark not found' issue in gbrowse (Using Bio::DB::SeqFeature::Store). Here's a tidbit of the converted GTF file (My e-mail formatting might not show the tabs correctly):

##gff-version 3
##date-created Mon Jun  7 15:30:53 2010
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29 NC10_CALLGENES_FINAL_2 gene 1167 2603 . - . ID=gene000000;Name=NCU10129
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29 NC10_CALLGENES_FINAL_2 mRNA 1167 2603 . - . ID=mRNA000000;Parent=gene000000;Name=NCU10129T0
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29 NC10_CALLGENES_FINAL_2 three_prime_utr 1167 1435 . - . ID=utr3000000;Parent=mRNA000000
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29 NC10_CALLGENES_FINAL_2 exon 1167 1449 . - . ID=exon000001;Parent=mRNA000000
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29 NC10_CALLGENES_FINAL_2 CDS 1436 1449 . - 2 ID=cds000001;Parent=mRNA000000

Here's a chunk from the corresponding .conf:

[GENERAL]
description = _Ncrassa - (Broad)
db_adaptor  = Bio::DB::SeqFeature::Store
db_args     = -adaptor DBI::mysql
              -dsn     dbi:mysql:database=NcrGTF;host=localhost
user        =nobody
passwd      =


In my first e-mail I mentioned that it's possible to export feature search results in GFF format from Broad -- I was able to complete such an export, and validated the gff3 (http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online), however, gbrowse still complains about landmarks not found. Here's a chunk from one of these:

##gff-version 3
CONTIG_200 NC10_CALLGENES_FINAL_2 gene 232569 235124 . + . ID=7000004871121765;Name=conserved hypothetical protein;annotations=CA-6086;featureClass=Gene;genomic=CCCATCTTCAAGACGACAATTGCTGACACGACGCGACAAACACAGAATCGCCAATCTCAGCTTTGATGCCGCGTGGCCGTCGCCATAAACACAATGATCCCGTCGGTATAGGGTTGTGCTCGCGTCCCAAACAAACGTACCTCGGCAAGACCTGAGCTTCAACTTCAAGGGGCTAGATGCCGATGGCCGAACTGGAAAAGTTGACTCGCAGTATTTCGCAAGTAGAAATATCTCCGAGTATCCGGGTTGAACGGGTTTCCACTTCCCAACTCCCAATCCCATCAATCACTGCGGGGCAACGGCTCAACGGCTTCCCACTCCGACCGCTAATACCCTGGGTGTTCGCATCGTCCAATATAGGCCGCACAACCTCCACTGACGGTTACCGTTCCCGATGAACTCTCGCCCTGTGAAGGCGATGGCTCCAATATTCTGACCATTAGGTATCTGTGCCCACTCTGTAAGGGGCCGCCGAAGTCAATGAGCGTTCGCAGACAAGCCTCGAAGGTCTTGGTTTGGGCTAAGGCCGGGGAAAAATCGGGGCCGCCTCAAGCTTTCACGACACGGAAGGGGAACAACCTGGAAAGAGGAGCTATCAGGGACTTAAGGCTCCACAGCATGCAGACGCGATTCTTCGGACCGACATTTTAATTGGTCTAGCATGTTGACCCCCTGATCATTGACAACCACATCCGTTGTTTGTTCCGGTGATATCGTGCGGTCGGGCATTATGTACCAGTTGCATGGTCTTAGACCCAAAACAGGAAGGCATCCCAGAGAATCCTTGTCTCTTGCGCACCTTCCACAGTGGACTGAACACACTGACTGTATCACACCAAGTTGTGCGTGCCAAATGATGGCCTCTTTTGCGGTGTTAAGTACAATGCTTCCCCCGAGTCCCCTCCTTAAATTCCACCCTGCCATGCTTCGGGCACAACGACCTTGACACAGACATCTGTCTACCTCTGATTACTTTCCCTACCCGGCGTCGCGAGCATGGCGTCCAGTCCCAAGAGCCCCAAGAGCGACCAAGCGTCCAACCCACCAGCACCCGTCGCGCCTGGTCACGGCCCAACACCGCTCACGGCCGAGGAACAAAACGCAGCTGGTCTTCTTCCAGCTTCGCATTGGGCAGCTCAACCTGTAAGTCACAGATGTAATGCATCTTCCATGACAAGCATTAACACTCTCGTTCGTAGCTCGAAGAAGACGATACAGTAGATGATGGCGCCTCTTCACTCGGCTCCTTCATTTCGAGCTCCGCTTCCTTAAGTTCGACTATCTTTCAGTACCGCACTATCCATGGAAGGACTTACCACGGTGATGTTGGCAATGCCGAGTCATATGAGCCCAATGACCAACGTCACGTCGAGGCCATGGAAATCTTGTAAGTAACAACTGTACCCATGGCTTTGCAGGATTGGAGTTGACCCTAGGCTGACGTTATACATGCATCAAAAAGCCACCACGCCATGTTGGTTCAGCTGGATGGAAAGCTCTACCTTTCACCACTTGATAAGAAGAGGATTCATAAAGTGCTGGACGTTGGGACAGGCAGTGGCCTGTGGGCCATGTAAGTTTTCCTGCACATTCTGTTCATTACACCCGAGCCGCCTGAAGCGACCATGCGATCGGTCCCCGTCCCCGCAACCGTGGATTACGAGCTTGGCATCCGTCACAGATCACAGAAGTCATCATGTCGAGCCACAACAACCGAGGGGACTTGCACCTTTGTCAAACTTGTCTTTACGCATTCGCTGTGACCGGCCACACAGTGCGGTGTCACCATCTTGATTACGACCGTACTAACCGACGCTTGCTATTTGTTGTAGTGATTTTGCCGACGAATACCCCAACACGGAAGTCATTGGCACCGACGTTTCCCCCATCCAGCCTTCGTGGGTTCCTCCCAATGTGAAGTTCGAAATCGACGATTGCAATCTAGACTGGACATATGCCGAGAATAGCTTCGATTTCATCCACATGCGCATGTTGGCAGGCGTTGTTAACGACTGGGATAAGCTGTTCCGTAACGCGTTCCGGTGTTGTAAGCCGGGCGGTTATGTAGAAAGCATTGGCAGTAGTATCCATTTCTTGAGTGATGATGGATCGGTTAAGGAAGGTACTGCTATGCATCAATGGGGCAAGGTTTTGGGCGAGGCTGGCAAGAAATTGGGAAGACCGTTCAATGTGTATGAGGACGATTTGCAACGCAAGGGTATGGAAGCGGCCGGATTTGTTGACATTGAGTTCAAGGACATTCAATGTCCCCTGGGGGTCTGGCATCCTGAGAAGAAAGCGGCAGAAAGGGGGCTGTGGTATAAGTTGGCAATCGAGGAAGATCTTGAGGGTAAGTTTCTTTCATCGAGACCAGATGTTGAGTTTATACTAACGAGTGGGGCAGGGTACCTCAACTACCTTCTCAATGTCGTCATGGGCTGGACTCCAGAGGAGACCAAGAGGTTTGCTGCCCACGCCAAGAAGGAGTGGAACAATCCCAAGATTCACGGCTATTTCTGGCTGCGTGTGATGTACGGTCGCAAACCAGAATAAAGT;locus=NCU04904.4;ontologyTerm=GO:0016036;organismName=Neurospora crassa OR74A (finished)
CONTIG_200 NC10_CALLGENES_FINAL_2 transcript 232569 235124 . + . ID=7000004871121773;Name=conserved hypothetical protein;Parent=7000004871121765;featureClass=Transcript;genomic=CCCATCTTCAAGACGACAATTGCTGACACGACGCGACAAACACAGAATCGCCAATCTCAGCTTTGATGCCGCGTGGCCGTCGCCATAAACACAATGATCCCGTCGGTATAGGGTTGTGCTCGCGTCCCAAACAAACGTACCTCGGCAAGACCTGAGCTTCAACTTCAAGGGGCTAGATGCCGATGGCCGAACTGGAAAAGTTGACTCGCAGTATTTCGCAAGTAGAAATATCTCCGAGTATCCGGGTTGAACGGGTTTCCACTTCCCAACTCCCAATCCCATCAATCACTGCGGGGCAACGGCTCAACGGCTTCCCACTCCGACCGCTAATACCCTGGGTGTTCGCATCGTCCAATATAGGCCGCACAACCTCCACTGACGGTTACCGTTCCCGATGAACTCTCGCCCTGTGAAGGCGATGGCTCCAATATTCTGACCATTAGGTATCTGTGCCCACTCTGTAAGGGGCCGCCGAAGTCAATGAGCGTTCGCAGACAAGCCTCGAAGGTCTTGGTTTGGGCTAAGGCCGGGGAAAAATCGGGGCCGCCTCAAGCTTTCACGACACGGAAGGGGAACAACCTGGAAAGAGGAGCTATCAGGGACTTAAGGCTCCACAGCATGCAGACGCGATTCTTCGGACCGACATTTTAATTGGTCTAGCATGTTGACCCCCTGATCATTGACAACCACATCCGTTGTTTGTTCCGGTGATATCGTGCGGTCGGGCATTATGTACCAGTTGCATGGTCTTAGACCCAAAACAGGAAGGCATCCCAGAGAATCCTTGTCTCTTGCGCACCTTCCACAGTGGACTGAACACACTGACTGTATCACACCAAGTTGTGCGTGCCAAATGATGGCCTCTTTTGCGGTGTTAAGTACAATGCTTCCCCCGAGTCCCCTCCTTAAATTCCACCCTGCCATGCTTCGGGCACAACGACCTTGACACAGACATCTGTCTACCTCTGATTACTTTCCCTACCCGGCGTCGCGAGCATGGCGTCCAGTCCCAAGAGCCCCAAGAGCGACCAAGCGTCCAACCCACCAGCACCCGTCGCGCCTGGTCACGGCCCAACACCGCTCACGGCCGAGGAACAAAACGCAGCTGGTCTTCTTCCAGCTTCGCATTGGGCAGCTCAACCTGTAAGTCACAGATGTAATGCATCTTCCATGACAAGCATTAACACTCTCGTTCGTAGCTCGAAGAAGACGATACAGTAGATGATGGCGCCTCTTCACTCGGCTCCTTCATTTCGAGCTCCGCTTCCTTAAGTTCGACTATCTTTCAGTACCGCACTATCCATGGAAGGACTTACCACGGTGATGTTGGCAATGCCGAGTCATATGAGCCCAATGACCAACGTCACGTCGAGGCCATGGAAATCTTGTAAGTAACAACTGTACCCATGGCTTTGCAGGATTGGAGTTGACCCTAGGCTGACGTTATACATGCATCAAAAAGCCACCACGCCATGTTGGTTCAGCTGGATGGAAAGCTCTACCTTTCACCACTTGATAAGAAGAGGATTCATAAAGTGCTGGACGTTGGGACAGGCAGTGGCCTGTGGGCCATGTAAGTTTTCCTGCACATTCTGTTCATTACACCCGAGCCGCCTGAAGCGACCATGCGATCGGTCCCCGTCCCCGCAACCGTGGATTACGAGCTTGGCATCCGTCACAGATCACAGAAGTCATCATGTCGAGCCACAACAACCGAGGGGACTTGCACCTTTGTCAAACTTGTCTTTACGCATTCGCTGTGACCGGCCACACAGTGCGGTGTCACCATCTTGATTACGACCGTACTAACCGACGCTTGCTATTTGTTGTAGTGATTTTGCCGACGAATACCCCAACACGGAAGTCATTGGCACCGACGTTTCCCCCATCCAGCCTTCGTGGGTTCCTCCCAATGTGAAGTTCGAAATCGACGATTGCAATCTAGACTGGACATATGCCGAGAATAGCTTCGATTTCATCCACATGCGCATGTTGGCAGGCGTTGTTAACGACTGGGATAAGCTGTTCCGTAACGCGTTCCGGTGTTGTAAGCCGGGCGGTTATGTAGAAAGCATTGGCAGTAGTATCCATTTCTTGAGTGATGATGGATCGGTTAAGGAAGGTACTGCTATGCATCAATGGGGCAAGGTTTTGGGCGAGGCTGGCAAGAAATTGGGAAGACCGTTCAATGTGTATGAGGACGATTTGCAACGCAAGGGTATGGAAGCGGCCGGATTTGTTGACATTGAGTTCAAGGACATTCAATGTCCCCTGGGGGTCTGGCATCCTGAGAAGAAAGCGGCAGAAAGGGGGCTGTGGTATAAGTTGGCAATCGAGGAAGATCTTGAGGGTAAGTTTCTTTCATCGAGACCAGATGTTGAGTTTATACTAACGAGTGGGGCAGGGTACCTCAACTACCTTCTCAATGTCGTCATGGGCTGGACTCCAGAGGAGACCAAGAGGTTTGCTGCCCACGCCAAGAAGGAGTGGAACAATCCCAAGATTCACGGCTATTTCTGGCTGCGTGTGATGTACGGTCGCAAACCAGAATAAAGT;organismName=Neurospora crassa OR74A (finished)
CONTIG_200 NC10_CALLGENES_FINAL_2 exon 234973 235124 . + . ID=7000004871121778;Parent=7000004871121773;featureClass=Exon;genomic=GGTACCTCAACTACCTTCTCAATGTCGTCATGGGCTGGACTCCAGAGGAGACCAAGAGGTTTGCTGCCCACGCCAAGAAGGAGTGGAACAATCCCAAGATTCACGGCTATTTCTGGCTGCGTGTGATGTACGGTCGCAAACCAGAATAAAGT;organismName=Neurospora crassa OR74A (finished)
CONTIG_200 NC10_CALLGENES_FINAL_2 exon 234401 234918 . + . ID=7000004871121777;Parent=7000004871121773;featureClass=Exon;genomic=TGATTTTGCCGACGAATACCCCAACACGGAAGTCATTGGCACCGACGTTTCCCCCATCCAGCCTTCGTGGGTTCCTCCCAATGTGAAGTTCGAAATCGACGATTGCAATCTAGACTGGACATATGCCGAGAATAGCTTCGATTTCATCCACATGCGCATGTTGGCAGGCGTTGTTAACGACTGGGATAAGCTGTTCCGTAACGCGTTCCGGTGTTGTAAGCCGGGCGGTTATGTAGAAAGCATTGGCAGTAGTATCCATTTCTTGAGTGATGATGGATCGGTTAAGGAAGGTACTGCTATGCATCAATGGGGCAAGGTTTTGGGCGAGGCTGGCAAGAAATTGGGAAGACCGTTCAATGTGTATGAGGACGATTTGCAACGCAAGGGTATGGAAGCGGCCGGATTTGTTGACATTGAGTTCAAGGACATTCAATGTCCCCTGGGGGTCTGGCATCCTGAGAAGAAAGCGGCAGAAAGGGGGCTGTGGTATAAGTTGGCAATCGAGGAAGATCTTGAGG;organismName=Neurospora crassa OR74A (finished)


Please excuse the e-mail formatting, as you can see - the file included sequence data as well. I've attached a screenshot.

Do you guys think I would be better off trying to display this data is gbrowse 2, or can it be done in 1.xx?


Best Regards,


R.S.W.

On Mon, Jun 7, 2010 at 3:16 PM, Jason Stajich <[hidden email]> wrote:
Seems like you best bet is to ask the Broad Institute... However some of us have converted the data into use in Gbrowse.

agp is just the assembly info I don't know why you want that.

You can convert gtf to gff3 with this script that works on the broad and JGI GTF outputs (which are different): It is geared towards Gbrowse2 and Bio::DB::SeqFeature usage though.
http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl

I have older GFF3 for Neurospora and other fungi here and are working on updating these.

http://fungalgenomes.org/data/GFF/
http://fungalgenomes.org/data/NT/

You may also try microbesonline.org

-jason

Johnny Quest wrote, On 6/7/10 12:33 PM:
Hello,

I've been trying to display the genome for N. crassa OR74A (
http://www.broadinstitute.org/annotation/genome/neurospora/MultiHome.html)
in gbrowse 1.69 (genes, annotations, proteins, etc.). So far I have not
found a good way to display the sequence data (.fasta, .agp, .gtf are the
available formats) in gbrowse, since they don't appear to provide any kind
of GFF file or anything similar. Do any of you have any experience with
display the data from the Broad Institute in gbrowse 1.xx? If so, I'd like
to hear some suggestions as to how I might be able to get this to work. I
use the MYSQL back-end, and normally use bp_bulk_load_gff.pl, specifying my
database, fasta file(s) and the GFF file.


p.s. When performing a feature search, there is an option to export the
result in GFF format -- Unfortunately, this results in an error every time.



Best Regards,


R.S.W.

  

------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo

_______________________________________________ Gmod-gbrowse mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Ncrassa-GFF.png (1M) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Display Broad fungal data in gbrowse

Jason Stajich-2
You also need to generate the scaffolds file -

FYI I rename the contigs in the transcript gtf/gff3 to just be
supercontig10.1 ...
instead of what I think is wrongly done the way they name them in the GFF file (not using the 1st part of the name after the > in the FASTA file like most people do)
"supercontig10.1 of Neurospora crassa OR74A"

So I fix the gtf with a simple perl one liner (perl -i -p -e 's/%20of%20Neurospora%20crassa%20%28OR74A%29//' transcripts.gtf)  to remove that extra bit and run the gtf to gff3 script I mentioned before.

I also generate the scaffold gff3 file with this script which you give it the FASTA genome
http://github.com/hyphaltip/genome-scripts/blob/master/gbrowse_tools/fasta_to_gbrowse_scaffold.pl

I think if you are starting out - might as well go to GFF3 and Bio::DB::SeqFeature since that is really the future and though the loading is slower perf is faster on the running side.

If you are still stuck I can post my transformed GFF3 files but you should be able to build this from the original data.

The only features the Broad site makes available for download in one single file are the transcripts - so there isn't information about the repeats, protein domains, etc - but I just map and run these analyses myself to have the most up to date for our work. For JGI data it is the same boat, though their GTF is slightly different. I am still standardizing the approaches but that will be happening in the near future.

-jason
Johnny Quest wrote, On 6/8/10 8:51 AM:
Hi Jason,

Thanks for the links you provided, I will check them out.

Now as far as the Broad data -- what would be the logical step-wise process
to displaying the data in gbrowse? Broad provides plenty of FASTA files for
gene and sequence data, but only the single transcript.gtf file (to be
converted to gff3). Don't I need more GFF files, or one GFF encompassing the
entire genome? -- I converted the GTF file to gff3 just for testing
purposes, but I get the 'landmark not found' issue in gbrowse (Using
Bio::DB::SeqFeature::Store). Here's a tidbit of the converted GTF file (My
e-mail formatting might not show the tabs correctly):

##gff-version 3
##date-created Mon Jun  7 15:30:53 2010
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29
NC10_CALLGENES_FINAL_2 gene 1167 2603 . - . ID=gene000000;Name=NCU10129
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29
NC10_CALLGENES_FINAL_2 mRNA 1167 2603 . - .
ID=mRNA000000;Parent=gene000000;Name=NCU10129T0
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29
NC10_CALLGENES_FINAL_2 three_prime_utr 1167 1435 . - .
ID=utr3000000;Parent=mRNA000000
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29
NC10_CALLGENES_FINAL_2 exon 1167 1449 . - . ID=exon000001;Parent=mRNA000000
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29
NC10_CALLGENES_FINAL_2 CDS 1436 1449 . - 2 ID=cds000001;Parent=mRNA000000

Here's a chunk from the corresponding .conf:

[GENERAL]
description = _Ncrassa - (Broad)
db_adaptor  = Bio::DB::SeqFeature::Store
db_args     = -adaptor DBI::mysql
              -dsn     dbi:mysql:database=NcrGTF;host=localhost
user        =nobody
passwd      =


In my first e-mail I mentioned that it's possible to export feature search
results in GFF format from Broad -- I was able to complete such an export,
and validated the gff3 (
http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online), however, gbrowse
still complains about landmarks not found. Here's a chunk from one of these:

##gff-version 3
CONTIG_200 NC10_CALLGENES_FINAL_2 gene 232569 235124 . + .
ID=7000004871121765;Name=conserved
hypothetical
protein;annotations=CA-6086;featureClass=Gene;genomic=CCCATCTTCAAGACGACAATTGCTGACACGACGCGACAAACACAGAATCGCCAATCTCAGCTTTGATGCCGCGTGGCCGTCGCCATAAACACAATGATCCCGTCGGTATAGGGTTGTGCTCGCGTCCCAAACAAACGTACCTCGGCAAGACCTGAGCTTCAACTTCAAGGGGCTAGATGCCGATGGCCGAACTGGAAAAGTTGACTCGCAGTATTTCGCAAGTAGAAATATCTCCGAGTATCCGGGTTGAACGGGTTTCCACTTCCCAACTCCCAATCCCATCAATCACTGCGGGGCAACGGCTCAACGGCTTCCCACTCCGACCGCTAATACCCTGGGTGTTCGCATCGTCCAATATAGGCCGCACAACCTCCACTGACGGTTACCGTTCCCGATGAACTCTCGCCCTGTGAAGGCGATGGCTCCAATATTCTGACCATTAGGTATCTGTGCCCACTCTGTAAGGGGCCGCCGAAGTCAATGAGCGTTCGCAGACAAGCCTCGAAGGTCTTGGTTTGGGCTAAGGCCGGGGAAAAATCGGGGCCGCCTCAAGCTTTCACGACACGGAAGGGGAACAACCTGGAAAGAGGAGCTATCAGGGACTTAAGGCTCCACAGCATGCAGACGCGATTCTTCGGACCGACATTTTAATTGGTCTAGCATGTTGACCCCCTGATCATTGACAACCACATCCGTTGTTTGTTCCGGTGATATCGTGCGGTCGGGCATTATGTACCAGTTGCATGGTCTTAGACCCAAAACAGGAAGGCATCCCAGAGAATCCTTGTCTCTTGCGCACCTTCCACAGTGGACTGAACACACTGACTGTATCACACCAAGTTGTGCGTGCCAAATGATGGCCTCTTTTGCGGTGTTAAGTACAATGCTTCCCCCGAGTCCCCTCCTTAAATTCCACCCTGCCATGCTTCGGGCACA
ACGACCTTGACACAGACATCTGTCTACCTCTGATTACTTTCCCTACCCGGCGTCGCGAGCATGGCGTCCAGTCCCAAGAGCCCCAAGAGCGACCAAGCGTCCAACCCACCAGCACCCGTCGCGCCTGGTCACGGCCCAACACCGCTCACGGCCGAGGAACAAAACGCAGCTGGTCTTCTTCCAGCTTCGCATTGGGCAGCTCAACCTGTAAGTCACAGATGTAATGCATCTTCCATGACAAGCATTAACACTCTCGTTCGTAGCTCGAAGAAGACGATACAGTAGATGATGGCGCCTCTTCACTCGGCTCCTTCATTTCGAGCTCCGCTTCCTTAAGTTCGACTATCTTTCAGTACCGCACTATCCATGGAAGGACTTACCACGGTGATGTTGGCAATGCCGAGTCATATGAGCCCAATGACCAACGTCACGTCGAGGCCATGGAAATCTTGTAAGTAACAACTGTACCCATGGCTTTGCAGGATTGGAGTTGACCCTAGGCTGACGTTATACATGCATCAAAAAGCCACCACGCCATGTTGGTTCAGCTGGATGGAAAGCTCTACCTTTCACCACTTGATAAGAAGAGGATTCATAAAGTGCTGGACGTTGGGACAGGCAGTGGCCTGTGGGCCATGTAAGTTTTCCTGCACATTCTGTTCATTACACCCGAGCCGCCTGAAGCGACCATGCGATCGGTCCCCGTCCCCGCAACCGTGGATTACGAGCTTGGCATCCGTCACAGATCACAGAAGTCATCATGTCGAGCCACAACAACCGAGGGGACTTGCACCTTTGTCAAACTTGTCTTTACGCATTCGCTGTGACCGGCCACACAGTGCGGTGTCACCATCTTGATTACGACCGTACTAACCGACGCTTGCTATTTGTTGTAGTGATTTTGCCGACGAATACCCCAACACGGAAGTCATTGGCACCGACGTTTCCCCCATCCAGCCTTCGTGGGTTCCTCCCAATGTGAAGTTCGAA
ATCGACGATTGCAATCTAGACTGGACATATGCCGAGAATAGCTTCGATTTCATCCACATGCGCATGTTGGCAGGCGTTGTTAACGACTGGGATAAGCTGTTCCGTAACGCGTTCCGGTGTTGTAAGCCGGGCGGTTATGTAGAAAGCATTGGCAGTAGTATCCATTTCTTGAGTGATGATGGATCGGTTAAGGAAGGTACTGCTATGCATCAATGGGGCAAGGTTTTGGGCGAGGCTGGCAAGAAATTGGGAAGACCGTTCAATGTGTATGAGGACGATTTGCAACGCAAGGGTATGGAAGCGGCCGGATTTGTTGACATTGAGTTCAAGGACATTCAATGTCCCCTGGGGGTCTGGCATCCTGAGAAGAAAGCGGCAGAAAGGGGGCTGTGGTATAAGTTGGCAATCGAGGAAGATCTTGAGGGTAAGTTTCTTTCATCGAGACCAGATGTTGAGTTTATACTAACGAGTGGGGCAGGGTACCTCAACTACCTTCTCAATGTCGTCATGGGCTGGACTCCAGAGGAGACCAAGAGGTTTGCTGCCCACGCCAAGAAGGAGTGGAACAATCCCAAGATTCACGGCTATTTCTGGCTGCGTGTGATGTACGGTCGCAAACCAGAATAAAGT;locus=NCU04904.4;ontologyTerm=GO:0016036;organismName=Neurospora
crassa OR74A (finished)
CONTIG_200 NC10_CALLGENES_FINAL_2 transcript 232569 235124 . + .
ID=7000004871121773;Name=conserved
hypothetical
protein;Parent=7000004871121765;featureClass=Transcript;genomic=CCCATCTTCAAGACGACAATTGCTGACACGACGCGACAAACACAGAATCGCCAATCTCAGCTTTGATGCCGCGTGGCCGTCGCCATAAACACAATGATCCCGTCGGTATAGGGTTGTGCTCGCGTCCCAAACAAACGTACCTCGGCAAGACCTGAGCTTCAACTTCAAGGGGCTAGATGCCGATGGCCGAACTGGAAAAGTTGACTCGCAGTATTTCGCAAGTAGAAATATCTCCGAGTATCCGGGTTGAACGGGTTTCCACTTCCCAACTCCCAATCCCATCAATCACTGCGGGGCAACGGCTCAACGGCTTCCCACTCCGACCGCTAATACCCTGGGTGTTCGCATCGTCCAATATAGGCCGCACAACCTCCACTGACGGTTACCGTTCCCGATGAACTCTCGCCCTGTGAAGGCGATGGCTCCAATATTCTGACCATTAGGTATCTGTGCCCACTCTGTAAGGGGCCGCCGAAGTCAATGAGCGTTCGCAGACAAGCCTCGAAGGTCTTGGTTTGGGCTAAGGCCGGGGAAAAATCGGGGCCGCCTCAAGCTTTCACGACACGGAAGGGGAACAACCTGGAAAGAGGAGCTATCAGGGACTTAAGGCTCCACAGCATGCAGACGCGATTCTTCGGACCGACATTTTAATTGGTCTAGCATGTTGACCCCCTGATCATTGACAACCACATCCGTTGTTTGTTCCGGTGATATCGTGCGGTCGGGCATTATGTACCAGTTGCATGGTCTTAGACCCAAAACAGGAAGGCATCCCAGAGAATCCTTGTCTCTTGCGCACCTTCCACAGTGGACTGAACACACTGACTGTATCACACCAAGTTGTGCGTGCCAAATGATGGCCTCTTTTGCGGTGTTAAGTACAATGCTTCCCCCGAGTCCCCTCCTTAAATTCCACCCTGCCATGC
TTCGGGCACAACGACCTTGACACAGACATCTGTCTACCTCTGATTACTTTCCCTACCCGGCGTCGCGAGCATGGCGTCCAGTCCCAAGAGCCCCAAGAGCGACCAAGCGTCCAACCCACCAGCACCCGTCGCGCCTGGTCACGGCCCAACACCGCTCACGGCCGAGGAACAAAACGCAGCTGGTCTTCTTCCAGCTTCGCATTGGGCAGCTCAACCTGTAAGTCACAGATGTAATGCATCTTCCATGACAAGCATTAACACTCTCGTTCGTAGCTCGAAGAAGACGATACAGTAGATGATGGCGCCTCTTCACTCGGCTCCTTCATTTCGAGCTCCGCTTCCTTAAGTTCGACTATCTTTCAGTACCGCACTATCCATGGAAGGACTTACCACGGTGATGTTGGCAATGCCGAGTCATATGAGCCCAATGACCAACGTCACGTCGAGGCCATGGAAATCTTGTAAGTAACAACTGTACCCATGGCTTTGCAGGATTGGAGTTGACCCTAGGCTGACGTTATACATGCATCAAAAAGCCACCACGCCATGTTGGTTCAGCTGGATGGAAAGCTCTACCTTTCACCACTTGATAAGAAGAGGATTCATAAAGTGCTGGACGTTGGGACAGGCAGTGGCCTGTGGGCCATGTAAGTTTTCCTGCACATTCTGTTCATTACACCCGAGCCGCCTGAAGCGACCATGCGATCGGTCCCCGTCCCCGCAACCGTGGATTACGAGCTTGGCATCCGTCACAGATCACAGAAGTCATCATGTCGAGCCACAACAACCGAGGGGACTTGCACCTTTGTCAAACTTGTCTTTACGCATTCGCTGTGACCGGCCACACAGTGCGGTGTCACCATCTTGATTACGACCGTACTAACCGACGCTTGCTATTTGTTGTAGTGATTTTGCCGACGAATACCCCAACACGGAAGTCATTGGCACCGACGTTTCCCCCATCCAGCCTTCGTGGGTTCCTCCCAATGT
GAAGTTCGAAATCGACGATTGCAATCTAGACTGGACATATGCCGAGAATAGCTTCGATTTCATCCACATGCGCATGTTGGCAGGCGTTGTTAACGACTGGGATAAGCTGTTCCGTAACGCGTTCCGGTGTTGTAAGCCGGGCGGTTATGTAGAAAGCATTGGCAGTAGTATCCATTTCTTGAGTGATGATGGATCGGTTAAGGAAGGTACTGCTATGCATCAATGGGGCAAGGTTTTGGGCGAGGCTGGCAAGAAATTGGGAAGACCGTTCAATGTGTATGAGGACGATTTGCAACGCAAGGGTATGGAAGCGGCCGGATTTGTTGACATTGAGTTCAAGGACATTCAATGTCCCCTGGGGGTCTGGCATCCTGAGAAGAAAGCGGCAGAAAGGGGGCTGTGGTATAAGTTGGCAATCGAGGAAGATCTTGAGGGTAAGTTTCTTTCATCGAGACCAGATGTTGAGTTTATACTAACGAGTGGGGCAGGGTACCTCAACTACCTTCTCAATGTCGTCATGGGCTGGACTCCAGAGGAGACCAAGAGGTTTGCTGCCCACGCCAAGAAGGAGTGGAACAATCCCAAGATTCACGGCTATTTCTGGCTGCGTGTGATGTACGGTCGCAAACCAGAATAAAGT;organismName=Neurospora
crassa OR74A (finished)
CONTIG_200 NC10_CALLGENES_FINAL_2 exon 234973 235124 . + .
ID=7000004871121778;Parent=7000004871121773;featureClass=Exon;genomic=GGTACCTCAACTACCTTCTCAATGTCGTCATGGGCTGGACTCCAGAGGAGACCAAGAGGTTTGCTGCCCACGCCAAGAAGGAGTGGAACAATCCCAAGATTCACGGCTATTTCTGGCTGCGTGTGATGTACGGTCGCAAACCAGAATAAAGT;organismName=Neurospora
crassa OR74A (finished)
CONTIG_200 NC10_CALLGENES_FINAL_2 exon 234401 234918 . + .
ID=7000004871121777;Parent=7000004871121773;featureClass=Exon;genomic=TGATTTTGCCGACGAATACCCCAACACGGAAGTCATTGGCACCGACGTTTCCCCCATCCAGCCTTCGTGGGTTCCTCCCAATGTGAAGTTCGAAATCGACGATTGCAATCTAGACTGGACATATGCCGAGAATAGCTTCGATTTCATCCACATGCGCATGTTGGCAGGCGTTGTTAACGACTGGGATAAGCTGTTCCGTAACGCGTTCCGGTGTTGTAAGCCGGGCGGTTATGTAGAAAGCATTGGCAGTAGTATCCATTTCTTGAGTGATGATGGATCGGTTAAGGAAGGTACTGCTATGCATCAATGGGGCAAGGTTTTGGGCGAGGCTGGCAAGAAATTGGGAAGACCGTTCAATGTGTATGAGGACGATTTGCAACGCAAGGGTATGGAAGCGGCCGGATTTGTTGACATTGAGTTCAAGGACATTCAATGTCCCCTGGGGGTCTGGCATCCTGAGAAGAAAGCGGCAGAAAGGGGGCTGTGGTATAAGTTGGCAATCGAGGAAGATCTTGAGG;organismName=Neurospora
crassa OR74A (finished)


Please excuse the e-mail formatting, as you can see - the file included
sequence data as well. I've attached a screenshot.

Do you guys think I would be better off trying to display this data is
gbrowse 2, or can it be done in 1.xx?


Best Regards,


R.S.W.

On Mon, Jun 7, 2010 at 3:16 PM, Jason Stajich [hidden email] wrote:

  
 Seems like you best bet is to ask the Broad Institute... However some of
us have converted the data into use in Gbrowse.

agp is just the assembly info I don't know why you want that.

You can convert gtf to gff3 with this script that works on the broad and
JGI GTF outputs (which are different): It is geared towards Gbrowse2 and
Bio::DB::SeqFeature usage though.

http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl

I have older GFF3 for Neurospora and other fungi here and are working on
updating these.

http://fungalgenomes.org/data/GFF/
http://fungalgenomes.org/data/NT/

You may also try microbesonline.org

-jason

Johnny Quest wrote, On 6/7/10 12:33 PM:

Hello,

I've been trying to display the genome for N. crassa OR74A (http://www.broadinstitute.org/annotation/genome/neurospora/MultiHome.html)
in gbrowse 1.69 (genes, annotations, proteins, etc.). So far I have not
found a good way to display the sequence data (.fasta, .agp, .gtf are the
available formats) in gbrowse, since they don't appear to provide any kind
of GFF file or anything similar. Do any of you have any experience with
display the data from the Broad Institute in gbrowse 1.xx? If so, I'd like
to hear some suggestions as to how I might be able to get this to work. I
use the MYSQL back-end, and normally use bp_bulk_load_gff.pl, specifying my
database, fasta file(s) and the GFF file.


p.s. When performing a feature search, there is an option to export the
result in GFF format -- Unfortunately, this results in an error every time.



Best Regards,


R.S.W.



------------------------------

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo

------------------------------

_______________________________________________
Gmod-gbrowse mailing [hidden email]


    

  

------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Display Broad fungal data in gbrowse

Johnny Quest
Hello Jason,

Thank a lot for this information. It was exactly what I'd been looking for! I actually noticed the naming issue in the first column myself, and I went through and added the scaffold ranges for each supercontig manually -- it's good to have a script for that now! I loaded up the appropriate fasta sequences as well, and everything appears to be working. 


Thanks again,


R.S.W.


On Tue, Jun 8, 2010 at 2:09 PM, Jason Stajich <[hidden email]> wrote:
You also need to generate the scaffolds file -

FYI I rename the contigs in the transcript gtf/gff3 to just be
supercontig10.1 ...
instead of what I think is wrongly done the way they name them in the GFF file (not using the 1st part of the name after the > in the FASTA file like most people do)
"supercontig10.1 of Neurospora crassa OR74A"

So I fix the gtf with a simple perl one liner (perl -i -p -e 's/%20of%20Neurospora%20crassa%20%28OR74A%29//' transcripts.gtf)  to remove that extra bit and run the gtf to gff3 script I mentioned before.

I also generate the scaffold gff3 file with this script which you give it the FASTA genome
http://github.com/hyphaltip/genome-scripts/blob/master/gbrowse_tools/fasta_to_gbrowse_scaffold.pl

I think if you are starting out - might as well go to GFF3 and Bio::DB::SeqFeature since that is really the future and though the loading is slower perf is faster on the running side.

If you are still stuck I can post my transformed GFF3 files but you should be able to build this from the original data.

The only features the Broad site makes available for download in one single file are the transcripts - so there isn't information about the repeats, protein domains, etc - but I just map and run these analyses myself to have the most up to date for our work. For JGI data it is the same boat, though their GTF is slightly different. I am still standardizing the approaches but that will be happening in the near future.

-jason
Johnny Quest wrote, On 6/8/10 8:51 AM:
Hi Jason,

Thanks for the links you provided, I will check them out.

Now as far as the Broad data -- what would be the logical step-wise process
to displaying the data in gbrowse? Broad provides plenty of FASTA files for
gene and sequence data, but only the single transcript.gtf file (to be
converted to gff3). Don't I need more GFF files, or one GFF encompassing the
entire genome? -- I converted the GTF file to gff3 just for testing
purposes, but I get the 'landmark not found' issue in gbrowse (Using
Bio::DB::SeqFeature::Store). Here's a tidbit of the converted GTF file (My
e-mail formatting might not show the tabs correctly):

##gff-version 3
##date-created Mon Jun  7 15:30:53 2010
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29
NC10_CALLGENES_FINAL_2 gene 1167 2603 . - . ID=gene000000;Name=NCU10129
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29
NC10_CALLGENES_FINAL_2 mRNA 1167 2603 . - .
ID=mRNA000000;Parent=gene000000;Name=NCU10129T0
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29
NC10_CALLGENES_FINAL_2 three_prime_utr 1167 1435 . - .
ID=utr3000000;Parent=mRNA000000
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29
NC10_CALLGENES_FINAL_2 exon 1167 1449 . - . ID=exon000001;Parent=mRNA000000
supercont10.1%20of%20Neurospora%20crassa%20%28OR74A%29
NC10_CALLGENES_FINAL_2 CDS 1436 1449 . - 2 ID=cds000001;Parent=mRNA000000

Here's a chunk from the corresponding .conf:

[GENERAL]
description = _Ncrassa - (Broad)
db_adaptor  = Bio::DB::SeqFeature::Store
db_args     = -adaptor DBI::mysql
              -dsn     dbi:mysql:database=NcrGTF;host=localhost
user        =nobody
passwd      =


In my first e-mail I mentioned that it's possible to export feature search
results in GFF format from Broad -- I was able to complete such an export,
and validated the gff3 (
http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online), however, gbrowse
still complains about landmarks not found. Here's a chunk from one of these:

##gff-version 3
CONTIG_200 NC10_CALLGENES_FINAL_2 gene 232569 235124 . + .
ID=7000004871121765;Name=conserved
hypothetical
protein;annotations=CA-6086;featureClass=Gene;genomic=CCCATCTTCAAGACGACAATTGCTGACACGACGCGACAAACACAGAATCGCCAATCTCAGCTTTGATGCCGCGTGGCCGTCGCCATAAACACAATGATCCCGTCGGTATAGGGTTGTGCTCGCGTCCCAAACAAACGTACCTCGGCAAGACCTGAGCTTCAACTTCAAGGGGCTAGATGCCGATGGCCGAACTGGAAAAGTTGACTCGCAGTATTTCGCAAGTAGAAATATCTCCGAGTATCCGGGTTGAACGGGTTTCCACTTCCCAACTCCCAATCCCATCAATCACTGCGGGGCAACGGCTCAACGGCTTCCCACTCCGACCGCTAATACCCTGGGTGTTCGCATCGTCCAATATAGGCCGCACAACCTCCACTGACGGTTACCGTTCCCGATGAACTCTCGCCCTGTGAAGGCGATGGCTCCAATATTCTGACCATTAGGTATCTGTGCCCACTCTGTAAGGGGCCGCCGAAGTCAATGAGCGTTCGCAGACAAGCCTCGAAGGTCTTGGTTTGGGCTAAGGCCGGGGAAAAATCGGGGCCGCCTCAAGCTTTCACGACACGGAAGGGGAACAACCTGGAAAGAGGAGCTATCAGGGACTTAAGGCTCCACAGCATGCAGACGCGATTCTTCGGACCGACATTTTAATTGGTCTAGCATGTTGACCCCCTGATCATTGACAACCACATCCGTTGTTTGTTCCGGTGATATCGTGCGGTCGGGCATTATGTACCAGTTGCATGGTCTTAGACCCAAAACAGGAAGGCATCCCAGAGAATCCTTGTCTCTTGCGCACCTTCCACAGTGGACTGAACACACTGACTGTATCACACCAAGTTGTGCGTGCCAAATGATGGCCTCTTTTGCGGTGTTAAGTACAATGCTTCCCCCGAGTCCCCTCCTTAAATTCCACCCTGCCATGCTTCGGGCACA
ACGACCTTGACACAGACATCTGTCTACCTCTGATTACTTTCCCTACCCGGCGTCGCGAGCATGGCGTCCAGTCCCAAGAGCCCCAAGAGCGACCAAGCGTCCAACCCACCAGCACCCGTCGCGCCTGGTCACGGCCCAACACCGCTCACGGCCGAGGAACAAAACGCAGCTGGTCTTCTTCCAGCTTCGCATTGGGCAGCTCAACCTGTAAGTCACAGATGTAATGCATCTTCCATGACAAGCATTAACACTCTCGTTCGTAGCTCGAAGAAGACGATACAGTAGATGATGGCGCCTCTTCACTCGGCTCCTTCATTTCGAGCTCCGCTTCCTTAAGTTCGACTATCTTTCAGTACCGCACTATCCATGGAAGGACTTACCACGGTGATGTTGGCAATGCCGAGTCATATGAGCCCAATGACCAACGTCACGTCGAGGCCATGGAAATCTTGTAAGTAACAACTGTACCCATGGCTTTGCAGGATTGGAGTTGACCCTAGGCTGACGTTATACATGCATCAAAAAGCCACCACGCCATGTTGGTTCAGCTGGATGGAAAGCTCTACCTTTCACCACTTGATAAGAAGAGGATTCATAAAGTGCTGGACGTTGGGACAGGCAGTGGCCTGTGGGCCATGTAAGTTTTCCTGCACATTCTGTTCATTACACCCGAGCCGCCTGAAGCGACCATGCGATCGGTCCCCGTCCCCGCAACCGTGGATTACGAGCTTGGCATCCGTCACAGATCACAGAAGTCATCATGTCGAGCCACAACAACCGAGGGGACTTGCACCTTTGTCAAACTTGTCTTTACGCATTCGCTGTGACCGGCCACACAGTGCGGTGTCACCATCTTGATTACGACCGTACTAACCGACGCTTGCTATTTGTTGTAGTGATTTTGCCGACGAATACCCCAACACGGAAGTCATTGGCACCGACGTTTCCCCCATCCAGCCTTCGTGGGTTCCTCCCAATGTGAAGTTCGAA
ATCGACGATTGCAATCTAGACTGGACATATGCCGAGAATAGCTTCGATTTCATCCACATGCGCATGTTGGCAGGCGTTGTTAACGACTGGGATAAGCTGTTCCGTAACGCGTTCCGGTGTTGTAAGCCGGGCGGTTATGTAGAAAGCATTGGCAGTAGTATCCATTTCTTGAGTGATGATGGATCGGTTAAGGAAGGTACTGCTATGCATCAATGGGGCAAGGTTTTGGGCGAGGCTGGCAAGAAATTGGGAAGACCGTTCAATGTGTATGAGGACGATTTGCAACGCAAGGGTATGGAAGCGGCCGGATTTGTTGACATTGAGTTCAAGGACATTCAATGTCCCCTGGGGGTCTGGCATCCTGAGAAGAAAGCGGCAGAAAGGGGGCTGTGGTATAAGTTGGCAATCGAGGAAGATCTTGAGGGTAAGTTTCTTTCATCGAGACCAGATGTTGAGTTTATACTAACGAGTGGGGCAGGGTACCTCAACTACCTTCTCAATGTCGTCATGGGCTGGACTCCAGAGGAGACCAAGAGGTTTGCTGCCCACGCCAAGAAGGAGTGGAACAATCCCAAGATTCACGGCTATTTCTGGCTGCGTGTGATGTACGGTCGCAAACCAGAATAAAGT;locus=NCU04904.4;ontologyTerm=GO:0016036;organismName=Neurospora
crassa OR74A (finished)
CONTIG_200 NC10_CALLGENES_FINAL_2 transcript 232569 235124 . + .
ID=7000004871121773;Name=conserved
hypothetical
protein;Parent=7000004871121765;featureClass=Transcript;genomic=CCCATCTTCAAGACGACAATTGCTGACACGACGCGACAAACACAGAATCGCCAATCTCAGCTTTGATGCCGCGTGGCCGTCGCCATAAACACAATGATCCCGTCGGTATAGGGTTGTGCTCGCGTCCCAAACAAACGTACCTCGGCAAGACCTGAGCTTCAACTTCAAGGGGCTAGATGCCGATGGCCGAACTGGAAAAGTTGACTCGCAGTATTTCGCAAGTAGAAATATCTCCGAGTATCCGGGTTGAACGGGTTTCCACTTCCCAACTCCCAATCCCATCAATCACTGCGGGGCAACGGCTCAACGGCTTCCCACTCCGACCGCTAATACCCTGGGTGTTCGCATCGTCCAATATAGGCCGCACAACCTCCACTGACGGTTACCGTTCCCGATGAACTCTCGCCCTGTGAAGGCGATGGCTCCAATATTCTGACCATTAGGTATCTGTGCCCACTCTGTAAGGGGCCGCCGAAGTCAATGAGCGTTCGCAGACAAGCCTCGAAGGTCTTGGTTTGGGCTAAGGCCGGGGAAAAATCGGGGCCGCCTCAAGCTTTCACGACACGGAAGGGGAACAACCTGGAAAGAGGAGCTATCAGGGACTTAAGGCTCCACAGCATGCAGACGCGATTCTTCGGACCGACATTTTAATTGGTCTAGCATGTTGACCCCCTGATCATTGACAACCACATCCGTTGTTTGTTCCGGTGATATCGTGCGGTCGGGCATTATGTACCAGTTGCATGGTCTTAGACCCAAAACAGGAAGGCATCCCAGAGAATCCTTGTCTCTTGCGCACCTTCCACAGTGGACTGAACACACTGACTGTATCACACCAAGTTGTGCGTGCCAAATGATGGCCTCTTTTGCGGTGTTAAGTACAATGCTTCCCCCGAGTCCCCTCCTTAAATTCCACCCTGCCATGC
TTCGGGCACAACGACCTTGACACAGACATCTGTCTACCTCTGATTACTTTCCCTACCCGGCGTCGCGAGCATGGCGTCCAGTCCCAAGAGCCCCAAGAGCGACCAAGCGTCCAACCCACCAGCACCCGTCGCGCCTGGTCACGGCCCAACACCGCTCACGGCCGAGGAACAAAACGCAGCTGGTCTTCTTCCAGCTTCGCATTGGGCAGCTCAACCTGTAAGTCACAGATGTAATGCATCTTCCATGACAAGCATTAACACTCTCGTTCGTAGCTCGAAGAAGACGATACAGTAGATGATGGCGCCTCTTCACTCGGCTCCTTCATTTCGAGCTCCGCTTCCTTAAGTTCGACTATCTTTCAGTACCGCACTATCCATGGAAGGACTTACCACGGTGATGTTGGCAATGCCGAGTCATATGAGCCCAATGACCAACGTCACGTCGAGGCCATGGAAATCTTGTAAGTAACAACTGTACCCATGGCTTTGCAGGATTGGAGTTGACCCTAGGCTGACGTTATACATGCATCAAAAAGCCACCACGCCATGTTGGTTCAGCTGGATGGAAAGCTCTACCTTTCACCACTTGATAAGAAGAGGATTCATAAAGTGCTGGACGTTGGGACAGGCAGTGGCCTGTGGGCCATGTAAGTTTTCCTGCACATTCTGTTCATTACACCCGAGCCGCCTGAAGCGACCATGCGATCGGTCCCCGTCCCCGCAACCGTGGATTACGAGCTTGGCATCCGTCACAGATCACAGAAGTCATCATGTCGAGCCACAACAACCGAGGGGACTTGCACCTTTGTCAAACTTGTCTTTACGCATTCGCTGTGACCGGCCACACAGTGCGGTGTCACCATCTTGATTACGACCGTACTAACCGACGCTTGCTATTTGTTGTAGTGATTTTGCCGACGAATACCCCAACACGGAAGTCATTGGCACCGACGTTTCCCCCATCCAGCCTTCGTGGGTTCCTCCCAATGT
GAAGTTCGAAATCGACGATTGCAATCTAGACTGGACATATGCCGAGAATAGCTTCGATTTCATCCACATGCGCATGTTGGCAGGCGTTGTTAACGACTGGGATAAGCTGTTCCGTAACGCGTTCCGGTGTTGTAAGCCGGGCGGTTATGTAGAAAGCATTGGCAGTAGTATCCATTTCTTGAGTGATGATGGATCGGTTAAGGAAGGTACTGCTATGCATCAATGGGGCAAGGTTTTGGGCGAGGCTGGCAAGAAATTGGGAAGACCGTTCAATGTGTATGAGGACGATTTGCAACGCAAGGGTATGGAAGCGGCCGGATTTGTTGACATTGAGTTCAAGGACATTCAATGTCCCCTGGGGGTCTGGCATCCTGAGAAGAAAGCGGCAGAAAGGGGGCTGTGGTATAAGTTGGCAATCGAGGAAGATCTTGAGGGTAAGTTTCTTTCATCGAGACCAGATGTTGAGTTTATACTAACGAGTGGGGCAGGGTACCTCAACTACCTTCTCAATGTCGTCATGGGCTGGACTCCAGAGGAGACCAAGAGGTTTGCTGCCCACGCCAAGAAGGAGTGGAACAATCCCAAGATTCACGGCTATTTCTGGCTGCGTGTGATGTACGGTCGCAAACCAGAATAAAGT;organismName=Neurospora
crassa OR74A (finished)
CONTIG_200 NC10_CALLGENES_FINAL_2 exon 234973 235124 . + .
ID=7000004871121778;Parent=7000004871121773;featureClass=Exon;genomic=GGTACCTCAACTACCTTCTCAATGTCGTCATGGGCTGGACTCCAGAGGAGACCAAGAGGTTTGCTGCCCACGCCAAGAAGGAGTGGAACAATCCCAAGATTCACGGCTATTTCTGGCTGCGTGTGATGTACGGTCGCAAACCAGAATAAAGT;organismName=Neurospora
crassa OR74A (finished)
CONTIG_200 NC10_CALLGENES_FINAL_2 exon 234401 234918 . + .
ID=7000004871121777;Parent=7000004871121773;featureClass=Exon;genomic=TGATTTTGCCGACGAATACCCCAACACGGAAGTCATTGGCACCGACGTTTCCCCCATCCAGCCTTCGTGGGTTCCTCCCAATGTGAAGTTCGAAATCGACGATTGCAATCTAGACTGGACATATGCCGAGAATAGCTTCGATTTCATCCACATGCGCATGTTGGCAGGCGTTGTTAACGACTGGGATAAGCTGTTCCGTAACGCGTTCCGGTGTTGTAAGCCGGGCGGTTATGTAGAAAGCATTGGCAGTAGTATCCATTTCTTGAGTGATGATGGATCGGTTAAGGAAGGTACTGCTATGCATCAATGGGGCAAGGTTTTGGGCGAGGCTGGCAAGAAATTGGGAAGACCGTTCAATGTGTATGAGGACGATTTGCAACGCAAGGGTATGGAAGCGGCCGGATTTGTTGACATTGAGTTCAAGGACATTCAATGTCCCCTGGGGGTCTGGCATCCTGAGAAGAAAGCGGCAGAAAGGGGGCTGTGGTATAAGTTGGCAATCGAGGAAGATCTTGAGG;organismName=Neurospora
crassa OR74A (finished)


Please excuse the e-mail formatting, as you can see - the file included
sequence data as well. I've attached a screenshot.

Do you guys think I would be better off trying to display this data is
gbrowse 2, or can it be done in 1.xx?


Best Regards,


R.S.W.

On Mon, Jun 7, 2010 at 3:16 PM, Jason Stajich [hidden email] wrote:

  
Seems like you best bet is to ask the Broad Institute... However some of us have converted the data into use in Gbrowse. agp is just the assembly info I don't know why you want that. You can convert gtf to gff3 with this script that works on the broad and JGI GTF outputs (which are different): It is geared towards Gbrowse2 and Bio::DB::SeqFeature usage though. http://github.com/hyphaltip/genome-scripts/blob/master/data_format/gtf2gff3_3level.pl I have older GFF3 for Neurospora and other fungi here and are working on updating these. http://fungalgenomes.org/data/GFF/ http://fungalgenomes.org/data/NT/ You may also try microbesonline.org -jason Johnny Quest wrote, On 6/7/10 12:33 PM: Hello, I've been trying to display the genome for N. crassa OR74A (http://www.broadinstitute.org/annotation/genome/neurospora/MultiHome.html) in gbrowse 1.69 (genes, annotations, proteins, etc.). So far I have not found a good way to display the sequence data (.fasta, .agp, .gtf are the available formats) in gbrowse, since they don't appear to provide any kind of GFF file or anything similar. Do any of you have any experience with display the data from the Broad Institute in gbrowse 1.xx? If so, I'd like to hear some suggestions as to how I might be able to get this to work. I use the MYSQL back-end, and normally use bp_bulk_load_gff.pl, specifying my database, fasta file(s) and the GFF file. p.s. When performing a feature search, there is an option to export the result in GFF format -- Unfortunately, this results in an error every time. Best Regards, R.S.W. ------------------------------ ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ------------------------------ ______________________________
_________________ Gmod-gbrowse mailing [hidden email]
  


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit.  See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse