Issues with data visualisation on Apollo

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Issues with data visualisation on Apollo

Shamika Mohanan
Hello,

I am trying to load GFF/GTF/BED/BigBED/BAM/UCSC SQL files using the command line scripts available in Apollo-2.6.1/bin. 

Most files load properly and are available on Apollo as seen in attachment_1. There are few files that do load but do not display the data properly as seen in attachment_2. 

I have listed the files that show this problem here- https://docs.google.com/spreadsheets/d/1l_O-GYGqyU6Sk9hBSWFwp-ehjr0RAQndpylnYOrzmzA/edit?usp=sharing

Should I set some option when running the scripts?

Regards,
Shamika

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

attachment_2.png (46K) Download Attachment
attachment_1.png (110K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Issues with data visualisation on Apollo

nathandunn

I’d have to see the individual data and commands you were using for each.


That being said, it LOOKS like the filter you are running is for genes, where typically the top-level should be mRNA (for your working attachment), but that is just a guess. 

If you provide a snippet, command, and output I can provide some more direct feedback. 

Nathan



On Nov 16, 2020, at 7:26 AM, Shamika Mohanan <[hidden email]> wrote:

Hello,

I am trying to load GFF/GTF/BED/BigBED/BAM/UCSC SQL files using the command line scripts available in Apollo-2.6.1/bin. 

Most files load properly and are available on Apollo as seen in attachment_1. There are few files that do load but do not display the data properly as seen in attachment_2. 


Should I set some option when running the scripts?

Regards,
Shamika
<attachment_2.png><attachment_1.png>

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Issues with data visualisation on Apollo

Shamika Mohanan
Hello,

These are three example commands that I have run. None of them show any error. But the visualization is not ideal on Apollo.

1. GFF
~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --gff GRCh38_latest_genomic.gff --trackLabel RefSeq_GRCh38 --out dest_loc

2. BigBED to BED
./bigBedToBed mane.0.9.bb mane.0.9.bed
~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --bed mane.0.9.bed --trackLabel UCSC_MANE --out
dest_loc

3. SQL
~/web-apollo-test-server/Apollo-2.6.1/bin/ucsc-to-json.pl --in source_loc --track gtexGene --out
dest_loc

Please let me know if you require any other information. The source for each file is available here-
https://docs.google.com/spreadsheets/d/1l_O-GYGqyU6Sk9hBSWFwp-ehjr0RAQndpylnYOrzmzA/edit?usp=sharing

Regards,
Shamika

On 16/11/2020 16:41, Nathan Dunn wrote:

I’d have to see the individual data and commands you were using for each.


That being said, it LOOKS like the filter you are running is for genes, where typically the top-level should be mRNA (for your working attachment), but that is just a guess. 

If you provide a snippet, command, and output I can provide some more direct feedback. 

Nathan



On Nov 16, 2020, at 7:26 AM, Shamika Mohanan <[hidden email]> wrote:

Hello,

I am trying to load GFF/GTF/BED/BigBED/BAM/UCSC SQL files using the command line scripts available in Apollo-2.6.1/bin. 

Most files load properly and are available on Apollo as seen in attachment_1. There are few files that do load but do not display the data properly as seen in attachment_2. 


Should I set some option when running the scripts?

Regards,
Shamika
<attachment_2.png><attachment_1.png>


--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Issues with data visualisation on Apollo

nathandunn

Shamika,

I don’t know the exact answer to your question, but usually you have to supply a —type argument to process the top-level information you want to display, so you could potentially put in multiple tracks. 

e.g., for just coding genes:

~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --gff GRCh38_latest_genomic.gff —type mRNA --trackLabel RefSeq_GRCh38_mRNA --out dest_loc

For multiple types, from the script itself you get:

=item --type <feature types to process>

Only process features of the given type.  Can take either single type
names, e.g. "mRNA", or type names qualified by "source" name, for
whatever definition of "source" your data file might have.  For
example, "mRNA:exonerate" will filter for only mRNA features that have
a source of "exonerate".

Multiple type names can be specified by separating the type names with
commas, e.g. C<--type mRNA:exonerate,ncRNA>.

Might be easier to play with a small scaffold at first until you get what you want. 

Someone on the gmod-ajax panel will likely know more than me about the UCSC piece. 

Nathan


On Nov 17, 2020, at 2:33 AM, smm <[hidden email]> wrote:

Hello,

These are three example commands that I have run. None of them show any error. But the visualization is not ideal on Apollo.

1. GFF
~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --gff GRCh38_latest_genomic.gff --trackLabel RefSeq_GRCh38 --out dest_loc

2. BigBED to BED
./bigBedToBed mane.0.9.bb mane.0.9.bed
~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --bed mane.0.9.bed --trackLabel UCSC_MANE --out 
dest_loc

3. SQL
~/web-apollo-test-server/Apollo-2.6.1/bin/ucsc-to-json.pl --in source_loc --track gtexGene --out 
dest_loc

Please let me know if you require any other information. The source for each file is available here- 
https://docs.google.com/spreadsheets/d/1l_O-GYGqyU6Sk9hBSWFwp-ehjr0RAQndpylnYOrzmzA/edit?usp=sharing

Regards,
Shamika

On 16/11/2020 16:41, Nathan Dunn wrote:

I’d have to see the individual data and commands you were using for each.


That being said, it LOOKS like the filter you are running is for genes, where typically the top-level should be mRNA (for your working attachment), but that is just a guess. 

If you provide a snippet, command, and output I can provide some more direct feedback. 

Nathan



On Nov 16, 2020, at 7:26 AM, Shamika Mohanan <[hidden email]> wrote:

Hello,

I am trying to load GFF/GTF/BED/BigBED/BAM/UCSC SQL files using the command line scripts available in Apollo-2.6.1/bin. 

Most files load properly and are available on Apollo as seen in attachment_1. There are few files that do load but do not display the data properly as seen in attachment_2. 


Should I set some option when running the scripts?

Regards,
Shamika
<attachment_2.png><attachment_1.png>

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: Issues with data visualisation on Apollo

Colin
Hi there,

This is an interesting thread, basically it appears you want to make some hg38 type instance for JBrowse data. It might be nice if this was just already available so that it didn't require a lot of legwork but certainly happy to help here (we do have one here but I think it could be better organized http://hg38.jbrowse.org/)

Here is a short summary of some options based on the files you listed

>Capture_seq gencode_pasa_Captureseq_2.pasa_assemblies.gtf /nfs/production/panda/ensembl/havana/warehouse_th_group/jmg/long_read_pipeline/Capture_seq_2/gencode_pasa_Captureseq_2.pasa_assemblies.gtf.gz GTF

Convert the gtf to gff, and load with flatfile-to-json (or use gff3tabix, but I'd suggest flatfile-to-json probably)

Here is one option for how to convert gtf to gff https://jbrowse.org/docs/faq.html#how-do-i-convert-gtf-to-gff

Load with flatfile-to-json --gff


Load with flatfile-to-json --gff

Probably convert gtf to gff, and then load with flatfile-to-json

Note that another option is to convert to bigBed. First gtfToGenePred and then https://gist.github.com/gireeshkbogu/f478ad8495dca56545746cd391615b93

If you want a searchable gene names though, suggest using flatfile-to-json

This one is tricky because it uses NCBI refnames so you'll have to convert them to chr1, chr2, etc. from NC_000001.11 etc.

>SLRseq_2 SLRseq.GRCh38.gtf /nfs/production/panda/ensembl/havana/warehouse_th_group/jmg/long_read_pipeline/SLRseq_2/SLRseq.GRCh38.gtf.gz GTF

Probably convert to gff3, load with flatfile-to-json

Manually download the BAM and BAI into your data folder, and edit it into tracks.conf with a text editor, don't have a great add track workflow for BAM files

[tracks.SLRseq_merged_bam]
urlTemplate=SLRseq_merged.bam

That is all that is needed for your config


You can do two things

1) load this as is in bigBed format. Manually edit this into your tracks.conf file with a text editor, the bigBed is natively supported

[tracks.MANE]
key=MANE 0.9 BigBed
urlTemplate=mane.0.9.bb

2) convert to gff, load with flatfile-to-json

Note that the trix index (ix and ixx) are not able to be used by jbrowse currently (that would allow searching for genes in the bigBed files) so if you want searchable gene names, convert to gff, use flatfile-to-json 

>UCSC_ncbiRefSeqOther ncbiRefSeqOther.bed https://hgdownload.soe.ucsc.edu/gbdb/hg38/ncbiRefSeq/ncbiRefSeqOther.bb BigBed -> BED

Manually download this into your tracks.conf file with a text editor, the bigBed is natively supported

[tracks.ncbiRefSeqOther]
key=NCBI RefSeq (other) BigBed
urlTemplate=ncbiRefSeqOther.bb


>UNIPROT UP000005640_9606_proteome.bed ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/genome_annotation_tracks/UP000005640_9606_beds/UP000005640_9606_proteome.bed BED

Convert to bigBed or GFF. The extra columns are not loaded if using flatfile-to-json.pl --bed


Use ucsc-to-json.pl with these "database" files

>unipAliSwissprot unipAliSwissprot.bed

Convert to bigBed or GFF, extra columns will not be loaded



-Colin



On Tue, Nov 17, 2020 at 1:35 PM Nathan Dunn <[hidden email]> wrote:

Shamika,

I don’t know the exact answer to your question, but usually you have to supply a —type argument to process the top-level information you want to display, so you could potentially put in multiple tracks. 

e.g., for just coding genes:

~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --gff GRCh38_latest_genomic.gff —type mRNA --trackLabel RefSeq_GRCh38_mRNA --out dest_loc

For multiple types, from the script itself you get:

=item --type <feature types to process>

Only process features of the given type.  Can take either single type
names, e.g. "mRNA", or type names qualified by "source" name, for
whatever definition of "source" your data file might have.  For
example, "mRNA:exonerate" will filter for only mRNA features that have
a source of "exonerate".

Multiple type names can be specified by separating the type names with
commas, e.g. C<--type mRNA:exonerate,ncRNA>.

Might be easier to play with a small scaffold at first until you get what you want. 

Someone on the gmod-ajax panel will likely know more than me about the UCSC piece. 

Nathan


On Nov 17, 2020, at 2:33 AM, smm <[hidden email]> wrote:

Hello,

These are three example commands that I have run. None of them show any error. But the visualization is not ideal on Apollo.

1. GFF
~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --gff GRCh38_latest_genomic.gff --trackLabel RefSeq_GRCh38 --out dest_loc

2. BigBED to BED
./bigBedToBed mane.0.9.bb mane.0.9.bed
~/web-apollo-test-server/Apollo-2.6.1/bin/flatfile-to-json.pl --bed mane.0.9.bed --trackLabel UCSC_MANE --out 
dest_loc

3. SQL
~/web-apollo-test-server/Apollo-2.6.1/bin/ucsc-to-json.pl --in source_loc --track gtexGene --out 
dest_loc

Please let me know if you require any other information. The source for each file is available here- 
https://docs.google.com/spreadsheets/d/1l_O-GYGqyU6Sk9hBSWFwp-ehjr0RAQndpylnYOrzmzA/edit?usp=sharing

Regards,
Shamika

On 16/11/2020 16:41, Nathan Dunn wrote:

I’d have to see the individual data and commands you were using for each.


That being said, it LOOKS like the filter you are running is for genes, where typically the top-level should be mRNA (for your working attachment), but that is just a guess. 

If you provide a snippet, command, and output I can provide some more direct feedback. 

Nathan



On Nov 16, 2020, at 7:26 AM, Shamika Mohanan <[hidden email]> wrote:

Hello,

I am trying to load GFF/GTF/BED/BigBED/BAM/UCSC SQL files using the command line scripts available in Apollo-2.6.1/bin. 

Most files load properly and are available on Apollo as seen in attachment_1. There are few files that do load but do not display the data properly as seen in attachment_2. 


Should I set some option when running the scripts?

Regards,
Shamika
<attachment_2.png><attachment_1.png>

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].