[Gmod-ajax] Problems with generate-names.pl with multiple datasets

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-ajax] Problems with generate-names.pl with multiple datasets

Hans Vasquez-Gross
Hi All,

    I got a handle of multiple dataset configuration working over the past few days.  However, when choosing a specific dataset that has a VCF file loaded, I tried to go to a specific contig using the URL &loc= parameter.  However, JBrowse loads the default first listed contig instead.  I checked the refSeqs.json file for the dataset to make sure the sequence name was present (it was).  



Excerpt from refSeqs.json:
{"length":8996,"name":"IWGSC_CSS_1AS_scaff_108201","seqChunkSize":20000,"end":8996,"start":0},


My next thought was I may need to generate-names to see if typing it in the auto-complete box would work.  But upon trying to run generate-names.pl from the top level of the JBrowse directory, I get the following error.

jbrowse$ bin/generate-names.pl --mem 8560000000 --verbose
No reference sequences defined in configuration, nothing to do.

Is this because I have a multiple dataset configuration?  My jbrowse.conf does define the datasets.

[datasets.tilling]
url  = ?data=data/json/tilling
name = CaptureDesign TILLING

[datasets.iwgsc-1AL]
url  = ?data=data/json/iwgsc-1AL
name = IWGSC-1AL

[datasets.iwgsc-1AS]
url  = ?data=data/json/iwgsc-1AS
name = IWGSC-1AS

[datasets.iwgsc-5BL]
url  = ?data=data/json/iwgsc-5BL
name = IWGSC-5BL

Any suggestions or input would be greatly appreciated.

Cheers,
-Hans

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: Problems with generate-names.pl with multiple datasets

Richard Hayes
Hi,

On Fri, Feb 21, 2014 at 4:17 PM, Hans Vasquez-Gross <[hidden email]> wrote:
Hi All,

    I got a handle of multiple dataset configuration working over the past few days.  However, when choosing a specific dataset that has a VCF file loaded, I tried to go to a specific contig using the URL &loc= parameter.  However, JBrowse loads the default first listed contig instead.  I checked the refSeqs.json file for the dataset to make sure the sequence name was present (it was).  



Excerpt from refSeqs.json:
{"length":8996,"name":"IWGSC_CSS_1AS_scaff_108201","seqChunkSize":20000,"end":8996,"start":0},
 

My next thought was I may need to generate-names to see if typing it in the auto-complete box would work.  But upon trying to run generate-names.pl from the top level of the JBrowse directory, I get the following error.

jbrowse$ bin/generate-names.pl --mem 8560000000 --verbose
No reference sequences defined in configuration, nothing to do.

Is this because I have a multiple dataset configuration?  My jbrowse.conf does define the datasets.

No, generate-names.pl must be run for each dataset separately. Running generate-names.pl without an --out parameter assumes that a dataset root is the current working directory. Since there is no seq/ folder in the top level directory, you see this error.

But, that is not the source of your problem. I am pretty sure that data should display even in the absence of a name index.

What did you use as the original input to prepare-refseqs.pl for the IWGSC-1AS dataset? Did you run this separately for each dataset?

Is there data for this specific contig in your VCF file?
 

[datasets.tilling]
url  = ?data=data/json/tilling
name = CaptureDesign TILLING

[datasets.iwgsc-1AL]
url  = ?data=data/json/iwgsc-1AL
name = IWGSC-1AL

[datasets.iwgsc-1AS]
url  = ?data=data/json/iwgsc-1AS
name = IWGSC-1AS

[datasets.iwgsc-5BL]
url  = ?data=data/json/iwgsc-5BL
name = IWGSC-5BL

Any suggestions or input would be greatly appreciated.

Cheers,
-Hans

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax



------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: Problems with generate-names.pl with multiple datasets

Hans Vasquez-Gross
Hi Richard,

   Thank you for your response.  I'll try running generate-names in the subdirectories after I resolve the current other issue.  

As for each of the datasets, I've prepared individual fasta files for each reference which I ran separately though prepare-refseqs.pl.  After running the script, each time I moved the data/seq folder to the data/json/{PROJECTDIR}/seq folder . 

Yes, IWGSC_CSS_1AS_scaff_108201 has VCF data which is why I was interested in visualizing the contig.  Just in case, here is a list of other contigs that have VCF data associated:
"1AL Examples"
IWGSC_CSS_1AL_scaff_1027303
IWGSC_CSS_1AL_scaff_105625
IWGSC_CSS_1AL_scaff_1059868
IWGSC_CSS_1AL_scaff_1088280
IWGSC_CSS_1AL_scaff_1090285
IWGSC_CSS_1AL_scaff_1091068

"1AS Examples"
IWGSC_CSS_1AS_scaff_1007522
IWGSC_CSS_1AS_scaff_1052864
IWGSC_CSS_1AS_scaff_1053056
IWGSC_CSS_1AS_scaff_1079109
IWGSC_CSS_1AS_scaff_108201
IWGSC_CSS_1AS_scaff_1082555
IWGSC_CSS_1AS_scaff_1082732
IWGSC_CSS_1AS_scaff_1128350
IWGSC_CSS_1AS_scaff_1136897

No matter which I try, it always loads the default top contig for the given dataset.

Any input for troubleshooting this issue would be greatly appreciated.

Cheers,
-Hans


On Fri, Feb 21, 2014 at 6:55 PM, Richard Hayes <[hidden email]> wrote:
Hi,

On Fri, Feb 21, 2014 at 4:17 PM, Hans Vasquez-Gross <[hidden email]> wrote:
Hi All,

    I got a handle of multiple dataset configuration working over the past few days.  However, when choosing a specific dataset that has a VCF file loaded, I tried to go to a specific contig using the URL &loc= parameter.  However, JBrowse loads the default first listed contig instead.  I checked the refSeqs.json file for the dataset to make sure the sequence name was present (it was).  



Excerpt from refSeqs.json:
{"length":8996,"name":"IWGSC_CSS_1AS_scaff_108201","seqChunkSize":20000,"end":8996,"start":0},
 

My next thought was I may need to generate-names to see if typing it in the auto-complete box would work.  But upon trying to run generate-names.pl from the top level of the JBrowse directory, I get the following error.

jbrowse$ bin/generate-names.pl --mem 8560000000 --verbose
No reference sequences defined in configuration, nothing to do.

Is this because I have a multiple dataset configuration?  My jbrowse.conf does define the datasets.

No, generate-names.pl must be run for each dataset separately. Running generate-names.pl without an --out parameter assumes that a dataset root is the current working directory. Since there is no seq/ folder in the top level directory, you see this error.

But, that is not the source of your problem. I am pretty sure that data should display even in the absence of a name index.

What did you use as the original input to prepare-refseqs.pl for the IWGSC-1AS dataset? Did you run this separately for each dataset?

Is there data for this specific contig in your VCF file?
 

[datasets.tilling]
url  = ?data=data/json/tilling
name = CaptureDesign TILLING

[datasets.iwgsc-1AL]
url  = ?data=data/json/iwgsc-1AL
name = IWGSC-1AL

[datasets.iwgsc-1AS]
url  = ?data=data/json/iwgsc-1AS
name = IWGSC-1AS

[datasets.iwgsc-5BL]
url  = ?data=data/json/iwgsc-5BL
name = IWGSC-5BL

Any suggestions or input would be greatly appreciated.

Cheers,
-Hans

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax





--
Hans Vasquez-Gross
Programmer
Dubcovsky and Neale Lab
Department of Plant Science
University of California at Davis
Phone: (530) 752-0609
Skype: hansvg.ucd

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: Problems with generate-names.pl with multiple datasets

Hans Vasquez-Gross
I've finished running generate-names.pl for the iwgsc-1AL dataset.  Now, I'm able to type in the contig in the get string &loc= and it displays the correct contig.  I think with larger references, generate-names.pl may need to be run.  

Maybe Robert could confirm this?  

I've tried loading another contig using the URL get string for iwgsc-1AS (which hasn't been run through generate-names.pl) which results in the same error displaying the default top listed contig in the dropdown.

I'm going to try generate-names in the iwgsc-5BL dataset to see if it fixes the problem there too.  If so, I think that was the missing component.  I'll report back to see if it resolves the issue.

Cheers,
-Hans


On Mon, Feb 24, 2014 at 11:47 AM, Hans Vasquez-Gross <[hidden email]> wrote:
Hi Richard,

   Thank you for your response.  I'll try running generate-names in the subdirectories after I resolve the current other issue.  

As for each of the datasets, I've prepared individual fasta files for each reference which I ran separately though prepare-refseqs.pl.  After running the script, each time I moved the data/seq folder to the data/json/{PROJECTDIR}/seq folder . 

Yes, IWGSC_CSS_1AS_scaff_108201 has VCF data which is why I was interested in visualizing the contig.  Just in case, here is a list of other contigs that have VCF data associated:
"1AL Examples"
IWGSC_CSS_1AL_scaff_1027303
IWGSC_CSS_1AL_scaff_105625
IWGSC_CSS_1AL_scaff_1059868
IWGSC_CSS_1AL_scaff_1088280
IWGSC_CSS_1AL_scaff_1090285
IWGSC_CSS_1AL_scaff_1091068

"1AS Examples"
IWGSC_CSS_1AS_scaff_1007522
IWGSC_CSS_1AS_scaff_1052864
IWGSC_CSS_1AS_scaff_1053056
IWGSC_CSS_1AS_scaff_1079109
IWGSC_CSS_1AS_scaff_108201
IWGSC_CSS_1AS_scaff_1082555
IWGSC_CSS_1AS_scaff_1082732
IWGSC_CSS_1AS_scaff_1128350
IWGSC_CSS_1AS_scaff_1136897

No matter which I try, it always loads the default top contig for the given dataset.

Any input for troubleshooting this issue would be greatly appreciated.

Cheers,
-Hans


On Fri, Feb 21, 2014 at 6:55 PM, Richard Hayes <[hidden email]> wrote:
Hi,

On Fri, Feb 21, 2014 at 4:17 PM, Hans Vasquez-Gross <[hidden email]> wrote:
Hi All,

    I got a handle of multiple dataset configuration working over the past few days.  However, when choosing a specific dataset that has a VCF file loaded, I tried to go to a specific contig using the URL &loc= parameter.  However, JBrowse loads the default first listed contig instead.  I checked the refSeqs.json file for the dataset to make sure the sequence name was present (it was).  



Excerpt from refSeqs.json:
{"length":8996,"name":"IWGSC_CSS_1AS_scaff_108201","seqChunkSize":20000,"end":8996,"start":0},
 

My next thought was I may need to generate-names to see if typing it in the auto-complete box would work.  But upon trying to run generate-names.pl from the top level of the JBrowse directory, I get the following error.

jbrowse$ bin/generate-names.pl --mem 8560000000 --verbose
No reference sequences defined in configuration, nothing to do.

Is this because I have a multiple dataset configuration?  My jbrowse.conf does define the datasets.

No, generate-names.pl must be run for each dataset separately. Running generate-names.pl without an --out parameter assumes that a dataset root is the current working directory. Since there is no seq/ folder in the top level directory, you see this error.

But, that is not the source of your problem. I am pretty sure that data should display even in the absence of a name index.

What did you use as the original input to prepare-refseqs.pl for the IWGSC-1AS dataset? Did you run this separately for each dataset?

Is there data for this specific contig in your VCF file?
 

[datasets.tilling]
url  = ?data=data/json/tilling
name = CaptureDesign TILLING

[datasets.iwgsc-1AL]
url  = ?data=data/json/iwgsc-1AL
name = IWGSC-1AL

[datasets.iwgsc-1AS]
url  = ?data=data/json/iwgsc-1AS
name = IWGSC-1AS

[datasets.iwgsc-5BL]
url  = ?data=data/json/iwgsc-5BL
name = IWGSC-5BL

Any suggestions or input would be greatly appreciated.

Cheers,
-Hans

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax





--
Hans Vasquez-Gross
Programmer
Dubcovsky and Neale Lab
Department of Plant Science
University of California at Davis
Phone: <a href="tel:%28530%29%20752-0609" value="+15307520609" target="_blank">(530) 752-0609
Skype: hansvg.ucd



--
Hans Vasquez-Gross
Programmer
Dubcovsky and Neale Lab
Department of Plant Science
University of California at Davis
Phone: (530) 752-0609
Skype: hansvg.ucd

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: Problems with generate-names.pl with multiple datasets

Robert Buels-2
I think that probably all you need to do is run generate-names.pl
separately for each of your data directories.  generate-names.pl does
not really have any notion of a multi-dataset configuration, it just
runs on a single dataset directory which is set using its --out option
(with a default value of data/, i.e. a single dataset that is located in
data/).

So if you just run:

bin/generate-names.pl --out data/json/tilling
bin/generate-names.pl --out data/json/iwgsc-1AL
bin/generate-names.pl --out data/json/iwgsc-1AS
bin/generate-names.pl --out data/json/iwgsc-5BL

I bet everything will work.

The behavior of the ?loc parameter that you saw is probably due to a
slightly odd behavior of the 1.11.2 release that requires that a name
index be present in order to navigate to reference sequences by name.
That issue is going to be smoothed over in the 1.11.3 release.

Does this help?

Robert Buels
Lead Developer
JBrowse - http://jbrowse.org

On 02/25/2014 06:07 PM, Hans Vasquez-Gross wrote:

> I've finished running generate-names.pl <http://generate-names.pl> for
> the iwgsc-1AL dataset.  Now, I'm able to type in the contig in the get
> string &loc= and it displays the correct contig.  I think with larger
> references, generate-names.pl <http://generate-names.pl> may need to be
> run.
>
> Maybe Robert could confirm this?
>
> I've tried loading another contig using the URL get string for iwgsc-1AS
> (which hasn't been run through generate-names.pl
> <http://generate-names.pl>) which results in the same error displaying
> the default top listed contig in the dropdown.
>
> I'm going to try generate-names in the iwgsc-5BL dataset to see if it
> fixes the problem there too.  If so, I think that was the missing
> component.  I'll report back to see if it resolves the issue.
>
> Cheers,
> -Hans
>
>
> On Mon, Feb 24, 2014 at 11:47 AM, Hans Vasquez-Gross
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi Richard,
>
>         Thank you for your response.  I'll try running generate-names in
>     the subdirectories after I resolve the current other issue.
>
>     As for each of the datasets, I've prepared individual fasta files
>     for each reference which I ran separately though prepare-refseqs.pl
>     <http://prepare-refseqs.pl>.  After running the script, each time I
>     moved the data/seq folder to the data/json/{PROJECTDIR}/seq folder .
>
>     Yes, IWGSC_CSS_1AS_scaff_108201
>     <http://169.237.215.34/jbrowse/?data=data%2Fjson%2Fiwgsc-1AS&loc=IWGSC_CSS_1AS_scaff_108201> has
>     VCF data which is why I was interested in visualizing the contig.
>       Just in case, here is a list of other contigs that have VCF data
>     associated:
>     "1AL Examples"
>     IWGSC_CSS_1AL_scaff_1027303
>     IWGSC_CSS_1AL_scaff_105625
>     IWGSC_CSS_1AL_scaff_1059868
>     IWGSC_CSS_1AL_scaff_1088280
>     IWGSC_CSS_1AL_scaff_1090285
>     IWGSC_CSS_1AL_scaff_1091068
>
>     "1AS Examples"
>     IWGSC_CSS_1AS_scaff_1007522
>     IWGSC_CSS_1AS_scaff_1052864
>     IWGSC_CSS_1AS_scaff_1053056
>     IWGSC_CSS_1AS_scaff_1079109
>     IWGSC_CSS_1AS_scaff_108201
>     IWGSC_CSS_1AS_scaff_1082555
>     IWGSC_CSS_1AS_scaff_1082732
>     IWGSC_CSS_1AS_scaff_1128350
>     IWGSC_CSS_1AS_scaff_1136897
>
>     No matter which I try, it always loads the default top contig for
>     the given dataset.
>
>     Any input for troubleshooting this issue would be greatly appreciated.
>
>     Cheers,
>     -Hans
>
>
>     On Fri, Feb 21, 2014 at 6:55 PM, Richard Hayes <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         Hi,
>
>         On Fri, Feb 21, 2014 at 4:17 PM, Hans Vasquez-Gross
>         <[hidden email] <mailto:[hidden email]>>
>         wrote:
>
>             Hi All,
>
>                  I got a handle of multiple dataset configuration
>             working over the past few days.  However, when choosing a
>             specific dataset that has a VCF file loaded, I tried to go
>             to a specific contig using the URL &loc= parameter.
>               However, JBrowse loads the default first listed contig
>             instead.  I checked the refSeqs.json file for the dataset to
>             make sure the sequence name was present (it was).
>
>             http://169.237.215.34/jbrowse/?data=data%2Fjson%2Fiwgsc-1AS
>
>             http://169.237.215.34/jbrowse/?data=data%2Fjson%2Fiwgsc-1AS&loc=IWGSC_CSS_1AS_scaff_108201
>
>             Excerpt from refSeqs.json:
>             {"length":8996,"name":"IWGSC_CSS_1AS_scaff_108201","seqChunkSize":20000,"end":8996,"start":0},
>
>
>             My next thought was I may need to generate-names to see if
>             typing it in the auto-complete box would work.  But upon
>             trying to run generate-names.pl <http://generate-names.pl>
>             from the top level of the JBrowse directory, I get the
>             following error.
>
>             jbrowse$ bin/generate-names.pl <http://generate-names.pl>
>             --mem 8560000000 --verbose
>             No reference sequences defined in configuration, nothing to do.
>
>             Is this because I have a multiple dataset configuration?  My
>             jbrowse.conf does define the datasets.
>
>
>         No, generate-names.pl <http://generate-names.pl> must be run for
>         each dataset separately. Running generate-names.pl
>         <http://generate-names.pl> without an --out parameter assumes
>         that a dataset root is the current working directory. Since
>         there is no seq/ folder in the top level directory, you see this
>         error.
>
>         But, that is not the source of your problem. I am pretty sure
>         that data should display even in the absence of a name index.
>
>         What did you use as the original input to prepare-refseqs.pl
>         <http://prepare-refseqs.pl> for the IWGSC-1AS dataset? Did you
>         run this separately for each dataset?
>
>         Is there data for this specific contig in your VCF file?
>
>
>             [datasets.tilling]
>             url  = ?data=data/json/tilling
>             name = CaptureDesign TILLING
>
>             [datasets.iwgsc-1AL]
>             url  = ?data=data/json/iwgsc-1AL
>             name = IWGSC-1AL
>
>             [datasets.iwgsc-1AS]
>             url  = ?data=data/json/iwgsc-1AS
>             name = IWGSC-1AS
>
>             [datasets.iwgsc-5BL]
>             url  = ?data=data/json/iwgsc-5BL
>             name = IWGSC-5BL
>
>             Any suggestions or input would be greatly appreciated.
>
>             Cheers,
>             -Hans
>
>             ------------------------------------------------------------------------------
>             Managing the Performance of Cloud-Based Applications
>             Take advantage of what the Cloud has to offer - Avoid Common
>             Pitfalls.
>             Read the Whitepaper.
>             http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk
>             _______________________________________________
>             Gmod-ajax mailing list
>             [hidden email]
>             <mailto:[hidden email]>
>             https://lists.sourceforge.net/lists/listinfo/gmod-ajax
>
>
>
>
>
>     --
>     Hans Vasquez-Gross
>     Programmer
>     TreeGenes Database - http://dendrome.ucdavis.edu/treegenes/
>     Dubcovsky and Neale Lab
>     Department of Plant Science
>     University of California at Davis
>     Email: [hidden email] <mailto:[hidden email]>
>     Phone: (530) 752-0609 <tel:%28530%29%20752-0609>
>     Skype: hansvg.ucd
>
>
>
>
> --
> Hans Vasquez-Gross
> Programmer
> TreeGenes Database - http://dendrome.ucdavis.edu/treegenes/
> Dubcovsky and Neale Lab
> Department of Plant Science
> University of California at Davis
> Email: [hidden email] <mailto:[hidden email]>
> Phone: (530) 752-0609
> Skype: hansvg.ucd
>
>
> ------------------------------------------------------------------------------
> Flow-based real-time traffic analytics software. Cisco certified tool.
> Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
> Customize your own dashboards, set traffic alerts and generate reports.
> Network behavioral analysis & security monitoring. All-in-one tool.
> http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Gmod-ajax mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-ajax
>

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: Problems with generate-names.pl with multiple datasets

Hans Vasquez-Gross
Thank you Robert.  Last night, I wrote a script to run generate-names for each subfolder for each dataset.    It fixed the issue of not being able to use ?loc to query different contigs.  This explains why after upgrading to 1.11.2 this feature stopped working.  I had previously never had to run generate-names.pl.

I have 1 more question regarding efficiency.  I split a larger genome file into logically divided sections of chromosomes because the original prepare-refseq run produced a refSeq.json file that was 650MB and would be too large to transfer over the network.  Therefore, the annotation and analysis on this full reference has many overlapping tracks.  Currently, I have each iwgsc-### dataset configured to use the same 3 VCF tracks and the same GFF annotation track.  Every time I run the generate-names.pl, the same 3 VCF and GFF track gets indexed, and the new DNA track for each dataset gets indexed.e Is there a better way to organize this?

Could I potentially have a names/ folder for all the common tracks (3x VCF + GFF3) and just generate the names of DNA track?  This could potentially save on space.  But looking at the size of names directory for each full index is only about 50-100MB, so the savings are negligible.   

Thank you,
-Hans


------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: Problems with generate-names.pl with multiple datasets

Robert Buels-2
On 02/26/2014 02:06 PM, Hans Vasquez-Gross wrote:

> I have 1 more question regarding efficiency.  I split a larger genome
> file into logically divided sections of chromosomes because the original
> prepare-refseq run produced a refSeq.json file that was 650MB and would
> be too large to transfer over the network.  Therefore, the annotation
> and analysis on this full reference has many overlapping tracks.
>   Currently, I have each iwgsc-### dataset configured to use the same 3
> VCF tracks and the same GFF annotation track.  Every time I run the
> generate-names.pl <http://generate-names.pl>, the same 3 VCF and GFF
> track gets indexed, and the new DNA track for each dataset gets
> indexed.e Is there a better way to organize this?

Yes, the configuration system supports "include"ing one configuration
file in another.  So if you have a common track configuration for all of
the datasets, you can just add something like:

include += ../common_tracks.conf

in each dataset's tracks.conf, or in a trackList.json it would be
"include": ['../common_tracks.json'].


> Could I potentially have a names/ folder for all the common tracks (3x
> VCF + GFF3) and just generate the names of DNA track?  This could
> potentially save on space.  But looking at the size of names directory
> for each full index is only about 50-100MB, so the savings are negligible.

Yes.  In each of your dataset trackList.json files, you should see a
section that looks like:

    "names" : {
       "url" : "names/",
       "type" : "Hash"
    }

which is put there by generate-names.pl.  If you take that section out
and move it into your common configuration file (mentioned above), and
change the URL to point to your common names directory, you can make all
of your sets share a names directory.

Or, you could just wait for JBrowse 1.11.3, which is going to make the
location lookups work again if the names index is not present, which it
sounds like is basically what you need anyway.

Rob


------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: Problems with generate-names.pl with multiple datasets

Hans Vasquez-Gross
Thank you Rob.  I'll wait until 1.11.3 to see if the more advanced configuration is necessary.

Cheers,
-Hans

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: Problems with generate-names.pl with multiple datasets

Hans Vasquez-Gross
I went ahead and tried creating a tracks.conf for the common VCF and GFF3 annotation tracks.

However, when running the generate-names script, I get an error saying that I did not provide a reference sequence when passing in these exact track names.

dubcovsky-bio:jbrowse havasquezgross$ time bin/generate-names.pl --tracks snpsgenomeHC,snpsgenomeMC,snpsgenomeLC,mipsannot,tgacannot --verbose --out data/json/common_tracks/
No reference sequences defined in configuration, nothing to do.

Is it necessary to provide generate-names a reference sequence?  My model is the VCF and GFF3 files are for the full reference file.  But since my reference sequence is too large, I had to split it up the jbrowse datasets into chromosomal sections.  

Cheers,
-Hans



------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax