GBrowse Custom Track Upload Timeout

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

GBrowse Custom Track Upload Timeout

Cruncher
Hi,

I'm developing an addon for GBrowse that can calculate expression ratios comparing experimental and control datasets and then upload the results to the GBrowse as a custom track. These files have 440,000 lines in them. The addon can also BLAST data against a particular reference database and then upload the results in the same way. The BLAST result files can have as many as 2,000,000 lines. I have a solution right now that nearly works, but I'm running into problems.

Here's the current procedure:
  1. store client cookie(session and authority)
  2. perform calculations
  3. format results
  4. upload the results to gbrowse by POSTing a url with the upload_file action and the client's cookie


GBrowse starts the upload and tries to count through the uploaded file line by line but eventually fails with the following error in the Apache error log:

"Timeout waiting for output from CGI script /usr/lib/cgi-bin/gb2/gbrowse"

I've also done the upload using id and eurl but I get similar results.


Here is the code that does the upload: http://pastebin.com/zY0RhAWh

$temp is a string containing the Dumped version of the cookie and $content is a string containing the calculation results.


Do you know of an easier way to programmatically upload files or a way of getting around that timeout?


Thanks,

Brad

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: GBrowse Custom Track Upload Timeout

Timothy Parnell
Hi Brad,

Sounds interesting. I noticed from the error message that you are using ordinary CGI. Have you tried using FastCGI? There are specific time-out parameters for mod_fcgid and mod_fastcgi in the gbrowse specific conf file for apache. Otherwise, there is also a global_timeout variable in the main GBrowse.conf. You may want to play with those and see if that helps.

I think you may be running into issues where apache is conservatively shutting down processes that are taking too long. Generally, that is a good thing, except when you don’t want it.

Hope that helps.
Tim

On Mar 12, 2014, at 7:16 AM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:

Hi,

I'm developing an addon for GBrowse that can calculate expression ratios comparing experimental and control datasets and then upload the results to the GBrowse as a custom track. These files have 440,000 lines in them. The addon can also BLAST data against a particular reference database and then upload the results in the same way. The BLAST result files can have as many as 2,000,000 lines. I have a solution right now that nearly works, but I'm running into problems.

Here's the current procedure:

  1.  store client cookie(session and authority)
  2.  perform calculations
  3.  format results
  4.  upload the results to gbrowse by POSTing a url with the upload_file action and the client's cookie


GBrowse starts the upload and tries to count through the uploaded file line by line but eventually fails with the following error in the Apache error log:

"Timeout waiting for output from CGI script /usr/lib/cgi-bin/gb2/gbrowse"

I've also done the upload using id and eurl but I get similar results.


Here is the code that does the upload: http://pastebin.com/zY0RhAWh

$temp is a string containing the Dumped version of the cookie and $content is a string containing the calculation results.


Do you know of an easier way to programmatically upload files or a way of getting around that timeout?


Thanks,

Brad
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: GBrowse Custom Track Upload Timeout

Cruncher
Hey Tim,

Thanks for the quick reply. Good catch on the FastCGI, I had forgotten about it when I was setting up the browser instance.

I tried playing with gbrowse's global timeout a little bit but I still saw the same error after roughly the same amount of time. This leads me to believe that it is indeed an Apache timeout I'm wrestling with. I've been told that it isn't a good idea to get rid of Apache's timeout for security reasons. This leaves me looking for another solution.

The software that's producing the output files that are being uploaded is located on the same machine as the GBrowse instance. Back at the start of this project I had looked at manually adding the tracks to the custom tracks sqlite database. I was a bit confused about how the custom tracks were being stored and I couldn't find any documentation talking about it so I decided to feed the files to GBrowse through http instead. Do you have any idea how GBrowse stores data in that database? Is there something like bp_seqfeature_load that I could use to upload to the custom tracks database?

While I'm waiting for an answer I'll dig into GBrowse's source and see if I can find anything.

Thanks again,
Brad


On Wed, Mar 12, 2014 at 12:17 PM, Timothy Parnell <[hidden email]> wrote:
Hi Brad,

Sounds interesting. I noticed from the error message that you are using ordinary CGI. Have you tried using FastCGI? There are specific time-out parameters for mod_fcgid and mod_fastcgi in the gbrowse specific conf file for apache. Otherwise, there is also a global_timeout variable in the main GBrowse.conf. You may want to play with those and see if that helps.

I think you may be running into issues where apache is conservatively shutting down processes that are taking too long. Generally, that is a good thing, except when you don’t want it.

Hope that helps.
Tim

On Mar 12, 2014, at 7:16 AM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:

Hi,

I'm developing an addon for GBrowse that can calculate expression ratios comparing experimental and control datasets and then upload the results to the GBrowse as a custom track. These files have 440,000 lines in them. The addon can also BLAST data against a particular reference database and then upload the results in the same way. The BLAST result files can have as many as 2,000,000 lines. I have a solution right now that nearly works, but I'm running into problems.

Here's the current procedure:

  1.  store client cookie(session and authority)
  2.  perform calculations
  3.  format results
  4.  upload the results to gbrowse by POSTing a url with the upload_file action and the client's cookie


GBrowse starts the upload and tries to count through the uploaded file line by line but eventually fails with the following error in the Apache error log:

"Timeout waiting for output from CGI script /usr/lib/cgi-bin/gb2/gbrowse"

I've also done the upload using id and eurl but I get similar results.


Here is the code that does the upload: http://pastebin.com/zY0RhAWh

$temp is a string containing the Dumped version of the cookie and $content is a string containing the calculation results.


Do you know of an easier way to programmatically upload files or a way of getting around that timeout?


Thanks,

Brad
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
<a href="http://p.sf.net/sfu/13534_NeoTech_______________________________________________ Gmod-gbrowse" target="_blank">http://p.sf.net/sfu/13534_NeoTech_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: GBrowse Custom Track Upload Timeout

Timothy Parnell
Hi Brad,

Yes, for files with seqfeatures (bed, gff, etc) they are loaded into a Bio::DB::Seqfeature::Store database, usually using a SQLite backend. Other files (wig, etc) will get handled with the appropriate adaptor. A track configuration is written for the new data, and stored in the users upload directory under datasource and unique user id.

You can take a look at Bio::Graphics::Browser2::DataLoader to see how it works. Be prepared to mentally parse all the code in your head; it’s poorly documented, but at least reasonably thought out.

It sounds like you may want to do your heavy processing offline (not through CGI script), and just upload results to GBrowse through normal channels.

You could look at the GBrowse REST API, which controls GBrowse through URL commands. I don’t know if there is one to upload files; there probably is, just don’t know it. It would be best to use that mechanism or whatever mechanism the plugin uses to add data tracks.

On Mar 12, 2014, at 12:36 PM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:

Hey Tim,

Thanks for the quick reply. Good catch on the FastCGI, I had forgotten about it when I was setting up the browser instance.

I tried playing with gbrowse's global timeout a little bit but I still saw the same error after roughly the same amount of time. This leads me to believe that it is indeed an Apache timeout I'm wrestling with. I've been told that it isn't a good idea to get rid of Apache's timeout for security reasons. This leaves me looking for another solution.

The software that's producing the output files that are being uploaded is located on the same machine as the GBrowse instance. Back at the start of this project I had looked at manually adding the tracks to the custom tracks sqlite database. I was a bit confused about how the custom tracks were being stored and I couldn't find any documentation talking about it so I decided to feed the files to GBrowse through http instead. Do you have any idea how GBrowse stores data in that database? Is there something like bp_seqfeature_load that I could use to upload to the custom tracks database?

While I'm waiting for an answer I'll dig into GBrowse's source and see if I can find anything.

Thanks again,
Brad


On Wed, Mar 12, 2014 at 12:17 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]>> wrote:
Hi Brad,

Sounds interesting. I noticed from the error message that you are using ordinary CGI. Have you tried using FastCGI? There are specific time-out parameters for mod_fcgid and mod_fastcgi in the gbrowse specific conf file for apache. Otherwise, there is also a global_timeout variable in the main GBrowse.conf. You may want to play with those and see if that helps.

I think you may be running into issues where apache is conservatively shutting down processes that are taking too long. Generally, that is a good thing, except when you don’t want it.

Hope that helps.
Tim

On Mar 12, 2014, at 7:16 AM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:

Hi,

I'm developing an addon for GBrowse that can calculate expression ratios comparing experimental and control datasets and then upload the results to the GBrowse as a custom track. These files have 440,000 lines in them. The addon can also BLAST data against a particular reference database and then upload the results in the same way. The BLAST result files can have as many as 2,000,000 lines. I have a solution right now that nearly works, but I'm running into problems.

Here's the current procedure:

  1.  store client cookie(session and authority)
  2.  perform calculations
  3.  format results
  4.  upload the results to gbrowse by POSTing a url with the upload_file action and the client's cookie


GBrowse starts the upload and tries to count through the uploaded file line by line but eventually fails with the following error in the Apache error log:

"Timeout waiting for output from CGI script /usr/lib/cgi-bin/gb2/gbrowse"

I've also done the upload using id and eurl but I get similar results.


Here is the code that does the upload: http://pastebin.com/zY0RhAWh

$temp is a string containing the Dumped version of the cookie and $content is a string containing the calculation results.


Do you know of an easier way to programmatically upload files or a way of getting around that timeout?


Thanks,

Brad
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech_______________________________________________
Gmod-gbrowse<http://p.sf.net/sfu/13534_NeoTech_______________________________________________Gmod-gbrowse> mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: GBrowse Custom Track Upload Timeout

Cruncher
What I'm actually doing is pretty similar to what you describe. I have a daemon running in the background that waits for requests, processes them, and then uploads the data. I'm currently using the REST interface to perform the upload but I'm getting that apache timeout because it's taking too long to process the uploaded files. I'll take a look at Bio::Graphics::Browser2::DataLoader, thanks for pointing me in the right direction.

Thanks again,
Brad


On Wed, Mar 12, 2014 at 4:53 PM, Timothy Parnell <[hidden email]> wrote:
Hi Brad,

Yes, for files with seqfeatures (bed, gff, etc) they are loaded into a Bio::DB::Seqfeature::Store database, usually using a SQLite backend. Other files (wig, etc) will get handled with the appropriate adaptor. A track configuration is written for the new data, and stored in the users upload directory under datasource and unique user id.

You can take a look at Bio::Graphics::Browser2::DataLoader to see how it works. Be prepared to mentally parse all the code in your head; it’s poorly documented, but at least reasonably thought out.

It sounds like you may want to do your heavy processing offline (not through CGI script), and just upload results to GBrowse through normal channels.

You could look at the GBrowse REST API, which controls GBrowse through URL commands. I don’t know if there is one to upload files; there probably is, just don’t know it. It would be best to use that mechanism or whatever mechanism the plugin uses to add data tracks.

On Mar 12, 2014, at 12:36 PM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:

Hey Tim,

Thanks for the quick reply. Good catch on the FastCGI, I had forgotten about it when I was setting up the browser instance.

I tried playing with gbrowse's global timeout a little bit but I still saw the same error after roughly the same amount of time. This leads me to believe that it is indeed an Apache timeout I'm wrestling with. I've been told that it isn't a good idea to get rid of Apache's timeout for security reasons. This leaves me looking for another solution.

The software that's producing the output files that are being uploaded is located on the same machine as the GBrowse instance. Back at the start of this project I had looked at manually adding the tracks to the custom tracks sqlite database. I was a bit confused about how the custom tracks were being stored and I couldn't find any documentation talking about it so I decided to feed the files to GBrowse through http instead. Do you have any idea how GBrowse stores data in that database? Is there something like bp_seqfeature_load that I could use to upload to the custom tracks database?

While I'm waiting for an answer I'll dig into GBrowse's source and see if I can find anything.

Thanks again,
Brad


On Wed, Mar 12, 2014 at 12:17 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]>> wrote:
Hi Brad,

Sounds interesting. I noticed from the error message that you are using ordinary CGI. Have you tried using FastCGI? There are specific time-out parameters for mod_fcgid and mod_fastcgi in the gbrowse specific conf file for apache. Otherwise, there is also a global_timeout variable in the main GBrowse.conf. You may want to play with those and see if that helps.

I think you may be running into issues where apache is conservatively shutting down processes that are taking too long. Generally, that is a good thing, except when you don’t want it.

Hope that helps.
Tim

On Mar 12, 2014, at 7:16 AM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:

Hi,

I'm developing an addon for GBrowse that can calculate expression ratios comparing experimental and control datasets and then upload the results to the GBrowse as a custom track. These files have 440,000 lines in them. The addon can also BLAST data against a particular reference database and then upload the results in the same way. The BLAST result files can have as many as 2,000,000 lines. I have a solution right now that nearly works, but I'm running into problems.

Here's the current procedure:

  1.  store client cookie(session and authority)
  2.  perform calculations
  3.  format results
  4.  upload the results to gbrowse by POSTing a url with the upload_file action and the client's cookie


GBrowse starts the upload and tries to count through the uploaded file line by line but eventually fails with the following error in the Apache error log:

"Timeout waiting for output from CGI script /usr/lib/cgi-bin/gb2/gbrowse"

I've also done the upload using id and eurl but I get similar results.


Here is the code that does the upload: http://pastebin.com/zY0RhAWh

$temp is a string containing the Dumped version of the cookie and $content is a string containing the calculation results.


Do you know of an easier way to programmatically upload files or a way of getting around that timeout?


Thanks,

Brad
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech_______________________________________________
Gmod-gbrowse<http://p.sf.net/sfu/13534_NeoTech_______________________________________________Gmod-gbrowse> mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse





------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: GBrowse Custom Track Upload Timeout

Cruncher
Hey Tim,

I've made some progress on this. I figured out that I can get the globals object for the current user and use it to create a render object which I can use to create a UserTracks object which I can use to do my upload. That is all working well for smaller files but I'm still running into problems with larger files.

Here's my code for doing the upload: http://pastebin.com/4wHGbQJc

I get the parameters from the user and then fork, the parent returns a reassuring message and the child disassociates itself and redirects it's STDOUT and STDIN(to fully separate itself from CGI and avoid Apache timeouts). Then the child calls the upload_file subroutine on the UserTracks object created earlier. Where I'm running into trouble is that with larger files the upload will run for a time and then fail with the error: 'Cancelled by user', it always seems to run for approximately the same amount of time before failing.

I thought maybe this was being caused by Apache timeouts so I tried to catch sigterms within my uploading script, but it never catches anything. Instead the sigterm is caught inside UserTracks.pm on line 535.

Does anyone have any thoughts about what might be causing this error?


Thanks,
Brad


On Wed, Mar 12, 2014 at 5:13 PM, Brad Covey <[hidden email]> wrote:
What I'm actually doing is pretty similar to what you describe. I have a daemon running in the background that waits for requests, processes them, and then uploads the data. I'm currently using the REST interface to perform the upload but I'm getting that apache timeout because it's taking too long to process the uploaded files. I'll take a look at Bio::Graphics::Browser2::DataLoader, thanks for pointing me in the right direction.

Thanks again,
Brad


On Wed, Mar 12, 2014 at 4:53 PM, Timothy Parnell <[hidden email]> wrote:
Hi Brad,

Yes, for files with seqfeatures (bed, gff, etc) they are loaded into a Bio::DB::Seqfeature::Store database, usually using a SQLite backend. Other files (wig, etc) will get handled with the appropriate adaptor. A track configuration is written for the new data, and stored in the users upload directory under datasource and unique user id.

You can take a look at Bio::Graphics::Browser2::DataLoader to see how it works. Be prepared to mentally parse all the code in your head; it’s poorly documented, but at least reasonably thought out.

It sounds like you may want to do your heavy processing offline (not through CGI script), and just upload results to GBrowse through normal channels.

You could look at the GBrowse REST API, which controls GBrowse through URL commands. I don’t know if there is one to upload files; there probably is, just don’t know it. It would be best to use that mechanism or whatever mechanism the plugin uses to add data tracks.

On Mar 12, 2014, at 12:36 PM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:

Hey Tim,

Thanks for the quick reply. Good catch on the FastCGI, I had forgotten about it when I was setting up the browser instance.

I tried playing with gbrowse's global timeout a little bit but I still saw the same error after roughly the same amount of time. This leads me to believe that it is indeed an Apache timeout I'm wrestling with. I've been told that it isn't a good idea to get rid of Apache's timeout for security reasons. This leaves me looking for another solution.

The software that's producing the output files that are being uploaded is located on the same machine as the GBrowse instance. Back at the start of this project I had looked at manually adding the tracks to the custom tracks sqlite database. I was a bit confused about how the custom tracks were being stored and I couldn't find any documentation talking about it so I decided to feed the files to GBrowse through http instead. Do you have any idea how GBrowse stores data in that database? Is there something like bp_seqfeature_load that I could use to upload to the custom tracks database?

While I'm waiting for an answer I'll dig into GBrowse's source and see if I can find anything.

Thanks again,
Brad


On Wed, Mar 12, 2014 at 12:17 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]>> wrote:
Hi Brad,

Sounds interesting. I noticed from the error message that you are using ordinary CGI. Have you tried using FastCGI? There are specific time-out parameters for mod_fcgid and mod_fastcgi in the gbrowse specific conf file for apache. Otherwise, there is also a global_timeout variable in the main GBrowse.conf. You may want to play with those and see if that helps.

I think you may be running into issues where apache is conservatively shutting down processes that are taking too long. Generally, that is a good thing, except when you don’t want it.

Hope that helps.
Tim

On Mar 12, 2014, at 7:16 AM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:

Hi,

I'm developing an addon for GBrowse that can calculate expression ratios comparing experimental and control datasets and then upload the results to the GBrowse as a custom track. These files have 440,000 lines in them. The addon can also BLAST data against a particular reference database and then upload the results in the same way. The BLAST result files can have as many as 2,000,000 lines. I have a solution right now that nearly works, but I'm running into problems.

Here's the current procedure:

  1.  store client cookie(session and authority)
  2.  perform calculations
  3.  format results
  4.  upload the results to gbrowse by POSTing a url with the upload_file action and the client's cookie


GBrowse starts the upload and tries to count through the uploaded file line by line but eventually fails with the following error in the Apache error log:

"Timeout waiting for output from CGI script /usr/lib/cgi-bin/gb2/gbrowse"

I've also done the upload using id and eurl but I get similar results.


Here is the code that does the upload: http://pastebin.com/zY0RhAWh

$temp is a string containing the Dumped version of the cookie and $content is a string containing the calculation results.


Do you know of an easier way to programmatically upload files or a way of getting around that timeout?


Thanks,

Brad
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech_______________________________________________
Gmod-gbrowse<http://p.sf.net/sfu/13534_NeoTech_______________________________________________Gmod-gbrowse> mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse






------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: GBrowse Custom Track Upload Timeout

Scott Cain
Hi Brad,

Is it this line:

                croak "This server does not support $type uploads"
                    if $type =~ /bigwig|bigbed|useq|archive/ && !$self->has_bigwig;

I'm guessing you do have bigwig installed, so I guess the question is why does GBrowse not think so?

Scott




On Tue, Mar 18, 2014 at 8:56 AM, Brad Covey <[hidden email]> wrote:
>
> Hey Tim,
>
> I've made some progress on this. I figured out that I can get the globals object for the current user and use it to create a render object which I can use to create a UserTracks object which I can use to do my upload. That is all working well for smaller files but I'm still running into problems with larger files.
>
> Here's my code for doing the upload: http://pastebin.com/4wHGbQJc
>
> I get the parameters from the user and then fork, the parent returns a reassuring message and the child disassociates itself and redirects it's STDOUT and STDIN(to fully separate itself from CGI and avoid Apache timeouts). Then the child calls the upload_file subroutine on the UserTracks object created earlier. Where I'm running into trouble is that with larger files the upload will run for a time and then fail with the error: 'Cancelled by user', it always seems to run for approximately the same amount of time before failing.
>
> I thought maybe this was being caused by Apache timeouts so I tried to catch sigterms within my uploading script, but it never catches anything. Instead the sigterm is caught inside UserTracks.pm on line 535.
>
> Does anyone have any thoughts about what might be causing this error?
>
>
> Thanks,
> Brad
>
>
> On Wed, Mar 12, 2014 at 5:13 PM, Brad Covey <[hidden email]> wrote:
>>
>> What I'm actually doing is pretty similar to what you describe. I have a daemon running in the background that waits for requests, processes them, and then uploads the data. I'm currently using the REST interface to perform the upload but I'm getting that apache timeout because it's taking too long to process the uploaded files. I'll take a look at Bio::Graphics::Browser2::DataLoader, thanks for pointing me in the right direction.
>>
>> Thanks again,
>> Brad
>>
>>
>> On Wed, Mar 12, 2014 at 4:53 PM, Timothy Parnell <[hidden email]> wrote:
>>>
>>> Hi Brad,
>>>
>>> Yes, for files with seqfeatures (bed, gff, etc) they are loaded into a Bio::DB::Seqfeature::Store database, usually using a SQLite backend. Other files (wig, etc) will get handled with the appropriate adaptor. A track configuration is written for the new data, and stored in the users upload directory under datasource and unique user id.
>>>
>>> You can take a look at Bio::Graphics::Browser2::DataLoader to see how it works. Be prepared to mentally parse all the code in your head; it’s poorly documented, but at least reasonably thought out.
>>>
>>> It sounds like you may want to do your heavy processing offline (not through CGI script), and just upload results to GBrowse through normal channels.
>>>
>>> You could look at the GBrowse REST API, which controls GBrowse through URL commands. I don’t know if there is one to upload files; there probably is, just don’t know it. It would be best to use that mechanism or whatever mechanism the plugin uses to add data tracks.
>>>
>>> On Mar 12, 2014, at 12:36 PM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:
>>>
>>> Hey Tim,
>>>
>>> Thanks for the quick reply. Good catch on the FastCGI, I had forgotten about it when I was setting up the browser instance.
>>>
>>> I tried playing with gbrowse's global timeout a little bit but I still saw the same error after roughly the same amount of time. This leads me to believe that it is indeed an Apache timeout I'm wrestling with. I've been told that it isn't a good idea to get rid of Apache's timeout for security reasons. This leaves me looking for another solution.
>>>
>>> The software that's producing the output files that are being uploaded is located on the same machine as the GBrowse instance. Back at the start of this project I had looked at manually adding the tracks to the custom tracks sqlite database. I was a bit confused about how the custom tracks were being stored and I couldn't find any documentation talking about it so I decided to feed the files to GBrowse through http instead. Do you have any idea how GBrowse stores data in that database? Is there something like bp_seqfeature_load that I could use to upload to the custom tracks database?
>>>
>>> While I'm waiting for an answer I'll dig into GBrowse's source and see if I can find anything.
>>>
>>> Thanks again,
>>> Brad
>>>
>>>
>>> On Wed, Mar 12, 2014 at 12:17 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]>> wrote:
>>> Hi Brad,
>>>
>>> Sounds interesting. I noticed from the error message that you are using ordinary CGI. Have you tried using FastCGI? There are specific time-out parameters for mod_fcgid and mod_fastcgi in the gbrowse specific conf file for apache. Otherwise, there is also a global_timeout variable in the main GBrowse.conf. You may want to play with those and see if that helps.
>>>
>>> I think you may be running into issues where apache is conservatively shutting down processes that are taking too long. Generally, that is a good thing, except when you don’t want it.
>>>
>>> Hope that helps.
>>> Tim
>>>
>>> On Mar 12, 2014, at 7:16 AM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
>>>
>>> Hi,
>>>
>>> I'm developing an addon for GBrowse that can calculate expression ratios comparing experimental and control datasets and then upload the results to the GBrowse as a custom track. These files have 440,000 lines in them. The addon can also BLAST data against a particular reference database and then upload the results in the same way. The BLAST result files can have as many as 2,000,000 lines. I have a solution right now that nearly works, but I'm running into problems.
>>>
>>> Here's the current procedure:
>>>
>>>   1.  store client cookie(session and authority)
>>>   2.  perform calculations
>>>   3.  format results
>>>   4.  upload the results to gbrowse by POSTing a url with the upload_file action and the client's cookie
>>>
>>>
>>> GBrowse starts the upload and tries to count through the uploaded file line by line but eventually fails with the following error in the Apache error log:
>>>
>>> "Timeout waiting for output from CGI script /usr/lib/cgi-bin/gb2/gbrowse"
>>>
>>> I've also done the upload using id and eurl but I get similar results.
>>>
>>>
>>> Here is the code that does the upload: http://pastebin.com/zY0RhAWh
>>>
>>> $temp is a string containing the Dumped version of the cookie and $content is a string containing the calculation results.
>>>
>>>
>>> Do you know of an easier way to programmatically upload files or a way of getting around that timeout?
>>>
>>>
>>> Thanks,
>>>
>>> Brad
>>> ------------------------------------------------------------------------------
>>> Learn Graph Databases - Download FREE O'Reilly Book
>>> "Graph Databases" is the definitive new guide to graph databases and their
>>> applications. Written by three acclaimed leaders in the field,
>>> this first edition is now available. Download your free book today!
>>> http://p.sf.net/sfu/13534_NeoTech_______________________________________________
>>> Gmod-gbrowse<http://p.sf.net/sfu/13534_NeoTech_______________________________________________Gmod-gbrowse> mailing list
>>> [hidden email]<mailto:[hidden email]>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Gmod-gbrowse mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: GBrowse Custom Track Upload Timeout

Timothy Parnell
Hi Brad,

It’s probably failing because, as Scott pointed out, it thinks it needs the bigwig adaptor because it thinks the file $type matches what appears to be a bigFile type (bigWig or bigBed). What kind of file are you uploading?

The guess_upload_type() method tries to determine the file type of the uploaded file based on a handful of criteria, including the file extension and the first few “magic” bytes for binary files. If the magic code matches a bam, bigWig, or bigBed file type, then it will set the appropriate file type. The useq and archive types are zip files, and tar archives are recognized by extension (because there are too many different magic bytes for tar!). The useq, zip, and tar presumes the files are (or soon will be) bigWig or bigBed files, hence the requirement for Bio::DB::BigFile.

On a side note, I’ve rewritten this section to handle useq files directly using a native Bio::DB::USeq adaptor. It’s in my github fork, and I keep meaning to submit as pull request.

I hope that makes sense.
Tim


On Mar 18, 2014, at 1:36 PM, Scott Cain <[hidden email]<mailto:[hidden email]>> wrote:

Hi Brad,

Is it this line:

                croak "This server does not support $type uploads"
                    if $type =~ /bigwig|bigbed|useq|archive/ && !$self->has_bigwig;

I'm guessing you do have bigwig installed, so I guess the question is why does GBrowse not think so?

Scott




On Tue, Mar 18, 2014 at 8:56 AM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:

>
> Hey Tim,
>
> I've made some progress on this. I figured out that I can get the globals object for the current user and use it to create a render object which I can use to create a UserTracks object which I can use to do my upload. That is all working well for smaller files but I'm still running into problems with larger files.
>
> Here's my code for doing the upload: http://pastebin.com/4wHGbQJc
>
> I get the parameters from the user and then fork, the parent returns a reassuring message and the child disassociates itself and redirects it's STDOUT and STDIN(to fully separate itself from CGI and avoid Apache timeouts). Then the child calls the upload_file subroutine on the UserTracks object created earlier. Where I'm running into trouble is that with larger files the upload will run for a time and then fail with the error: 'Cancelled by user', it always seems to run for approximately the same amount of time before failing.
>
> I thought maybe this was being caused by Apache timeouts so I tried to catch sigterms within my uploading script, but it never catches anything. Instead the sigterm is caught inside UserTracks.pm on line 535.
>
> Does anyone have any thoughts about what might be causing this error?
>
>
> Thanks,
> Brad
>
>
> On Wed, Mar 12, 2014 at 5:13 PM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:
>>
>> What I'm actually doing is pretty similar to what you describe. I have a daemon running in the background that waits for requests, processes them, and then uploads the data. I'm currently using the REST interface to perform the upload but I'm getting that apache timeout because it's taking too long to process the uploaded files. I'll take a look at Bio::Graphics::Browser2::DataLoader, thanks for pointing me in the right direction.
>>
>> Thanks again,
>> Brad
>>
>>
>> On Wed, Mar 12, 2014 at 4:53 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]>> wrote:
>>>
>>> Hi Brad,
>>>
>>> Yes, for files with seqfeatures (bed, gff, etc) they are loaded into a Bio::DB::Seqfeature::Store database, usually using a SQLite backend. Other files (wig, etc) will get handled with the appropriate adaptor. A track configuration is written for the new data, and stored in the users upload directory under datasource and unique user id.
>>>
>>> You can take a look at Bio::Graphics::Browser2::DataLoader to see how it works. Be prepared to mentally parse all the code in your head; it’s poorly documented, but at least reasonably thought out.
>>>
>>> It sounds like you may want to do your heavy processing offline (not through CGI script), and just upload results to GBrowse through normal channels.
>>>
>>> You could look at the GBrowse REST API, which controls GBrowse through URL commands. I don’t know if there is one to upload files; there probably is, just don’t know it. It would be best to use that mechanism or whatever mechanism the plugin uses to add data tracks.
>>>
>>> On Mar 12, 2014, at 12:36 PM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
>>>
>>> Hey Tim,
>>>
>>> Thanks for the quick reply. Good catch on the FastCGI, I had forgotten about it when I was setting up the browser instance.
>>>
>>> I tried playing with gbrowse's global timeout a little bit but I still saw the same error after roughly the same amount of time. This leads me to believe that it is indeed an Apache timeout I'm wrestling with. I've been told that it isn't a good idea to get rid of Apache's timeout for security reasons. This leaves me looking for another solution.
>>>
>>> The software that's producing the output files that are being uploaded is located on the same machine as the GBrowse instance. Back at the start of this project I had looked at manually adding the tracks to the custom tracks sqlite database. I was a bit confused about how the custom tracks were being stored and I couldn't find any documentation talking about it so I decided to feed the files to GBrowse through http instead. Do you have any idea how GBrowse stores data in that database? Is there something like bp_seqfeature_load that I could use to upload to the custom tracks database?
>>>
>>> While I'm waiting for an answer I'll dig into GBrowse's source and see if I can find anything.
>>>
>>> Thanks again,
>>> Brad
>>>
>>>
>>> On Wed, Mar 12, 2014 at 12:17 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
>>> Hi Brad,
>>>
>>> Sounds interesting. I noticed from the error message that you are using ordinary CGI. Have you tried using FastCGI? There are specific time-out parameters for mod_fcgid and mod_fastcgi in the gbrowse specific conf file for apache. Otherwise, there is also a global_timeout variable in the main GBrowse.conf. You may want to play with those and see if that helps.
>>>
>>> I think you may be running into issues where apache is conservatively shutting down processes that are taking too long. Generally, that is a good thing, except when you don’t want it.
>>>
>>> Hope that helps.
>>> Tim
>>>
>>> On Mar 12, 2014, at 7:16 AM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>><mailto:[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>> wrote:
>>>
>>> Hi,
>>>
>>> I'm developing an addon for GBrowse that can calculate expression ratios comparing experimental and control datasets and then upload the results to the GBrowse as a custom track. These files have 440,000 lines in them. The addon can also BLAST data against a particular reference database and then upload the results in the same way. The BLAST result files can have as many as 2,000,000 lines. I have a solution right now that nearly works, but I'm running into problems.
>>>
>>> Here's the current procedure:
>>>
>>>   1.  store client cookie(session and authority)
>>>   2.  perform calculations
>>>   3.  format results
>>>   4.  upload the results to gbrowse by POSTing a url with the upload_file action and the client's cookie
>>>
>>>
>>> GBrowse starts the upload and tries to count through the uploaded file line by line but eventually fails with the following error in the Apache error log:
>>>
>>> "Timeout waiting for output from CGI script /usr/lib/cgi-bin/gb2/gbrowse"
>>>
>>> I've also done the upload using id and eurl but I get similar results.
>>>
>>>
>>> Here is the code that does the upload: http://pastebin.com/zY0RhAWh
>>>
>>> $temp is a string containing the Dumped version of the cookie and $content is a string containing the calculation results.
>>>
>>>
>>> Do you know of an easier way to programmatically upload files or a way of getting around that timeout?
>>>
>>>
>>> Thanks,
>>>
>>> Brad
>>> ------------------------------------------------------------------------------
>>> Learn Graph Databases - Download FREE O'Reilly Book
>>> "Graph Databases" is the definitive new guide to graph databases and their
>>> applications. Written by three acclaimed leaders in the field,
>>> this first edition is now available. Download your free book today!
>>> http://p.sf.net/sfu/13534_NeoTech_______________________________________________
>>> Gmod-gbrowse<http://p.sf.net/sfu/13534_NeoTech_______________________________________________Gmod-gbrowse> mailing list
>>> [hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Gmod-gbrowse mailing list
> [hidden email]<mailto:[hidden email]>
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: GBrowse Custom Track Upload Timeout

Cruncher
Hi guys,

At one point I angered the CPAN gods by trying to install the wrong version of a module and CPAN setup a whole new version of perl in a different location. I've gotten that more or less straightened out but that's probably why GBrowse doesn't think I have bigwig installed. It may be looking in the wrong location. I'll get that sorted out, but I don't think it's the root of my problem. The files I'm uploading are GFF3 and featurefile, I see the "cancelled by user" error in both cases.

I tested the type being returned by guess_upload_type() and it is right. the subroutine is correctly guessing the filetype. I added a print statement to DataLoader.pm to see how many lines were being processed. It looks like the loader is getting through roughly 5000 lines of the file being uploaded before it fails with that cancelled by user error.

This led me to wonder if maybe Apache was still somehow killing my uploader. I tried tripling the Apache timeout(to 15 minutes) and nothing changed.

I also wondered if maybe I had a formatting problem with the file being uploaded. I checked the formatting on the line where the uploader failed and everything looks normal.

Is there some kind of upload timeout within GBrowse that I'm missing?


Thanks for all the help,
Brad


On Tue, Mar 18, 2014 at 6:34 PM, Timothy Parnell <[hidden email]> wrote:
Hi Brad,

It’s probably failing because, as Scott pointed out, it thinks it needs the bigwig adaptor because it thinks the file $type matches what appears to be a bigFile type (bigWig or bigBed). What kind of file are you uploading?

The guess_upload_type() method tries to determine the file type of the uploaded file based on a handful of criteria, including the file extension and the first few “magic” bytes for binary files. If the magic code matches a bam, bigWig, or bigBed file type, then it will set the appropriate file type. The useq and archive types are zip files, and tar archives are recognized by extension (because there are too many different magic bytes for tar!). The useq, zip, and tar presumes the files are (or soon will be) bigWig or bigBed files, hence the requirement for Bio::DB::BigFile.

On a side note, I’ve rewritten this section to handle useq files directly using a native Bio::DB::USeq adaptor. It’s in my github fork, and I keep meaning to submit as pull request.

I hope that makes sense.
Tim


On Mar 18, 2014, at 1:36 PM, Scott Cain <[hidden email]<mailto:[hidden email]>> wrote:

Hi Brad,

Is it this line:

                croak "This server does not support $type uploads"
                    if $type =~ /bigwig|bigbed|useq|archive/ && !$self->has_bigwig;

I'm guessing you do have bigwig installed, so I guess the question is why does GBrowse not think so?

Scott




On Tue, Mar 18, 2014 at 8:56 AM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:
>
> Hey Tim,
>
> I've made some progress on this. I figured out that I can get the globals object for the current user and use it to create a render object which I can use to create a UserTracks object which I can use to do my upload. That is all working well for smaller files but I'm still running into problems with larger files.
>
> Here's my code for doing the upload: http://pastebin.com/4wHGbQJc
>
> I get the parameters from the user and then fork, the parent returns a reassuring message and the child disassociates itself and redirects it's STDOUT and STDIN(to fully separate itself from CGI and avoid Apache timeouts). Then the child calls the upload_file subroutine on the UserTracks object created earlier. Where I'm running into trouble is that with larger files the upload will run for a time and then fail with the error: 'Cancelled by user', it always seems to run for approximately the same amount of time before failing.
>
> I thought maybe this was being caused by Apache timeouts so I tried to catch sigterms within my uploading script, but it never catches anything. Instead the sigterm is caught inside UserTracks.pm on line 535.
>
> Does anyone have any thoughts about what might be causing this error?
>
>
> Thanks,
> Brad
>
>
> On Wed, Mar 12, 2014 at 5:13 PM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:
>>
>> What I'm actually doing is pretty similar to what you describe. I have a daemon running in the background that waits for requests, processes them, and then uploads the data. I'm currently using the REST interface to perform the upload but I'm getting that apache timeout because it's taking too long to process the uploaded files. I'll take a look at Bio::Graphics::Browser2::DataLoader, thanks for pointing me in the right direction.
>>
>> Thanks again,
>> Brad
>>
>>
>> On Wed, Mar 12, 2014 at 4:53 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]>> wrote:
>>>
>>> Hi Brad,
>>>
>>> Yes, for files with seqfeatures (bed, gff, etc) they are loaded into a Bio::DB::Seqfeature::Store database, usually using a SQLite backend. Other files (wig, etc) will get handled with the appropriate adaptor. A track configuration is written for the new data, and stored in the users upload directory under datasource and unique user id.
>>>
>>> You can take a look at Bio::Graphics::Browser2::DataLoader to see how it works. Be prepared to mentally parse all the code in your head; it’s poorly documented, but at least reasonably thought out.
>>>
>>> It sounds like you may want to do your heavy processing offline (not through CGI script), and just upload results to GBrowse through normal channels.
>>>
>>> You could look at the GBrowse REST API, which controls GBrowse through URL commands. I don’t know if there is one to upload files; there probably is, just don’t know it. It would be best to use that mechanism or whatever mechanism the plugin uses to add data tracks.
>>>
>>> On Mar 12, 2014, at 12:36 PM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
>>>
>>> Hey Tim,
>>>
>>> Thanks for the quick reply. Good catch on the FastCGI, I had forgotten about it when I was setting up the browser instance.
>>>
>>> I tried playing with gbrowse's global timeout a little bit but I still saw the same error after roughly the same amount of time. This leads me to believe that it is indeed an Apache timeout I'm wrestling with. I've been told that it isn't a good idea to get rid of Apache's timeout for security reasons. This leaves me looking for another solution.
>>>
>>> The software that's producing the output files that are being uploaded is located on the same machine as the GBrowse instance. Back at the start of this project I had looked at manually adding the tracks to the custom tracks sqlite database. I was a bit confused about how the custom tracks were being stored and I couldn't find any documentation talking about it so I decided to feed the files to GBrowse through http instead. Do you have any idea how GBrowse stores data in that database? Is there something like bp_seqfeature_load that I could use to upload to the custom tracks database?
>>>
>>> While I'm waiting for an answer I'll dig into GBrowse's source and see if I can find anything.
>>>
>>> Thanks again,
>>> Brad
>>>
>>>
>>> On Wed, Mar 12, 2014 at 12:17 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
>>> Hi Brad,
>>>
>>> Sounds interesting. I noticed from the error message that you are using ordinary CGI. Have you tried using FastCGI? There are specific time-out parameters for mod_fcgid and mod_fastcgi in the gbrowse specific conf file for apache. Otherwise, there is also a global_timeout variable in the main GBrowse.conf. You may want to play with those and see if that helps.
>>>
>>> I think you may be running into issues where apache is conservatively shutting down processes that are taking too long. Generally, that is a good thing, except when you don’t want it.
>>>
>>> Hope that helps.
>>> Tim
>>>
>>> On Mar 12, 2014, at 7:16 AM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>><mailto:[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>> wrote:
>>>
>>> Hi,
>>>
>>> I'm developing an addon for GBrowse that can calculate expression ratios comparing experimental and control datasets and then upload the results to the GBrowse as a custom track. These files have 440,000 lines in them. The addon can also BLAST data against a particular reference database and then upload the results in the same way. The BLAST result files can have as many as 2,000,000 lines. I have a solution right now that nearly works, but I'm running into problems.
>>>
>>> Here's the current procedure:
>>>
>>>   1.  store client cookie(session and authority)
>>>   2.  perform calculations
>>>   3.  format results
>>>   4.  upload the results to gbrowse by POSTing a url with the upload_file action and the client's cookie
>>>
>>>
>>> GBrowse starts the upload and tries to count through the uploaded file line by line but eventually fails with the following error in the Apache error log:
>>>
>>> "Timeout waiting for output from CGI script /usr/lib/cgi-bin/gb2/gbrowse"
>>>
>>> I've also done the upload using id and eurl but I get similar results.
>>>
>>>
>>> Here is the code that does the upload: http://pastebin.com/zY0RhAWh
>>>
>>> $temp is a string containing the Dumped version of the cookie and $content is a string containing the calculation results.
>>>
>>>
>>> Do you know of an easier way to programmatically upload files or a way of getting around that timeout?
>>>
>>>
>>> Thanks,
>>>
>>> Brad
>>> ------------------------------------------------------------------------------
>>> Learn Graph Databases - Download FREE O'Reilly Book
>>> "Graph Databases" is the definitive new guide to graph databases and their
>>> applications. Written by three acclaimed leaders in the field,
>>> this first edition is now available. Download your free book today!
>>> http://p.sf.net/sfu/13534_NeoTech_______________________________________________
>>> Gmod-gbrowse<http://p.sf.net/sfu/13534_NeoTech_______________________________________________Gmod-gbrowse> mailing list
>>> [hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Gmod-gbrowse mailing list
> [hidden email]<mailto:[hidden email]>
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087">216-392-3087
Ontario Institute for Cancer Research



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: GBrowse Custom Track Upload Timeout

Timothy Parnell
The only GBrowse timeouts I am aware of are listed in the GBrowse.conf file. The general one is set to 60 seconds. But I don’t know how that relates to file uploads; I thought it related more to image rendering.

I’ve had better success with FastCGI and uploads; perhaps because I can increase limits on file size and Apache timeouts. Perhaps Apache is killing the process, not because of time, but because of the size of the file to upload.

Tim

On Mar 19, 2014, at 9:29 AM, Brad Covey <[hidden email]<mailto:[hidden email]>> wrote:

Hi guys,

At one point I angered the CPAN gods by trying to install the wrong version of a module and CPAN setup a whole new version of perl in a different location. I've gotten that more or less straightened out but that's probably why GBrowse doesn't think I have bigwig installed. It may be looking in the wrong location. I'll get that sorted out, but I don't think it's the root of my problem. The files I'm uploading are GFF3 and featurefile, I see the "cancelled by user" error in both cases.

I tested the type being returned by guess_upload_type() and it is right. the subroutine is correctly guessing the filetype. I added a print statement to DataLoader.pm to see how many lines were being processed. It looks like the loader is getting through roughly 5000 lines of the file being uploaded before it fails with that cancelled by user error.

This led me to wonder if maybe Apache was still somehow killing my uploader. I tried tripling the Apache timeout(to 15 minutes) and nothing changed.

I also wondered if maybe I had a formatting problem with the file being uploaded. I checked the formatting on the line where the uploader failed and everything looks normal.

Is there some kind of upload timeout within GBrowse that I'm missing?


Thanks for all the help,
Brad
[https://mail.google.com/mail/u/0/images/cleardot.gif]


On Tue, Mar 18, 2014 at 6:34 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]>> wrote:
Hi Brad,

It’s probably failing because, as Scott pointed out, it thinks it needs the bigwig adaptor because it thinks the file $type matches what appears to be a bigFile type (bigWig or bigBed). What kind of file are you uploading?

The guess_upload_type() method tries to determine the file type of the uploaded file based on a handful of criteria, including the file extension and the first few “magic” bytes for binary files. If the magic code matches a bam, bigWig, or bigBed file type, then it will set the appropriate file type. The useq and archive types are zip files, and tar archives are recognized by extension (because there are too many different magic bytes for tar!). The useq, zip, and tar presumes the files are (or soon will be) bigWig or bigBed files, hence the requirement for Bio::DB::BigFile.

On a side note, I’ve rewritten this section to handle useq files directly using a native Bio::DB::USeq adaptor. It’s in my github fork, and I keep meaning to submit as pull request.

I hope that makes sense.
Tim


On Mar 18, 2014, at 1:36 PM, Scott Cain <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:

Hi Brad,

Is it this line:

                croak "This server does not support $type uploads"
                    if $type =~ /bigwig|bigbed|useq|archive/ && !$self->has_bigwig;

I'm guessing you do have bigwig installed, so I guess the question is why does GBrowse not think so?

Scott




On Tue, Mar 18, 2014 at 8:56 AM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:

>
> Hey Tim,
>
> I've made some progress on this. I figured out that I can get the globals object for the current user and use it to create a render object which I can use to create a UserTracks object which I can use to do my upload. That is all working well for smaller files but I'm still running into problems with larger files.
>
> Here's my code for doing the upload: http://pastebin.com/4wHGbQJc
>
> I get the parameters from the user and then fork, the parent returns a reassuring message and the child disassociates itself and redirects it's STDOUT and STDIN(to fully separate itself from CGI and avoid Apache timeouts). Then the child calls the upload_file subroutine on the UserTracks object created earlier. Where I'm running into trouble is that with larger files the upload will run for a time and then fail with the error: 'Cancelled by user', it always seems to run for approximately the same amount of time before failing.
>
> I thought maybe this was being caused by Apache timeouts so I tried to catch sigterms within my uploading script, but it never catches anything. Instead the sigterm is caught inside UserTracks.pm on line 535.
>
> Does anyone have any thoughts about what might be causing this error?
>
>
> Thanks,
> Brad
>
>
> On Wed, Mar 12, 2014 at 5:13 PM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
>>
>> What I'm actually doing is pretty similar to what you describe. I have a daemon running in the background that waits for requests, processes them, and then uploads the data. I'm currently using the REST interface to perform the upload but I'm getting that apache timeout because it's taking too long to process the uploaded files. I'll take a look at Bio::Graphics::Browser2::DataLoader, thanks for pointing me in the right direction.
>>
>> Thanks again,
>> Brad
>>
>>
>> On Wed, Mar 12, 2014 at 4:53 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>> wrote:
>>>
>>> Hi Brad,
>>>
>>> Yes, for files with seqfeatures (bed, gff, etc) they are loaded into a Bio::DB::Seqfeature::Store database, usually using a SQLite backend. Other files (wig, etc) will get handled with the appropriate adaptor. A track configuration is written for the new data, and stored in the users upload directory under datasource and unique user id.
>>>
>>> You can take a look at Bio::Graphics::Browser2::DataLoader to see how it works. Be prepared to mentally parse all the code in your head; it’s poorly documented, but at least reasonably thought out.
>>>
>>> It sounds like you may want to do your heavy processing offline (not through CGI script), and just upload results to GBrowse through normal channels.
>>>
>>> You could look at the GBrowse REST API, which controls GBrowse through URL commands. I don’t know if there is one to upload files; there probably is, just don’t know it. It would be best to use that mechanism or whatever mechanism the plugin uses to add data tracks.
>>>
>>> On Mar 12, 2014, at 12:36 PM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>><mailto:[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>> wrote:
>>>
>>> Hey Tim,
>>>
>>> Thanks for the quick reply. Good catch on the FastCGI, I had forgotten about it when I was setting up the browser instance.
>>>
>>> I tried playing with gbrowse's global timeout a little bit but I still saw the same error after roughly the same amount of time. This leads me to believe that it is indeed an Apache timeout I'm wrestling with. I've been told that it isn't a good idea to get rid of Apache's timeout for security reasons. This leaves me looking for another solution.
>>>
>>> The software that's producing the output files that are being uploaded is located on the same machine as the GBrowse instance. Back at the start of this project I had looked at manually adding the tracks to the custom tracks sqlite database. I was a bit confused about how the custom tracks were being stored and I couldn't find any documentation talking about it so I decided to feed the files to GBrowse through http instead. Do you have any idea how GBrowse stores data in that database? Is there something like bp_seqfeature_load that I could use to upload to the custom tracks database?
>>>
>>> While I'm waiting for an answer I'll dig into GBrowse's source and see if I can find anything.
>>>
>>> Thanks again,
>>> Brad
>>>
>>>
>>> On Wed, Mar 12, 2014 at 12:17 PM, Timothy Parnell <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>><mailto:[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>> wrote:
>>> Hi Brad,
>>>
>>> Sounds interesting. I noticed from the error message that you are using ordinary CGI. Have you tried using FastCGI? There are specific time-out parameters for mod_fcgid and mod_fastcgi in the gbrowse specific conf file for apache. Otherwise, there is also a global_timeout variable in the main GBrowse.conf. You may want to play with those and see if that helps.
>>>
>>> I think you may be running into issues where apache is conservatively shutting down processes that are taking too long. Generally, that is a good thing, except when you don’t want it.
>>>
>>> Hope that helps.
>>> Tim
>>>
>>> On Mar 12, 2014, at 7:16 AM, Brad Covey <[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>><mailto:[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>><mailto:[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>><mailto:[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>>> wrote:
>>>
>>> Hi,
>>>
>>> I'm developing an addon for GBrowse that can calculate expression ratios comparing experimental and control datasets and then upload the results to the GBrowse as a custom track. These files have 440,000 lines in them. The addon can also BLAST data against a particular reference database and then upload the results in the same way. The BLAST result files can have as many as 2,000,000 lines. I have a solution right now that nearly works, but I'm running into problems.
>>>
>>> Here's the current procedure:
>>>
>>>   1.  store client cookie(session and authority)
>>>   2.  perform calculations
>>>   3.  format results
>>>   4.  upload the results to gbrowse by POSTing a url with the upload_file action and the client's cookie
>>>
>>>
>>> GBrowse starts the upload and tries to count through the uploaded file line by line but eventually fails with the following error in the Apache error log:
>>>
>>> "Timeout waiting for output from CGI script /usr/lib/cgi-bin/gb2/gbrowse"
>>>
>>> I've also done the upload using id and eurl but I get similar results.
>>>
>>>
>>> Here is the code that does the upload: http://pastebin.com/zY0RhAWh
>>>
>>> $temp is a string containing the Dumped version of the cookie and $content is a string containing the calculation results.
>>>
>>>
>>> Do you know of an easier way to programmatically upload files or a way of getting around that timeout?
>>>
>>>
>>> Thanks,
>>>
>>> Brad
>>> ------------------------------------------------------------------------------
>>> Learn Graph Databases - Download FREE O'Reilly Book
>>> "Graph Databases" is the definitive new guide to graph databases and their
>>> applications. Written by three acclaimed leaders in the field,
>>> this first edition is now available. Download your free book today!
>>> http://p.sf.net/sfu/13534_NeoTech_______________________________________________
>>> Gmod-gbrowse<http://p.sf.net/sfu/13534_NeoTech_______________________________________________Gmod-gbrowse> mailing list
>>> [hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>><mailto:[hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Gmod-gbrowse mailing list
> [hidden email]<mailto:[hidden email]><mailto:[hidden email]<mailto:[hidden email]>>
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087<tel:216-392-3087>
Ontario Institute for Cancer Research




------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse