[BioMart Users] Transfer very, very slow from martservice on large-ish requests

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[BioMart Users] Transfer very, very slow from martservice on large-ish requests

Kevin C. Dorff
Hi,

I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.

I am using Curl to download the file. An example command line I would use that exhibits the problem is

curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice

Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)

query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="mmusculus_snp" interface = "default" ><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>

I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.

Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.

Any suggestions?
Kevin


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Arek Kasprzyk
Hi Kevin
there seem to be some problems with the service recently and now biomart.org is down. The OICR team are working to restore the service. Once is restored please try again and let us know if you still are experiencing those problems and we'll be able to look into it in more details

a

On Wed, Nov 9, 2011 at 11:59 AM, Kevin C. Dorff <[hidden email]> wrote:
Hi,

I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.

I am using Curl to download the file. An example command line I would use that exhibits the problem is

curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice

Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)

query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="mmusculus_snp" interface = "default" ><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>

I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.

Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.

Any suggestions?
Kevin


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users



_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Kevin C. Dorff
Hi Arek,

Thanks for responding. I saw that biomart.org was back up so I tried again. I started my script and I am seeing the exact same effect as before. It starts normally then after 10mb or so it will only periodically burst a bit of data across. In 34 minutes I am up to 113M (now at 37 minutes I am at 118M but most of the time is sent sending no data at all). This has been happening like this for a little over a week, but I cannot speak to when it might have started because it has been several months since I did these data transfers from martservice (this is something I only do a few times a year). Looking back over my logs, these larger files have timed out nearly every time, some around 5 hours, some at around more than 10.

If you could look into this for it, it would be greatly appreciated.

Thanks,
Kevin

On Thu, Nov 10, 2011 at 7:37 AM, Arek Kasprzyk <[hidden email]> wrote:
Hi Kevin
there seem to be some problems with the service recently and now biomart.org is down. The OICR team are working to restore the service. Once is restored please try again and let us know if you still are experiencing those problems and we'll be able to look into it in more details

a

On Wed, Nov 9, 2011 at 11:59 AM, Kevin C. Dorff <[hidden email]> wrote:
Hi,

I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.

I am using Curl to download the file. An example command line I would use that exhibits the problem is

curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice

Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)

query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="mmusculus_snp" interface = "default" ><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>

I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.

Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.

Any suggestions?
Kevin


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users




_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Junjun Zhang
Hi Kevin,

BioMart 0.7 does not work well for handling large/long running queries (snp marts are large), recent high server load may have made things worse. There are two options you can use to alleviate to situation.

  1. Break the query down into multiple queries, say, one query per chromosome, you need to add a filter like: <Filter name = "chr_name" value = "1"/> to you query. This way, you can track the query more easily, and rerun the failed query separately.
  2. Use the email notification option at martview web GUI (this is not available for script driven queries).
Hope this helps, let us know how it goes.

Best regards,
Junjun


From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 12:09:20 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Hi Arek,

Thanks for responding. I saw that biomart.org was back up so I tried again. I started my script and I am seeing the exact same effect as before. It starts normally then after 10mb or so it will only periodically burst a bit of data across. In 34 minutes I am up to 113M (now at 37 minutes I am at 118M but most of the time is sent sending no data at all). This has been happening like this for a little over a week, but I cannot speak to when it might have started because it has been several months since I did these data transfers from martservice (this is something I only do a few times a year). Looking back over my logs, these larger files have timed out nearly every time, some around 5 hours, some at around more than 10.

If you could look into this for it, it would be greatly appreciated.

Thanks,
Kevin

On Thu, Nov 10, 2011 at 7:37 AM, Arek Kasprzyk <[hidden email]> wrote:
Hi Kevin
there seem to be some problems with the service recently and now biomart.org is down. The OICR team are working to restore the service. Once is restored please try again and let us know if you still are experiencing those problems and we'll be able to look into it in more details

a

On Wed, Nov 9, 2011 at 11:59 AM, Kevin C. Dorff <[hidden email]> wrote:
Hi,

I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.

I am using Curl to download the file. An example command line I would use that exhibits the problem is

curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice

Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)

query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="mmusculus_snp" interface = "default" ><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>

I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.

Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.

Any suggestions?
Kevin


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users




_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Kevin C. Dorff
I've modified my script to fetch these one chromosome at a time, as you mentioned. 

My fear is that given that the splits are very unbalanced (some chromosomes are clearly going to be much larger files than others) I'll still get timeouts. For instance, I am currently transferring chromosome "X" and it is exhibiting the same stalling / bursting behavior. 18 minutes so far and it's at 56.9MB (now at 22 minutes it is at 62.7MB) and just occasionally adding new data to the file but most of the time just sitting there transferring nothing. This feels to me like there is some flaw in the transfer system unless you've designed it to really throttle any transfers over a certain size and are throttling very, very aggressively. I'll review the transfer output tomorrow morning for timeouts, etc.

Kevin

On Thu, Nov 10, 2011 at 1:03 PM, Junjun Zhang <[hidden email]> wrote:
Hi Kevin,

BioMart 0.7 does not work well for handling large/long running queries (snp marts are large), recent high server load may have made things worse. There are two options you can use to alleviate to situation.

  1. Break the query down into multiple queries, say, one query per chromosome, you need to add a filter like: <Filter name = "chr_name" value = "1"/> to you query. This way, you can track the query more easily, and rerun the failed query separately.
  2. Use the email notification option at martview web GUI (this is not available for script driven queries).
Hope this helps, let us know how it goes.

Best regards,
Junjun


From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 12:09:20 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Hi Arek,

Thanks for responding. I saw that biomart.org was back up so I tried again. I started my script and I am seeing the exact same effect as before. It starts normally then after 10mb or so it will only periodically burst a bit of data across. In 34 minutes I am up to 113M (now at 37 minutes I am at 118M but most of the time is sent sending no data at all). This has been happening like this for a little over a week, but I cannot speak to when it might have started because it has been several months since I did these data transfers from martservice (this is something I only do a few times a year). Looking back over my logs, these larger files have timed out nearly every time, some around 5 hours, some at around more than 10.

If you could look into this for it, it would be greatly appreciated.

Thanks,
Kevin

On Thu, Nov 10, 2011 at 7:37 AM, Arek Kasprzyk <[hidden email]> wrote:
Hi Kevin
there seem to be some problems with the service recently and now biomart.org is down. The OICR team are working to restore the service. Once is restored please try again and let us know if you still are experiencing those problems and we'll be able to look into it in more details

a

On Wed, Nov 9, 2011 at 11:59 AM, Kevin C. Dorff <[hidden email]> wrote:
Hi,

I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.

I am using Curl to download the file. An example command line I would use that exhibits the problem is

curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice

Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)

query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="mmusculus_snp" interface = "default" ><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>

I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.

Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.

Any suggestions?
Kevin


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users





_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Syed Haider-2
Kevin,

The best way forward is to try the email option from MartView. That will
do all the database retrieval and storage on server side and send you
the link to download the results when they are ready.

Syed


On 10/11/2011 21:50, Kevin C. Dorff wrote:

> I've modified my script to fetch these one chromosome at a time, as you mentioned.
>
> My fear is that given that the splits are very unbalanced (some chromosomes are clearly going to be much larger files than others) I'll still get timeouts. For instance, I am currently transferring chromosome "X" and it is exhibiting the same stalling / bursting behavior. 18 minutes so far and it's at 56.9MB (now at 22 minutes it is at 62.7MB) and just occasionally adding new data to the file but most of the time just sitting there transferring nothing. This feels to me like there is some flaw in the transfer system unless you've designed it to really throttle any transfers over a certain size and are throttling very, very aggressively. I'll review the transfer output tomorrow morning for timeouts, etc.
>
> Kevin
>
> On Thu, Nov 10, 2011 at 1:03 PM, Junjun Zhang<[hidden email]<mailto:[hidden email]>>  wrote:
> Hi Kevin,
>
> BioMart 0.7 does not work well for handling large/long running queries (snp marts are large), recent high server load may have made things worse. There are two options you can use to alleviate to situation.
>
>
>   1.  Break the query down into multiple queries, say, one query per chromosome, you need to add a filter like:<Filter name = "chr_name" value = "1"/>  to you query. This way, you can track the query more easily, and rerun the failed query separately.
>   2.  Use the email notification option at martview web GUI (this is not available for script driven queries).
>
> Hope this helps, let us know how it goes.
>
> Best regards,
> Junjun
>
>
> From: "Kevin C. Dorff"<[hidden email]<mailto:[hidden email]>>
> Date: Thu, 10 Nov 2011 12:09:20 -0500
> To: "[hidden email]<mailto:[hidden email]>"<[hidden email]<mailto:[hidden email]>>
> Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests
>
> Hi Arek,
>
> Thanks for responding. I saw that biomart.org<http://biomart.org>  was back up so I tried again. I started my script and I am seeing the exact same effect as before. It starts normally then after 10mb or so it will only periodically burst a bit of data across. In 34 minutes I am up to 113M (now at 37 minutes I am at 118M but most of the time is sent sending no data at all). This has been happening like this for a little over a week, but I cannot speak to when it might have started because it has been several months since I did these data transfers from martservice (this is something I only do a few times a year). Looking back over my logs, these larger files have timed out nearly every time, some around 5 hours, some at around more than 10.
>
> If you could look into this for it, it would be greatly appreciated.
>
> Thanks,
> Kevin
>
> On Thu, Nov 10, 2011 at 7:37 AM, Arek Kasprzyk<[hidden email]<mailto:[hidden email]>>  wrote:
> Hi Kevin
> there seem to be some problems with the service recently and now biomart.org<http://biomart.org>  is down. The OICR team are working to restore the service. Once is restored please try again and let us know if you still are experiencing those problems and we'll be able to look into it in more details
>
> a
>
> On Wed, Nov 9, 2011 at 11:59 AM, Kevin C. Dorff<[hidden email]<mailto:[hidden email]>>  wrote:
> Hi,
>
> I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.
>
> I am using Curl to download the file. An example command line I would use that exhibits the problem is
>
> curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice
>
> Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)
>
> query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
> iqueRows="0" count="" datasetConfigVersion="0.6"><Dataset name="mmusculus_snp" interface = "default"><Attribute name="chr_
> name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
> ="consequence_type_tv"/></Dataset></Query>
>
> I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.
>
> Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.
>
> Any suggestions?
> Kevin
>
>
> _______________________________________________
> Users mailing list
> [hidden email]<mailto:[hidden email]>
> https://lists.biomart.org/mailman/listinfo/users
>
>
>
>
>
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Junjun Zhang
In reply to this post by Kevin C. Dorff
I was watching the mysql query log for a little while, it isn't all that bad. I see a few queries on snp mart for different chromosomes.  The queries usually finishes around 30 seconds per batch. One batch contains 5000 records. There are some chromosomes should have done already. Are you still seeing slowness now?
Junjun

From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 16:50:58 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

I've modified my script to fetch these one chromosome at a time, as you mentioned. 

My fear is that given that the splits are very unbalanced (some chromosomes are clearly going to be much larger files than others) I'll still get timeouts. For instance, I am currently transferring chromosome "X" and it is exhibiting the same stalling / bursting behavior. 18 minutes so far and it's at 56.9MB (now at 22 minutes it is at 62.7MB) and just occasionally adding new data to the file but most of the time just sitting there transferring nothing. This feels to me like there is some flaw in the transfer system unless you've designed it to really throttle any transfers over a certain size and are throttling very, very aggressively. I'll review the transfer output tomorrow morning for timeouts, etc.

Kevin

On Thu, Nov 10, 2011 at 1:03 PM, Junjun Zhang <[hidden email]> wrote:
Hi Kevin,

BioMart 0.7 does not work well for handling large/long running queries (snp marts are large), recent high server load may have made things worse. There are two options you can use to alleviate to situation.

  1. Break the query down into multiple queries, say, one query per chromosome, you need to add a filter like: <Filter name = "chr_name" value = "1"/> to you query. This way, you can track the query more easily, and rerun the failed query separately.
  2. Use the email notification option at martview web GUI (this is not available for script driven queries).
Hope this helps, let us know how it goes.

Best regards,
Junjun


From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 12:09:20 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Hi Arek,

Thanks for responding. I saw that biomart.org was back up so I tried again. I started my script and I am seeing the exact same effect as before. It starts normally then after 10mb or so it will only periodically burst a bit of data across. In 34 minutes I am up to 113M (now at 37 minutes I am at 118M but most of the time is sent sending no data at all). This has been happening like this for a little over a week, but I cannot speak to when it might have started because it has been several months since I did these data transfers from martservice (this is something I only do a few times a year). Looking back over my logs, these larger files have timed out nearly every time, some around 5 hours, some at around more than 10.

If you could look into this for it, it would be greatly appreciated.

Thanks,
Kevin

On Thu, Nov 10, 2011 at 7:37 AM, Arek Kasprzyk <[hidden email]> wrote:
Hi Kevin
there seem to be some problems with the service recently and now biomart.org is down. The OICR team are working to restore the service. Once is restored please try again and let us know if you still are experiencing those problems and we'll be able to look into it in more details

a

On Wed, Nov 9, 2011 at 11:59 AM, Kevin C. Dorff <[hidden email]> wrote:
Hi,

I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.

I am using Curl to download the file. An example command line I would use that exhibits the problem is

curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice

Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)

query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="mmusculus_snp" interface = "default" ><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>

I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.

Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.

Any suggestions?
Kevin


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users





_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Junjun Zhang
Just received an alter from our monitoring system, both biomart servers became unresponsive. I will have to restart them and that will interrupt your running queries.

Junjun


From: jzhang <[hidden email]>
Date: Thu, 10 Nov 2011 17:22:11 -0500
To: "Kevin C. Dorff" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

I was watching the mysql query log for a little while, it isn't all that bad. I see a few queries on snp mart for different chromosomes.  The queries usually finishes around 30 seconds per batch. One batch contains 5000 records. There are some chromosomes should have done already. Are you still seeing slowness now?
Junjun

From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 16:50:58 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

I've modified my script to fetch these one chromosome at a time, as you mentioned. 

My fear is that given that the splits are very unbalanced (some chromosomes are clearly going to be much larger files than others) I'll still get timeouts. For instance, I am currently transferring chromosome "X" and it is exhibiting the same stalling / bursting behavior. 18 minutes so far and it's at 56.9MB (now at 22 minutes it is at 62.7MB) and just occasionally adding new data to the file but most of the time just sitting there transferring nothing. This feels to me like there is some flaw in the transfer system unless you've designed it to really throttle any transfers over a certain size and are throttling very, very aggressively. I'll review the transfer output tomorrow morning for timeouts, etc.

Kevin

On Thu, Nov 10, 2011 at 1:03 PM, Junjun Zhang <[hidden email]> wrote:
Hi Kevin,

BioMart 0.7 does not work well for handling large/long running queries (snp marts are large), recent high server load may have made things worse. There are two options you can use to alleviate to situation.

  1. Break the query down into multiple queries, say, one query per chromosome, you need to add a filter like: <Filter name = "chr_name" value = "1"/> to you query. This way, you can track the query more easily, and rerun the failed query separately.
  2. Use the email notification option at martview web GUI (this is not available for script driven queries).
Hope this helps, let us know how it goes.

Best regards,
Junjun


From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 12:09:20 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Hi Arek,

Thanks for responding. I saw that biomart.org was back up so I tried again. I started my script and I am seeing the exact same effect as before. It starts normally then after 10mb or so it will only periodically burst a bit of data across. In 34 minutes I am up to 113M (now at 37 minutes I am at 118M but most of the time is sent sending no data at all). This has been happening like this for a little over a week, but I cannot speak to when it might have started because it has been several months since I did these data transfers from martservice (this is something I only do a few times a year). Looking back over my logs, these larger files have timed out nearly every time, some around 5 hours, some at around more than 10.

If you could look into this for it, it would be greatly appreciated.

Thanks,
Kevin

On Thu, Nov 10, 2011 at 7:37 AM, Arek Kasprzyk <[hidden email]> wrote:
Hi Kevin
there seem to be some problems with the service recently and now biomart.org is down. The OICR team are working to restore the service. Once is restored please try again and let us know if you still are experiencing those problems and we'll be able to look into it in more details

a

On Wed, Nov 9, 2011 at 11:59 AM, Kevin C. Dorff <[hidden email]> wrote:
Hi,

I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.

I am using Curl to download the file. An example command line I would use that exhibits the problem is

curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice

Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)

query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="mmusculus_snp" interface = "default" ><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>

I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.

Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.

Any suggestions?
Kevin


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users





_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Junjun Zhang
In one previous email, I mentioned it seemed ok at the db server side. With now the biomart web servers went unresponsive, it seems like the problem is at the biomart server side.

As I mentioned earlier, BioMart 0.7 is not able to handle high level query load well. Queries are not trackable, when the server gets rebooted, all running queries will be terminated.

Junjun


See the information top shows below, just to give you some idea how heavy the server load was at the time it stopped responding requests.

Server 1:

top - 17:41:08 up 13 days,  5:30,  2 users,  load average: 17.57, 17.57, 15.89
Tasks: 148 total,   1 running, 147 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.4%us,  2.3%sy,  0.0%ni, 69.8%id, 23.1%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  16472372k total, 16381820k used,    90552k free,    16880k buffers
Swap:        0k total,        0k used,        0k free,  6082432k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                  
17367 biomart   20   0 2705m 2.4g 5228 D    2 15.6   1:06.73 apache2                                                                                                                                                                   
17926 biomart   20   0 2771m 2.5g 4704 D    2 16.0   0:40.29 apache2                                                                                                                                                                   
17932 biomart   20   0 2696m 2.4g 4856 D    2 15.6   0:30.10 apache2                                                                                                                                                                   
18514 biomart   20   0 2696m 2.4g 4744 D    1 15.6   0:28.30 apache2                                                                                                                                                                   
18522 biomart   20   0 2760m 2.4g 4780 D    1 15.6   0:26.61 apache2                                                                                                                                                                   
18808 biomart   20   0 2696m 2.4g 4788 D    1 15.6   0:19.36 apache2                                                                                                                                                                   
19091 biomart   20   0 2703m 2.4g 5076 D    1 15.6   0:10.87 apache2                                                                                                                                                                   
19099 biomart   20   0 2696m 2.4g 4748 D    1 15.6   0:08.55 apache2                                                                                                                                                                   
19103 biomart   20   0 2696m 2.4g 4768 S    1 15.6   0:08.78 apache2                                                                                                                                                                   
19366 biomart   20   0 2699m 2.4g 4728 D    1 15.6   0:06.72 apache2                                                                                                                                                                   
 5812 biomart   20   0 2699m 2.4g 4752 D    1 15.6 190:12.44 apache2                                                                                                                                                                   
17911 biomart   20   0 2705m 2.4g 5096 D    1 15.6   0:55.96 apache2                                                                                                                                                                   
18524 biomart   20   0 2696m 2.4g 4716 D    1 15.6   0:27.75 apache2                                                                                                                                                                   
18544 biomart   20   0 2696m 2.4g 4736 D    1 15.6   0:19.10 apache2                                                                                                                                                                   
18822 biomart   20   0 2705m 2.4g 5140 D    1 15.6   0:15.19 apache2                                                                                                                                                                   
19105 biomart   20   0 2771m 2.4g 5192 D    1 15.6   0:07.97 apache2                                                                                                                                                                   
11513 biomart   20   0 2699m 2.4g 4936 S    1 15.6   2:12.54 apache2                                                                                                                                                                   
18810 biomart   20   0 2703m 2.4g 4960 D    1 15.6   0:13.70 apache2                                                                                                                                                                   
19107 biomart   20   0 2705m 2.4g 5152 D    1 15.6   0:09.45 apache2                                                                                                                                                                   
 6742 biomart   20   0 2699m 2.4g 5004 D    0 15.6   3:41.05 apache2                                                                                                                                                                   
 6917 biomart   20   0 6427m 6.1g 5116 S    0 38.7  66:20.79 apache2                                                                                                                                                                   
 7713 biomart   20   0 2784m 2.5g 4724 S    0 15.7  31:36.10 apache2                                                                                                                                                                   
 8997 biomart   20   0 2696m 2.4g 4640 S    0 15.6   5:29.66 apache2                                                                                                                                                                   
 9019 biomart   20   0 2705m 2.4g 5032 S    0 15.6   5:30.70 apache2                                                                                                                                                                   
18193 biomart   20   0 2696m 2.4g 4700 S    0 15.6   0:31.84 apache2                                                                                                                                                                   
18195 biomart   20   0 2705m 2.4g 5068 S    0 15.6   0:32.33 apache2                                                                                                                                                                   
18203 biomart   20   0 2696m 2.4g 4700 S    0 15.6   0:29.34 apache2                                                                                                                                                                   
18207 biomart   20   0 2696m 2.4g 4700 S    0 15.6   0:29.27 apache2                                                                                                                                                                   
18209 biomart   20   0 2696m 2.4g 4776 S    0 15.6   0:29.63 apache2                                                                                                                                                                   
18245 biomart   20   0 2696m 2.4g 4848 S    0 15.6   0:29.72 apache2                                                                                                                                                                   

Server 2:

top - 17:41:33 up 9 days, 10:23,  2 users,  load average: 14.76, 15.20, 14.32
Tasks: 146 total,   2 running, 144 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.5%us,  3.3%sy,  0.0%ni, 71.8%id, 22.2%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  16472372k total, 12159968k used,  4312404k free,    19040k buffers
Swap:        0k total,        0k used,        0k free,  2054872k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                 
29830 biomart   20   0 2705m 2.4g 5212 D    2 15.6   0:16.99 apache2                                                                                                                                                                   
29832 biomart   20   0 2696m 2.4g 4724 D    2 15.6   0:16.37 apache2                                                                                                                                                                   
29213 biomart   20   0 2697m 2.4g 4708 D    2 15.6   0:43.90 apache2                                                                                                                                                                   
29506 biomart   20   0 2699m 2.4g 4824 D    2 15.6   0:34.44 apache2                                                                                                                                                                   
29819 biomart   20   0 2705m 2.4g 5068 D    2 15.6   0:18.09 apache2                                                                                                                                                                   
29209 biomart   20   0 2761m 2.4g 4776 D    1 15.6   0:46.84 apache2                                                                                                                                                                   
29803 biomart   20   0 2703m 2.4g 4940 D    1 15.6   0:39.59 apache2                                                                                                                                                                   
30098 biomart   20   0 2705m 2.4g 5104 D    1 15.6   0:13.27 apache2                                                                                                                                                                   
23587 biomart   20   0 2763m 2.4g 4996 D    1 15.6   1:35.93 apache2                                                                                                                                                                   
29185 biomart   20   0 2697m 2.4g 4708 D    1 15.6   0:48.51 apache2                                                                                                                                                                   
29524 biomart   20   0 2696m 2.4g 4708 D    1 15.6   0:19.06 apache2                                                                                                                                                                   
30108 biomart   20   0 2696m 2.4g 4752 D    1 15.6   0:10.83 apache2                                                                                                                                                                   
30110 biomart   20   0 2696m 2.4g 4752 D    1 15.6   0:10.61 apache2                                                                                                                                                                   
28916 biomart   20   0 2709m 2.4g 4984 R    1 15.6   7:33.73 apache2                                                                                                                                                                   
29813 biomart   20   0 2696m 2.4g 4704 D    1 15.6   0:21.68 apache2                                                                                                                                                                   
18129 biomart   20   0 6455m 6.1g 5000 S    0 38.9  65:06.56 apache2                                                                                                                                                                  
18717 biomart   20   0 2740m 2.5g 4756 S    0 15.8  37:22.20 apache2                                                                                                                                                                  
20272 biomart   20   0 2696m 2.4g 4644 S    0 15.6   5:13.81 apache2                                                                                                                                                                  
21959 biomart   20   0 2802m 2.5g 5224 S    0 16.2   4:42.07 apache2                                                                                                                                                                  
22244 biomart   20   0 2696m 2.4g 4668 S    0 15.6   4:22.48 apache2                                                                                                                                                                  
26621 biomart   20   0 2709m 2.4g 5172 S    0 15.6   2:44.31 apache2                                                                                                                                                                  
29221 biomart   20   0 2707m 2.4g 5264 S    0 15.6   0:17.17 apache2                                                                                                                                                                  
29223 biomart   20   0 2696m 2.4g 4756 S    0 15.6   0:27.52 apache2                                                                                                                                                                  
29225 biomart   20   0 2696m 2.4g 4760 S    0 15.6   0:27.84 apache2                                                                                                                                                                  
29230 biomart   20   0 2696m 2.4g 4844 S    0 15.6   0:28.46 apache2                                                                                                                                                                  
29234 biomart   20   0 2696m 2.4g 4656 S    0 15.6   0:27.45 apache2                                                                                                                                                                  
29236 biomart   20   0 2696m 2.4g 4844 S    0 15.6   0:27.99 apache2                                                                                                                                                                  
29237 biomart   20   0 2696m 2.4g 4844 S    0 15.6   0:28.00 apache2                                                                                                                                                                  
29493 biomart   20   0 2696m 2.4g 4708 S    0 15.6   0:29.50 apache2                                                                                                                                                                  
29495 biomart   20   0 2696m 2.4g 4748 S    0 15.6   1:50.94 apache2                                                                                                                                                                  









From: jzhang <[hidden email]>
Date: Thu, 10 Nov 2011 17:40:37 -0500
To: jzhang <[hidden email]>, "Kevin C. Dorff" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Just received an alter from our monitoring system, both biomart servers became unresponsive. I will have to restart them and that will interrupt your running queries.

Junjun


From: jzhang <[hidden email]>
Date: Thu, 10 Nov 2011 17:22:11 -0500
To: "Kevin C. Dorff" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

I was watching the mysql query log for a little while, it isn't all that bad. I see a few queries on snp mart for different chromosomes.  The queries usually finishes around 30 seconds per batch. One batch contains 5000 records. There are some chromosomes should have done already. Are you still seeing slowness now?
Junjun

From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 16:50:58 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

I've modified my script to fetch these one chromosome at a time, as you mentioned. 

My fear is that given that the splits are very unbalanced (some chromosomes are clearly going to be much larger files than others) I'll still get timeouts. For instance, I am currently transferring chromosome "X" and it is exhibiting the same stalling / bursting behavior. 18 minutes so far and it's at 56.9MB (now at 22 minutes it is at 62.7MB) and just occasionally adding new data to the file but most of the time just sitting there transferring nothing. This feels to me like there is some flaw in the transfer system unless you've designed it to really throttle any transfers over a certain size and are throttling very, very aggressively. I'll review the transfer output tomorrow morning for timeouts, etc.

Kevin

On Thu, Nov 10, 2011 at 1:03 PM, Junjun Zhang <[hidden email]> wrote:
Hi Kevin,

BioMart 0.7 does not work well for handling large/long running queries (snp marts are large), recent high server load may have made things worse. There are two options you can use to alleviate to situation.

  1. Break the query down into multiple queries, say, one query per chromosome, you need to add a filter like: <Filter name = "chr_name" value = "1"/> to you query. This way, you can track the query more easily, and rerun the failed query separately.
  2. Use the email notification option at martview web GUI (this is not available for script driven queries).
Hope this helps, let us know how it goes.

Best regards,
Junjun


From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 12:09:20 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Hi Arek,

Thanks for responding. I saw that biomart.org was back up so I tried again. I started my script and I am seeing the exact same effect as before. It starts normally then after 10mb or so it will only periodically burst a bit of data across. In 34 minutes I am up to 113M (now at 37 minutes I am at 118M but most of the time is sent sending no data at all). This has been happening like this for a little over a week, but I cannot speak to when it might have started because it has been several months since I did these data transfers from martservice (this is something I only do a few times a year). Looking back over my logs, these larger files have timed out nearly every time, some around 5 hours, some at around more than 10.

If you could look into this for it, it would be greatly appreciated.

Thanks,
Kevin

On Thu, Nov 10, 2011 at 7:37 AM, Arek Kasprzyk <[hidden email]> wrote:
Hi Kevin
there seem to be some problems with the service recently and now biomart.org is down. The OICR team are working to restore the service. Once is restored please try again and let us know if you still are experiencing those problems and we'll be able to look into it in more details

a

On Wed, Nov 9, 2011 at 11:59 AM, Kevin C. Dorff <[hidden email]> wrote:
Hi,

I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.

I am using Curl to download the file. An example command line I would use that exhibits the problem is

curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice

Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)

query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="mmusculus_snp" interface = "default" ><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>

I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.

Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.

Any suggestions?
Kevin


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users





_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Kevin C. Dorff
Hi, Yes, this morning it is still very bursty and slow. Two of the chromosomes didn't transfer last night, possibly due to you restarting the server. I was able to transfer the two missing ones this morning. I can potentially re-write my script to keep trying missing pieces until all pieces transfer correctly. I've seen mention of a BioMart 0.8 - will this still have the same issues with large transfers?

Regarding the email notification, if this requires navigating the GUI than receiving an email and then going to download, this really isn't a great option for me. I need specific annotations and already have it scripted to grab the TSV files I need from martservice; my script knows the organism, the fields necessary, etc. and doesn't require a person who knows how to navigate the GUI, just run the script with a few parameters and it fetches the files.

I don't you don't ALREADY implement it, but it seems like a third option would be great... That I submit a query and it has an additional parameter that says I will download the query results when the file is created (similar to your email option in the GUI, but via martservice). The return at that point isn't a whole file but just a query-id. Your system then creates the file for download. At some point in the future my script can connect to mart service and ask if the result for the query-id is complete. If it is complete, I can then download it. If it is not yet complete, I can wait and ask again after a while. This would have the added advantage of when you create the file for download you could even gzip  the result, if requested, it to make the transfer faster.

Kevin

On Thu, Nov 10, 2011 at 5:50 PM, Junjun Zhang <[hidden email]> wrote:
In one previous email, I mentioned it seemed ok at the db server side. With now the biomart web servers went unresponsive, it seems like the problem is at the biomart server side.

As I mentioned earlier, BioMart 0.7 is not able to handle high level query load well. Queries are not trackable, when the server gets rebooted, all running queries will be terminated.

Junjun


See the information top shows below, just to give you some idea how heavy the server load was at the time it stopped responding requests.

Server 1:

top - 17:41:08 up 13 days,  5:30,  2 users,  load average: 17.57, 17.57, 15.89
Tasks: 148 total,   1 running, 147 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.4%us,  2.3%sy,  0.0%ni, 69.8%id, 23.1%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  16472372k total, 16381820k used,    90552k free,    16880k buffers
Swap:        0k total,        0k used,        0k free,  6082432k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                  
17367 biomart   20   0 2705m 2.4g 5228 D    2 15.6   1:06.73 apache2                                                                                                                                                                   
17926 biomart   20   0 2771m 2.5g 4704 D    2 16.0   0:40.29 apache2                                                                                                                                                                   
17932 biomart   20   0 2696m 2.4g 4856 D    2 15.6   0:30.10 apache2                                                                                                                                                                   
18514 biomart   20   0 2696m 2.4g 4744 D    1 15.6   0:28.30 apache2                                                                                                                                                                   
18522 biomart   20   0 2760m 2.4g 4780 D    1 15.6   0:26.61 apache2                                                                                                                                                                   
18808 biomart   20   0 2696m 2.4g 4788 D    1 15.6   0:19.36 apache2                                                                                                                                                                   
19091 biomart   20   0 2703m 2.4g 5076 D    1 15.6   0:10.87 apache2                                                                                                                                                                   
19099 biomart   20   0 2696m 2.4g 4748 D    1 15.6   0:08.55 apache2                                                                                                                                                                   
19103 biomart   20   0 2696m 2.4g 4768 S    1 15.6   0:08.78 apache2                                                                                                                                                                   
19366 biomart   20   0 2699m 2.4g 4728 D    1 15.6   0:06.72 apache2                                                                                                                                                                   
 5812 biomart   20   0 2699m 2.4g 4752 D    1 15.6 190:12.44 apache2                                                                                                                                                                   
17911 biomart   20   0 2705m 2.4g 5096 D    1 15.6   0:55.96 apache2                                                                                                                                                                   
18524 biomart   20   0 2696m 2.4g 4716 D    1 15.6   0:27.75 apache2                                                                                                                                                                   
18544 biomart   20   0 2696m 2.4g 4736 D    1 15.6   0:19.10 apache2                                                                                                                                                                   
18822 biomart   20   0 2705m 2.4g 5140 D    1 15.6   0:15.19 apache2                                                                                                                                                                   
19105 biomart   20   0 2771m 2.4g 5192 D    1 15.6   0:07.97 apache2                                                                                                                                                                   
11513 biomart   20   0 2699m 2.4g 4936 S    1 15.6   2:12.54 apache2                                                                                                                                                                   
18810 biomart   20   0 2703m 2.4g 4960 D    1 15.6   0:13.70 apache2                                                                                                                                                                   
19107 biomart   20   0 2705m 2.4g 5152 D    1 15.6   0:09.45 apache2                                                                                                                                                                   
 6742 biomart   20   0 2699m 2.4g 5004 D    0 15.6   3:41.05 apache2                                                                                                                                                                   
 6917 biomart   20   0 6427m 6.1g 5116 S    0 38.7  66:20.79 apache2                                                                                                                                                                   
 7713 biomart   20   0 2784m 2.5g 4724 S    0 15.7  31:36.10 apache2                                                                                                                                                                   
 8997 biomart   20   0 2696m 2.4g 4640 S    0 15.6   5:29.66 apache2                                                                                                                                                                   
 9019 biomart   20   0 2705m 2.4g 5032 S    0 15.6   5:30.70 apache2                                                                                                                                                                   
18193 biomart   20   0 2696m 2.4g 4700 S    0 15.6   0:31.84 apache2                                                                                                                                                                   
18195 biomart   20   0 2705m 2.4g 5068 S    0 15.6   0:32.33 apache2                                                                                                                                                                   
18203 biomart   20   0 2696m 2.4g 4700 S    0 15.6   0:29.34 apache2                                                                                                                                                                   
18207 biomart   20   0 2696m 2.4g 4700 S    0 15.6   0:29.27 apache2                                                                                                                                                                   
18209 biomart   20   0 2696m 2.4g 4776 S    0 15.6   0:29.63 apache2                                                                                                                                                                   
18245 biomart   20   0 2696m 2.4g 4848 S    0 15.6   0:29.72 apache2                                                                                                                                                                   

Server 2:

top - 17:41:33 up 9 days, 10:23,  2 users,  load average: 14.76, 15.20, 14.32
Tasks: 146 total,   2 running, 144 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.5%us,  3.3%sy,  0.0%ni, 71.8%id, 22.2%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  16472372k total, 12159968k used,  4312404k free,    19040k buffers
Swap:        0k total,        0k used,        0k free,  2054872k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                 
29830 biomart   20   0 2705m 2.4g 5212 D    2 15.6   0:16.99 apache2                                                                                                                                                                   
29832 biomart   20   0 2696m 2.4g 4724 D    2 15.6   0:16.37 apache2                                                                                                                                                                   
29213 biomart   20   0 2697m 2.4g 4708 D    2 15.6   0:43.90 apache2                                                                                                                                                                   
29506 biomart   20   0 2699m 2.4g 4824 D    2 15.6   0:34.44 apache2                                                                                                                                                                   
29819 biomart   20   0 2705m 2.4g 5068 D    2 15.6   0:18.09 apache2                                                                                                                                                                   
29209 biomart   20   0 2761m 2.4g 4776 D    1 15.6   0:46.84 apache2                                                                                                                                                                   
29803 biomart   20   0 2703m 2.4g 4940 D    1 15.6   0:39.59 apache2                                                                                                                                                                   
30098 biomart   20   0 2705m 2.4g 5104 D    1 15.6   0:13.27 apache2                                                                                                                                                                   
23587 biomart   20   0 2763m 2.4g 4996 D    1 15.6   1:35.93 apache2                                                                                                                                                                   
29185 biomart   20   0 2697m 2.4g 4708 D    1 15.6   0:48.51 apache2                                                                                                                                                                   
29524 biomart   20   0 2696m 2.4g 4708 D    1 15.6   0:19.06 apache2                                                                                                                                                                   
30108 biomart   20   0 2696m 2.4g 4752 D    1 15.6   0:10.83 apache2                                                                                                                                                                   
30110 biomart   20   0 2696m 2.4g 4752 D    1 15.6   0:10.61 apache2                                                                                                                                                                   
28916 biomart   20   0 2709m 2.4g 4984 R    1 15.6   7:33.73 apache2                                                                                                                                                                   
29813 biomart   20   0 2696m 2.4g 4704 D    1 15.6   0:21.68 apache2                                                                                                                                                                   
18129 biomart   20   0 6455m 6.1g 5000 S    0 38.9  65:06.56 apache2                                                                                                                                                                  
18717 biomart   20   0 2740m 2.5g 4756 S    0 15.8  37:22.20 apache2                                                                                                                                                                  
20272 biomart   20   0 2696m 2.4g 4644 S    0 15.6   5:13.81 apache2                                                                                                                                                                  
21959 biomart   20   0 2802m 2.5g 5224 S    0 16.2   4:42.07 apache2                                                                                                                                                                  
22244 biomart   20   0 2696m 2.4g 4668 S    0 15.6   4:22.48 apache2                                                                                                                                                                  
26621 biomart   20   0 2709m 2.4g 5172 S    0 15.6   2:44.31 apache2                                                                                                                                                                  
29221 biomart   20   0 2707m 2.4g 5264 S    0 15.6   0:17.17 apache2                                                                                                                                                                  
29223 biomart   20   0 2696m 2.4g 4756 S    0 15.6   0:27.52 apache2                                                                                                                                                                  
29225 biomart   20   0 2696m 2.4g 4760 S    0 15.6   0:27.84 apache2                                                                                                                                                                  
29230 biomart   20   0 2696m 2.4g 4844 S    0 15.6   0:28.46 apache2                                                                                                                                                                  
29234 biomart   20   0 2696m 2.4g 4656 S    0 15.6   0:27.45 apache2                                                                                                                                                                  
29236 biomart   20   0 2696m 2.4g 4844 S    0 15.6   0:27.99 apache2                                                                                                                                                                  
29237 biomart   20   0 2696m 2.4g 4844 S    0 15.6   0:28.00 apache2                                                                                                                                                                  
29493 biomart   20   0 2696m 2.4g 4708 S    0 15.6   0:29.50 apache2                                                                                                                                                                  
29495 biomart   20   0 2696m 2.4g 4748 S    0 15.6   1:50.94 apache2                                                                                                                                                                  









From: jzhang <[hidden email]>
Date: Thu, 10 Nov 2011 17:40:37 -0500
To: jzhang <[hidden email]>, "Kevin C. Dorff" <[hidden email]>, "[hidden email]" <[hidden email]>

Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Just received an alter from our monitoring system, both biomart servers became unresponsive. I will have to restart them and that will interrupt your running queries.

Junjun


From: jzhang <[hidden email]>
Date: Thu, 10 Nov 2011 17:22:11 -0500
To: "Kevin C. Dorff" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

I was watching the mysql query log for a little while, it isn't all that bad. I see a few queries on snp mart for different chromosomes.  The queries usually finishes around 30 seconds per batch. One batch contains 5000 records. There are some chromosomes should have done already. Are you still seeing slowness now?
Junjun

From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 16:50:58 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

I've modified my script to fetch these one chromosome at a time, as you mentioned. 

My fear is that given that the splits are very unbalanced (some chromosomes are clearly going to be much larger files than others) I'll still get timeouts. For instance, I am currently transferring chromosome "X" and it is exhibiting the same stalling / bursting behavior. 18 minutes so far and it's at 56.9MB (now at 22 minutes it is at 62.7MB) and just occasionally adding new data to the file but most of the time just sitting there transferring nothing. This feels to me like there is some flaw in the transfer system unless you've designed it to really throttle any transfers over a certain size and are throttling very, very aggressively. I'll review the transfer output tomorrow morning for timeouts, etc.

Kevin

On Thu, Nov 10, 2011 at 1:03 PM, Junjun Zhang <[hidden email]> wrote:
Hi Kevin,

BioMart 0.7 does not work well for handling large/long running queries (snp marts are large), recent high server load may have made things worse. There are two options you can use to alleviate to situation.

  1. Break the query down into multiple queries, say, one query per chromosome, you need to add a filter like: <Filter name = "chr_name" value = "1"/> to you query. This way, you can track the query more easily, and rerun the failed query separately.
  2. Use the email notification option at martview web GUI (this is not available for script driven queries).
Hope this helps, let us know how it goes.

Best regards,
Junjun


From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 12:09:20 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Hi Arek,

Thanks for responding. I saw that biomart.org was back up so I tried again. I started my script and I am seeing the exact same effect as before. It starts normally then after 10mb or so it will only periodically burst a bit of data across. In 34 minutes I am up to 113M (now at 37 minutes I am at 118M but most of the time is sent sending no data at all). This has been happening like this for a little over a week, but I cannot speak to when it might have started because it has been several months since I did these data transfers from martservice (this is something I only do a few times a year). Looking back over my logs, these larger files have timed out nearly every time, some around 5 hours, some at around more than 10.

If you could look into this for it, it would be greatly appreciated.

Thanks,
Kevin

On Thu, Nov 10, 2011 at 7:37 AM, Arek Kasprzyk <[hidden email]> wrote:
Hi Kevin
there seem to be some problems with the service recently and now biomart.org is down. The OICR team are working to restore the service. Once is restored please try again and let us know if you still are experiencing those problems and we'll be able to look into it in more details

a

On Wed, Nov 9, 2011 at 11:59 AM, Kevin C. Dorff <[hidden email]> wrote:
Hi,

I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.

I am using Curl to download the file. An example command line I would use that exhibits the problem is

curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice

Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)

query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="mmusculus_snp" interface = "default" ><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>

I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.

Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.

Any suggestions?
Kevin


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users






_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Junjun Zhang
Dear Kevin,

Thanks for your valuable feedback.

From: "Kevin C. Dorff" <[hidden email]>
Date: Fri, 11 Nov 2011 12:05:52 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Hi, Yes, this morning it is still very bursty and slow. Two of the chromosomes didn't transfer last night, possibly due to you restarting the server. I was able to transfer the two missing ones this morning. I can potentially re-write my script to keep trying missing pieces until all pieces transfer correctly. I've seen mention of a BioMart 0.8 - will this still have the same issues with large transfers?

Regarding the email notification, if this requires navigating the GUI than receiving an email and then going to download, this really isn't a great option for me. I need specific annotations and already have it scripted to grab the TSV files I need from martservice; my script knows the organism, the fields necessary, etc. and doesn't require a person who knows how to navigate the GUI, just run the script with a few parameters and it fetches the files.

Agreed!


I don't you don't ALREADY implement it, but it seems like a third option would be great... That I submit a query and it has an additional parameter that says I will download the query results when the file is created (similar to your email option in the GUI, but via martservice). The return at that point isn't a whole file but just a query-id. Your system then creates the file for download. At some point in the future my script can connect to mart service and ask if the result for the query-id is complete. If it is complete, I can then download it. If it is not yet complete, I can wait and ask again after a while. This would have the added advantage of when you create the file for download you could even gzip  the result, if requested, it to make the transfer faster.

Yes, this is exactly what we are doing. All query will be trackable, and status/result can be checked by client at any time.

Best regards,
Junjun



Kevin

On Thu, Nov 10, 2011 at 5:50 PM, Junjun Zhang <[hidden email]> wrote:
In one previous email, I mentioned it seemed ok at the db server side. With now the biomart web servers went unresponsive, it seems like the problem is at the biomart server side.

As I mentioned earlier, BioMart 0.7 is not able to handle high level query load well. Queries are not trackable, when the server gets rebooted, all running queries will be terminated.

Junjun


See the information top shows below, just to give you some idea how heavy the server load was at the time it stopped responding requests.

Server 1:

top - 17:41:08 up 13 days,  5:30,  2 users,  load average: 17.57, 17.57, 15.89
Tasks: 148 total,   1 running, 147 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.4%us,  2.3%sy,  0.0%ni, 69.8%id, 23.1%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  16472372k total, 16381820k used,    90552k free,    16880k buffers
Swap:        0k total,        0k used,        0k free,  6082432k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                  
17367 biomart   20   0 2705m 2.4g 5228 D    2 15.6   1:06.73 apache2                                                                                                                                                                   
17926 biomart   20   0 2771m 2.5g 4704 D    2 16.0   0:40.29 apache2                                                                                                                                                                   
17932 biomart   20   0 2696m 2.4g 4856 D    2 15.6   0:30.10 apache2                                                                                                                                                                   
18514 biomart   20   0 2696m 2.4g 4744 D    1 15.6   0:28.30 apache2                                                                                                                                                                   
18522 biomart   20   0 2760m 2.4g 4780 D    1 15.6   0:26.61 apache2                                                                                                                                                                   
18808 biomart   20   0 2696m 2.4g 4788 D    1 15.6   0:19.36 apache2                                                                                                                                                                   
19091 biomart   20   0 2703m 2.4g 5076 D    1 15.6   0:10.87 apache2                                                                                                                                                                   
19099 biomart   20   0 2696m 2.4g 4748 D    1 15.6   0:08.55 apache2                                                                                                                                                                   
19103 biomart   20   0 2696m 2.4g 4768 S    1 15.6   0:08.78 apache2                                                                                                                                                                   
19366 biomart   20   0 2699m 2.4g 4728 D    1 15.6   0:06.72 apache2                                                                                                                                                                   
 5812 biomart   20   0 2699m 2.4g 4752 D    1 15.6 190:12.44 apache2                                                                                                                                                                   
17911 biomart   20   0 2705m 2.4g 5096 D    1 15.6   0:55.96 apache2                                                                                                                                                                   
18524 biomart   20   0 2696m 2.4g 4716 D    1 15.6   0:27.75 apache2                                                                                                                                                                   
18544 biomart   20   0 2696m 2.4g 4736 D    1 15.6   0:19.10 apache2                                                                                                                                                                   
18822 biomart   20   0 2705m 2.4g 5140 D    1 15.6   0:15.19 apache2                                                                                                                                                                   
19105 biomart   20   0 2771m 2.4g 5192 D    1 15.6   0:07.97 apache2                                                                                                                                                                   
11513 biomart   20   0 2699m 2.4g 4936 S    1 15.6   2:12.54 apache2                                                                                                                                                                   
18810 biomart   20   0 2703m 2.4g 4960 D    1 15.6   0:13.70 apache2                                                                                                                                                                   
19107 biomart   20   0 2705m 2.4g 5152 D    1 15.6   0:09.45 apache2                                                                                                                                                                   
 6742 biomart   20   0 2699m 2.4g 5004 D    0 15.6   3:41.05 apache2                                                                                                                                                                   
 6917 biomart   20   0 6427m 6.1g 5116 S    0 38.7  66:20.79 apache2                                                                                                                                                                   
 7713 biomart   20   0 2784m 2.5g 4724 S    0 15.7  31:36.10 apache2                                                                                                                                                                   
 8997 biomart   20   0 2696m 2.4g 4640 S    0 15.6   5:29.66 apache2                                                                                                                                                                   
 9019 biomart   20   0 2705m 2.4g 5032 S    0 15.6   5:30.70 apache2                                                                                                                                                                   
18193 biomart   20   0 2696m 2.4g 4700 S    0 15.6   0:31.84 apache2                                                                                                                                                                   
18195 biomart   20   0 2705m 2.4g 5068 S    0 15.6   0:32.33 apache2                                                                                                                                                                   
18203 biomart   20   0 2696m 2.4g 4700 S    0 15.6   0:29.34 apache2                                                                                                                                                                   
18207 biomart   20   0 2696m 2.4g 4700 S    0 15.6   0:29.27 apache2                                                                                                                                                                   
18209 biomart   20   0 2696m 2.4g 4776 S    0 15.6   0:29.63 apache2                                                                                                                                                                   
18245 biomart   20   0 2696m 2.4g 4848 S    0 15.6   0:29.72 apache2                                                                                                                                                                   

Server 2:

top - 17:41:33 up 9 days, 10:23,  2 users,  load average: 14.76, 15.20, 14.32
Tasks: 146 total,   2 running, 144 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.5%us,  3.3%sy,  0.0%ni, 71.8%id, 22.2%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:  16472372k total, 12159968k used,  4312404k free,    19040k buffers
Swap:        0k total,        0k used,        0k free,  2054872k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                 
29830 biomart   20   0 2705m 2.4g 5212 D    2 15.6   0:16.99 apache2                                                                                                                                                                   
29832 biomart   20   0 2696m 2.4g 4724 D    2 15.6   0:16.37 apache2                                                                                                                                                                   
29213 biomart   20   0 2697m 2.4g 4708 D    2 15.6   0:43.90 apache2                                                                                                                                                                   
29506 biomart   20   0 2699m 2.4g 4824 D    2 15.6   0:34.44 apache2                                                                                                                                                                   
29819 biomart   20   0 2705m 2.4g 5068 D    2 15.6   0:18.09 apache2                                                                                                                                                                   
29209 biomart   20   0 2761m 2.4g 4776 D    1 15.6   0:46.84 apache2                                                                                                                                                                   
29803 biomart   20   0 2703m 2.4g 4940 D    1 15.6   0:39.59 apache2                                                                                                                                                                   
30098 biomart   20   0 2705m 2.4g 5104 D    1 15.6   0:13.27 apache2                                                                                                                                                                   
23587 biomart   20   0 2763m 2.4g 4996 D    1 15.6   1:35.93 apache2                                                                                                                                                                   
29185 biomart   20   0 2697m 2.4g 4708 D    1 15.6   0:48.51 apache2                                                                                                                                                                   
29524 biomart   20   0 2696m 2.4g 4708 D    1 15.6   0:19.06 apache2                                                                                                                                                                   
30108 biomart   20   0 2696m 2.4g 4752 D    1 15.6   0:10.83 apache2                                                                                                                                                                   
30110 biomart   20   0 2696m 2.4g 4752 D    1 15.6   0:10.61 apache2                                                                                                                                                                   
28916 biomart   20   0 2709m 2.4g 4984 R    1 15.6   7:33.73 apache2                                                                                                                                                                   
29813 biomart   20   0 2696m 2.4g 4704 D    1 15.6   0:21.68 apache2                                                                                                                                                                   
18129 biomart   20   0 6455m 6.1g 5000 S    0 38.9  65:06.56 apache2                                                                                                                                                                  
18717 biomart   20   0 2740m 2.5g 4756 S    0 15.8  37:22.20 apache2                                                                                                                                                                  
20272 biomart   20   0 2696m 2.4g 4644 S    0 15.6   5:13.81 apache2                                                                                                                                                                  
21959 biomart   20   0 2802m 2.5g 5224 S    0 16.2   4:42.07 apache2                                                                                                                                                                  
22244 biomart   20   0 2696m 2.4g 4668 S    0 15.6   4:22.48 apache2                                                                                                                                                                  
26621 biomart   20   0 2709m 2.4g 5172 S    0 15.6   2:44.31 apache2                                                                                                                                                                  
29221 biomart   20   0 2707m 2.4g 5264 S    0 15.6   0:17.17 apache2                                                                                                                                                                  
29223 biomart   20   0 2696m 2.4g 4756 S    0 15.6   0:27.52 apache2                                                                                                                                                                  
29225 biomart   20   0 2696m 2.4g 4760 S    0 15.6   0:27.84 apache2                                                                                                                                                                  
29230 biomart   20   0 2696m 2.4g 4844 S    0 15.6   0:28.46 apache2                                                                                                                                                                  
29234 biomart   20   0 2696m 2.4g 4656 S    0 15.6   0:27.45 apache2                                                                                                                                                                  
29236 biomart   20   0 2696m 2.4g 4844 S    0 15.6   0:27.99 apache2                                                                                                                                                                  
29237 biomart   20   0 2696m 2.4g 4844 S    0 15.6   0:28.00 apache2                                                                                                                                                                  
29493 biomart   20   0 2696m 2.4g 4708 S    0 15.6   0:29.50 apache2                                                                                                                                                                  
29495 biomart   20   0 2696m 2.4g 4748 S    0 15.6   1:50.94 apache2                                                                                                                                                                  









From: jzhang <[hidden email]>
Date: Thu, 10 Nov 2011 17:40:37 -0500
To: jzhang <[hidden email]>, "Kevin C. Dorff" <[hidden email]>, "[hidden email]" <[hidden email]>

Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Just received an alter from our monitoring system, both biomart servers became unresponsive. I will have to restart them and that will interrupt your running queries.

Junjun


From: jzhang <[hidden email]>
Date: Thu, 10 Nov 2011 17:22:11 -0500
To: "Kevin C. Dorff" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

I was watching the mysql query log for a little while, it isn't all that bad. I see a few queries on snp mart for different chromosomes.  The queries usually finishes around 30 seconds per batch. One batch contains 5000 records. There are some chromosomes should have done already. Are you still seeing slowness now?
Junjun

From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 16:50:58 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

I've modified my script to fetch these one chromosome at a time, as you mentioned. 

My fear is that given that the splits are very unbalanced (some chromosomes are clearly going to be much larger files than others) I'll still get timeouts. For instance, I am currently transferring chromosome "X" and it is exhibiting the same stalling / bursting behavior. 18 minutes so far and it's at 56.9MB (now at 22 minutes it is at 62.7MB) and just occasionally adding new data to the file but most of the time just sitting there transferring nothing. This feels to me like there is some flaw in the transfer system unless you've designed it to really throttle any transfers over a certain size and are throttling very, very aggressively. I'll review the transfer output tomorrow morning for timeouts, etc.

Kevin

On Thu, Nov 10, 2011 at 1:03 PM, Junjun Zhang <[hidden email]> wrote:
Hi Kevin,

BioMart 0.7 does not work well for handling large/long running queries (snp marts are large), recent high server load may have made things worse. There are two options you can use to alleviate to situation.

  1. Break the query down into multiple queries, say, one query per chromosome, you need to add a filter like: <Filter name = "chr_name" value = "1"/> to you query. This way, you can track the query more easily, and rerun the failed query separately.
  2. Use the email notification option at martview web GUI (this is not available for script driven queries).
Hope this helps, let us know how it goes.

Best regards,
Junjun


From: "Kevin C. Dorff" <[hidden email]>
Date: Thu, 10 Nov 2011 12:09:20 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Transfer very, very slow from martservice on large-ish requests

Hi Arek,

Thanks for responding. I saw that biomart.org was back up so I tried again. I started my script and I am seeing the exact same effect as before. It starts normally then after 10mb or so it will only periodically burst a bit of data across. In 34 minutes I am up to 113M (now at 37 minutes I am at 118M but most of the time is sent sending no data at all). This has been happening like this for a little over a week, but I cannot speak to when it might have started because it has been several months since I did these data transfers from martservice (this is something I only do a few times a year). Looking back over my logs, these larger files have timed out nearly every time, some around 5 hours, some at around more than 10.

If you could look into this for it, it would be greatly appreciated.

Thanks,
Kevin

On Thu, Nov 10, 2011 at 7:37 AM, Arek Kasprzyk <[hidden email]> wrote:
Hi Kevin
there seem to be some problems with the service recently and now biomart.org is down. The OICR team are working to restore the service. Once is restored please try again and let us know if you still are experiencing those problems and we'll be able to look into it in more details

a

On Wed, Nov 9, 2011 at 11:59 AM, Kevin C. Dorff <[hidden email]> wrote:
Hi,

I periodically grab annotations files in TSV format using martservice via XML. One of the three files I transfer is relatively large (>500MB). It starts transferring at a normal speed but before too far into the file (10MB or so?) the transfer speed just bottoms out and then periodically bursts a little bit of data at a time before stopping transfer for a while again. The transfer that previously took maybe a couple hours now can take 10-20 hours, it seems, or worse, the connection just times out and after 10+ hours of transferring data I end up with an incomplete file.

I am using Curl to download the file. An example command line I would use that exhibits the problem is

curl -o var-annotations-unsorted.tsv.body --tr-encoding --verbose -d @query.xml http://www.biomart.org/biomart/martservice

Where query.xml contains the data (but the XML portion is URLEncoded per the directions by Curl)

query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query virtualSchemaName="default" formatter="TSV" header="0" un
iqueRows="0" count="" datasetConfigVersion="0.6" ><Dataset name="mmusculus_snp" interface = "default" ><Attribute name="chr_
name"/><Attribute name="chrom_start"/><Attribute name="refsnp_id"/><Attribute name="ensembl_gene_stable_id"/><Attribute name
="consequence_type_tv"/></Dataset></Query>

I've tried this from both my work network and my home network to verify it wasn't an issue with our work network, and the same throttling behavior is exhibited. I'd be somewhat less concerned of it taking 10+ hours to complete the transfers didn't sometimes timeout after many hours of transfer.

Secondarily, I was hoping to speed up the transfer by providing the options "--tr-encoding" or "--compressed" options in Curl, which would allow the server to send the file over the wire as gzip, but it seems your server doesn't support this, which is too bad because that could easily cut down the number of bytes transferred by a factor of 10 or more. I've tried both options and neither seem to do anything with the martservice servers. Is there some other option I could specify that would compress the data over the wire or before transfer? I can handle nearly any file format on my side and would do nearly anything you offer to speed up these transfers.

Any suggestions?
Kevin


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users






_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users