biomaRt package in Bioconductor does not close remote connections

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

biomaRt package in Bioconductor does not close remote connections

mxlaakso
Hello,

I have been using biomaRt package from Bioconductor to fetch some
biological annotations. What I have notice this week is that getBM() calls
leak TCP connections (probably via Curl). I have a loop that makes calls
such as:

annotations <- getBM(attributes=attributes,
                     filter    =filter.types,
                     values    =filter.value,
                     mart      =mart)

I can see each request creating a new open connection when I execute this
loop and monitor the open connections using 'lsof' program. The whole loop
crashes after 1000 iterations because that exceeds the limit of allowed
parallel connections. Loops with less than 1000 iterations are completed
with correct results although the connections are left open.

I have also tried to use curl parameter do that I first call:
curlHandle <- getCurlHandle()
then I use this handle for the getBM() call but that does not change
anything. Should I apply some kind of close call each each iteration?

Package:        biomaRt
Version:        2.4.0
Packaged:       2010-04-22 22:52:44 UTC; biocbuild
Built:          R 2.11.0; ; 2010-04-27 12:27:46 UTC; unix


Package:              RCurl
Version:              1.4-3
Date/Publication:     2010-07-25 12:15:39
Built:                R 2.11.1; x86_64-pc-linux-gnu; 2010-09-23
                      10:54:07 UTC; unix



Example output for the COSMIC Biomart:
  MART_NAME = "CosmicMart"
  MART_HOST = "www.sanger.ac.uk"
  MART_PATH = "/genetics/CGP/cosmic/biomart/martservice"
  MART_DSET = "COSMIC48"

$ lsof | grep sanger.ac
.
.
.
R         19974       myname  259u     IPv4             137937      0t0    
   TCP myhost:48226->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  260u     IPv4             137971      0t0    
   TCP myhost:48228->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  261u     IPv4             137984      0t0    
   TCP myhost:48230->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  262u     IPv4             138004      0t0    
   TCP myhost:48233->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  263u     IPv4             138016      0t0    
   TCP myhost:48235->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  264u     IPv4             138032      0t0    
   TCP myhost:48239->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  265u     IPv4             138077      0t0    
   TCP myhost:45214->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  266u     IPv4             138102      0t0    
   TCP myhost:45228->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  267u     IPv4             138116      0t0    
   TCP myhost:45230->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  268u     IPv4             138123      0t0    
   TCP myhost:48263->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  269u     IPv4             138135      0t0    
   TCP myhost:48265->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  270u     IPv4             138147      0t0    
   TCP myhost:48267->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  271u     IPv4             138185      0t0    
   TCP myhost:48272->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  272u     IPv4             138198      0t0    
   TCP myhost:48274->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  273u     IPv4             138210      0t0    
   TCP myhost:48276->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  274u     IPv4             138226      0t0    
   TCP myhost:48282->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  275u     IPv4             138246      0t0    
   TCP myhost:48284->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  276u     IPv4             138258      0t0    
   TCP myhost:48286->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
R         19974       myname  277u     IPv4             138272      0t0    
   TCP myhost:48288->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
R         19974       myname  278u     IPv4             138527      0t0    
   TCP myhost:48290->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
R         19974       myname  279u     IPv4             138533      0t0    
   TCP myhost:45259->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)
R         19974       myname  280u     IPv4             138545      0t0    
   TCP myhost:45261->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)
R         19974       myname  281u     IPv4             138557      0t0    
   TCP myhost:45263->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)


The final error message that I'll obtain after 1000 open connections is:
[STDERR] Error in value[[3L]](cond) :
[STDERR]   Request to BioMart web service failed. Verify if you are still
connected to the internet.  Alternatively the BioMart web service is
temporarily down.
[STDERR] Calls: main ... tryCatch -> tryCatchList -> tryCatchOne ->
<Anonymous>
[STDERR] Error during wrapup: cannot open the connection
[STDERR] Execution halted

My R version is 2.11.1 (2010-05-31).

Best regards,
Marko Laakso
Reply | Threaded
Open this post in threaded view
|

Re: biomaRt package in Bioconductor does not close remote connections

Steffen Durinck
Hi Marko,

Thank you for reporting this issue. It is definitely something that has to be fixed asap.
That said, it is usually recommended to use biomaRt for batch queries and not in loops.
Is the any chance you can query for all the annotations you need at once and then loop in R over the result?
This will make you need only a few biomaRt queries and will avoid the TCP connection leak.

Let me know if you need help converting your 1000+ queries into one batch query.

Cheers,
Steffen

On Tue, Sep 28, 2010 at 11:59 PM, mxlaakso <[hidden email]> wrote:
Hello,

I have been using biomaRt package from Bioconductor to fetch some
biological annotations. What I have notice this week is that getBM() calls
leak TCP connections (probably via Curl). I have a loop that makes calls
such as:

annotations <- getBM(attributes=attributes,
                    filter    =filter.types,
                    values    =filter.value,
                    mart      =mart)

I can see each request creating a new open connection when I execute this
loop and monitor the open connections using 'lsof' program. The whole loop
crashes after 1000 iterations because that exceeds the limit of allowed
parallel connections. Loops with less than 1000 iterations are completed
with correct results although the connections are left open.

I have also tried to use curl parameter do that I first call:
curlHandle <- getCurlHandle()
then I use this handle for the getBM() call but that does not change
anything. Should I apply some kind of close call each each iteration?

Package:        biomaRt
Version:        2.4.0
Packaged:       2010-04-22 22:52:44 UTC; biocbuild
Built:          R 2.11.0; ; 2010-04-27 12:27:46 UTC; unix


Package:              RCurl
Version:              1.4-3
Date/Publication:     2010-07-25 12:15:39
Built:                R 2.11.1; x86_64-pc-linux-gnu; 2010-09-23
                     10:54:07 UTC; unix



Example output for the COSMIC Biomart:
 MART_NAME = "CosmicMart"
 MART_HOST = "www.sanger.ac.uk"
 MART_PATH = "/genetics/CGP/cosmic/biomart/martservice"
 MART_DSET = "COSMIC48"

$ lsof | grep sanger.ac
.
.
.
R         19974       myname  259u     IPv4             137937      0t0
  TCP myhost:48226->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  260u     IPv4             137971      0t0
  TCP myhost:48228->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  261u     IPv4             137984      0t0
  TCP myhost:48230->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  262u     IPv4             138004      0t0
  TCP myhost:48233->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  263u     IPv4             138016      0t0
  TCP myhost:48235->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  264u     IPv4             138032      0t0
  TCP myhost:48239->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  265u     IPv4             138077      0t0
  TCP myhost:45214->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  266u     IPv4             138102      0t0
  TCP myhost:45228->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  267u     IPv4             138116      0t0
  TCP myhost:45230->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  268u     IPv4             138123      0t0
  TCP myhost:48263->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  269u     IPv4             138135      0t0
  TCP myhost:48265->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  270u     IPv4             138147      0t0
  TCP myhost:48267->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  271u     IPv4             138185      0t0
  TCP myhost:48272->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  272u     IPv4             138198      0t0
  TCP myhost:48274->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  273u     IPv4             138210      0t0
  TCP myhost:48276->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  274u     IPv4             138226      0t0
  TCP myhost:48282->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  275u     IPv4             138246      0t0
  TCP myhost:48284->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R         19974       myname  276u     IPv4             138258      0t0
  TCP myhost:48286->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
R         19974       myname  277u     IPv4             138272      0t0
  TCP myhost:48288->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
R         19974       myname  278u     IPv4             138527      0t0
  TCP myhost:48290->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
R         19974       myname  279u     IPv4             138533      0t0
  TCP myhost:45259->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)
R         19974       myname  280u     IPv4             138545      0t0
  TCP myhost:45261->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)
R         19974       myname  281u     IPv4             138557      0t0
  TCP myhost:45263->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)


The final error message that I'll obtain after 1000 open connections is:
[STDERR] Error in value[[3L]](cond) :
[STDERR]   Request to BioMart web service failed. Verify if you are still
connected to the internet.  Alternatively the BioMart web service is
temporarily down.
[STDERR] Calls: main ... tryCatch -> tryCatchList -> tryCatchOne ->
<Anonymous>
[STDERR] Error during wrapup: cannot open the connection
[STDERR] Execution halted

My R version is 2.11.1 (2010-05-31).

Best regards,
Marko Laakso

Reply | Threaded
Open this post in threaded view
|

Re: biomaRt package in Bioconductor does not close remote connections

mxlaakso
Hi Steffen,

Thank you for your response!

I have been using batch queries usually but could not do that for now. I
would need attributes from two different pages in order to identify the
matches between the filters and the attributes uniquely.

Do you think that this is a server side issue (the servers would not close
their streams)?

Br,
Marko

On Wed, 29 Sep 2010 09:07:17 -0700, Steffen Durinck <[hidden email]>
wrote:
> Hi Marko,
>
> Thank you for reporting this issue. It is definitely something that has
to

> be fixed asap.
> That said, it is usually recommended to use biomaRt for batch queries and
> not in loops.
> Is the any chance you can query for all the annotations you need at once
> and
> then loop in R over the result?
> This will make you need only a few biomaRt queries and will avoid the TCP
> connection leak.
>
> Let me know if you need help converting your 1000+ queries into one batch
> query.
>
> Cheers,
> Steffen
>
> On Tue, Sep 28, 2010 at 11:59 PM, mxlaakso <[hidden email]>
wrote:

>
>> Hello,
>>
>> I have been using biomaRt package from Bioconductor to fetch some
>> biological annotations. What I have notice this week is that getBM()
>> calls
>> leak TCP connections (probably via Curl). I have a loop that makes calls
>> such as:
>>
>> annotations <- getBM(attributes=attributes,
>>                     filter    =filter.types,
>>                     values    =filter.value,
>>                     mart      =mart)
>>
>> I can see each request creating a new open connection when I execute
this

>> loop and monitor the open connections using 'lsof' program. The whole
>> loop
>> crashes after 1000 iterations because that exceeds the limit of allowed
>> parallel connections. Loops with less than 1000 iterations are completed
>> with correct results although the connections are left open.
>>
>> I have also tried to use curl parameter do that I first call:
>> curlHandle <- getCurlHandle()
>> then I use this handle for the getBM() call but that does not change
>> anything. Should I apply some kind of close call each each iteration?
>>
>> Package:        biomaRt
>> Version:        2.4.0
>> Packaged:       2010-04-22 22:52:44 UTC; biocbuild
>> Built:          R 2.11.0; ; 2010-04-27 12:27:46 UTC; unix
>>
>>
>> Package:              RCurl
>> Version:              1.4-3
>> Date/Publication:     2010-07-25 12:15:39
>> Built:                R 2.11.1; x86_64-pc-linux-gnu; 2010-09-23
>>                      10:54:07 UTC; unix
>>
>>
>>
>> Example output for the COSMIC Biomart:
>>  MART_NAME = "CosmicMart"
>>  MART_HOST = "www.sanger.ac.uk"
>>  MART_PATH = "/genetics/CGP/cosmic/biomart/martservice"
>>  MART_DSET = "COSMIC48"
>>
>> $ lsof | grep sanger.ac
>> .
>> .
>> .
>> R         19974       myname  259u     IPv4             137937      0t0
>>   TCP myhost:48226->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  260u     IPv4             137971      0t0
>>   TCP myhost:48228->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  261u     IPv4             137984      0t0
>>   TCP myhost:48230->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  262u     IPv4             138004      0t0
>>   TCP myhost:48233->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  263u     IPv4             138016      0t0
>>   TCP myhost:48235->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  264u     IPv4             138032      0t0
>>   TCP myhost:48239->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  265u     IPv4             138077      0t0
>>   TCP myhost:45214->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  266u     IPv4             138102      0t0
>>   TCP myhost:45228->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  267u     IPv4             138116      0t0
>>   TCP myhost:45230->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  268u     IPv4             138123      0t0
>>   TCP myhost:48263->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  269u     IPv4             138135      0t0
>>   TCP myhost:48265->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  270u     IPv4             138147      0t0
>>   TCP myhost:48267->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  271u     IPv4             138185      0t0
>>   TCP myhost:48272->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  272u     IPv4             138198      0t0
>>   TCP myhost:48274->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  273u     IPv4             138210      0t0
>>   TCP myhost:48276->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  274u     IPv4             138226      0t0
>>   TCP myhost:48282->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  275u     IPv4             138246      0t0
>>   TCP myhost:48284->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
>> R         19974       myname  276u     IPv4             138258      0t0
>>   TCP myhost:48286->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
>> R         19974       myname  277u     IPv4             138272      0t0
>>   TCP myhost:48288->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
>> R         19974       myname  278u     IPv4             138527      0t0
>>   TCP myhost:48290->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
>> R         19974       myname  279u     IPv4             138533      0t0
>>   TCP myhost:45259->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)
>> R         19974       myname  280u     IPv4             138545      0t0
>>   TCP myhost:45261->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)
>> R         19974       myname  281u     IPv4             138557      0t0
>>   TCP myhost:45263->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)
>>
>>
>> The final error message that I'll obtain after 1000 open connections is:
>> [STDERR] Error in value[[3L]](cond) :
>> [STDERR]   Request to BioMart web service failed. Verify if you are
still

>> connected to the internet.  Alternatively the BioMart web service is
>> temporarily down.
>> [STDERR] Calls: main ... tryCatch -> tryCatchList -> tryCatchOne ->
>> <Anonymous>
>> [STDERR] Error during wrapup: cannot open the connection
>> [STDERR] Execution halted
>>
>> My R version is 2.11.1 (2010-05-31).
>>
>> Best regards,
>> Marko Laakso
>>