[biomart-users] local server install

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[biomart-users] local server install

lynchjosh02

Good morning, I’ve been using R and the bioconductor bioMart package to make queries of long gene-ID lists for 3’ UTRs using getBM(). Until recently this has worked fine. Unfortunately during the past week the response time per query has been around 15-30s. Here is the setup:

 

library("biomaRt")

> ensembl <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",

+                    host = "www.ensembl.org",

+                    path = "/biomart/martservice",

+                    dataset = "mmusculus_gene_ensembl")

> attrib = c("ensembl_gene_id", "ensembl_transcript_id", "3utr")

> filts = "ensembl_gene_id"

 

And the response:

 

system.time(getBM(attrib, filters = filts, 
+                              values = "ENSMUSG00000066475",  mart = ensembl))
   user  system elapsed 
   0.03    0.00   29.89 

 

Our network connection seems fine and I think the bottleneck is on the database end. I’m therefore interested in installing bioMart locally. Again, my goal here is to run the same command but to access a database locally in the interest of time. Is bioMart 0.8 and the instructions here:  http://www.biomart.org/other/rc6_documentation.pdf what I am looking for? Thanks for your help. 

-Josh

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] local server install

Syed Haider-5
Josh,

Try changing to host to www.biomart.org instead and see if that is any faster ? installing mart databases just for this might be over-investment unless these queries are generated in huge numbers per day, on your end.

Syed


On 3 February 2014 20:12, <[hidden email]> wrote:

Good morning, I’ve been using R and the bioconductor bioMart package to make queries of long gene-ID lists for 3’ UTRs using getBM(). Until recently this has worked fine. Unfortunately during the past week the response time per query has been around 15-30s. Here is the setup:

 

library("biomaRt")

> ensembl <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",

+                    host = "www.ensembl.org",

+                    path = "/biomart/martservice",

+                    dataset = "mmusculus_gene_ensembl")

> attrib = c("ensembl_gene_id", "ensembl_transcript_id", "3utr")

> filts = "ensembl_gene_id"

 

And the response:

 

system.time(getBM(attrib, filters = filts, 
+                              values = "ENSMUSG00000066475",  mart = ensembl))
   user  system elapsed 
   0.03    0.00   29.89 

 

Our network connection seems fine and I think the bottleneck is on the database end. I’m therefore interested in installing bioMart locally. Again, my goal here is to run the same command but to access a database locally in the interest of time. Is bioMart 0.8 and the instructions here:  http://www.biomart.org/other/rc6_documentation.pdf what I am looking for? Thanks for your help. 

-Josh

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.



--
<font color="green">Please consider environment before you print this!</font>

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

[biomart-users] Re: local server install

lynchjosh02
In reply to this post by lynchjosh02
A similar query was about 20s. So quite similar. The lists I'm working with are in the ballpark of 23k lines long. I've been breaking them up into smaller pieces. But that is still about 6 days worth of processing time. So a few of these will add up to weeks worth of waiting. Although I'm not doing this analysis regularly, and I'm the only person that will be accessing the database. 
       So I guess the question becomes just how much expense and time will be spent deploying a server? I'm not interested in the entire database, just the species I use frequently. I do not have a sense of how much of a headache this will be and whether or not is is worth the effort. So your expertise and insight is greatly appreciated!
-Josh


On Monday, February 3, 2014 1:12:01 PM UTC-7, [hidden email] wrote:

Good morning, I’ve been using R and the bioconductor bioMart package to make queries of long gene-ID lists for 3’ UTRs using getBM(). Until recently this has worked fine. Unfortunately during the past week the response time per query has been around 15-30s. Here is the setup:

 

library("biomaRt")

> ensembl <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",

+                    host = "<a href="http://www.ensembl.org" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.ensembl.org\46sa\75D\46sntz\0751\46usg\75AFQjCNEhRkMjGzJ0Ux5vBUiT-NWHb0ycIg';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.ensembl.org\46sa\75D\46sntz\0751\46usg\75AFQjCNEhRkMjGzJ0Ux5vBUiT-NWHb0ycIg';return true;">www.ensembl.org",

+                    path = "/biomart/martservice",

+                    dataset = "mmusculus_gene_ensembl")

> attrib = c("ensembl_gene_id", "ensembl_transcript_id", "3utr")

> filts = "ensembl_gene_id"

 

And the response:

 

system.time(getBM(attrib, filters = filts, 
+                              values = "ENSMUSG00000066475",  mart = ensembl))
   user  system elapsed 
   0.03    0.00   29.89 

 

Our network connection seems fine and I think the bottleneck is on the database end. I’m therefore interested in installing bioMart locally. Again, my goal here is to run the same command but to access a database locally in the interest of time. Is bioMart 0.8 and the instructions here:  <a href="http://www.biomart.org/other/rc6_documentation.pdf" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.biomart.org%2Fother%2Frc6_documentation.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNHo_NXALK4gz6vKe_ue6Edq-LPmdA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.biomart.org%2Fother%2Frc6_documentation.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNHo_NXALK4gz6vKe_ue6Edq-LPmdA';return true;">http://www.biomart.org/other/rc6_documentation.pdf what I am looking for? Thanks for your help. 

-Josh

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Re: local server install

Arek Kasprzyk
Hi Josh,
Another suggestion:
If your 'query pattern' is fairly repetitive you may consider downloading two data tables from the ensembl ftp site and processing them locally. 
If this  is something that interest you, I'll point you to the right tables

a





On 4 February 2014 03:52, <[hidden email]> wrote:
A similar query was about 20s. So quite similar. The lists I'm working with are in the ballpark of 23k lines long. I've been breaking them up into smaller pieces. But that is still about 6 days worth of processing time. So a few of these will add up to weeks worth of waiting. Although I'm not doing this analysis regularly, and I'm the only person that will be accessing the database. 
       So I guess the question becomes just how much expense and time will be spent deploying a server? I'm not interested in the entire database, just the species I use frequently. I do not have a sense of how much of a headache this will be and whether or not is is worth the effort. So your expertise and insight is greatly appreciated!
-Josh


On Monday, February 3, 2014 1:12:01 PM UTC-7, [hidden email] wrote:

Good morning, I’ve been using R and the bioconductor bioMart package to make queries of long gene-ID lists for 3’ UTRs using getBM(). Until recently this has worked fine. Unfortunately during the past week the response time per query has been around 15-30s. Here is the setup:

 

library("biomaRt")

> ensembl <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",

+                    host = "www.ensembl.org",

+                    path = "/biomart/martservice",

+                    dataset = "mmusculus_gene_ensembl")

> attrib = c("ensembl_gene_id", "ensembl_transcript_id", "3utr")

> filts = "ensembl_gene_id"

 

And the response:

 

system.time(getBM(attrib, filters = filts, 
+                              values = "ENSMUSG00000066475",  mart = ensembl))
   user  system elapsed 
   0.03    0.00   29.89 

 

Our network connection seems fine and I think the bottleneck is on the database end. I’m therefore interested in installing bioMart locally. Again, my goal here is to run the same command but to access a database locally in the interest of time. Is bioMart 0.8 and the instructions here:  http://www.biomart.org/other/rc6_documentation.pdf what I am looking for? Thanks for your help. 

-Josh

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.



--


"In prosperity, our friends know us; in adversity, we know our friends"

― John Churton Collins 



--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

[biomart-users] Re: local server install

lynchjosh02
In reply to this post by lynchjosh02
OK. I might be interested in the tables. However, I reread (if you can believe that) the user notes and some other online posts. I saw one post (I think from you guys) suggesting that folks do their queries in batches rather than in a for loop. I ran one batch and the result was not different; but another was considerably faster. So how about a much easier question: Can you give me a suggestion or limitation for batch length? I'm guessing 23,000 is too big. Based on your suggestion I'll try that and circle back if it still takes too much time. Thanks again!

-josh 



On Monday, February 3, 2014 1:12:01 PM UTC-7, [hidden email] wrote:

Good morning, I’ve been using R and the bioconductor bioMart package to make queries of long gene-ID lists for 3’ UTRs using getBM(). Until recently this has worked fine. Unfortunately during the past week the response time per query has been around 15-30s. Here is the setup:

 

library("biomaRt")

> ensembl <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",

+                    host = "<a href="http://www.ensembl.org" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.ensembl.org\46sa\75D\46sntz\0751\46usg\75AFQjCNEhRkMjGzJ0Ux5vBUiT-NWHb0ycIg';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.ensembl.org\46sa\75D\46sntz\0751\46usg\75AFQjCNEhRkMjGzJ0Ux5vBUiT-NWHb0ycIg';return true;">www.ensembl.org",

+                    path = "/biomart/martservice",

+                    dataset = "mmusculus_gene_ensembl")

> attrib = c("ensembl_gene_id", "ensembl_transcript_id", "3utr")

> filts = "ensembl_gene_id"

 

And the response:

 

system.time(getBM(attrib, filters = filts, 
+                              values = "ENSMUSG00000066475",  mart = ensembl))
   user  system elapsed 
   0.03    0.00   29.89 

 

Our network connection seems fine and I think the bottleneck is on the database end. I’m therefore interested in installing bioMart locally. Again, my goal here is to run the same command but to access a database locally in the interest of time. Is bioMart 0.8 and the instructions here:  <a href="http://www.biomart.org/other/rc6_documentation.pdf" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.biomart.org%2Fother%2Frc6_documentation.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNHo_NXALK4gz6vKe_ue6Edq-LPmdA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.biomart.org%2Fother%2Frc6_documentation.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNHo_NXALK4gz6vKe_ue6Edq-LPmdA';return true;">http://www.biomart.org/other/rc6_documentation.pdf what I am looking for? Thanks for your help. 

-Josh

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Re: local server install

Arek Kasprzyk
Hi Josh,
I somehow missed the 23K from your emails. It would certainly help if you could send it in smaller chunks. Perhaps 1K at a time?

a


On 4 February 2014 17:17, <[hidden email]> wrote:
OK. I might be interested in the tables. However, I reread (if you can believe that) the user notes and some other online posts. I saw one post (I think from you guys) suggesting that folks do their queries in batches rather than in a for loop. I ran one batch and the result was not different; but another was considerably faster. So how about a much easier question: Can you give me a suggestion or limitation for batch length? I'm guessing 23,000 is too big. Based on your suggestion I'll try that and circle back if it still takes too much time. Thanks again!

-josh 




On Monday, February 3, 2014 1:12:01 PM UTC-7, [hidden email] wrote:

Good morning, I’ve been using R and the bioconductor bioMart package to make queries of long gene-ID lists for 3’ UTRs using getBM(). Until recently this has worked fine. Unfortunately during the past week the response time per query has been around 15-30s. Here is the setup:

 

library("biomaRt")

> ensembl <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",

+                    host = "www.ensembl.org",

+                    path = "/biomart/martservice",

+                    dataset = "mmusculus_gene_ensembl")

> attrib = c("ensembl_gene_id", "ensembl_transcript_id", "3utr")

> filts = "ensembl_gene_id"

 

And the response:

 

system.time(getBM(attrib, filters = filts, 
+                              values = "ENSMUSG00000066475",  mart = ensembl))
   user  system elapsed 
   0.03    0.00   29.89 

 

Our network connection seems fine and I think the bottleneck is on the database end. I’m therefore interested in installing bioMart locally. Again, my goal here is to run the same command but to access a database locally in the interest of time. Is bioMart 0.8 and the instructions here:  http://www.biomart.org/other/rc6_documentation.pdf what I am looking for? Thanks for your help. 

-Josh

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.



--


"In prosperity, our friends know us; in adversity, we know our friends"

― John Churton Collins 



--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

[biomart-users] Re: local server install

lynchjosh02
In reply to this post by lynchjosh02
Well- I took a leap of faith after a batch of 100 and jumped to a batch of 10,000. Success! I think this only took minutes. So I'm really impressed. I do have one follow up question though. The order of the final list (gene-ID, transcript-ID and 3' UTR) is not in the same order as they were in the batch query. One of the first 10 on the list is not around 14,000. Does the system return these in the order it is able to cross reference them? Thanks for your help. 

-Josh

On Monday, February 3, 2014 1:12:01 PM UTC-7, [hidden email] wrote:

Good morning, I’ve been using R and the bioconductor bioMart package to make queries of long gene-ID lists for 3’ UTRs using getBM(). Until recently this has worked fine. Unfortunately during the past week the response time per query has been around 15-30s. Here is the setup:

 

library("biomaRt")

> ensembl <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",

+                    host = "<a href="http://www.ensembl.org" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.ensembl.org\46sa\75D\46sntz\0751\46usg\75AFQjCNEhRkMjGzJ0Ux5vBUiT-NWHb0ycIg';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.ensembl.org\46sa\75D\46sntz\0751\46usg\75AFQjCNEhRkMjGzJ0Ux5vBUiT-NWHb0ycIg';return true;">www.ensembl.org",

+                    path = "/biomart/martservice",

+                    dataset = "mmusculus_gene_ensembl")

> attrib = c("ensembl_gene_id", "ensembl_transcript_id", "3utr")

> filts = "ensembl_gene_id"

 

And the response:

 

system.time(getBM(attrib, filters = filts, 
+                              values = "ENSMUSG00000066475",  mart = ensembl))
   user  system elapsed 
   0.03    0.00   29.89 

 

Our network connection seems fine and I think the bottleneck is on the database end. I’m therefore interested in installing bioMart locally. Again, my goal here is to run the same command but to access a database locally in the interest of time. Is bioMart 0.8 and the instructions here:  <a href="http://www.biomart.org/other/rc6_documentation.pdf" target="_blank" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.biomart.org%2Fother%2Frc6_documentation.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNHo_NXALK4gz6vKe_ue6Edq-LPmdA';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fwww.biomart.org%2Fother%2Frc6_documentation.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNHo_NXALK4gz6vKe_ue6Edq-LPmdA';return true;">http://www.biomart.org/other/rc6_documentation.pdf what I am looking for? Thanks for your help. 

-Josh

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] Re: local server install

Arek Kasprzyk
Hi Josh,
I am glad it is working for you.

Re: The order of the final list. 
BioMart delegates query processing to its relational backend. The order of returned rows depends on the decisions made by the RDBMS query optimizer and therefore is not guaranteed to be the same as the input order

a


On 5 February 2014 00:58, <[hidden email]> wrote:
Well- I took a leap of faith after a batch of 100 and jumped to a batch of 10,000. Success! I think this only took minutes. So I'm really impressed. I do have one follow up question though. The order of the final list (gene-ID, transcript-ID and 3' UTR) is not in the same order as they were in the batch query. One of the first 10 on the list is not around 14,000. Does the system return these in the order it is able to cross reference them? Thanks for your help. 

-Josh

On Monday, February 3, 2014 1:12:01 PM UTC-7, [hidden email] wrote:

Good morning, I’ve been using R and the bioconductor bioMart package to make queries of long gene-ID lists for 3’ UTRs using getBM(). Until recently this has worked fine. Unfortunately during the past week the response time per query has been around 15-30s. Here is the setup:

 

library("biomaRt")

> ensembl <- useMart(biomart = "ENSEMBL_MART_ENSEMBL",

+                    host = "www.ensembl.org",

+                    path = "/biomart/martservice",

+                    dataset = "mmusculus_gene_ensembl")

> attrib = c("ensembl_gene_id", "ensembl_transcript_id", "3utr")

> filts = "ensembl_gene_id"

 

And the response:

 

system.time(getBM(attrib, filters = filts, 
+                              values = "ENSMUSG00000066475",  mart = ensembl))
   user  system elapsed 
   0.03    0.00   29.89 

 

Our network connection seems fine and I think the bottleneck is on the database end. I’m therefore interested in installing bioMart locally. Again, my goal here is to run the same command but to access a database locally in the interest of time. Is bioMart 0.8 and the instructions here:  http://www.biomart.org/other/rc6_documentation.pdf what I am looking for? Thanks for your help. 

-Josh

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.



--


"In prosperity, our friends know us; in adversity, we know our friends"

― John Churton Collins 



--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/groups/opt_out.