[BioMart Users] bug when gene ID contains an apostrophe

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[BioMart Users] bug when gene ID contains an apostrophe

Timothée Flutre
Hello,

I am using the R package "biomaRt" to find Ensembl IDs from a list of HGCN gene IDs:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt",lib="~/src/Rlibs/")
library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="IPO8", mart=mart.ens)
  ensembl_gene_id hgnc_symbol
1 ENSG00000133704        IPO8

It's working pretty well until a HGCN ID contains an apostrophe (here "2'-PDE" is an alias for the gene "PDE12", see here):

getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="2'-PDE", mart=mart.ens)
Error in getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "hgnc_symbol",  :
  Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 322, byte 322 at /usr/lib/perl5/XML/Parser.pm line 187

I know how to work around this for my own case, but would it be possible to fix this for a future release?

Best regards,
Tim


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] bug when gene ID contains an apostrophe

Marie Wong-Erasmus
hi Tim,

You might want to post this to the bioconductor mailing list to get them to handle single quotes in the value field.

Either way, aliases should just return an empty set.
Only hgnc symbols that are not synonyms will have an associated ENSG id.
So if you used IMP8 which is an alias for IPO8, you will get an empty set which is what should be returned if you used 2'-PDE

Marie

From: Timothée Flutre <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 15 Jul 2011 12:13:41 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] bug when gene ID contains an apostrophe

Hello,

I am using the R package "biomaRt" to find Ensembl IDs from a list of HGCN gene IDs:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt",lib="~/src/Rlibs/")
library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="IPO8", mart=mart.ens)
  ensembl_gene_id hgnc_symbol
1 ENSG00000133704        IPO8

It's working pretty well until a HGCN ID contains an apostrophe (here "2'-PDE" is an alias for the gene "PDE12", see here):

getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="2'-PDE", mart=mart.ens)
Error in getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "hgnc_symbol",  :
  Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 322, byte 322 at /usr/lib/perl5/XML/Parser.pm line 187

I know how to work around this for my own case, but would it be possible to fix this for a future release?

Best regards,
Tim


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] bug when gene ID contains an apostrophe

Steffen Durinck-2
The error is thrown on the BioMart side, I don't think it likes quotes in gene symbols.

In addition, if you want a mapping between all gene symbols and ensembl gene ids you could do this in one query in R:

library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
map = getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="with_hgnc", values=TRUE, mart=mart.ens)

head(map)

  ensembl_gene_id hgnc_symbol
1 ENSG00000249567       MIMT1
2 ENSG00000246493       SNHG8
3 ENSG00000187667     WHAMML1
4 ENSG00000248334     WHAMML2
5 ENSG00000225273    UBE2Q2P2
6 ENSG00000186615    C14orf33


Steffen
 

On Fri, Jul 15, 2011 at 10:45 AM, Marie Wong-Erasmus <[hidden email]> wrote:
hi Tim,

You might want to post this to the bioconductor mailing list to get them to handle single quotes in the value field.

Either way, aliases should just return an empty set.
Only hgnc symbols that are not synonyms will have an associated ENSG id.
So if you used IMP8 which is an alias for IPO8, you will get an empty set which is what should be returned if you used 2'-PDE

Marie

From: Timothée Flutre <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 15 Jul 2011 12:13:41 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] bug when gene ID contains an apostrophe

Hello,

I am using the R package "biomaRt" to find Ensembl IDs from a list of HGCN gene IDs:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt",lib="~/src/Rlibs/")
library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="IPO8", mart=mart.ens)
  ensembl_gene_id hgnc_symbol
1 ENSG00000133704        IPO8

It's working pretty well until a HGCN ID contains an apostrophe (here "2'-PDE" is an alias for the gene "PDE12", see here):

getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="2'-PDE", mart=mart.ens)
Error in getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "hgnc_symbol",  :
  Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 322, byte 322 at /usr/lib/perl5/XML/Parser.pm line 187

I know how to work around this for my own case, but would it be possible to fix this for a future release?

Best regards,
Tim


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users



_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] bug when gene ID contains an apostrophe

Junjun Zhang
Hi Steffen,

Yes, the error is thrown from the BioMart server side. But this is because the original query XML the server received is not well-formed (as the error message shows).

It might be possible the original query XML looks like this:
<Query>
	<Dataset name = 'hsapiens_gene_ensembl'>
		<Filter name = 'hgnc_symbol' value = '2'-PDE'/>
		<Attribute name = 'ensembl_gene_id' />
		<Attribute name = 'hgnc_symbol' />
	</Dataset>
</Query>
If that's the case, the solution is actually simple, we just need to escape the apostrophe using the string "&apos;"
<Query>
	<Dataset name = 'hsapiens_gene_ensembl'>
		<Filter name = 'hgnc_symbol' value = '2&apos;-PDE'/>
		<Attribute name = 'ensembl_gene_id' />
		<Attribute name = 'hgnc_symbol' />
	</Dataset>
</Query>
Do you think it's possible to escape special characters (eg, '<', '>', '&' etc) in query XML in biomaRt before it sends to the BioMart server.
Thanks,
Junjun

From: Steffen Durinck <[hidden email]>
Date: Fri, 15 Jul 2011 15:46:00 -0400
To: Marie Wong-Erasmus <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] bug when gene ID contains an apostrophe

The error is thrown on the BioMart side, I don't think it likes quotes in gene symbols.

In addition, if you want a mapping between all gene symbols and ensembl gene ids you could do this in one query in R:

library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
map = getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="with_hgnc", values=TRUE, mart=mart.ens)

head(map)

  ensembl_gene_id hgnc_symbol
1 ENSG00000249567       MIMT1
2 ENSG00000246493       SNHG8
3 ENSG00000187667     WHAMML1
4 ENSG00000248334     WHAMML2
5 ENSG00000225273    UBE2Q2P2
6 ENSG00000186615    C14orf33


Steffen
 

On Fri, Jul 15, 2011 at 10:45 AM, Marie Wong-Erasmus <[hidden email]> wrote:
hi Tim,

You might want to post this to the bioconductor mailing list to get them to handle single quotes in the value field.

Either way, aliases should just return an empty set.
Only hgnc symbols that are not synonyms will have an associated ENSG id.
So if you used IMP8 which is an alias for IPO8, you will get an empty set which is what should be returned if you used 2'-PDE

Marie

From: Timothée Flutre <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 15 Jul 2011 12:13:41 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] bug when gene ID contains an apostrophe

Hello,

I am using the R package "biomaRt" to find Ensembl IDs from a list of HGCN gene IDs:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt",lib="~/src/Rlibs/")
library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="IPO8", mart=mart.ens)
  ensembl_gene_id hgnc_symbol
1 ENSG00000133704        IPO8

It's working pretty well until a HGCN ID contains an apostrophe (here "2'-PDE" is an alias for the gene "PDE12", see here):

getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="2'-PDE", mart=mart.ens)
Error in getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "hgnc_symbol",  :
  Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 322, byte 322 at /usr/lib/perl5/XML/Parser.pm line 187

I know how to work around this for my own case, but would it be possible to fix this for a future release?

Best regards,
Tim


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users



_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] bug when gene ID contains an apostrophe

Steffen Durinck-2
Hi Junjun,

This should be possible, I'll look into it.

Cheers,
Steffen

On Thu, Aug 4, 2011 at 12:21 PM, Junjun Zhang <[hidden email]> wrote:
Hi Steffen,

Yes, the error is thrown from the BioMart server side. But this is because the original query XML the server received is not well-formed (as the error message shows).

It might be possible the original query XML looks like this:
<Query>
	<Dataset name = 'hsapiens_gene_ensembl'>
		<Filter name = 'hgnc_symbol' value = '2'-PDE'/>
		<Attribute name = 'ensembl_gene_id' />
		<Attribute name = 'hgnc_symbol' />
	</Dataset>
</Query>
If that's the case, the solution is actually simple, we just need to escape the apostrophe using the string "&apos;"
<Query>
	<Dataset name = 'hsapiens_gene_ensembl'>
		<Filter name = 'hgnc_symbol' value = '2&apos;-PDE'/>
		<Attribute name = 'ensembl_gene_id' />
		<Attribute name = 'hgnc_symbol' />
	</Dataset>
</Query>
Do you think it's possible to escape special characters (eg, '<', '>', '&' etc) in query XML in biomaRt before it sends to the BioMart server.
Thanks,
Junjun

From: Steffen Durinck <[hidden email]>
Date: Fri, 15 Jul 2011 15:46:00 -0400
To: Marie Wong-Erasmus <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] bug when gene ID contains an apostrophe

The error is thrown on the BioMart side, I don't think it likes quotes in gene symbols.

In addition, if you want a mapping between all gene symbols and ensembl gene ids you could do this in one query in R:

library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
map = getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="with_hgnc", values=TRUE, mart=mart.ens)

head(map)

  ensembl_gene_id hgnc_symbol
1 ENSG00000249567       MIMT1
2 ENSG00000246493       SNHG8
3 ENSG00000187667     WHAMML1
4 ENSG00000248334     WHAMML2
5 ENSG00000225273    UBE2Q2P2
6 ENSG00000186615    C14orf33


Steffen
 

On Fri, Jul 15, 2011 at 10:45 AM, Marie Wong-Erasmus <[hidden email]> wrote:
hi Tim,

You might want to post this to the bioconductor mailing list to get them to handle single quotes in the value field.

Either way, aliases should just return an empty set.
Only hgnc symbols that are not synonyms will have an associated ENSG id.
So if you used IMP8 which is an alias for IPO8, you will get an empty set which is what should be returned if you used 2'-PDE

Marie

From: Timothée Flutre <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 15 Jul 2011 12:13:41 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] bug when gene ID contains an apostrophe

Hello,

I am using the R package "biomaRt" to find Ensembl IDs from a list of HGCN gene IDs:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt",lib="~/src/Rlibs/")
library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="IPO8", mart=mart.ens)
  ensembl_gene_id hgnc_symbol
1 ENSG00000133704        IPO8

It's working pretty well until a HGCN ID contains an apostrophe (here "2'-PDE" is an alias for the gene "PDE12", see here):

getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="2'-PDE", mart=mart.ens)
Error in getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "hgnc_symbol",  :
  Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 322, byte 322 at /usr/lib/perl5/XML/Parser.pm line 187

I know how to work around this for my own case, but would it be possible to fix this for a future release?

Best regards,
Tim


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users




_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] bug when gene ID contains an apostrophe

Christian Pérez-Llamas
In reply to this post by Junjun Zhang
Hi Junjun,

I had a similar problem for values of a filter having commas ','

Example:

Values for the filter:
- carcinoma, nos
- adenocarcinoma

Query filter tag:

<Filter name="icdo_morphology" value="carcinoma, nos, adenocarcinoma" />

Resulting in:
- carcinoma
- nos
- adenocarcinoma

How to work around that ?

However, these problems would disappear if there was the following option:

<Filter name="icdo_morphology">
  <Value>carcinoma, nos</Value>
  <Value>adenocarcinoma</Value>
</Filter>

Do you thing it would be possible for the 0.8 API ?

You could maintain backward compatibility allowing both options, attribute and elements.

Best regards,
Christian Perez-Llamas

El 04/08/11 21:21, Junjun Zhang escribió:
Hi Steffen,

Yes, the error is thrown from the BioMart server side. But this is because the original query XML the server received is not well-formed (as the error message shows).

It might be possible the original query XML looks like this:
<Query>
	<Dataset name = 'hsapiens_gene_ensembl'>
		<Filter name = 'hgnc_symbol' value = '2'-PDE'/>
		<Attribute name = 'ensembl_gene_id' />
		<Attribute name = 'hgnc_symbol' />
	</Dataset>
</Query>
If that's the case, the solution is actually simple, we just need to escape the apostrophe using the string "&apos;"
<Query>
	<Dataset name = 'hsapiens_gene_ensembl'>
		<Filter name = 'hgnc_symbol' value = '2&apos;-PDE'/>
		<Attribute name = 'ensembl_gene_id' />
		<Attribute name = 'hgnc_symbol' />
	</Dataset>
</Query>
Do you think it's possible to escape special characters (eg, '<', '>', '&' etc) in query XML in biomaRt before it sends to the BioMart server.
Thanks,
Junjun

From: Steffen Durinck <[hidden email]>
Date: Fri, 15 Jul 2011 15:46:00 -0400
To: Marie Wong-Erasmus <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] bug when gene ID contains an apostrophe

The error is thrown on the BioMart side, I don't think it likes quotes in gene symbols.

In addition, if you want a mapping between all gene symbols and ensembl gene ids you could do this in one query in R:

library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
map = getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="with_hgnc", values=TRUE, mart=mart.ens)

head(map)

  ensembl_gene_id hgnc_symbol
1 ENSG00000249567       MIMT1
2 ENSG00000246493       SNHG8
3 ENSG00000187667     WHAMML1
4 ENSG00000248334     WHAMML2
5 ENSG00000225273    UBE2Q2P2
6 ENSG00000186615    C14orf33


Steffen
 

On Fri, Jul 15, 2011 at 10:45 AM, Marie Wong-Erasmus <[hidden email]> wrote:
hi Tim,

You might want to post this to the bioconductor mailing list to get them to handle single quotes in the value field.

Either way, aliases should just return an empty set.
Only hgnc symbols that are not synonyms will have an associated ENSG id.
So if you used IMP8 which is an alias for IPO8, you will get an empty set which is what should be returned if you used 2'-PDE

Marie

From: Timothée Flutre <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 15 Jul 2011 12:13:41 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] bug when gene ID contains an apostrophe

Hello,

I am using the R package "biomaRt" to find Ensembl IDs from a list of HGCN gene IDs:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt",lib="~/src/Rlibs/")
library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="IPO8", mart=mart.ens)
  ensembl_gene_id hgnc_symbol
1 ENSG00000133704        IPO8

It's working pretty well until a HGCN ID contains an apostrophe (here "2'-PDE" is an alias for the gene "PDE12", see here):

getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="2'-PDE", mart=mart.ens)
Error in getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "hgnc_symbol",  :
  Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 322, byte 322 at /usr/lib/perl5/XML/Parser.pm line 187

I know how to work around this for my own case, but would it be possible to fix this for a future release?

Best regards,
Tim


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users


_______________________________________________ Users mailing list [hidden email] https://lists.biomart.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] bug when gene ID contains an apostrophe

Junjun Zhang
Hi Christian,

Thanks for bringing this up. Yes, we are aware of this issue and are considering various solutions including the one you suggested.

Cheers,
Junjun


From: Christian Pérez-Llamas <[hidden email]>
Date: Fri, 5 Aug 2011 04:52:23 -0400
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] bug when gene ID contains an apostrophe

Hi Junjun,

I had a similar problem for values of a filter having commas ','

Example:

Values for the filter:
- carcinoma, nos
- adenocarcinoma

Query filter tag:

<Filter name="icdo_morphology" value="carcinoma, nos, adenocarcinoma" />

Resulting in:
- carcinoma
- nos
- adenocarcinoma

How to work around that ?

However, these problems would disappear if there was the following option:

<Filter name="icdo_morphology">
  <Value>carcinoma, nos</Value>
  <Value>adenocarcinoma</Value>
</Filter>

Do you thing it would be possible for the 0.8 API ?

You could maintain backward compatibility allowing both options, attribute and elements.

Best regards,
Christian Perez-Llamas

El 04/08/11 21:21, Junjun Zhang escribió:
Hi Steffen,

Yes, the error is thrown from the BioMart server side. But this is because the original query XML the server received is not well-formed (as the error message shows).

It might be possible the original query XML looks like this:
<Query>
	<Dataset name = 'hsapiens_gene_ensembl'>
		<Filter name = 'hgnc_symbol' value = '2'-PDE'/>
		<Attribute name = 'ensembl_gene_id' />
		<Attribute name = 'hgnc_symbol' />
	</Dataset>
</Query>
If that's the case, the solution is actually simple, we just need to escape the apostrophe using the string "&apos;"
<Query>
	<Dataset name = 'hsapiens_gene_ensembl'>
		<Filter name = 'hgnc_symbol' value = '2&apos;-PDE'/>
		<Attribute name = 'ensembl_gene_id' />
		<Attribute name = 'hgnc_symbol' />
	</Dataset>
</Query>
Do you think it's possible to escape special characters (eg, '<', '>', '&' etc) in query XML in biomaRt before it sends to the BioMart server.
Thanks,
Junjun

From: Steffen Durinck <[hidden email]>
Date: Fri, 15 Jul 2011 15:46:00 -0400
To: Marie Wong-Erasmus <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] bug when gene ID contains an apostrophe

The error is thrown on the BioMart side, I don't think it likes quotes in gene symbols.

In addition, if you want a mapping between all gene symbols and ensembl gene ids you could do this in one query in R:

library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
map = getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="with_hgnc", values=TRUE, mart=mart.ens)

head(map)

  ensembl_gene_id hgnc_symbol
1 ENSG00000249567       MIMT1
2 ENSG00000246493       SNHG8
3 ENSG00000187667     WHAMML1
4 ENSG00000248334     WHAMML2
5 ENSG00000225273    UBE2Q2P2
6 ENSG00000186615    C14orf33


Steffen
 

On Fri, Jul 15, 2011 at 10:45 AM, Marie Wong-Erasmus <[hidden email]> wrote:
hi Tim,

You might want to post this to the bioconductor mailing list to get them to handle single quotes in the value field.

Either way, aliases should just return an empty set.
Only hgnc symbols that are not synonyms will have an associated ENSG id.
So if you used IMP8 which is an alias for IPO8, you will get an empty set which is what should be returned if you used 2'-PDE

Marie

From: Timothée Flutre <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 15 Jul 2011 12:13:41 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] bug when gene ID contains an apostrophe

Hello,

I am using the R package "biomaRt" to find Ensembl IDs from a list of HGCN gene IDs:

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt",lib="~/src/Rlibs/")
library(biomaRt)
mart.ens <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="IPO8", mart=mart.ens)
  ensembl_gene_id hgnc_symbol
1 ENSG00000133704        IPO8

It's working pretty well until a HGCN ID contains an apostrophe (here "2'-PDE" is an alias for the gene "PDE12", see here):

getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters="hgnc_symbol", values="2'-PDE", mart=mart.ens)
Error in getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "hgnc_symbol",  :
  Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 322, byte 322 at /usr/lib/perl5/XML/Parser.pm line 187

I know how to work around this for my own case, but would it be possible to fix this for a future release?

Best regards,
Tim


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users


_______________________________________________ Users mailing list [hidden email]https://lists.biomart.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users