[BioMart Users] Fwd: Returned mail: see transcript for details

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[BioMart Users] Fwd: Returned mail: see transcript for details

Henri-Jean GARCHON


-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


Hi,

I am not sure whether there is a mailing list to share my (lack of) experience!
However, I am contacting you because I found the following issue with queries for Entrez Gene IDs.
Actually, initially, this was using biomaRt Bioconductor package but it is clearly not an issue of biomaRt.
When I interrogate ENTREZ, duplicated items are returned: one with the correct Entrez ID, the others empty.

For example:
Input:
ENSMUSG00000026073
ENSMUSG00000026180
ENSMUSG00000053846
ENSMUSG00000035208
ENSMUSG00000069830
ENSMUSG00000078853
ENSMUSG00000020826
ENSMUSG00000030077
ENSMUSG00000050075
ENSMUSG00000070501

Normal output:
Ensembl Gene ID	Associated Gene Name	MGI symbol	Chromosome Name	Gene Start (bp)	Gene End (bp)
ENSMUSG00000020826	Nos2	Nos2	11	78734289	78773756
ENSMUSG00000026073	Il1r2	Il1r2	1	40141613	40182064
ENSMUSG00000026180	Cxcr2	Cxcr2	1	74200563	74207812
ENSMUSG00000030077	Chl1	Chl1	6	103460870	103699671
ENSMUSG00000035208	Slfn8	Slfn8	11	82815660	82834312
ENSMUSG00000050075	Gpr171	Gpr171	3	58900370	58905743
ENSMUSG00000053846	Lipg	Lipg	18	75098976	75120917
ENSMUSG00000069830	Nlrp1a	Nlrp1a	11	70904699	70958290
ENSMUSG00000070501	BC094916	BC094916	1	175449848	175466088
ENSMUSG00000078853	Igtp	Igtp	11	58013058	58021093
Output with Entrez Gene ID:
Ensembl Gene ID	Associated Gene Name	MGI symbol	Chromosome Name	Gene Start (bp)	Gene End (bp)	EntrezGene ID
ENSMUSG00000020826	Nos2	Nos2	11	78734289	78773756	18126
ENSMUSG00000026073	Il1r2	Il1r2	1	40141613	40182064	16178
ENSMUSG00000026073	Il1r2	Il1r2	1	40141613	40182064	
ENSMUSG00000026180	Cxcr2	Cxcr2	1	74200563	74207812	12765
ENSMUSG00000026180	Cxcr2	Cxcr2	1	74200563	74207812	12765
ENSMUSG00000030077	Chl1	Chl1	6	103460870	103699671	
ENSMUSG00000030077	Chl1	Chl1	6	103460870	103699671	12661
ENSMUSG00000030077	Chl1	Chl1	6	103460870	103699671	
ENSMUSG00000035208	Slfn8	Slfn8	11	82815660	82834312	276950
ENSMUSG00000035208	Slfn8	Slfn8	11	82815660	82834312	100505399
Ten items, but only 5 distinct,  are returned. So we are missing 5 items.
Of course the issue might be at the level of ENTREZ.
In anticipation, I thank you for your help.
Kind regards

Dr Henri-Jean Garchon

Henri-Jean Garchon, MD, PhD
Director of Research
Cochin Institute and Inserm U1016
Department of Immunology
Laboratory of Chronic Inflammation
Pavillon Hardy A 1st floor
27 rue du Faubourg Saint-Jacques
75014 Paris
France
phone +33 1 40 51 66 06
fax:+33 1 40 51 66 41
mail: [hidden email]


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Portion de message jointe (438 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Fwd: Returned mail: see transcript for details

Elena Rivkin
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Fwd: Returned mail: see transcript for details

Junjun Zhang
In reply to this post by Henri-Jean GARCHON
Re: [BioMart Users] Fwd: Returned mail: see transcript for details Dear Henri-Jean,

In order to receive the full query result, if you use the BioMart web query interface (martview), in the result page you need to choose “All” rows and select the “Unique results only” checkbox. Below is the result I got after doing so:

Ensembl Gene ID MGI ID MGI symbol EntrezGene ID
ENSMUSG00000020826 MGI:97361 Nos2 18126
ENSMUSG00000026073 MGI:96546 Il1r2 16178
ENSMUSG00000026073 MGI:96546 Il1r2 
ENSMUSG00000026180 MGI:105303 Cxcr2 12765
ENSMUSG00000030077 MGI:1098266 Chl1 
ENSMUSG00000030077 MGI:1098266 Chl1 12661
ENSMUSG00000035208 MGI:2672859 Slfn8 276950
ENSMUSG00000035208 MGI:2672859 Slfn8 100505399
ENSMUSG00000035208 MGI:2672859 Slfn8 
ENSMUSG00000050075 MGI:2442043 Gpr171 229323
ENSMUSG00000053846 MGI:1341803 Lipg 16891
ENSMUSG00000069830 MGI:2684861 Nlrp1a 195046
ENSMUSG00000069830 MGI:2684861 Nlrp1a 
ENSMUSG00000070501 MGI:3584522 BC094916 545384
ENSMUSG00000070501 MGI:3584522 BC094916 
ENSMUSG00000078853 MGI:107729 Igtp 16145
ENSMUSG00000078853 MGI:107729 Igtp 

The result does not miss any of your 10 input mouse genes, and all of them have at least one corresponding EntrezGeneID. In terms of rows with missing EntrezID, you can safely ignore them. The reason such duplicate happens has something to do with how data is organized internally.

If you use biomaRt, there should be a way to remove size limit of result to be returned so you do just see the first 10 rows.

Let us know if you still have any questions.

Best regards,

Junjun



On 11-06-24 5:00 AM, "Henri-Jean GARCHON" <henri-jean.garchon@...> wrote:


From: Henri-Jean GARCHON <henri-jean.garchon@...>
Reply-To: <henri-jean.garchon@...>
Date: Fri, 24 Jun 2011 04:48:24 -0400
To: <mart-dev@...>
Subject: biomaRt and ENTREZ issue

   Hi,
 
 I am not sure whether there is a mailing list to share my (lack of) experience!
 However, I am contacting you because I found the following issue with queries for Entrez Gene IDs.
 Actually, initially, this was using biomaRt Bioconductor package but it is clearly not an issue of biomaRt.
 When I interrogate ENTREZ, duplicated items are returned: one with the correct Entrez ID, the others empty.
 
 For example:
 Input:
 ENSMUSG00000026073
 ENSMUSG00000026180
 ENSMUSG00000053846
 ENSMUSG00000035208
 ENSMUSG00000069830
 ENSMUSG00000078853
 ENSMUSG00000020826
 ENSMUSG00000030077
 ENSMUSG00000050075
 ENSMUSG00000070501
 
 Normal output:
  
Ensembl Gene ID Associated Gene Name MGI symbol Chromosome Name Gene Start (bp) Gene End (bp)
ENSMUSG00000020826 Nos2 Nos2 11 78734289 78773756
ENSMUSG00000026073 Il1r2 Il1r2 1 40141613 40182064
ENSMUSG00000026180 Cxcr2 Cxcr2 1 74200563 74207812
ENSMUSG00000030077 Chl1 Chl1 6 103460870 103699671
ENSMUSG00000035208 Slfn8 Slfn8 11 82815660 82834312
ENSMUSG00000050075 Gpr171 Gpr171 3 58900370 58905743
ENSMUSG00000053846 Lipg Lipg 18 75098976 75120917
ENSMUSG00000069830 Nlrp1a Nlrp1a 11 70904699 70958290
ENSMUSG00000070501 BC094916 BC094916 1 175449848 175466088
ENSMUSG00000078853 Igtp Igtp 11 58013058 58021093
 Output with Entrez Gene ID:
 
Ensembl Gene ID Associated Gene Name MGI symbol Chromosome Name Gene Start (bp) Gene End (bp) EntrezGene ID
ENSMUSG00000020826 Nos2 Nos2 11 78734289 78773756 18126
ENSMUSG00000026073 Il1r2 Il1r2 1 40141613 40182064 16178
ENSMUSG00000026073 Il1r2 Il1r2 1 40141613 40182064 
ENSMUSG00000026180 Cxcr2 Cxcr2 1 74200563 74207812 12765
ENSMUSG00000026180 Cxcr2 Cxcr2 1 74200563 74207812 12765
ENSMUSG00000030077 Chl1 Chl1 6 103460870 103699671 
ENSMUSG00000030077 Chl1 Chl1 6 103460870 103699671 12661
ENSMUSG00000030077 Chl1 Chl1 6 103460870 103699671 
ENSMUSG00000035208 Slfn8 Slfn8 11 82815660 82834312 276950
ENSMUSG00000035208 Slfn8 Slfn8 11 82815660 82834312 100505399
 Ten items, but only 5 distinct,  are returned. So we are missing 5 items.
 Of course the issue might be at the level of ENTREZ.
 In anticipation, I thank you for your help.
 Kind regards
 
 Dr Henri-Jean Garchon
 
 Henri-Jean Garchon, MD, PhD
 Director of Research
 Cochin Institute and Inserm U1016
 Department of Immunology
 Laboratory of Chronic Inflammation
 Pavillon Hardy A 1st floor
 27 rue du Faubourg Saint-Jacques
 75014 Paris
 France
 phone +33 1 40 51 66 06
 fax:+33 1 40 51 66 41
 mail:
henri-jean.garchon@...

 


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Henri-Jean GARCHON
In reply to this post by Elena Rivkin
Dear Elena, Dear JunJun,

Many thanks to both of you for having taken the time to address my request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my issue was not very bright! Apologies.

I must say that things have changed substantially since then and look much better today.
The output file  generated after retrivieving "EntrezGene.ID" is a lot more consistent than a month ago. There are fewer duplicates (and actually no duplicate rows in the output table as there used to be), much fewer "NA" entries from Entrez (although I checked these null entries have an associated gene name). I guess these are issues with the Entrez database. Perhaps, what is most important: all input  Ensembl.Gene.ID are present in the outpout table.

My concern now is an issue with the retrieval of Ensembl.Transcript.ID, the default attributes of Biomart:

Actually I am working with a list of 22308 Ensembl gene ID mapped on the Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output  as a TSV file ("export all results to",
not checking "unique results only").
I  then go to R to check this output file.

The output table has 75966 row, of which 59458 are unique. In other words, 42950 rows are unique and 16508 are duplicated. Why some rows are duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved and are missing from the output table.  These are bona fide genes with regular associated gene names. If I upload the list of these missing guys, I now get the corresponding transcripts. All of them are retrieved and there are no duplicate rows!

In anticipation I thank you very much for your valuable help and comments

Best regards

Henri-Jean


Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Junjun Zhang
Dear Henri-Jean,

Thanks for sending the lists of IDs to us for testing. Based on Elena's test here, we can confirm the problem exists, however, we have not had a chance to look into it closely enough to figure out what exactly causes the problem. It very likely has something to do with BioMart 0.7's query batching (that I described a few days ago in another thread), which may result in missing/duplicating rows in the result. As I mentioned earlier, this will not happen in 0.8 where batching is implemented in a different way.

We will continue the investigation and get you back when we found something concrete.

Best regards,
Junjun


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thu, 4 Aug 2011 09:49:43 -0400
To: Elena Rivkin <[hidden email]>, jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Elena, Dear JunJun,

Many thanks to both of you for having taken the time to address my request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my issue was not very bright! Apologies.

I must say that things have changed substantially since then and look much better today.
The output file  generated after retrivieving "EntrezGene.ID" is a lot more consistent than a month ago. There are fewer duplicates (and actually no duplicate rows in the output table as there used to be), much fewer "NA" entries from Entrez (although I checked these null entries have an associated gene name). I guess these are issues with the Entrez database. Perhaps, what is most important: all input  Ensembl.Gene.ID are present in the outpout table.

My concern now is an issue with the retrieval of Ensembl.Transcript.ID, the default attributes of Biomart:

Actually I am working with a list of 22308 Ensembl gene ID mapped on the Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output  as a TSV file ("export all results to",
not checking "unique results only").
I  then go to R to check this output file.

The output table has 75966 row, of which 59458 are unique. In other words, 42950 rows are unique and 16508 are duplicated. Why some rows are duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved and are missing from the output table.  These are bona fide genes with regular associated gene names. If I upload the list of these missing guys, I now get the corresponding transcripts. All of them are retrieved and there are no duplicate rows!

In anticipation I thank you very much for your valuable help and comments

Best regards

Henri-Jean


Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Junjun Zhang
Dear Henri-Jean,

After executing the same query directly using SQL SELECT statement against the database and testing the same query on a BioMart 0.8 server. It is confirmed that they both do not have any problem of missing or duplicating results. So problem is caused by BioMart 0.7 query batching.

Please try this if you'd like to test your gene IDs:

Cheers,
Junjun


From: jzhang <[hidden email]>
Date: Fri, 5 Aug 2011 01:33:34 -0400
To: "[hidden email]" <[hidden email]>, Elena Rivkin <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Henri-Jean,

Thanks for sending the lists of IDs to us for testing. Based on Elena's test here, we can confirm the problem exists, however, we have not had a chance to look into it closely enough to figure out what exactly causes the problem. It very likely has something to do with BioMart 0.7's query batching (that I described a few days ago in another thread), which may result in missing/duplicating rows in the result. As I mentioned earlier, this will not happen in 0.8 where batching is implemented in a different way.

We will continue the investigation and get you back when we found something concrete.

Best regards,
Junjun


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thu, 4 Aug 2011 09:49:43 -0400
To: Elena Rivkin <[hidden email]>, jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Elena, Dear JunJun,

Many thanks to both of you for having taken the time to address my request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my issue was not very bright! Apologies.

I must say that things have changed substantially since then and look much better today.
The output file  generated after retrivieving "EntrezGene.ID" is a lot more consistent than a month ago. There are fewer duplicates (and actually no duplicate rows in the output table as there used to be), much fewer "NA" entries from Entrez (although I checked these null entries have an associated gene name). I guess these are issues with the Entrez database. Perhaps, what is most important: all input  Ensembl.Gene.ID are present in the outpout table.

My concern now is an issue with the retrieval of Ensembl.Transcript.ID, the default attributes of Biomart:

Actually I am working with a list of 22308 Ensembl gene ID mapped on the Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output  as a TSV file ("export all results to",
not checking "unique results only").
I  then go to R to check this output file.

The output table has 75966 row, of which 59458 are unique. In other words, 42950 rows are unique and 16508 are duplicated. Why some rows are duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved and are missing from the output table.  These are bona fide genes with regular associated gene names. If I upload the list of these missing guys, I now get the corresponding transcripts. All of them are retrieved and there are no duplicate rows!

In anticipation I thank you very much for your valuable help and comments

Best regards

Henri-Jean


Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Henri-Jean GARCHON
Dear Junjun,

Thanks very much for your prompt feedback.
I agree that it's not worth spending time fixing the issue if the 0.8 release is coming up soon.
I tried the link.
There is this issue in the Filters dialog box: when I want to click on any "upload file" button, the window jumps back to the top.
Otherwise, the interface looks really great.
Best wishes

Henri-Jean

Le 05/08/2011 22:45, Junjun Zhang a écrit :
Dear Henri-Jean,

After executing the same query directly using SQL SELECT statement against the database and testing the same query on a BioMart 0.8 server. It is confirmed that they both do not have any problem of missing or duplicating results. So problem is caused by BioMart 0.7 query batching.

Please try this if you'd like to test your gene IDs:

Cheers,
Junjun


From: jzhang <[hidden email]>
Date: Fri, 5 Aug 2011 01:33:34 -0400
To: "[hidden email]" <[hidden email]>, Elena Rivkin <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Henri-Jean,

Thanks for sending the lists of IDs to us for testing. Based on Elena's test here, we can confirm the problem exists, however, we have not had a chance to look into it closely enough to figure out what exactly causes the problem. It very likely has something to do with BioMart 0.7's query batching (that I described a few days ago in another thread), which may result in missing/duplicating rows in the result. As I mentioned earlier, this will not happen in 0.8 where batching is implemented in a different way.

We will continue the investigation and get you back when we found something concrete.

Best regards,
Junjun


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thu, 4 Aug 2011 09:49:43 -0400
To: Elena Rivkin <[hidden email]>, jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Elena, Dear JunJun,

Many thanks to both of you for having taken the time to address my request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my issue was not very bright! Apologies.

I must say that things have changed substantially since then and look much better today.
The output file  generated after retrivieving "EntrezGene.ID" is a lot more consistent than a month ago. There are fewer duplicates (and actually no duplicate rows in the output table as there used to be), much fewer "NA" entries from Entrez (although I checked these null entries have an associated gene name). I guess these are issues with the Entrez database. Perhaps, what is most important: all input  Ensembl.Gene.ID are present in the outpout table.

My concern now is an issue with the retrieval of Ensembl.Transcript.ID, the default attributes of Biomart:

Actually I am working with a list of 22308 Ensembl gene ID mapped on the Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output  as a TSV file ("export all results to",
not checking "unique results only").
I  then go to R to check this output file.

The output table has 75966 row, of which 59458 are unique. In other words, 42950 rows are unique and 16508 are duplicated. Why some rows are duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved and are missing from the output table.  These are bona fide genes with regular associated gene names. If I upload the list of these missing guys, I now get the corresponding transcripts. All of them are retrieved and there are no duplicate rows!

In anticipation I thank you very much for your valuable help and comments

Best regards

Henri-Jean


Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Henri-Jean GARCHON
In reply to this post by Junjun Zhang
Dear Junjun,

Following my previous mail, the issue was with the browser: I  was using Opera (my default).
With Internet Explorer, the "upload file" button works fine.

There are no duplicates in the output transcript list (93805 transcripts). So this is good.
The issue however is that all the Ensembl genes  (n = 36814) are retrieved.
Identical result if I upload a 5000 gene list.

Best wishes

Henri-Jean


Le 05/08/2011 22:45, Junjun Zhang a écrit :
Dear Henri-Jean,

After executing the same query directly using SQL SELECT statement against the database and testing the same query on a BioMart 0.8 server. It is confirmed that they both do not have any problem of missing or duplicating results. So problem is caused by BioMart 0.7 query batching.

Please try this if you'd like to test your gene IDs:

Cheers,
Junjun


From: jzhang <[hidden email]>
Date: Fri, 5 Aug 2011 01:33:34 -0400
To: "[hidden email]" <[hidden email]>, Elena Rivkin <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Henri-Jean,

Thanks for sending the lists of IDs to us for testing. Based on Elena's test here, we can confirm the problem exists, however, we have not had a chance to look into it closely enough to figure out what exactly causes the problem. It very likely has something to do with BioMart 0.7's query batching (that I described a few days ago in another thread), which may result in missing/duplicating rows in the result. As I mentioned earlier, this will not happen in 0.8 where batching is implemented in a different way.

We will continue the investigation and get you back when we found something concrete.

Best regards,
Junjun


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thu, 4 Aug 2011 09:49:43 -0400
To: Elena Rivkin <[hidden email]>, jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Elena, Dear JunJun,

Many thanks to both of you for having taken the time to address my request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my issue was not very bright! Apologies.

I must say that things have changed substantially since then and look much better today.
The output file  generated after retrivieving "EntrezGene.ID" is a lot more consistent than a month ago. There are fewer duplicates (and actually no duplicate rows in the output table as there used to be), much fewer "NA" entries from Entrez (although I checked these null entries have an associated gene name). I guess these are issues with the Entrez database. Perhaps, what is most important: all input  Ensembl.Gene.ID are present in the outpout table.

My concern now is an issue with the retrieval of Ensembl.Transcript.ID, the default attributes of Biomart:

Actually I am working with a list of 22308 Ensembl gene ID mapped on the Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output  as a TSV file ("export all results to",
not checking "unique results only").
I  then go to R to check this output file.

The output table has 75966 row, of which 59458 are unique. In other words, 42950 rows are unique and 16508 are duplicated. Why some rows are duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved and are missing from the output table.  These are bona fide genes with regular associated gene names. If I upload the list of these missing guys, I now get the corresponding transcripts. All of them are retrieved and there are no duplicate rows!

In anticipation I thank you very much for your valuable help and comments

Best regards

Henri-Jean


Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Elena Rivkin
Hi Henri-Jean, 

Do you get 93805 transcript from the same list of Ids that you sent us? Can you please describe how you get all the Ensembl genes  (n = 36814)?

In addition to the method described by Junjun, there is another quick way to get the transcript ID for a list of genes. In central.biomart.org, under Tools – select ID Converter. Select Mus musculus dataset, and upload your list of genes. Select Transcript ID in the TO box. Using this method I get 75,966 rows, as before. 

Best wishes, 

Elena Rivkin
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower

101 College Street, Suite 800

Toronto, Ontario, Canada M5G 0A3


Tel: 647-258-4316

Toll-free: 1-866-678-6427

www.oicr.on.ca


This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

 


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Sun, 7 Aug 2011 06:28:57 -0400
To: Junjun Zhang <[hidden email]>
Cc: Microsoft Office User <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Junjun,

Following my previous mail, the issue was with the browser: I  was using Opera (my default).
With Internet Explorer, the "upload file" button works fine.

There are no duplicates in the output transcript list (93805 transcripts). So this is good.
The issue however is that all the Ensembl genes  (n = 36814) are retrieved.
Identical result if I upload a 5000 gene list.

Best wishes

Henri-Jean


Le 05/08/2011 22:45, Junjun Zhang a écrit :
Dear Henri-Jean,

After executing the same query directly using SQL SELECT statement against the database and testing the same query on a BioMart 0.8 server. It is confirmed that they both do not have any problem of missing or duplicating results. So problem is caused by BioMart 0.7 query batching.

Please try this if you'd like to test your gene IDs:
<a moz-do-not-send="true" href="http://central.biomart.org/martwizard/#%21/Genome?mart=Ensembl&#43;Genes&#43;63&#43;%28WTSI%2C&#43;UK%29&amp;step=1&amp;datasets=mmusculus_gene_ensembl">http://central.biomart.org/martwizard/#!/Genome?mart=Ensembl+Genes+63+(WTSI%2C+UK)&step=1&datasets=mmusculus_gene_ensembl

Cheers,
Junjun


From: jzhang <[hidden email]>
Date: Fri, 5 Aug 2011 01:33:34 -0400
To: "[hidden email]" <[hidden email]>, Elena Rivkin <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Henri-Jean,

Thanks for sending the lists of IDs to us for testing. Based on Elena's test here, we can confirm the problem exists, however, we have not had a chance to look into it closely enough to figure out what exactly causes the problem. It very likely has something to do with BioMart 0.7's query batching (that I described a few days ago in another thread), which may result in missing/duplicating rows in the result. As I mentioned earlier, this will not happen in 0.8 where batching is implemented in a different way.

We will continue the investigation and get you back when we found something concrete.

Best regards,
Junjun


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thu, 4 Aug 2011 09:49:43 -0400
To: Elena Rivkin <[hidden email]>, jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Elena, Dear JunJun,

Many thanks to both of you for having taken the time to address my request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my issue was not very bright! Apologies.

I must say that things have changed substantially since then and look much better today.
The output file  generated after retrivieving "EntrezGene.ID" is a lot more consistent than a month ago. There are fewer duplicates (and actually no duplicate rows in the output table as there used to be), much fewer "NA" entries from Entrez (although I checked these null entries have an associated gene name). I guess these are issues with the Entrez database. Perhaps, what is most important: all input  Ensembl.Gene.ID are present in the outpout table.

My concern now is an issue with the retrieval of Ensembl.Transcript.ID, the default attributes of Biomart:

Actually I am working with a list of 22308 Ensembl gene ID mapped on the Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output  as a TSV file ("export all results to",
not checking "unique results only").
I  then go to R to check this output file.

The output table has 75966 row, of which 59458 are unique. In other words, 42950 rows are unique and 16508 are duplicated. Why some rows are duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved and are missing from the output table.  These are bona fide genes with regular associated gene names. If I upload the list of these missing guys, I now get the corresponding transcripts. All of them are retrieved and there are no duplicate rows!

In anticipation I thank you very much for your valuable help and comments

Best regards

Henri-Jean


Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Henri-Jean GARCHON
Hi Elena,

Do you get 93805 transcript from the same list of Ids that you sent us? Can you please describe how you get all the Ensembl genes  (n = 36814)?

This was partly a mistake of mine. As I mentioned yesterday to Junjun (I realize I didn't cc the Biomart Users list, sorry about it):

"I am speaking only of the 0.8 version Biomart server, using Internet Explorer (IE9).
I was puzzled by the fact that when I had uploaded the 22308 Ensembl Gene ID list I am working with, a list of 36814 genes ID and 93805 transcripts ID was returned, actually corresponding to the unfiltered database. Uploading a subset of 5000 gene IDs, the output was the same.
But I realized that in the "Filters" section, altough I had checked the "ID list limit" box, I had not selected "Ensembl Gene IDs" option in the popdown menu (that doesn't exist in the 0.7 version). So there was no filtering, even though I had uploaded my gene list, making complete sense.
Indeed, if I select "Ensembl Gene IDs"  in the popdown menu under the 'ID list limit" checkbox and l upload the list, I can now see the proper gene ID list appear in the "Filters" section of the summary panel on the left (nicely done!).
However:
my new problem is that I cannot switch to the next step to select attributes: the "Output" and "Next" buttons on the top are not working."

In addition to the method described by Junjun, there is another quick way to get the transcript ID for a list of genes. In central.biomart.org, under Tools – select ID Converter. Select Mus musculus dataset, and upload your list of genes. Select Transcript ID in the TO box. Using this method I get 75,966 rows, as before. 

I can't find this tool

Best regards

Henri-Jean

From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Sun, 7 Aug 2011 06:28:57 -0400
To: Junjun Zhang <[hidden email]>
Cc: Microsoft Office User <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Junjun,

Following my previous mail, the issue was with the browser: I  was using Opera (my default).
With Internet Explorer, the "upload file" button works fine.

There are no duplicates in the output transcript list (93805 transcripts). So this is good.
The issue however is that all the Ensembl genes  (n = 36814) are retrieved.
Identical result if I upload a 5000 gene list.

Best wishes

Henri-Jean


Le 05/08/2011 22:45, Junjun Zhang a écrit :
Dear Henri-Jean,

After executing the same query directly using SQL SELECT statement against the database and testing the same query on a BioMart 0.8 server. It is confirmed that they both do not have any problem of missing or duplicating results. So problem is caused by BioMart 0.7 query batching.

Please try this if you'd like to test your gene IDs:

Cheers,
Junjun


From: jzhang <[hidden email]>
Date: Fri, 5 Aug 2011 01:33:34 -0400
To: "[hidden email]" <[hidden email]>, Elena Rivkin <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Henri-Jean,

Thanks for sending the lists of IDs to us for testing. Based on Elena's test here, we can confirm the problem exists, however, we have not had a chance to look into it closely enough to figure out what exactly causes the problem. It very likely has something to do with BioMart 0.7's query batching (that I described a few days ago in another thread), which may result in missing/duplicating rows in the result. As I mentioned earlier, this will not happen in 0.8 where batching is implemented in a different way.

We will continue the investigation and get you back when we found something concrete.

Best regards,
Junjun


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thu, 4 Aug 2011 09:49:43 -0400
To: Elena Rivkin <[hidden email]>, jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Elena, Dear JunJun,

Many thanks to both of you for having taken the time to address my request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my issue was not very bright! Apologies.

I must say that things have changed substantially since then and look much better today.
The output file  generated after retrivieving "EntrezGene.ID" is a lot more consistent than a month ago. There are fewer duplicates (and actually no duplicate rows in the output table as there used to be), much fewer "NA" entries from Entrez (although I checked these null entries have an associated gene name). I guess these are issues with the Entrez database. Perhaps, what is most important: all input  Ensembl.Gene.ID are present in the outpout table.

My concern now is an issue with the retrieval of Ensembl.Transcript.ID, the default attributes of Biomart:

Actually I am working with a list of 22308 Ensembl gene ID mapped on the Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output  as a TSV file ("export all results to",
not checking "unique results only").
I  then go to R to check this output file.

The output table has 75966 row, of which 59458 are unique. In other words, 42950 rows are unique and 16508 are duplicated. Why some rows are duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved and are missing from the output table.  These are bona fide genes with regular associated gene names. If I upload the list of these missing guys, I now get the corresponding transcripts. All of them are retrieved and there are no duplicate rows!

In anticipation I thank you very much for your valuable help and comments

Best regards

Henri-Jean


Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Elena Rivkin
Hi Henri-Jean, 

  1. Regarding: " I cannot switch to the next step to select attributes: the "Output" and "Next" buttons on the top are not working." Can you please describe what is not working? What happens when you click on these buttons? 
  2. The ID Converter can be accessed here: http://central.biomart.org/converter/#!/ID_converter/gene_ensembl_config_2

Elena Rivkin
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower

101 College Street, Suite 800

Toronto, Ontario, Canada M5G 0A3


Tel: 647-258-4316

Toll-free: 1-866-678-6427

www.oicr.on.ca


This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

 


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Mon, 8 Aug 2011 10:44:35 -0400
To: Microsoft Office User <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, Junjun Zhang <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Hi Elena,

Do you get 93805 transcript from the same list of Ids that you sent us? Can you please describe how you get all the Ensembl genes  (n = 36814)?

This was partly a mistake of mine. As I mentioned yesterday to Junjun (I realize I didn't cc the Biomart Users list, sorry about it):

"I am speaking only of the 0.8 version Biomart server, using Internet Explorer (IE9).
I was puzzled by the fact that when I had uploaded the 22308 Ensembl Gene ID list I am working with, a list of 36814 genes ID and 93805 transcripts ID was returned, actually corresponding to the unfiltered database. Uploading a subset of 5000 gene IDs, the output was the same.
But I realized that in the "Filters" section, altough I had checked the "ID list limit" box, I had not selected "Ensembl Gene IDs" option in the popdown menu (that doesn't exist in the 0.7 version). So there was no filtering, even though I had uploaded my gene list, making complete sense.
Indeed, if I select "Ensembl Gene IDs"  in the popdown menu under the 'ID list limit" checkbox and l upload the list, I can now see the proper gene ID list appear in the "Filters" section of the summary panel on the left (nicely done!).
However:
my new problem is that I cannot switch to the next step to select attributes: the "Output" and "Next" buttons on the top are not working."

In addition to the method described by Junjun, there is another quick way to get the transcript ID for a list of genes. In central.biomart.org, under Tools – select ID Converter. Select Mus musculus dataset, and upload your list of genes. Select Transcript ID in the TO box. Using this method I get 75,966 rows, as before. 

I can't find this tool

Best regards

Henri-Jean

From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Sun, 7 Aug 2011 06:28:57 -0400
To: Junjun Zhang <[hidden email]>
Cc: Microsoft Office User <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Junjun,

Following my previous mail, the issue was with the browser: I  was using Opera (my default).
With Internet Explorer, the "upload file" button works fine.

There are no duplicates in the output transcript list (93805 transcripts). So this is good.
The issue however is that all the Ensembl genes  (n = 36814) are retrieved.
Identical result if I upload a 5000 gene list.

Best wishes

Henri-Jean


Le 05/08/2011 22:45, Junjun Zhang a écrit :
Dear Henri-Jean,

After executing the same query directly using SQL SELECT statement against the database and testing the same query on a BioMart 0.8 server. It is confirmed that they both do not have any problem of missing or duplicating results. So problem is caused by BioMart 0.7 query batching.

Please try this if you'd like to test your gene IDs:
<a moz-do-not-send="true" href="http://central.biomart.org/martwizard/#%21/Genome?mart=Ensembl&#43;Genes&#43;63&#43;%28WTSI%2C&#43;UK%29&amp;step=1&amp;datasets=mmusculus_gene_ensembl">http://central.biomart.org/martwizard/#!/Genome?mart=Ensembl+Genes+63+(WTSI%2C+UK)&step=1&datasets=mmusculus_gene_ensembl

Cheers,
Junjun


From: jzhang <[hidden email]>
Date: Fri, 5 Aug 2011 01:33:34 -0400
To: "[hidden email]" <[hidden email]>, Elena Rivkin <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Henri-Jean,

Thanks for sending the lists of IDs to us for testing. Based on Elena's test here, we can confirm the problem exists, however, we have not had a chance to look into it closely enough to figure out what exactly causes the problem. It very likely has something to do with BioMart 0.7's query batching (that I described a few days ago in another thread), which may result in missing/duplicating rows in the result. As I mentioned earlier, this will not happen in 0.8 where batching is implemented in a different way.

We will continue the investigation and get you back when we found something concrete.

Best regards,
Junjun


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thu, 4 Aug 2011 09:49:43 -0400
To: Elena Rivkin <[hidden email]>, jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Elena, Dear JunJun,

Many thanks to both of you for having taken the time to address my request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my issue was not very bright! Apologies.

I must say that things have changed substantially since then and look much better today.
The output file  generated after retrivieving "EntrezGene.ID" is a lot more consistent than a month ago. There are fewer duplicates (and actually no duplicate rows in the output table as there used to be), much fewer "NA" entries from Entrez (although I checked these null entries have an associated gene name). I guess these are issues with the Entrez database. Perhaps, what is most important: all input  Ensembl.Gene.ID are present in the outpout table.

My concern now is an issue with the retrieval of Ensembl.Transcript.ID, the default attributes of Biomart:

Actually I am working with a list of 22308 Ensembl gene ID mapped on the Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output  as a TSV file ("export all results to",
not checking "unique results only").
I  then go to R to check this output file.

The output table has 75966 row, of which 59458 are unique. In other words, 42950 rows are unique and 16508 are duplicated. Why some rows are duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved and are missing from the output table.  These are bona fide genes with regular associated gene names. If I upload the list of these missing guys, I now get the corresponding transcripts. All of them are retrieved and there are no duplicate rows!

In anticipation I thank you very much for your valuable help and comments

Best regards

Henri-Jean


Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Henri-Jean GARCHON
Hi Elena, 
  1. Regarding: " I cannot switch to the next step to select attributes: the "Output" and "Next" buttons on the top are not working." Can you please describe what is not working? What happens when you click on these buttons?
When I click on these buttons, nothing happens: I would expect to switch to another window/dialog box to select attributes.
This happens if I don't select the Ensembl Gene ID options in the popdown menu under the ID list limit checkbox (but then, the whole database is returned).
 

 

I just tried the link. I get "no data" returned. Here are the first few rows of the "Conversion results" window:

Ensembl Gene ID

Ensembl Transcript ID

ENSMUSG00000000001

no data

ENSMUSG00000000003

no data

ENSMUSG00000000028

no data

ENSMUSG00000000031

no data

ENSMUSG00000000037

no data

ENSMUSG00000000049

no data

ENSMUSG00000000056

no data

ENSMUSG00000000058

no data

ENSMUSG00000000078

no data


That is puzzling.
Best wishes

Henri-Jean


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Mon, 8 Aug 2011 10:44:35 -0400
To: Microsoft Office User <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, Junjun Zhang <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Hi Elena,

Do you get 93805 transcript from the same list of Ids that you sent us? Can you please describe how you get all the Ensembl genes  (n = 36814)?

This was partly a mistake of mine. As I mentioned yesterday to Junjun (I realize I didn't cc the Biomart Users list, sorry about it):

"I am speaking only of the 0.8 version Biomart server, using Internet Explorer (IE9).
I was puzzled by the fact that when I had uploaded the 22308 Ensembl Gene ID list I am working with, a list of 36814 genes ID and 93805 transcripts ID was returned, actually corresponding to the unfiltered database. Uploading a subset of 5000 gene IDs, the output was the same.
But I realized that in the "Filters" section, altough I had checked the "ID list limit" box, I had not selected "Ensembl Gene IDs" option in the popdown menu (that doesn't exist in the 0.7 version). So there was no filtering, even though I had uploaded my gene list, making complete sense.
Indeed, if I select "Ensembl Gene IDs"  in the popdown menu under the 'ID list limit" checkbox and l upload the list, I can now see the proper gene ID list appear in the "Filters" section of the summary panel on the left (nicely done!).
However:
my new problem is that I cannot switch to the next step to select attributes: the "Output" and "Next" buttons on the top are not working."

In addition to the method described by Junjun, there is another quick way to get the transcript ID for a list of genes. In central.biomart.org, under Tools – select ID Converter. Select Mus musculus dataset, and upload your list of genes. Select Transcript ID in the TO box. Using this method I get 75,966 rows, as before. 

I can't find this tool

Best regards

Henri-Jean

From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Sun, 7 Aug 2011 06:28:57 -0400
To: Junjun Zhang <[hidden email]>
Cc: Microsoft Office User <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Junjun,

Following my previous mail, the issue was with the browser: I  was using Opera (my default).
With Internet Explorer, the "upload file" button works fine.

There are no duplicates in the output transcript list (93805 transcripts). So this is good.
The issue however is that all the Ensembl genes  (n = 36814) are retrieved.
Identical result if I upload a 5000 gene list.

Best wishes

Henri-Jean


Le 05/08/2011 22:45, Junjun Zhang a écrit :
Dear Henri-Jean,

After executing the same query directly using SQL SELECT statement against the database and testing the same query on a BioMart 0.8 server. It is confirmed that they both do not have any problem of missing or duplicating results. So problem is caused by BioMart 0.7 query batching.

Please try this if you'd like to test your gene IDs:

Cheers,
Junjun


From: jzhang <[hidden email]>
Date: Fri, 5 Aug 2011 01:33:34 -0400
To: "[hidden email]" <[hidden email]>, Elena Rivkin <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Henri-Jean,

Thanks for sending the lists of IDs to us for testing. Based on Elena's test here, we can confirm the problem exists, however, we have not had a chance to look into it closely enough to figure out what exactly causes the problem. It very likely has something to do with BioMart 0.7's query batching (that I described a few days ago in another thread), which may result in missing/duplicating rows in the result. As I mentioned earlier, this will not happen in 0.8 where batching is implemented in a different way.

We will continue the investigation and get you back when we found something concrete.

Best regards,
Junjun


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thu, 4 Aug 2011 09:49:43 -0400
To: Elena Rivkin <[hidden email]>, jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Elena, Dear JunJun,

Many thanks to both of you for having taken the time to address my request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my issue was not very bright! Apologies.

I must say that things have changed substantially since then and look much better today.
The output file  generated after retrivieving "EntrezGene.ID" is a lot more consistent than a month ago. There are fewer duplicates (and actually no duplicate rows in the output table as there used to be), much fewer "NA" entries from Entrez (although I checked these null entries have an associated gene name). I guess these are issues with the Entrez database. Perhaps, what is most important: all input  Ensembl.Gene.ID are present in the outpout table.

My concern now is an issue with the retrieval of Ensembl.Transcript.ID, the default attributes of Biomart:

Actually I am working with a list of 22308 Ensembl gene ID mapped on the Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output  as a TSV file ("export all results to",
not checking "unique results only").
I  then go to R to check this output file.

The output table has 75966 row, of which 59458 are unique. In other words, 42950 rows are unique and 16508 are duplicated. Why some rows are duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved and are missing from the output table.  These are bona fide genes with regular associated gene names. If I upload the list of these missing guys, I now get the corresponding transcripts. All of them are retrieved and there are no duplicate rows!

In anticipation I thank you very much for your valuable help and comments

Best regards

Henri-Jean


Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Elena Rivkin
Hi Henri-Jean, 
  1. Thank you for the information. We will take a look at it. So it ONLY happens if you select don't select the Ensembl Gene ID options in the popdown menu under the ID list limit checkbox? 
  2. Is it possible that you did not  select Mummusculs dataset but used the default Homosapiens dataset instead? This is the output I got when in entered the gene list:

Elena Rivkin

Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower

101 College Street, Suite 800

Toronto, Ontario, Canada M5G 0A3


Tel: 647-258-4316

Toll-free: 1-866-678-6427

www.oicr.on.ca


This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

 


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Mon, 8 Aug 2011 12:15:48 -0400
To: Microsoft Office User <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Hi Elena, 
  1. Regarding: " I cannot switch to the next step to select attributes: the "Output" and "Next" buttons on the top are not working." Can you please describe what is not working? What happens when you click on these buttons?
When I click on these buttons, nothing happens: I would expect to switch to another window/dialog box to select attributes.
This happens if I don't select the Ensembl Gene ID options in the popdown menu under the ID list limit checkbox (but then, the whole database is returned).
 

 

I just tried the link. I get "no data" returned. Here are the first few rows of the "Conversion results" window:

Ensembl Gene ID

Ensembl Transcript ID

ENSMUSG00000000001

no data

ENSMUSG00000000003

no data

ENSMUSG00000000028

no data

ENSMUSG00000000031

no data

ENSMUSG00000000037

no data

ENSMUSG00000000049

no data

ENSMUSG00000000056

no data

ENSMUSG00000000058

no data

ENSMUSG00000000078

no data


That is puzzling.
Best wishes

Henri-Jean


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Mon, 8 Aug 2011 10:44:35 -0400
To: Microsoft Office User <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, Junjun Zhang <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Hi Elena,

Do you get 93805 transcript from the same list of Ids that you sent us? Can you please describe how you get all the Ensembl genes  (n = 36814)?

This was partly a mistake of mine. As I mentioned yesterday to Junjun (I realize I didn't cc the Biomart Users list, sorry about it):

"I am speaking only of the 0.8 version Biomart server, using Internet Explorer (IE9).
I was puzzled by the fact that when I had uploaded the 22308 Ensembl Gene ID list I am working with, a list of 36814 genes ID and 93805 transcripts ID was returned, actually corresponding to the unfiltered database. Uploading a subset of 5000 gene IDs, the output was the same.
But I realized that in the "Filters" section, altough I had checked the "ID list limit" box, I had not selected "Ensembl Gene IDs" option in the popdown menu (that doesn't exist in the 0.7 version). So there was no filtering, even though I had uploaded my gene list, making complete sense.
Indeed, if I select "Ensembl Gene IDs"  in the popdown menu under the 'ID list limit" checkbox and l upload the list, I can now see the proper gene ID list appear in the "Filters" section of the summary panel on the left (nicely done!).
However:
my new problem is that I cannot switch to the next step to select attributes: the "Output" and "Next" buttons on the top are not working."

In addition to the method described by Junjun, there is another quick way to get the transcript ID for a list of genes. In central.biomart.org, under Tools – select ID Converter. Select Mus musculus dataset, and upload your list of genes. Select Transcript ID in the TO box. Using this method I get 75,966 rows, as before. 

I can't find this tool

Best regards

Henri-Jean

From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Sun, 7 Aug 2011 06:28:57 -0400
To: Junjun Zhang <[hidden email]>
Cc: Microsoft Office User <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Junjun,

Following my previous mail, the issue was with the browser: I  was using Opera (my default).
With Internet Explorer, the "upload file" button works fine.

There are no duplicates in the output transcript list (93805 transcripts). So this is good.
The issue however is that all the Ensembl genes  (n = 36814) are retrieved.
Identical result if I upload a 5000 gene list.

Best wishes

Henri-Jean


Le 05/08/2011 22:45, Junjun Zhang a écrit :
Dear Henri-Jean,

After executing the same query directly using SQL SELECT statement against the database and testing the same query on a BioMart 0.8 server. It is confirmed that they both do not have any problem of missing or duplicating results. So problem is caused by BioMart 0.7 query batching.

Please try this if you'd like to test your gene IDs:
<a moz-do-not-send="true" href="http://central.biomart.org/martwizard/#%21/Genome?mart=Ensembl&#43;Genes&#43;63&#43;%28WTSI%2C&#43;UK%29&amp;step=1&amp;datasets=mmusculus_gene_ensembl">http://central.biomart.org/martwizard/#!/Genome?mart=Ensembl+Genes+63+(WTSI%2C+UK)&step=1&datasets=mmusculus_gene_ensembl

Cheers,
Junjun


From: jzhang <[hidden email]>
Date: Fri, 5 Aug 2011 01:33:34 -0400
To: "[hidden email]" <[hidden email]>, Elena Rivkin <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Henri-Jean,

Thanks for sending the lists of IDs to us for testing. Based on Elena's test here, we can confirm the problem exists, however, we have not had a chance to look into it closely enough to figure out what exactly causes the problem. It very likely has something to do with BioMart 0.7's query batching (that I described a few days ago in another thread), which may result in missing/duplicating rows in the result. As I mentioned earlier, this will not happen in 0.8 where batching is implemented in a different way.

We will continue the investigation and get you back when we found something concrete.

Best regards,
Junjun


From: Henri-Jean Garchon <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thu, 4 Aug 2011 09:49:43 -0400
To: Elena Rivkin <[hidden email]>, jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Dear Elena, Dear JunJun,

Many thanks to both of you for having taken the time to address my request a month ago.
I agree that using the viewer with limit set to 10 to illustrate my issue was not very bright! Apologies.

I must say that things have changed substantially since then and look much better today.
The output file  generated after retrivieving "EntrezGene.ID" is a lot more consistent than a month ago. There are fewer duplicates (and actually no duplicate rows in the output table as there used to be), much fewer "NA" entries from Entrez (although I checked these null entries have an associated gene name). I guess these are issues with the Entrez database. Perhaps, what is most important: all input  Ensembl.Gene.ID are present in the outpout table.

My concern now is an issue with the retrieval of Ensembl.Transcript.ID, the default attributes of Biomart:

Actually I am working with a list of 22308 Ensembl gene ID mapped on the Affymetrix Mouse Gene 1.0 ST microarray.
I uploaded this list on the Biomart.org website to filter my query.
The database is Ensembl build 63, the dataset is NCBIM37.
I retrieve the output  as a TSV file ("export all results to",
not checking "unique results only").
I  then go to R to check this output file.

The output table has 75966 row, of which 59458 are unique. In other words, 42950 rows are unique and 16508 are duplicated. Why some rows are duplicated and others not perhaps might be explained.
My main concern is that 6467 input Ensembl.Gene.IDs are not retrieved and are missing from the output table.  These are bona fide genes with regular associated gene names. If I upload the list of these missing guys, I now get the corresponding transcripts. All of them are retrieved and there are no duplicate rows!

In anticipation I thank you very much for your valuable help and comments

Best regards

Henri-Jean


Le 24/06/2011 15:24, Elena Rivkin a écrit :
Dr. Henri-Jean Garchon, 

The reason for only seeing a subset of EntrezGeneID is b/c only some transcripts do not have EntrezGene ID associated with them. If you select Ensembl Transcript ID as an attribute, you will se which transcripts correspond to which EntrezGene ID. 

For example. 
ENSMUSG00000026073 (Illr2)  - only one of transcripts (ENSMUST00000027243) has EntrezGene ID

And 

ENSMUSG00000035208 (Slfn8) - has two different EntrezGEneIDs, although only one transcript (ENSMUST00000038141). 

I hope it helps. 

Elena

From: Henri-Jean GARCHON <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Fri, 24 Jun 2011 05:00:03 -0400
To: "[hidden email]" <[hidden email]>
Subject: [BioMart Users] Fwd: Returned mail: see transcript for details



-------- Message original --------
Sujet: Returned mail: see transcript for details
Date : Fri, 24 Jun 2011 09:48:20 +0100
De : Mail Delivery Subsystem [hidden email]
Pour : [hidden email]


The original message was received at Fri, 24 Jun 2011 09:48:20 +0100
from mx1.ebi.ac.uk [193.62.197.214]

   ----- The following addresses had permanent fatal errors -----
[hidden email]
    (reason: 550 Host unknown)
    (expanded from: [hidden email])

   ----- Transcript of session follows -----
550 5.1.2 [hidden email]... Host unknown (Name server: biomart.org.redirect: host not found)


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Mouse EnsemblTranscriptID retrieval

Henri-Jean GARCHON
Hi Elena, 
  1. Thank you for the information. We will take a look at it. So it ONLY happens if you select don't select the Ensembl Gene ID options in the popdown menu under the ID list limit checkbox?
The block does happen if I select "Ensembl Gene ID" options in the popdown menu under the "ID list limit" checkbox.
It does not happen if I do not select this option (i have not tried to select sthing else). In which case, I can go on and select attributes but then the whole database is returned.

  1. Is it possible that you did not  select Mummusculs dataset but used the default Homosapiens dataset instead? This is the output I got when in entered the gene list:
My mistake. Apologies. It works perfectly! Exactly as you said, I had not selected the right dataset.
In addition, it works with Opera and the output looks actually prettier than with IE9!
Thank you
Best wishes

Henri-Jean

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users