[BioMart Users] ENSG00000167380 biomart record question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[BioMart Users] ENSG00000167380 biomart record question

Yuan Hao
Dear Biomart team,

I used biomart in R/bioconductor to map ensembl gene ids to gene  
symbols. I found sometimes a single gene id associates with multiple  
gene symbols, for example ENSG00000167380, which associates with both  
'ZNF234' and 'ZNF226'.

By searching 'ZNF234' in Ensembl browser, I got 'ZNF226' (chr19:  
44,645,710 - 44,681,836) returned instead which seems that the later  
took over both. Does this mean gene record in this region has been  
updated, but not yet in R/Bioconductor? I looked into the UCSC browser  
as well, where still two separate records for each of the two gene  
symbols are kept: ZNF234 (chr19: 44,645,710 - 44,664,460), ZNF226  
(chr19:44,669,249 - 44,681,836). It would be very much appreciated if  
you could help to clarify on this. Thank you very much in advance!

Cheers,
Yuan

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] ENSG00000167380 biomart record question

Li, Yongjin
duplicate gene?
________________________________________
From: [hidden email] [[hidden email]] on behalf of Yuan Hao [[hidden email]]
Sent: Monday, November 28, 2011 6:04 AM
To: [hidden email]
Subject: [BioMart Users] ENSG00000167380 biomart record question

Dear Biomart team,

I used biomart in R/bioconductor to map ensembl gene ids to gene
symbols. I found sometimes a single gene id associates with multiple
gene symbols, for example ENSG00000167380, which associates with both
'ZNF234' and 'ZNF226'.

By searching 'ZNF234' in Ensembl browser, I got 'ZNF226' (chr19:
44,645,710 - 44,681,836) returned instead which seems that the later
took over both. Does this mean gene record in this region has been
updated, but not yet in R/Bioconductor? I looked into the UCSC browser
as well, where still two separate records for each of the two gene
symbols are kept: ZNF234 (chr19: 44,645,710 - 44,664,460), ZNF226
(chr19:44,669,249 - 44,681,836). It would be very much appreciated if
you could help to clarify on this. Thank you very much in advance!

Cheers,
Yuan

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] ENSG00000167380 biomart record question

Arek Kasprzyk
Hi Rhoda
do you have any more insight on what's going on there?

a

On Mon, Nov 28, 2011 at 1:28 PM, Li, Yongjin <[hidden email]> wrote:
duplicate gene?
________________________________________
From: [hidden email] [[hidden email]] on behalf of Yuan Hao [[hidden email]]
Sent: Monday, November 28, 2011 6:04 AM
To: [hidden email]
Subject: [BioMart Users] ENSG00000167380 biomart record question

Dear Biomart team,

I used biomart in R/bioconductor to map ensembl gene ids to gene
symbols. I found sometimes a single gene id associates with multiple
gene symbols, for example ENSG00000167380, which associates with both
'ZNF234' and 'ZNF226'.

By searching 'ZNF234' in Ensembl browser, I got 'ZNF226' (chr19:
44,645,710 - 44,681,836) returned instead which seems that the later
took over both. Does this mean gene record in this region has been
updated, but not yet in R/Bioconductor? I looked into the UCSC browser
as well, where still two separate records for each of the two gene
symbols are kept: ZNF234 (chr19: 44,645,710 - 44,664,460), ZNF226
(chr19:44,669,249 - 44,681,836). It would be very much appreciated if
you could help to clarify on this. Thank you very much in advance!

Cheers,
Yuan

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] ENSG00000167380 biomart record question

Rhoda Kinsella
Hi Yuan and Arek,
This is the feedback from our helpdesk team:

The Ensembl databases backing the genome browser web site and BioMart are always in sync. BioMart is rebuilt with every Ensembl release.

The underlying problem here is that human ZNF226 and ZNF239 have been accidentally merged on the basis of transcript ZNF226-206 (ENST00000536276). We require overlap of coding regions in the genome to merge transcripts into gene clusters. Since transcript ZNF226-206 (ENST00000536276) overlaps the coding region of ZNF226-204 (ENST00000426739), which really is a representative of the ZNF234 gene we have merged both genes under ZNF226.

http://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000167380;r=19:44645710-44681836

By selecting transcript ZNF226-204 (ENST00000426739) and following the "General Identifiers" link in the navigation column this transcript clearly maps to external UniProtKB/Swiss-Prot and NCBI RefSeq record for ZNF234.

http://www.ensembl.org/Homo_sapiens/Transcript/Similarity?db=core;g=ENSG00000167380;r=19:44645710-44681836;t=ENST00000426739

However, even on the Ensembl web site the merged gene is associated with both gene symbols. This can be seen by following the "External identifiers" link in the navigation column of the Gene page, wher this gene is associated with HGNC symbols ZNF226 and ZNF234.

http://www.ensembl.org/Homo_sapiens/Gene/Matches?g=ENSG00000167380;r=19:44645710-44681836

Now, we realise that this accidental gene merge is far from ideal and really confusing to our users. Transcript ZNF226-204 (ENST00000426739), which causes the merge has been annotated on the basis of UniProtKB/TrEMBL record O14859, which is only a very short fragment of a zinc finger protein (68 amino acid residues).

http://www.ensembl.org/Homo_sapiens/Transcript/SupportingEvidence?db=core;g=ENSG00000167380;r=19:44645710-44681836;t=ENST00000536276

http://www.uniprot.org/uniprot/O14859

Since it is based on weak supporting evidence we will delete this transcript for one of the upcoming releases and may also ask UniProtKB to remove this rather weak record.


Regards
Rhoda


On 29 Nov 2011, at 00:07, Arek Kasprzyk wrote:

Hi Rhoda
do you have any more insight on what's going on there?

a

On Mon, Nov 28, 2011 at 1:28 PM, Li, Yongjin <[hidden email]> wrote:
duplicate gene?
________________________________________
From: [hidden email] [[hidden email]] on behalf of Yuan Hao [[hidden email]]
Sent: Monday, November 28, 2011 6:04 AM
To: [hidden email]
Subject: [BioMart Users] ENSG00000167380 biomart record question

Dear Biomart team,

I used biomart in R/bioconductor to map ensembl gene ids to gene
symbols. I found sometimes a single gene id associates with multiple
gene symbols, for example ENSG00000167380, which associates with both
'ZNF234' and 'ZNF226'.

By searching 'ZNF234' in Ensembl browser, I got 'ZNF226' (chr19:
44,645,710 - 44,681,836) returned instead which seems that the later
took over both. Does this mean gene record in this region has been
updated, but not yet in R/Bioconductor? I looked into the UCSC browser
as well, where still two separate records for each of the two gene
symbols are kept: ZNF234 (chr19: 44,645,710 - 44,664,460), ZNF226
(chr19:44,669,249 - 44,681,836). It would be very much appreciated if
you could help to clarify on this. Thank you very much in advance!

Cheers,
Yuan

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users


Rhoda Kinsella Ph.D.
Ensembl Production Project Leader,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users