[BioMart Users] Bug or User error with filtering?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[BioMart Users] Bug or User error with filtering?

pip pipster
We are seeing strange things occur with the protein ID filter.  For example, transcript ENST00000345514 is being filtered out by the following search below.  However, you can see that it indeed has a Preotin ID shown here:  http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000127152;r=14:99635624-99737861;t=ENST00000345514 .  Any idea why this is being filtered?  Could this be a bug in Biomart/Data or User Error?

Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);

Thank you,
Phillipe

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Bug or User error with filtering?

Rhoda Kinsella
Hi Phillipe
You are filtering using the protein ID (Genbank protein accession) and as this Ensembl protein ID does not have a corresponding Genbank protein accession, you will not get this ENSP. Please filter using the Gene type filter and select protein_coding. That way you will get the ENSP data you require.
Regards
Rhoda


On 21 Aug 2011, at 22:54, pip pipster wrote:

We are seeing strange things occur with the protein ID filter.  For example, transcript ENST00000345514 is being filtered out by the following search below.  However, you can see that it indeed has a Preotin ID shown here:  http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000127152;r=14:99635624-99737861;t=ENST00000345514 .  Any idea why this is being filtered?  Could this be a bug in Biomart/Data or User Error?

Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);

Thank you,
Phillipe
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Bug or User error with filtering?

pip pipster
Rhoda,
Thank you for the feedback, very helpful.  The Gene Type filter, 'protein_coding' will likely work, however it doesn't allow me to do an 'exclude' type filter (i.e. give me everything except for the non protein-coding genes).  Do you know if you can still do an exclude using the method you described?

Thank you!
Phillipe


From: Rhoda Kinsella <[hidden email]>
To: pip pipster <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 5:04 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe
You are filtering using the protein ID (Genbank protein accession) and as this Ensembl protein ID does not have a corresponding Genbank protein accession, you will not get this ENSP. Please filter using the Gene type filter and select protein_coding. That way you will get the ENSP data you require.
Regards
Rhoda


On 21 Aug 2011, at 22:54, pip pipster wrote:

We are seeing strange things occur with the protein ID filter.  For example, transcript ENST00000345514 is being filtered out by the following search below.  However, you can see that it indeed has a Preotin ID shown here:  http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000127152;r=14:99635624-99737861;t=ENST00000345514 .  Any idea why this is being filtered?  Could this be a bug in Biomart/Data or User Error?

Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);

Thank you,
Phillipe
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.




_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Bug or User error with filtering?

pip pipster
After doing more investigation, something definitely isn't adding up.  As it turns out, filtering by Genbank protein accession is what we want and we need the ability to exclude.  The 2 transcripts below are examples (they show up as protein coding Genbank as well as Ensembl) but there are thousands more like this.  The filter below is taking them out despite them having a Genbank protein accession.  What may be causing this?

ENST00000169293
http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=ENST00000169293
http://www.ncbi.nlm.nih.gov/nuccore/D28593?
http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000127241;r=3:186964149-187009745;t=ENST00000169293

ENST00000345514
http://www.ncbi.nlm.nih.gov/gene?term=ENST00000345514
http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000127152;r=14:99635624-99737822;t=ENST00000345514


Filter used:
Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);


From: pip pipster <[hidden email]>
To: Rhoda Kinsella <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 8:07 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Rhoda,
Thank you for the feedback, very helpful.  The Gene Type filter, 'protein_coding' will likely work, however it doesn't allow me to do an 'exclude' type filter (i.e. give me everything except for the non protein-coding genes).  Do you know if you can still do an exclude using the method you described?

Thank you!
Phillipe


From: Rhoda Kinsella <[hidden email]>
To: pip pipster <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 5:04 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe
You are filtering using the protein ID (Genbank protein accession) and as this Ensembl protein ID does not have a corresponding Genbank protein accession, you will not get this ENSP. Please filter using the Gene type filter and select protein_coding. That way you will get the ENSP data you require.
Regards
Rhoda


On 21 Aug 2011, at 22:54, pip pipster wrote:

We are seeing strange things occur with the protein ID filter.  For example, transcript ENST00000345514 is being filtered out by the following search below.  However, you can see that it indeed has a Preotin ID shown here:  http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000127152;r=14:99635624-99737861;t=ENST00000345514 .  Any idea why this is being filtered?  Could this be a bug in Biomart/Data or User Error?

Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);

Thank you,
Phillipe
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.






_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Bug or User error with filtering?

Elena Rivkin
Hi Phillipe, 
Can you let me know, for these two transcripts, what are their Genbank protein accessions. I cant find them. 

Thank you. 

Elena Rivkin, PhD
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower

101 College Street, Suite 800

Toronto, Ontario, Canada M5G 0A3


Tel: 647-258-4316

Toll-free: 1-866-678-6427

www.oicr.on.ca


This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

 


From: pip pipster <[hidden email]>
Reply-To: pip pipster <[hidden email]>
Date: Mon, 22 Aug 2011 10:32:43 -0400
To: Rhoda Kinsella <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

After doing more investigation, something definitely isn't adding up.  As it turns out, filtering by Genbank protein accession is what we want and we need the ability to exclude.  The 2 transcripts below are examples (they show up as protein coding Genbank as well as Ensembl) but there are thousands more like this.  The filter below is taking them out despite them having a Genbank protein accession.  What may be causing this?



Filter used:
Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);


From: pip pipster <[hidden email]>
To: Rhoda Kinsella <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 8:07 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Rhoda,
Thank you for the feedback, very helpful.  The Gene Type filter, 'protein_coding' will likely work, however it doesn't allow me to do an 'exclude' type filter (i.e. give me everything except for the non protein-coding genes).  Do you know if you can still do an exclude using the method you described?

Thank you!
Phillipe


From: Rhoda Kinsella <[hidden email]>
To: pip pipster <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 5:04 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe
You are filtering using the protein ID (Genbank protein accession) and as this Ensembl protein ID does not have a corresponding Genbank protein accession, you will not get this ENSP. Please filter using the Gene type filter and select protein_coding. That way you will get the ENSP data you require.
Regards
Rhoda


On 21 Aug 2011, at 22:54, pip pipster wrote:

We are seeing strange things occur with the protein ID filter.  For example, transcript ENST00000345514 is being filtered out by the following search below.  However, you can see that it indeed has a Preotin ID shown here:  http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000127152;r=14:99635624-99737861;t=ENST00000345514 .  Any idea why this is being filtered?  Could this be a bug in Biomart/Data or User Error?

Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);

Thank you,
Phillipe
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.






_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Bug or User error with filtering?

Junjun Zhang
Hi Phillipe,

I am forwarding your questions to the Ensembl Helpdesk. Ensembl team is the best to answer questions about data contents in Ensembl databases.

Cheers,
Junjun


From: Elena Rivkin <[hidden email]>
Date: Mon, 22 Aug 2011 10:46:35 -0400
To: pip pipster <[hidden email]>, Rhoda Kinsella <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe, 
Can you let me know, for these two transcripts, what are their Genbank protein accessions. I cant find them. 

Thank you. 

Elena Rivkin, PhD
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower

101 College Street, Suite 800

Toronto, Ontario, Canada M5G 0A3


Tel: 647-258-4316

Toll-free: 1-866-678-6427

www.oicr.on.ca


This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

 


From: pip pipster <[hidden email]>
Reply-To: pip pipster <[hidden email]>
Date: Mon, 22 Aug 2011 10:32:43 -0400
To: Rhoda Kinsella <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

After doing more investigation, something definitely isn't adding up.  As it turns out, filtering by Genbank protein accession is what we want and we need the ability to exclude.  The 2 transcripts below are examples (they show up as protein coding Genbank as well as Ensembl) but there are thousands more like this.  The filter below is taking them out despite them having a Genbank protein accession.  What may be causing this?



Filter used:
Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);


From: pip pipster <[hidden email]>
To: Rhoda Kinsella <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 8:07 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Rhoda,
Thank you for the feedback, very helpful.  The Gene Type filter, 'protein_coding' will likely work, however it doesn't allow me to do an 'exclude' type filter (i.e. give me everything except for the non protein-coding genes).  Do you know if you can still do an exclude using the method you described?

Thank you!
Phillipe


From: Rhoda Kinsella <[hidden email]>
To: pip pipster <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 5:04 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe
You are filtering using the protein ID (Genbank protein accession) and as this Ensembl protein ID does not have a corresponding Genbank protein accession, you will not get this ENSP. Please filter using the Gene type filter and select protein_coding. That way you will get the ENSP data you require.
Regards
Rhoda


On 21 Aug 2011, at 22:54, pip pipster wrote:

We are seeing strange things occur with the protein ID filter.  For example, transcript ENST00000345514 is being filtered out by the following search below.  However, you can see that it indeed has a Preotin ID shown here:  http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000127152;r=14:99635624-99737861;t=ENST00000345514 .  Any idea why this is being filtered?  Could this be a bug in Biomart/Data or User Error?

Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);

Thank you,
Phillipe
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.






_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Bug or User error with filtering?

pip pipster
Thank you Junjun.

Elena, to answer your question, I believe the ncbi links in the below thread include a link to the protein where you can get the protein accession number.  For example, for the 2 transcripts below you will find links to the following proteins.  You will also see that the transcripts are correctly showing up on the URL's as being protein coding.

http://www.ncbi.nlm.nih.gov/protein/471128 (accession BAA05928)
and
http://www.ncbi.nlm.nih.gov/protein/11558488 (accession CAC17726)

Thank you,
Phillipe


From: Junjun Zhang <[hidden email]>
To: pip pipster <[hidden email]>; "[hidden email]" <[hidden email]>
Cc: Rhoda Kinsella via RT <[hidden email]>
Sent: Monday, August 22, 2011 12:59 PM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe,

I am forwarding your questions to the Ensembl Helpdesk. Ensembl team is the best to answer questions about data contents in Ensembl databases.

Cheers,
Junjun


From: Elena Rivkin <[hidden email]>
Date: Mon, 22 Aug 2011 10:46:35 -0400
To: pip pipster <[hidden email]>, Rhoda Kinsella <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe, 
Can you let me know, for these two transcripts, what are their Genbank protein accessions. I cant find them. 

Thank you. 
Elena Rivkin, PhD
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel: 647-258-4316
Toll-free: 1-866-678-6427
www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
 

From: pip pipster <[hidden email]>
Reply-To: pip pipster <[hidden email]>
Date: Mon, 22 Aug 2011 10:32:43 -0400
To: Rhoda Kinsella <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

After doing more investigation, something definitely isn't adding up.  As it turns out, filtering by Genbank protein accession is what we want and we need the ability to exclude.  The 2 transcripts below are examples (they show up as protein coding Genbank as well as Ensembl) but there are thousands more like this.  The filter below is taking them out despite them having a Genbank protein accession.  What may be causing this?



Filter used:
Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);


From: pip pipster <[hidden email]>
To: Rhoda Kinsella <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 8:07 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Rhoda,
Thank you for the feedback, very helpful.  The Gene Type filter, 'protein_coding' will likely work, however it doesn't allow me to do an 'exclude' type filter (i.e. give me everything except for the non protein-coding genes).  Do you know if you can still do an exclude using the method you described?

Thank you!
Phillipe


From: Rhoda Kinsella <[hidden email]>
To: pip pipster <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 5:04 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe
You are filtering using the protein ID (Genbank protein accession) and as this Ensembl protein ID does not have a corresponding Genbank protein accession, you will not get this ENSP. Please filter using the Gene type filter and select protein_coding. That way you will get the ENSP data you require.
Regards
Rhoda


On 21 Aug 2011, at 22:54, pip pipster wrote:

We are seeing strange things occur with the protein ID filter.  For example, transcript ENST00000345514 is being filtered out by the following search below.  However, you can see that it indeed has a Preotin ID shown here:  http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000127152;r=14:99635624-99737861;t=ENST00000345514 .  Any idea why this is being filtered?  Could this be a bug in Biomart/Data or User Error?

Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);

Thank you,
Phillipe
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.








_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Bug or User error with filtering?

Elena Rivkin
Hi Philliple, 
When entering Protein GeneBank ID: BAA05928, and retrieving Ensembl gene id and transcript id, I get the following:

When entering Protein GeneBank ID: CAC17726, and retrieving Ensembl gene id and transcript id, I get the following:
ENSG000000127152, ENST000000357195

It appears that in the Ensembl mart that you are querying, these GeneBank Ids coorespond to a different transcripts (although to the same gene ID).
Regards, 

Elena Rivkin, PhD
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower

101 College Street, Suite 800

Toronto, Ontario, Canada M5G 0A3


Tel: 647-258-4316

Toll-free: 1-866-678-6427

www.oicr.on.ca


This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

 


From: pip pipster <[hidden email]>
Reply-To: pip pipster <[hidden email]>
Date: Mon, 22 Aug 2011 13:51:56 -0400
To: Junjun Zhang <[hidden email]>, "[hidden email]" <[hidden email]>
Cc: Rhoda Kinsella via RT <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

Thank you Junjun.

Elena, to answer your question, I believe the ncbi links in the below thread include a link to the protein where you can get the protein accession number.  For example, for the 2 transcripts below you will find links to the following proteins.  You will also see that the transcripts are correctly showing up on the URL's as being protein coding.

and

Thank you,
Phillipe


From: Junjun Zhang <[hidden email]>
To: pip pipster <[hidden email]>; "[hidden email]" <[hidden email]>
Cc: Rhoda Kinsella via RT <[hidden email]>
Sent: Monday, August 22, 2011 12:59 PM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe,

I am forwarding your questions to the Ensembl Helpdesk. Ensembl team is the best to answer questions about data contents in Ensembl databases.

Cheers,
Junjun


From: Elena Rivkin <[hidden email]>
Date: Mon, 22 Aug 2011 10:46:35 -0400
To: pip pipster <[hidden email]>, Rhoda Kinsella <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe, 
Can you let me know, for these two transcripts, what are their Genbank protein accessions. I cant find them. 

Thank you. 
Elena Rivkin, PhD
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel: 647-258-4316
Toll-free: 1-866-678-6427
www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
 

From: pip pipster <[hidden email]>
Reply-To: pip pipster <[hidden email]>
Date: Mon, 22 Aug 2011 10:32:43 -0400
To: Rhoda Kinsella <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

After doing more investigation, something definitely isn't adding up.  As it turns out, filtering by Genbank protein accession is what we want and we need the ability to exclude.  The 2 transcripts below are examples (they show up as protein coding Genbank as well as Ensembl) but there are thousands more like this.  The filter below is taking them out despite them having a Genbank protein accession.  What may be causing this?



Filter used:
Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);


From: pip pipster <[hidden email]>
To: Rhoda Kinsella <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 8:07 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Rhoda,
Thank you for the feedback, very helpful.  The Gene Type filter, 'protein_coding' will likely work, however it doesn't allow me to do an 'exclude' type filter (i.e. give me everything except for the non protein-coding genes).  Do you know if you can still do an exclude using the method you described?

Thank you!
Phillipe


From: Rhoda Kinsella <[hidden email]>
To: pip pipster <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 5:04 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe
You are filtering using the protein ID (Genbank protein accession) and as this Ensembl protein ID does not have a corresponding Genbank protein accession, you will not get this ENSP. Please filter using the Gene type filter and select protein_coding. That way you will get the ENSP data you require.
Regards
Rhoda


On 21 Aug 2011, at 22:54, pip pipster wrote:

We are seeing strange things occur with the protein ID filter.  For example, transcript ENST00000345514 is being filtered out by the following search below.  However, you can see that it indeed has a Preotin ID shown here:  http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000127152;r=14:99635624-99737861;t=ENST00000345514 .  Any idea why this is being filtered?  Could this be a bug in Biomart/Data or User Error?

Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);

Thank you,
Phillipe
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.








_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Bug or User error with filtering?

pip pipster
Elena,
You should be able to follow this up the chain in getting accession numbers.

a.  From Transcript
http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=search&term=ENST00000169293

b.  To Gene (link to this Gene URL is located on Transcript link above)
http://www.ncbi.nlm.nih.gov/nuccore/D28593.1

c.  To Protein (link to this Protein URL is located on Gene link above)
http://www.ncbi.nlm.nih.gov/protein/471128

From this stand-point, I am led to believe that the Transcript maps to a Genbank protein accession and should not be filtered out with the $query->addFilter("with_protein_id", ["Only"]) filter.  But in either case I would like to understand why it's being filtered out since I have to trust the data I get back and deal with it accordingly.

Likewise, the following URL also appears to chain the Gene to the proper transcripts.
http://www.ebi.ac.uk/ena/data/view/D28593

It appears that for some reason the data in Emsembl is not mapping transcript ENST00000169293 (and many others in similar categories) to the proper Protein Accession.  But that's just my theory and would love to understand it better.  Thoughts?

Best regards,
Phillipe






From: Elena Rivkin <[hidden email]>
To: pip pipster <[hidden email]>; Junjun Zhang <[hidden email]>; "[hidden email]" <[hidden email]>
Cc: Rhoda Kinsella via RT <[hidden email]>
Sent: Monday, August 22, 2011 2:04 PM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Philliple, 
When entering Protein GeneBank ID: BAA05928, and retrieving Ensembl gene id and transcript id, I get the following:

When entering Protein GeneBank ID: CAC17726, and retrieving Ensembl gene id and transcript id, I get the following:
ENSG000000127152, ENST000000357195

It appears that in the Ensembl mart that you are querying, these GeneBank Ids coorespond to a different transcripts (although to the same gene ID).
Regards, 
Elena Rivkin, PhD
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel: 647-258-4316
Toll-free: 1-866-678-6427
www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
 

From: pip pipster <[hidden email]>
Reply-To: pip pipster <[hidden email]>
Date: Mon, 22 Aug 2011 13:51:56 -0400
To: Junjun Zhang <[hidden email]>, "[hidden email]" <[hidden email]>
Cc: Rhoda Kinsella via RT <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

Thank you Junjun.

Elena, to answer your question, I believe the ncbi links in the below thread include a link to the protein where you can get the protein accession number.  For example, for the 2 transcripts below you will find links to the following proteins.  You will also see that the transcripts are correctly showing up on the URL's as being protein coding.

and

Thank you,
Phillipe


From: Junjun Zhang <[hidden email]>
To: pip pipster <[hidden email]>; "[hidden email]" <[hidden email]>
Cc: Rhoda Kinsella via RT <[hidden email]>
Sent: Monday, August 22, 2011 12:59 PM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe,

I am forwarding your questions to the Ensembl Helpdesk. Ensembl team is the best to answer questions about data contents in Ensembl databases.

Cheers,
Junjun


From: Elena Rivkin <[hidden email]>
Date: Mon, 22 Aug 2011 10:46:35 -0400
To: pip pipster <[hidden email]>, Rhoda Kinsella <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe, 
Can you let me know, for these two transcripts, what are their Genbank protein accessions. I cant find them. 

Thank you. 
Elena Rivkin, PhD
Outreach and Training Coordinator, Informatics and Bio-computing

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel: 647-258-4316
Toll-free: 1-866-678-6427
www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
 

From: pip pipster <[hidden email]>
Reply-To: pip pipster <[hidden email]>
Date: Mon, 22 Aug 2011 10:32:43 -0400
To: Rhoda Kinsella <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] Bug or User error with filtering?

After doing more investigation, something definitely isn't adding up.  As it turns out, filtering by Genbank protein accession is what we want and we need the ability to exclude.  The 2 transcripts below are examples (they show up as protein coding Genbank as well as Ensembl) but there are thousands more like this.  The filter below is taking them out despite them having a Genbank protein accession.  What may be causing this?



Filter used:
Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);


From: pip pipster <[hidden email]>
To: Rhoda Kinsella <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 8:07 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Rhoda,
Thank you for the feedback, very helpful.  The Gene Type filter, 'protein_coding' will likely work, however it doesn't allow me to do an 'exclude' type filter (i.e. give me everything except for the non protein-coding genes).  Do you know if you can still do an exclude using the method you described?

Thank you!
Phillipe


From: Rhoda Kinsella <[hidden email]>
To: pip pipster <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Sent: Monday, August 22, 2011 5:04 AM
Subject: Re: [BioMart Users] Bug or User error with filtering?

Hi Phillipe
You are filtering using the protein ID (Genbank protein accession) and as this Ensembl protein ID does not have a corresponding Genbank protein accession, you will not get this ENSP. Please filter using the Gene type filter and select protein_coding. That way you will get the ENSP data you require.
Regards
Rhoda


On 21 Aug 2011, at 22:54, pip pipster wrote:

We are seeing strange things occur with the protein ID filter.  For example, transcript ENST00000345514 is being filtered out by the following search below.  However, you can see that it indeed has a Preotin ID shown here:  http://useast.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000127152;r=14:99635624-99737861;t=ENST00000345514 .  Any idea why this is being filtered?  Could this be a bug in Biomart/Data or User Error?

Manual (non-Perl)
    Homo sapiens genes (GRCh37.p3)
    Filters
        with protein ID(s): Only
    Attributes
        Ensembl Gene ID
        Ensembl Transcript ID

Same problem occurs using Perl filter as well
    $query->addFilter("with_protein_id", ["Only"]);

Thank you,
Phillipe
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.










_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users