[biomart-users] transcript length or coding length

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[biomart-users] transcript length or coding length

Quanwei Zhang
I get the transcript length for protein coding genes through bimart for Human.
The transcript length is the total length of mature RNA, right? And the UTR regions are covered by transcript?
 Is there a way to get the length of coding region(i.e., translated region)?
Thanks

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] transcript length or coding length

William Spooner
Hi,

There is a 'CDS length' attribute under the 'Structures' attribute section.

Best,

Will

On Thu, Jul 16, 2015 at 3:16 PM, Quanwei Zhang <[hidden email]> wrote:

> I get the transcript length for protein coding genes through bimart for
> Human.
> The transcript length is the total length of mature RNA, right? And the UTR
> regions are covered by transcript?
>  Is there a way to get the length of coding region(i.e., translated region)?
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "biomart-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> Visit this group at http://groups.google.com/group/biomart-users.
> For more options, visit https://groups.google.com/d/optout.



--
William Spooner
Chief Science Officer
M: +44 (0)7779663045 | T: @wspoonr | L: linkedin
Eagle Genomics Ltd
Disclaimer

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] transcript length or coding length

Quanwei Zhang


Thanks. I got the CDS length, but I have two questions.

(1)I found for some coding sequence, there is no annotation for UTRs, does it mean the UTRs are not known? And some coding sequence only have annotation for either 3'UTR or 5'UTR. Does it mean the other side UTR is not known?

(2)For some coding sequence I wonder why there is a huge difference between transcript length (exclude the UTRs) and coding length?

Take one hit for gene "HIST2H4A" as an example (see below) the 3UTR_length=149804678-149804561=117, 5UTR_length=149804248-149804221=27.

the transcript length is 955, so 955-117-27=811. Why the length of coding sequence is only 312?


Examples:

GeneName     CDS_Length 3' UTR _End 3' UTR_Start 5' UTR_End 5' UTR_Start Transcript_length Ensembl_Transcript_ID

CCDC163P       357                                   45965751 45965282      2229 ENST00000415578 1

CCDC163P       357                                                     2229 ENST00000415578

HIST2H4A   312   149804678  149804561  149804248  14980422   1 955   ENST00000369165



On Friday, July 17, 2015 at 4:32:01 AM UTC-4, whs wrote:
Hi,

There is a 'CDS length' attribute under the 'Structures' attribute section.

Best,

Will

On Thu, Jul 16, 2015 at 3:16 PM, Quanwei Zhang <<a href="javascript:" target="_blank" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">qwzha...@...> wrote:

> I get the transcript length for protein coding genes through bimart for
> Human.
> The transcript length is the total length of mature RNA, right? And the UTR
> regions are covered by transcript?
>  Is there a way to get the length of coding region(i.e., translated region)?
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "biomart-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to <a href="javascript:" target="_blank" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">biomart-user...@....
> Visit this group at <a href="http://groups.google.com/group/biomart-users" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://groups.google.com/group/biomart-users&#39;;return true;" onclick="this.href=&#39;http://groups.google.com/group/biomart-users&#39;;return true;">http://groups.google.com/group/biomart-users.
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.



--
William Spooner
Chief Science Officer
M: +44 (0)7779663045 | T: @wspoonr | L: linkedin
Eagle Genomics Ltd
Disclaimer

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] transcript length or coding length

William Spooner
On Fri, Jul 17, 2015 at 3:37 PM, Quanwei Zhang <[hidden email]> wrote:
>
>
> Thanks. I got the CDS length, but I have two questions.
>
> (1)I found for some coding sequence, there is no annotation for UTRs, does
> it mean the UTRs are not known? And some coding sequence only have
> annotation for either 3'UTR or 5'UTR. Does it mean the other side UTR is not
> known?

Correct



>
> (2)For some coding sequence I wonder why there is a huge difference between
> transcript length (exclude the UTRs) and coding length?
>
> Take one hit for gene "HIST2H4A" as an example (see below) the
> 3UTR_length=149804678-149804561=117, 5UTR_length=149804248-149804221=27.
>
> the transcript length is 955, so 955-117-27=811. Why the length of coding
> sequence is only 312?
>
>
> Examples:
>
> GeneName     CDS_Length 3' UTR _End 3' UTR_Start 5' UTR_End 5' UTR_Start
> Transcript_length Ensembl_Transcript_ID
>
> CCDC163P       357                                   45965751 45965282
> 2229 ENST00000415578 1
>
> CCDC163P       357                                                     2229
> ENST00000415578
>
> HIST2H4A   312   149804678  149804561  149804248  14980422   1 955
> ENST00000369165

Do the 5' UTR_Start and 3' UTR _End coordinates correspond with the
transcript start/end coordinates? Does not look like it to me. That
suggests there's a error with the UTR coordinates in the database.

http://feb2014.archive.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000183941;r=1:149804221-149806197;t=ENST00000369165

Best,

Will

>
>
>
> On Friday, July 17, 2015 at 4:32:01 AM UTC-4, whs wrote:
>>
>> Hi,
>>
>> There is a 'CDS length' attribute under the 'Structures' attribute
>> section.
>>
>> Best,
>>
>> Will
>>
>> On Thu, Jul 16, 2015 at 3:16 PM, Quanwei Zhang <[hidden email]> wrote:
>> > I get the transcript length for protein coding genes through bimart for
>> > Human.
>> > The transcript length is the total length of mature RNA, right? And the
>> > UTR
>> > regions are covered by transcript?
>> >  Is there a way to get the length of coding region(i.e., translated
>> > region)?
>> > Thanks
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "biomart-users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to [hidden email].
>> > Visit this group at http://groups.google.com/group/biomart-users.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> William Spooner
>> Chief Science Officer
>> M: +44 (0)7779663045 | T: @wspoonr | L: linkedin
>> Eagle Genomics Ltd
>> Disclaimer
>
> --
> You received this message because you are subscribed to the Google Groups
> "biomart-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> Visit this group at http://groups.google.com/group/biomart-users.
> For more options, visit https://groups.google.com/d/optout.



--
William Spooner
Chief Science Officer
M: +44 (0)7779663045 | T: @wspoonr | L: linkedin
Eagle Genomics Ltd
Disclaimer

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] transcript length or coding length

Thomas Maurel
Hello, 

On 17 Jul 2015, at 18:01, William Spooner <[hidden email]> wrote:

On Fri, Jul 17, 2015 at 3:37 PM, Quanwei Zhang <[hidden email]> wrote:


Thanks. I got the CDS length, but I have two questions.

(1)I found for some coding sequence, there is no annotation for UTRs, does
it mean the UTRs are not known? And some coding sequence only have
annotation for either 3'UTR or 5'UTR. Does it mean the other side UTR is not
known?

Correct




(2)For some coding sequence I wonder why there is a huge difference between
transcript length (exclude the UTRs) and coding length?

Take one hit for gene "HIST2H4A" as an example (see below) the
3UTR_length=149804678-149804561=117, 5UTR_length=149804248-149804221=27.

the transcript length is 955, so 955-117-27=811. Why the length of coding
sequence is only 312?


Examples:

GeneName     CDS_Length 3' UTR _End 3' UTR_Start 5' UTR_End 5' UTR_Start
Transcript_length Ensembl_Transcript_ID

CCDC163P       357                                   45965751 45965282
2229 ENST00000415578 1

CCDC163P       357                                                     2229
ENST00000415578

HIST2H4A   312   149804678  149804561  149804248  14980422   1 955
ENST00000369165

Do the 5' UTR_Start and 3' UTR _End coordinates correspond with the
transcript start/end coordinates? Does not look like it to me. That
suggests there's a error with the UTR coordinates in the database.

http://feb2014.archive.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000183941;r=1:149804221-149806197;t=ENST00000369165

Best,

Will

Yes, the 5’ UTR_Start and 3’ UTR _End coordinates correspond with the transcript start/end coordinates as you see below in bold. The line above was only displaying the values for the Transcript Exon 1: ENSE00002688671.

> GRCh37 = useEnsembl(biomart="ensembl",dataset="hsapiens_gene_ensembl",GRCh=37)
> transcript_info=getBM(attributes=c("ensembl_transcript_id","external_gene_name","transcript_start","transcript_end", "transcript_length","5_utr_start", "5_utr_end", "3_utr_start", "3_utr_end","cds_length","ensembl_exon_id"),filters=c('ensembl_transcript_id'),values="ENST00000369165",mart=GRCh37)
> transcript_info
  ensembl_transcript_id external_gene_name transcript_start transcript_end transcript_length 5_utr_start
1       ENST00000369165           HIST2H4A        149804221      149806197               955   149804221
2       ENST00000369165           HIST2H4A        149804221      149806197               955          NA
  5_utr_end 3_utr_start 3_utr_end cds_length ensembl_exon_id
1 149804248   149804561 149804678        312 ENSE00002688671
2        NA   149805701 149806197        312 ENSE00002715154

I believe the confusion here is coming from the mart “Transcript length” attribute which is actually displaying the full Transcript length including UTR and CDS (as displayed on the Ensembl website in the Transcript table):
NameTranscript IDbpProtein
HIST2H4A-004ENST00000610125716103aa
HIST2H4A-003ENST000003929391696103aa
HIST2H4A-001ENST00000369165955103aa
HIST2H4A-002ENST00000392938610103aa

The Transcript length in Ensembl is the sum of the Exon length:

transcript_info2=getBM(attributes=c("ensembl_transcript_id","external_gene_name","transcript_length","ensembl_exon_id","exon_chrom_start","exon_chrom_end","rank"),filters=c('ensembl_transcript_id'),values="ENST00000369165",mart=GRCh37)
> transcript_info2
  ensembl_transcript_id external_gene_name transcript_length ensembl_exon_id exon_chrom_start exon_chrom_end
1       ENST00000369165           HIST2H4A               955 ENSE00002688671        149804221      149804678
2       ENST00000369165           HIST2H4A               955 ENSE00002715154        149805701      149806197
  rank
1    1
2    2

Exon 1 length: 149804678-149804221+1=458
Exon 2 length: 149806197-149805701+1= 497

Transcript length= 458+497= 955

The Exon page of the Ensembl website with the line numbering “Relative to the coding sequence” turned on confirm that the cds length is 312 and that the Exon1 and Exon2 length are 458 and 497: (http://grch37.ensembl.org/Homo_sapiens/Share/148b9c0106c8adafc7543cb8e33aa175194447320).

Hope this helps,
Regards,
Thomas




On Friday, July 17, 2015 at 4:32:01 AM UTC-4, whs wrote:

Hi,

There is a 'CDS length' attribute under the 'Structures' attribute
section.

Best,

Will

On Thu, Jul 16, 2015 at 3:16 PM, Quanwei Zhang <qwzha...@gmail.com> wrote:
I get the transcript length for protein coding genes through bimart for
Human.
The transcript length is the total length of mature RNA, right? And the
UTR
regions are covered by transcript?
Is there a way to get the length of coding region(i.e., translated
region)?
Thanks

--
You received this message because you are subscribed to the Google
Groups
"biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to biomart-user...@googlegroups.com.
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--
William Spooner
Chief Science Officer
M: +44 (0)7779663045 | T: @wspoonr | L: linkedin
Eagle Genomics Ltd
Disclaimer

--
You received this message because you are subscribed to the Google Groups
"biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



-- 
William Spooner
Chief Science Officer
M: +44 (0)7779663045 | T: @wspoonr | L: linkedin
Eagle Genomics Ltd
Disclaimer

-- 
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.

--
Thomas Maurel
Bioinformatician - Ensembl Production Team
European Bioinformatics Institute (EMBL-EBI)
European Molecular Biology Laboratory
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
United Kingdom

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.