[biomart-users] BioMart Does not return complete result set

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[biomart-users] BioMart Does not return complete result set

Ashok Ragavendran
Hi,
   I have been using biomaRt to annotate a set of gene ids and came across this strange behaviour and any insight will be appreciated. 

I tried to get the chromosome, start, end for about 10000 genes from an Ensembl gtf. file ( an RNASeq experiment). 

When i query biomaRt for the information using Ensemble Ids i get only 9977 results back. However if i run a subsequent query with the 23 gene ids i get all the results. 

This seems to be independent of the columns that might have data as i ran this query retreiving only the Ensembl gene ids and I got the same result.

 Is this a bug or  am i missing something?? Any help will be much appreciated.
     Cheers
     Ashok

Query the original 10000 genes
 ensembl <- useMart(biomart="ENSEMBL_MART_ENSEMBL",host="uswest.ensembl.org","mmusculus_gene_ensembl")
 getAttrs <- listAttributes(ensembl)[c(1,6,7,8, 9,10),1]

>     getAttrs
[1] "ensembl_gene_id" "chromosome_name" "start_position"  "end_position"   
[5] "strand"          "band"   

 geneAnno <- getBM(filters="ensembl_gene_id",values=as.character(inDat$gene[1:10000]),
                      attributes=getAttrs, ensembl, uniqueRows=T,checkFilters=F)

>     dim(geneAnno)
[1] 9977    6



Query the 23 genes that were Missing

>     nonGenes <- inDat$gene[1:10000]
>     nonGenes <- nonGenes[-which(geneAnno$ensembl_gene_id %in% inDat$gene[1:10000])]
>     nonGenes
 [1] "ENSMUSG00000045665" "ENSMUSG00000045671" "ENSMUSG00000045672"
 [4] "ENSMUSG00000045679" "ENSMUSG00000045689" "ENSMUSG00000045690"
 [7] "ENSMUSG00000045691" "ENSMUSG00000045694" "ENSMUSG00000045725"
[10] "ENSMUSG00000045733" "ENSMUSG00000045744" "ENSMUSG00000045746"
[13] "ENSMUSG00000045751" "ENSMUSG00000045752" "ENSMUSG00000045761"
[16] "ENSMUSG00000045763" "ENSMUSG00000045767" "ENSMUSG00000045777"
[19] "ENSMUSG00000045790" "ENSMUSG00000045795" "ENSMUSG00000045799"
[22] "ENSMUSG00000045817" "ENSMUSG00000045822"
>     geneAnno <- getBM(filters="ensembl_gene_id",values=as.character(nonGenes),
+                       attributes=getAttrs, ensembl, uniqueRows=F,checkFilters=F)
>     geneAnno
      ensembl_gene_id chromosome_name start_position end_position strand band
1  ENSMUSG00000045665              15      102279456    102281744      1   F3
2  ENSMUSG00000045671              11       19924375     20024026      1 A3.1
3  ENSMUSG00000045672               4       63214004     63334991      1   B3
4  ENSMUSG00000045679              12       16988648     17000408     -1 A1.1
5  ENSMUSG00000045689              18       37307455     37311172      1   B3
6  ENSMUSG00000045690              12       75630596     75669537     -1   C3
7  ENSMUSG00000045691              14       55053884     55098986      1   C3
8  ENSMUSG00000045694               X      155150952    155151311     -1   F3
9  ENSMUSG00000045725               6       54327012     54330200      1   B3
10 ENSMUSG00000045733               7      140150628    140154877     -1   F4
11 ENSMUSG00000045744              17       24473884     24475469      1 A3.3
12 ENSMUSG00000045746               8       60983437     60985908      1 B3.1
13 ENSMUSG00000045751               4       24496451     24602950      1   A3
14 ENSMUSG00000045752               7      143069249    143071093      1   F5
15 ENSMUSG00000045761              17       71673261     71729669      1 E1.3
16 ENSMUSG00000045763              15       25363285     25413764     -1   B1
17 ENSMUSG00000045767              13       55693124     55703499      1   B1
18 ENSMUSG00000045777               7      142325837    142373753     -1   F5
19 ENSMUSG00000045790               5       52374651     52471521     -1   C1
20 ENSMUSG00000045795               7       81571292     81596836      1   D3
21 ENSMUSG00000045799              14        8799589      8800504     -1   A1
22 ENSMUSG00000045817              17       84183931     84187947     -1   E4
23 ENSMUSG00000045822               2      164805098    164822130      1   H3


--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [biomart-users] BioMart Does not return complete result set

Arek Kasprzyk
Hi Ashok,
did you check your original file for redundancy? i.e if some of the gene names appear twice?

a.

On 1 September 2015 at 22:04, Ashok Ragavendran <[hidden email]> wrote:
Hi,
   I have been using biomaRt to annotate a set of gene ids and came across this strange behaviour and any insight will be appreciated. 

I tried to get the chromosome, start, end for about 10000 genes from an Ensembl gtf. file ( an RNASeq experiment). 

When i query biomaRt for the information using Ensemble Ids i get only 9977 results back. However if i run a subsequent query with the 23 gene ids i get all the results. 

This seems to be independent of the columns that might have data as i ran this query retreiving only the Ensembl gene ids and I got the same result.

 Is this a bug or  am i missing something?? Any help will be much appreciated.
     Cheers
     Ashok

Query the original 10000 genes
 ensembl <- useMart(biomart="ENSEMBL_MART_ENSEMBL",host="uswest.ensembl.org","mmusculus_gene_ensembl")
 getAttrs <- listAttributes(ensembl)[c(1,6,7,8, 9,10),1]

>     getAttrs
[1] "ensembl_gene_id" "chromosome_name" "start_position"  "end_position"   
[5] "strand"          "band"   

 geneAnno <- getBM(filters="ensembl_gene_id",values=as.character(inDat$gene[1:10000]),
                      attributes=getAttrs, ensembl, uniqueRows=T,checkFilters=F)

>     dim(geneAnno)
[1] 9977    6



Query the 23 genes that were Missing

>     nonGenes <- inDat$gene[1:10000]
>     nonGenes <- nonGenes[-which(geneAnno$ensembl_gene_id %in% inDat$gene[1:10000])]
>     nonGenes
 [1] "ENSMUSG00000045665" "ENSMUSG00000045671" "ENSMUSG00000045672"
 [4] "ENSMUSG00000045679" "ENSMUSG00000045689" "ENSMUSG00000045690"
 [7] "ENSMUSG00000045691" "ENSMUSG00000045694" "ENSMUSG00000045725"
[10] "ENSMUSG00000045733" "ENSMUSG00000045744" "ENSMUSG00000045746"
[13] "ENSMUSG00000045751" "ENSMUSG00000045752" "ENSMUSG00000045761"
[16] "ENSMUSG00000045763" "ENSMUSG00000045767" "ENSMUSG00000045777"
[19] "ENSMUSG00000045790" "ENSMUSG00000045795" "ENSMUSG00000045799"
[22] "ENSMUSG00000045817" "ENSMUSG00000045822"
>     geneAnno <- getBM(filters="ensembl_gene_id",values=as.character(nonGenes),
+                       attributes=getAttrs, ensembl, uniqueRows=F,checkFilters=F)
>     geneAnno
      ensembl_gene_id chromosome_name start_position end_position strand band
1  ENSMUSG00000045665              15      102279456    102281744      1   F3
2  ENSMUSG00000045671              11       19924375     20024026      1 A3.1
3  ENSMUSG00000045672               4       63214004     63334991      1   B3
4  ENSMUSG00000045679              12       16988648     17000408     -1 A1.1
5  ENSMUSG00000045689              18       37307455     37311172      1   B3
6  ENSMUSG00000045690              12       75630596     75669537     -1   C3
7  ENSMUSG00000045691              14       55053884     55098986      1   C3
8  ENSMUSG00000045694               X      155150952    155151311     -1   F3
9  ENSMUSG00000045725               6       54327012     54330200      1   B3
10 ENSMUSG00000045733               7      140150628    140154877     -1   F4
11 ENSMUSG00000045744              17       24473884     24475469      1 A3.3
12 ENSMUSG00000045746               8       60983437     60985908      1 B3.1
13 ENSMUSG00000045751               4       24496451     24602950      1   A3
14 ENSMUSG00000045752               7      143069249    143071093      1   F5
15 ENSMUSG00000045761              17       71673261     71729669      1 E1.3
16 ENSMUSG00000045763              15       25363285     25413764     -1   B1
17 ENSMUSG00000045767              13       55693124     55703499      1   B1
18 ENSMUSG00000045777               7      142325837    142373753     -1   F5
19 ENSMUSG00000045790               5       52374651     52471521     -1   C1
20 ENSMUSG00000045795               7       81571292     81596836      1   D3
21 ENSMUSG00000045799              14        8799589      8800504     -1   A1
22 ENSMUSG00000045817              17       84183931     84187947     -1   E4
23 ENSMUSG00000045822               2      164805098    164822130      1   H3


--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.



--


“You have enemies? Good. 
That means you've stood up for something, sometime in your life.”

― Winston Churchill

--
You received this message because you are subscribed to the Google Groups "biomart-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Visit this group at http://groups.google.com/group/biomart-users.
For more options, visit https://groups.google.com/d/optout.