[BioMart Users] biomaRt returning multiple columns out of order

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[BioMart Users] biomaRt returning multiple columns out of order

Richard Hayes
Hi,

Our group maintains the biomart instance at the Phytozome plant genomics portal. We've had some users report problems with the result sets from the biomaRt interface. It is unclear if this is a biomaRt problem or a problem in our mart configuration. At the moment, we are still running biomart version 0.6, but are hoping to upgrade in the very near future to 0.7.

I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to R 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software releases.

I can successfully connect to our mart and the main genome transcript dataset as follows, successfully retrieving a single column of transcript names for Arabidopsis thaliana using our internal "orgid" filter for organism ID 167:

> library('biomaRt')
> phyto=useMart('phytozome_mart', dataset='phytozome')
> transcripts = getBM(attributes = c("transcript_name"), filters= "orgid", values="167", mart=phyto)
> transcripts[1:5,]
[1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1"

However, when I construct a multicolumn query, the columns are not returned in the expected order:

> multiTest = getBM(attributes= c("organism_name", "transcript_name", "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167", mart=phyto)
> multiTest[1:5,]
  organism_name transcript_name exon_chrom_start exon_chrom_end
1   AT5G47220.1        19171862         19172823      Athaliana
2   AT1G71920.3        27067059         27067098      Athaliana
3   AT1G71920.3        27067189         27067401      Athaliana
4   AT1G71920.3        27067506         27067589      Athaliana
5   AT1G71920.3        27067706         27067860      Athaliana

Any help diagnosing the source of this problem is much appreciated.

Best regards,

--
Richard D. Hayes, Ph.D.
Joint Genome Institute / Lawrence Berkeley National Lab
http://www.phytozome.net

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] biomaRt returning multiple columns out of order

Arek Kasprzyk
Hi Richard,
the best person to help you is Steffen Durinck, the original biomaRt coder (cc'ed on this email)

a

On Wed, Sep 28, 2011 at 3:52 PM, Richard Hayes <[hidden email]> wrote:
Hi,

Our group maintains the biomart instance at the Phytozome plant genomics portal. We've had some users report problems with the result sets from the biomaRt interface. It is unclear if this is a biomaRt problem or a problem in our mart configuration. At the moment, we are still running biomart version 0.6, but are hoping to upgrade in the very near future to 0.7.

I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to R 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software releases.

I can successfully connect to our mart and the main genome transcript dataset as follows, successfully retrieving a single column of transcript names for Arabidopsis thaliana using our internal "orgid" filter for organism ID 167:

> library('biomaRt')
> phyto=useMart('phytozome_mart', dataset='phytozome')
> transcripts = getBM(attributes = c("transcript_name"), filters= "orgid", values="167", mart=phyto)
> transcripts[1:5,]
[1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1"

However, when I construct a multicolumn query, the columns are not returned in the expected order:

> multiTest = getBM(attributes= c("organism_name", "transcript_name", "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167", mart=phyto)
> multiTest[1:5,]
  organism_name transcript_name exon_chrom_start exon_chrom_end
1   AT5G47220.1        19171862         19172823      Athaliana
2   AT1G71920.3        27067059         27067098      Athaliana
3   AT1G71920.3        27067189         27067401      Athaliana
4   AT1G71920.3        27067506         27067589      Athaliana
5   AT1G71920.3        27067706         27067860      Athaliana

Any help diagnosing the source of this problem is much appreciated.

Best regards,

--
Richard D. Hayes, Ph.D.
Joint Genome Institute / Lawrence Berkeley National Lab
http://www.phytozome.net

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users



_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] biomaRt returning multiple columns out of order

Steffen Durinck-2
Hi RIchard, Arek,

If you set verbose=TRUE in your getBM query you'll see the XML query that is send to the BioMart server (see below for your example).
The order of the attributes in the XML query is usually the same order we get the results back from the BioMart server.
However for your example this is not the case and there is no way for biomaRt to know this (Arek correct me if this is not the case), so when we add column names to the returned matrix they will be wrong when the query order is not preserved in the returned result.

> multiTest = getBM(attributes= c("organism_name", "transcript_name", "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167", mart=phyto,verbose=TRUE)

<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query  virtualSchemaName = 'default' uniqueRows = '1' count = '0' datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name = 'phytozome'><Attribute name = 'organism_name'/><Attribute name = 'transcript_name'/><Attribute name = 'exon_chrom_start'/><Attribute name = 'exon_chrom_end'/><Filter name = 'orgid' value = '167' /></Dataset></Query>


Cheers,
Steffen

On Thu, Sep 29, 2011 at 9:08 AM, Arek Kasprzyk <[hidden email]> wrote:
Hi Richard,
the best person to help you is Steffen Durinck, the original biomaRt coder (cc'ed on this email)

a

On Wed, Sep 28, 2011 at 3:52 PM, Richard Hayes <[hidden email]> wrote:
Hi,

Our group maintains the biomart instance at the Phytozome plant genomics portal. We've had some users report problems with the result sets from the biomaRt interface. It is unclear if this is a biomaRt problem or a problem in our mart configuration. At the moment, we are still running biomart version 0.6, but are hoping to upgrade in the very near future to 0.7.

I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to R 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software releases.

I can successfully connect to our mart and the main genome transcript dataset as follows, successfully retrieving a single column of transcript names for Arabidopsis thaliana using our internal "orgid" filter for organism ID 167:

> library('biomaRt')
> phyto=useMart('phytozome_mart', dataset='phytozome')
> transcripts = getBM(attributes = c("transcript_name"), filters= "orgid", values="167", mart=phyto)
> transcripts[1:5,]
[1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1"

However, when I construct a multicolumn query, the columns are not returned in the expected order:

> multiTest = getBM(attributes= c("organism_name", "transcript_name", "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167", mart=phyto)
> multiTest[1:5,]
  organism_name transcript_name exon_chrom_start exon_chrom_end
1   AT5G47220.1        19171862         19172823      Athaliana
2   AT1G71920.3        27067059         27067098      Athaliana
3   AT1G71920.3        27067189         27067401      Athaliana
4   AT1G71920.3        27067506         27067589      Athaliana
5   AT1G71920.3        27067706         27067860      Athaliana

Any help diagnosing the source of this problem is much appreciated.

Best regards,

--
Richard D. Hayes, Ph.D.
Joint Genome Institute / Lawrence Berkeley National Lab
http://www.phytozome.net

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users




_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] biomaRt returning multiple columns out of order

Richard Hayes


On Fri, Sep 30, 2011 at 2:51 PM, Steffen Durinck <[hidden email]> wrote:
Hi RIchard, Arek,

If you set verbose=TRUE in your getBM query you'll see the XML query that is send to the BioMart server (see below for your example).
The order of the attributes in the XML query is usually the same order we get the results back from the BioMart server.
However for your example this is not the case and there is no way for biomaRt to know this (Arek correct me if this is not the case), so when we add column names to the returned matrix they will be wrong when the query order is not preserved in the returned result.

> multiTest = getBM(attributes= c("organism_name", "transcript_name", "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167", mart=phyto,verbose=TRUE)

<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query  virtualSchemaName = 'default' uniqueRows = '1' count = '0' datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name = 'phytozome'><Attribute name = 'organism_name'/><Attribute name = 'transcript_name'/><Attribute name = 'exon_chrom_start'/><Attribute name = 'exon_chrom_end'/><Filter name = 'orgid' value = '167' /></Dataset></Query>


Okay, I see that on my end as well. Is this a consequence of biomart v0.6 on the backend that would be alleviated by our plans to upgrade to 0.7 soon?
 

Cheers,
Steffen


On Thu, Sep 29, 2011 at 9:08 AM, Arek Kasprzyk <[hidden email]> wrote:
Hi Richard,
the best person to help you is Steffen Durinck, the original biomaRt coder (cc'ed on this email)

a

On Wed, Sep 28, 2011 at 3:52 PM, Richard Hayes <[hidden email]> wrote:
Hi,

Our group maintains the biomart instance at the Phytozome plant genomics portal. We've had some users report problems with the result sets from the biomaRt interface. It is unclear if this is a biomaRt problem or a problem in our mart configuration. At the moment, we are still running biomart version 0.6, but are hoping to upgrade in the very near future to 0.7.

I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to R 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software releases.

I can successfully connect to our mart and the main genome transcript dataset as follows, successfully retrieving a single column of transcript names for Arabidopsis thaliana using our internal "orgid" filter for organism ID 167:

> library('biomaRt')
> phyto=useMart('phytozome_mart', dataset='phytozome')
> transcripts = getBM(attributes = c("transcript_name"), filters= "orgid", values="167", mart=phyto)
> transcripts[1:5,]
[1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1"

However, when I construct a multicolumn query, the columns are not returned in the expected order:

> multiTest = getBM(attributes= c("organism_name", "transcript_name", "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167", mart=phyto)
> multiTest[1:5,]
  organism_name transcript_name exon_chrom_start exon_chrom_end
1   AT5G47220.1        19171862         19172823      Athaliana
2   AT1G71920.3        27067059         27067098      Athaliana
3   AT1G71920.3        27067189         27067401      Athaliana
4   AT1G71920.3        27067506         27067589      Athaliana
5   AT1G71920.3        27067706         27067860      Athaliana

Any help diagnosing the source of this problem is much appreciated.

Best regards,

--
Richard D. Hayes, Ph.D.
Joint Genome Institute / Lawrence Berkeley National Lab
http://www.phytozome.net

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users






--
Richard D. Hayes, Ph.D.
Joint Genome Institute / Lawrence Berkeley National Lab
http://www.phytozome.net

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] [BioC] biomaRt returning multiple columns out of order

Laurent Gatto
Dear all,

Any update about the column order in biomaRt results?
I have come across the same issue, as illustrated below.

> library(biomaRt)
> mart = useMart("plants_mart_10","athaliana_eg_gene")
> ans <- getBM(attributes=c("tair_locus","peptide"), filter="tair_locus", value=c("AT3G18780","AT2G26300"), mart=mart, verbose=TRUE)
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
virtualSchemaName = 'default' uniqueRows = '1' count = '0'
datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name =
'athaliana_eg_gene'><Attribute name = 'tair_locus'/><Attribute name =
'peptide'/><Filter name = 'tair_locus' value = 'AT3G18780,AT2G26300'
/></Dataset></Query>
> ans                                                                                                                                                                                                                                                                                                                                                                                        tair_locus
1 MAEADDI[...]ASLIDQILFRILLHAN*
2 MGLLCSR[...]VKKRRRNLLEAGLL*
3 MAEADDI[...]ILASAGPGIVHRKCF*
    peptide
1 AT3G18780
2 AT2G26300
3 AT3G18780

I see the same for useMart("ensembl","ensembl_gene_id") using
ensembl_gene_id or ensembl_exon_id as filters.
In these cases, datasetConfigVersion is also 0.6, if that's of any help.

> sessionInfo()
R Under development (unstable) (2011-10-13 r57241)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
 [5] LC_MONETARY=en_GB.utf8    LC_MESSAGES=en_GB.utf8
 [7] LC_PAPER=C                LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.9.3

loaded via a namespace (and not attached):
[1] RCurl_1.6-10 XML_3.4-3

Best wishes,

Laurent

On 30 September 2011 23:11, Richard Hayes <[hidden email]> wrote:

> On Fri, Sep 30, 2011 at 2:51 PM, Steffen Durinck <[hidden email]> wrote:
>
>> Hi RIchard, Arek,
>>
>> If you set verbose=TRUE in your getBM query you'll see the XML query that
>> is send to the BioMart server (see below for your example).
>> The order of the attributes in the XML query is usually the same order we
>> get the results back from the BioMart server.
>> However for your example this is not the case and there is no way for
>> biomaRt to know this (Arek correct me if this is not the case), so when we
>> add column names to the returned matrix they will be wrong when the query
>> order is not preserved in the returned result.
>>
>> > multiTest = getBM(attributes= c("organism_name", "transcript_name",
>> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167",
>> mart=phyto,verbose=TRUE)
>>
>> <?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
>>  virtualSchemaName = 'default' uniqueRows = '1' count = '0'
>> datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name =
>> 'phytozome'><Attribute name = 'organism_name'/><Attribute name =
>> 'transcript_name'/><Attribute name = 'exon_chrom_start'/><Attribute name =
>> 'exon_chrom_end'/><Filter name = 'orgid' value = '167' /></Dataset></Query>
>>
>>
> Okay, I see that on my end as well. Is this a consequence of biomart v0.6 on
> the backend that would be alleviated by our plans to upgrade to 0.7 soon?
>
>
>>
>> Cheers,
>> Steffen
>>
>>
>> On Thu, Sep 29, 2011 at 9:08 AM, Arek Kasprzyk <[hidden email]>wrote:
>>
>>> Hi Richard,
>>> the best person to help you is Steffen Durinck, the original biomaRt coder
>>> (cc'ed on this email)
>>>
>>> a
>>>
>>> On Wed, Sep 28, 2011 at 3:52 PM, Richard Hayes <[hidden email]> wrote:
>>>
>>>> Hi,
>>>>
>>>> Our group maintains the biomart instance at the Phytozome plant genomics
>>>> portal. We've had some users report problems with the result sets from the
>>>> biomaRt interface. It is unclear if this is a biomaRt problem or a problem
>>>> in our mart configuration. At the moment, we are still running biomart
>>>> version 0.6, but are hoping to upgrade in the very near future to 0.7.
>>>>
>>>> I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to
>>>> R 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software
>>>> releases.
>>>>
>>>> I can successfully connect to our mart and the main genome transcript
>>>> dataset as follows, successfully retrieving a single column of transcript
>>>> names for Arabidopsis thaliana using our internal "orgid" filter for
>>>> organism ID 167:
>>>>
>>>> > library('biomaRt')
>>>> > phyto=useMart('phytozome_mart', dataset='phytozome')
>>>> > transcripts = getBM(attributes = c("transcript_name"), filters=
>>>> "orgid", values="167", mart=phyto)
>>>> > transcripts[1:5,]
>>>> [1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1"
>>>>
>>>> However, when I construct a multicolumn query, the columns are not
>>>> returned in the expected order:
>>>>
>>>> > multiTest = getBM(attributes= c("organism_name", "transcript_name",
>>>> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167",
>>>> mart=phyto)
>>>> > multiTest[1:5,]
>>>>   organism_name transcript_name exon_chrom_start exon_chrom_end
>>>> 1   AT5G47220.1        19171862         19172823      Athaliana
>>>> 2   AT1G71920.3        27067059         27067098      Athaliana
>>>> 3   AT1G71920.3        27067189         27067401      Athaliana
>>>> 4   AT1G71920.3        27067506         27067589      Athaliana
>>>> 5   AT1G71920.3        27067706         27067860      Athaliana
>>>>
>>>> Any help diagnosing the source of this problem is much appreciated.
>>>>
>>>> Best regards,
>>>>
>>>> --
>>>> Richard D. Hayes, Ph.D.
>>>> Joint Genome Institute / Lawrence Berkeley National Lab
>>>> http://www.phytozome.net
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> [hidden email]
>>>> https://lists.biomart.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>
>
> --
> Richard D. Hayes, Ph.D.
> Joint Genome Institute / Lawrence Berkeley National Lab
> http://www.phytozome.net
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

--
[ Laurent Gatto | slashhome.be ]
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] [BioC] biomaRt returning multiple columns out of order

Steffen Durinck-3
Hi Laurent,

No there is no update on this issue.  Unless the BioMart server can return column headers, the biomaRt package can not know in which order the query results come back and assumes it is in the same order as the attributes specified in the query (which it does for most of the queries).
Arek, let me know if you have any solution to this. 

Cheers,
Steffen

On Wed, Oct 19, 2011 at 6:52 AM, Laurent Gatto <[hidden email]> wrote:
Dear all,

Any update about the column order in biomaRt results?
I have come across the same issue, as illustrated below.

> library(biomaRt)
> mart = useMart("plants_mart_10","athaliana_eg_gene")
> ans <- getBM(attributes=c("tair_locus","peptide"), filter="tair_locus", value=c("AT3G18780","AT2G26300"), mart=mart, verbose=TRUE)
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
virtualSchemaName = 'default' uniqueRows = '1' count = '0'
datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name =
'athaliana_eg_gene'><Attribute name = 'tair_locus'/><Attribute name =
'peptide'/><Filter name = 'tair_locus' value = 'AT3G18780,AT2G26300'
/></Dataset></Query>
> ans                                                                                                                                                                                                                                                                                                                                                                                        tair_locus
1 MAEADDI[...]ASLIDQILFRILLHAN*
2 MGLLCSR[...]VKKRRRNLLEAGLL*
3 MAEADDI[...]ILASAGPGIVHRKCF*
   peptide
1 AT3G18780
2 AT2G26300
3 AT3G18780

I see the same for useMart("ensembl","ensembl_gene_id") using
ensembl_gene_id or ensembl_exon_id as filters.
In these cases, datasetConfigVersion is also 0.6, if that's of any help.

> sessionInfo()
R Under development (unstable) (2011-10-13 r57241)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
 [5] LC_MONETARY=en_GB.utf8    LC_MESSAGES=en_GB.utf8
 [7] LC_PAPER=C                LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.9.3

loaded via a namespace (and not attached):
[1] RCurl_1.6-10 XML_3.4-3

Best wishes,

Laurent

On 30 September 2011 23:11, Richard Hayes <[hidden email]> wrote:
> On Fri, Sep 30, 2011 at 2:51 PM, Steffen Durinck <[hidden email]> wrote:
>
>> Hi RIchard, Arek,
>>
>> If you set verbose=TRUE in your getBM query you'll see the XML query that
>> is send to the BioMart server (see below for your example).
>> The order of the attributes in the XML query is usually the same order we
>> get the results back from the BioMart server.
>> However for your example this is not the case and there is no way for
>> biomaRt to know this (Arek correct me if this is not the case), so when we
>> add column names to the returned matrix they will be wrong when the query
>> order is not preserved in the returned result.
>>
>> > multiTest = getBM(attributes= c("organism_name", "transcript_name",
>> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167",
>> mart=phyto,verbose=TRUE)
>>
>> <?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
>>  virtualSchemaName = 'default' uniqueRows = '1' count = '0'
>> datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name =
>> 'phytozome'><Attribute name = 'organism_name'/><Attribute name =
>> 'transcript_name'/><Attribute name = 'exon_chrom_start'/><Attribute name =
>> 'exon_chrom_end'/><Filter name = 'orgid' value = '167' /></Dataset></Query>
>>
>>
> Okay, I see that on my end as well. Is this a consequence of biomart v0.6 on
> the backend that would be alleviated by our plans to upgrade to 0.7 soon?
>
>
>>
>> Cheers,
>> Steffen
>>
>>
>> On Thu, Sep 29, 2011 at 9:08 AM, Arek Kasprzyk <[hidden email]>wrote:
>>
>>> Hi Richard,
>>> the best person to help you is Steffen Durinck, the original biomaRt coder
>>> (cc'ed on this email)
>>>
>>> a
>>>
>>> On Wed, Sep 28, 2011 at 3:52 PM, Richard Hayes <[hidden email]> wrote:
>>>
>>>> Hi,
>>>>
>>>> Our group maintains the biomart instance at the Phytozome plant genomics
>>>> portal. We've had some users report problems with the result sets from the
>>>> biomaRt interface. It is unclear if this is a biomaRt problem or a problem
>>>> in our mart configuration. At the moment, we are still running biomart
>>>> version 0.6, but are hoping to upgrade in the very near future to 0.7.
>>>>
>>>> I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded to
>>>> R 2.13.1 and biomaRt 2.8.1. The problems persist with these latest software
>>>> releases.
>>>>
>>>> I can successfully connect to our mart and the main genome transcript
>>>> dataset as follows, successfully retrieving a single column of transcript
>>>> names for Arabidopsis thaliana using our internal "orgid" filter for
>>>> organism ID 167:
>>>>
>>>> > library('biomaRt')
>>>> > phyto=useMart('phytozome_mart', dataset='phytozome')
>>>> > transcripts = getBM(attributes = c("transcript_name"), filters=
>>>> "orgid", values="167", mart=phyto)
>>>> > transcripts[1:5,]
>>>> [1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1" "AT2G19280.1"
>>>>
>>>> However, when I construct a multicolumn query, the columns are not
>>>> returned in the expected order:
>>>>
>>>> > multiTest = getBM(attributes= c("organism_name", "transcript_name",
>>>> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167",
>>>> mart=phyto)
>>>> > multiTest[1:5,]
>>>>   organism_name transcript_name exon_chrom_start exon_chrom_end
>>>> 1   AT5G47220.1        19171862         19172823      Athaliana
>>>> 2   AT1G71920.3        27067059         27067098      Athaliana
>>>> 3   AT1G71920.3        27067189         27067401      Athaliana
>>>> 4   AT1G71920.3        27067506         27067589      Athaliana
>>>> 5   AT1G71920.3        27067706         27067860      Athaliana
>>>>
>>>> Any help diagnosing the source of this problem is much appreciated.
>>>>
>>>> Best regards,
>>>>
>>>> --
>>>> Richard D. Hayes, Ph.D.
>>>> Joint Genome Institute / Lawrence Berkeley National Lab
>>>> http://www.phytozome.net
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> [hidden email]
>>>> https://lists.biomart.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>
>
>
> --
> Richard D. Hayes, Ph.D.
> Joint Genome Institute / Lawrence Berkeley National Lab
> http://www.phytozome.net
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

--
[ Laurent Gatto | slashhome.be ]

_______________________________________________
Bioconductor mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] [BioC] biomaRt returning multiple columns out of order

vasu punj
Try to use Biomart site. it should be good
From: Steffen Durinck <[hidden email]>
To: Laurent Gatto <[hidden email]>
Cc: Arek Kasprzyk <[hidden email]>; [hidden email]; [hidden email]; Steffen Durinck <[hidden email]>
Sent: Wednesday, October 19, 2011 9:31 AM
Subject: Re: [BioC] [BioMart Users] biomaRt returning multiple columns out of order

Hi Laurent,

No there is no update on this issue.  Unless the BioMart server can return
column headers, the biomaRt package can not know in which order the query
results come back and assumes it is in the same order as the attributes
specified in the query (which it does for most of the queries).
Arek, let me know if you have any solution to this.

Cheers,
Steffen

On Wed, Oct 19, 2011 at 6:52 AM, Laurent Gatto <[hidden email]>wrote:

> Dear all,
>
> Any update about the column order in biomaRt results?
> I have come across the same issue, as illustrated below.
>
> > library(biomaRt)
> > mart = useMart("plants_mart_10","athaliana_eg_gene")
> > ans <- getBM(attributes=c("tair_locus","peptide"), filter="tair_locus",
> value=c("AT3G18780","AT2G26300"), mart=mart, verbose=TRUE)
> <?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
> virtualSchemaName = 'default' uniqueRows = '1' count = '0'
> datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name =
> 'athaliana_eg_gene'><Attribute name = 'tair_locus'/><Attribute name =
> 'peptide'/><Filter name = 'tair_locus' value = 'AT3G18780,AT2G26300'
> /></Dataset></Query>
> > ans
>
>
>
>
>  tair_locus
> 1 MAEADDI[...]ASLIDQILFRILLHAN*
> 2 MGLLCSR[...]VKKRRRNLLEAGLL*
> 3 MAEADDI[...]ILASAGPGIVHRKCF*
>    peptide
> 1 AT3G18780
> 2 AT2G26300
> 3 AT3G18780
>
> I see the same for useMart("ensembl","ensembl_gene_id") using
> ensembl_gene_id or ensembl_exon_id as filters.
> In these cases, datasetConfigVersion is also 0.6, if that's of any help.
>
> > sessionInfo()
> R Under development (unstable) (2011-10-13 r57241)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_GB.utf8      LC_NUMERIC=C
>  [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
>  [5] LC_MONETARY=en_GB.utf8    LC_MESSAGES=en_GB.utf8
>  [7] LC_PAPER=C                LC_NAME=C
>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats    graphics  grDevices utils    datasets  methods  base
>
> other attached packages:
> [1] biomaRt_2.9.3
>
> loaded via a namespace (and not attached):
> [1] RCurl_1.6-10 XML_3.4-3
>
> Best wishes,
>
> Laurent
>
> On 30 September 2011 23:11, Richard Hayes <[hidden email]> wrote:
> > On Fri, Sep 30, 2011 at 2:51 PM, Steffen Durinck <[hidden email]>
> wrote:
> >
> >> Hi RIchard, Arek,
> >>
> >> If you set verbose=TRUE in your getBM query you'll see the XML query
> that
> >> is send to the BioMart server (see below for your example).
> >> The order of the attributes in the XML query is usually the same order
> we
> >> get the results back from the BioMart server.
> >> However for your example this is not the case and there is no way for
> >> biomaRt to know this (Arek correct me if this is not the case), so when
> we
> >> add column names to the returned matrix they will be wrong when the
> query
> >> order is not preserved in the returned result.
> >>
> >> > multiTest = getBM(attributes= c("organism_name", "transcript_name",
> >> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167",
> >> mart=phyto,verbose=TRUE)
> >>
> >> <?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query
> >>  virtualSchemaName = 'default' uniqueRows = '1' count = '0'
> >> datasetConfigVersion = '0.6' requestid= "biomaRt"> <Dataset name =
> >> 'phytozome'><Attribute name = 'organism_name'/><Attribute name =
> >> 'transcript_name'/><Attribute name = 'exon_chrom_start'/><Attribute name
> =
> >> 'exon_chrom_end'/><Filter name = 'orgid' value = '167'
> /></Dataset></Query>
> >>
> >>
> > Okay, I see that on my end as well. Is this a consequence of biomart v0.6
> on
> > the backend that would be alleviated by our plans to upgrade to 0.7 soon?
> >
> >
> >>
> >> Cheers,
> >> Steffen
> >>
> >>
> >> On Thu, Sep 29, 2011 at 9:08 AM, Arek Kasprzyk <[hidden email]
> >wrote:
> >>
> >>> Hi Richard,
> >>> the best person to help you is Steffen Durinck, the original biomaRt
> coder
> >>> (cc'ed on this email)
> >>>
> >>> a
> >>>
> >>> On Wed, Sep 28, 2011 at 3:52 PM, Richard Hayes <[hidden email]>
> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Our group maintains the biomart instance at the Phytozome plant
> genomics
> >>>> portal. We've had some users report problems with the result sets from
> the
> >>>> biomaRt interface. It is unclear if this is a biomaRt problem or a
> problem
> >>>> in our mart configuration. At the moment, we are still running biomart
> >>>> version 0.6, but are hoping to upgrade in the very near future to 0.7.
> >>>>
> >>>> I had been testing with R 2.12.2 and biomaRt 2.6.0, but then upgraded
> to
> >>>> R 2.13.1 and biomaRt 2.8.1. The problems persist with these latest
> software
> >>>> releases.
> >>>>
> >>>> I can successfully connect to our mart and the main genome transcript
> >>>> dataset as follows, successfully retrieving a single column of
> transcript
> >>>> names for Arabidopsis thaliana using our internal "orgid" filter for
> >>>> organism ID 167:
> >>>>
> >>>> > library('biomaRt')
> >>>> > phyto=useMart('phytozome_mart', dataset='phytozome')
> >>>> > transcripts = getBM(attributes = c("transcript_name"), filters=
> >>>> "orgid", values="167", mart=phyto)
> >>>> > transcripts[1:5,]
> >>>> [1] "AT2G38230.1" "AT2G39920.2" "AT2G26530.1" "AT2G28630.1"
> "AT2G19280.1"
> >>>>
> >>>> However, when I construct a multicolumn query, the columns are not
> >>>> returned in the expected order:
> >>>>
> >>>> > multiTest = getBM(attributes= c("organism_name", "transcript_name",
> >>>> "exon_chrom_start", "exon_chrom_end"), filters="orgid", values="167",
> >>>> mart=phyto)
> >>>> > multiTest[1:5,]
> >>>>  organism_name transcript_name exon_chrom_start exon_chrom_end
> >>>> 1  AT5G47220.1        19171862        19172823      Athaliana
> >>>> 2  AT1G71920.3        27067059        27067098      Athaliana
> >>>> 3  AT1G71920.3        27067189        27067401      Athaliana
> >>>> 4  AT1G71920.3        27067506        27067589      Athaliana
> >>>> 5  AT1G71920.3        27067706        27067860      Athaliana
> >>>>
> >>>> Any help diagnosing the source of this problem is much appreciated.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> --
> >>>> Richard D. Hayes, Ph.D.
> >>>> Joint Genome Institute / Lawrence Berkeley National Lab
> >>>> http://www.phytozome.net
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> [hidden email]
> >>>> https://lists.biomart.org/mailman/listinfo/users
> >>>>
> >>>>
> >>>
> >>
> >
> >
> > --
> > Richard D. Hayes, Ph.D.
> > Joint Genome Institute / Lawrence Berkeley National Lab
> > http://www.phytozome.net
> >
> >        [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>
> --
> [ Laurent Gatto | slashhome.be ]
>
> _______________________________________________
> Bioconductor mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

    [[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users