getting results in Perl API in data structures instead of printResults()

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

getting results in Perl API in data structures instead of printResults()

Leandro Hermida
Hi,

I was wondering if there is a way using the Perl API to get results in a Perl data structure and, if possible, row by row.  For example each row returned as an array or arrayref.  It seems inefficient to take printResults() and have to break everything up again when I know somewhere in the Perl API it was doing the reverse...

thanks,
Leandro

Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Syed Haider
Hi Leandro,

The datastructures representation of results is not returned by the API.
If you are feeling adventurous please feel free to look into the
lib/BioMart/Formatter/ directory for the appropriate formatter that you
are interested in.


Best
Syed



On 09/06/2010 17:51, Leandro Hermida wrote:
> Hi,
>
> I was wondering if there is a way using the Perl API to get results in a Perl data structure and, if possible, row by row.  For example each row returned as an array or arrayref.  It seems inefficient to take printResults() and have to break everything up again when I know somewhere in the Perl API it was doing the reverse...
>
> thanks,
> Leandro
>
>
Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Leandro Hermida-2
Sorry forgot to post what I did before! For those of your who use the
Biomart APIs and want to get results back into a Perl data structures,
here is the approach I use:

If using the Perl API:

use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $bm_initializer = BioMart::Initializer->new(
    registryFile => "/path/to/myRegistry.xml",
    action => 'update',
);
my $bm_query = BioMart::Query->new(
    registry => $bm_initializer->getRegistry(),
    virtualSchemaName => 'default'
);
$bm_query->setDataset('my_dataset');
$bm_query->addFilter('attr1', ['Q6LTE1']);
$bm_query->addAttribute('attr2');
$bm_query->addAttribute('attr3');
$bm_query->formatter('TSV');
my $bm_query_runner=BioMart::QueryRunner->new();
$bm_query_runner->uniqueRowsOnly(1);
$bm_query_runner->execute($bm_query);
open(RESULTS, '+>', \my $results) or die "$!\n";
$bm_query_runner->printResults(\*RESULTS);
seek(RESULTS, 0, 0);
while (<RESULTS>) {
    chomp;
    my @row_fields = split /\t/;
    # build up a data structure or processed your fields here...
}
close(RESULTS);


Using the REST API:

use LWP::UserAgent ();

my $query_xml = <<XML;
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName="default" formatter="TSV" header="0"
uniqueRows="1" count="" datasetConfigVersion="0.7">
    <Dataset name="my_dataset" interface="default">
        <Filter name="attr1" value="Q6LTE1"/>
        <Attribute name="attr2" />
        <Attribute name="attr3" />
    </Dataset>
</Query>
XML

my $ua = LWP::UserAgent->new();
my $response = $ua->post('http://myserver.mydomain:9002/biomart/martservice',
[ query => $query_xml ]);
if ($response->is_success and $response->decoded_content !~
/BioMart::Exception/i) {
    open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
    while (<RESULTS>) {
        chomp;
        my @row_fields = split /\t/;
        # build up a data structure or processed your fields here...
    }
    close(RESULTS);
}
else {
    die $response->decoded_content, "\n";
}


On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider <[hidden email]> wrote:

> Hi Leandro,
>
> The datastructures representation of results is not returned by the API. If
> you are feeling adventurous please feel free to look into the
> lib/BioMart/Formatter/ directory for the appropriate formatter that you are
> interested in.
>
>
> Best
> Syed
>
>
>
> On 09/06/2010 17:51, Leandro Hermida wrote:
>>
>> Hi,
>>
>> I was wondering if there is a way using the Perl API to get results in a
>> Perl data structure and, if possible, row by row.  For example each row
>> returned as an array or arrayref.  It seems inefficient to take
>> printResults() and have to break everything up again when I know somewhere
>> in the Perl API it was doing the reverse...
>>
>> thanks,
>> Leandro
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Syed Haider
Hi Leandro,

this is the only method that returns the results. What exactly are you
after ?

Best
Syed

On 14/07/2010 13:14, Leandro Hermida wrote:

> Sorry forgot to post what I did before! For those of your who use the
> Biomart APIs and want to get results back into a Perl data structures,
> here is the approach I use:
>
> If using the Perl API:
>
> use BioMart::Initializer;
> use BioMart::Query;
> use BioMart::QueryRunner;
>
> my $bm_initializer = BioMart::Initializer->new(
>      registryFile =>  "/path/to/myRegistry.xml",
>      action =>  'update',
> );
> my $bm_query = BioMart::Query->new(
>      registry =>  $bm_initializer->getRegistry(),
>      virtualSchemaName =>  'default'
> );
> $bm_query->setDataset('my_dataset');
> $bm_query->addFilter('attr1', ['Q6LTE1']);
> $bm_query->addAttribute('attr2');
> $bm_query->addAttribute('attr3');
> $bm_query->formatter('TSV');
> my $bm_query_runner=BioMart::QueryRunner->new();
> $bm_query_runner->uniqueRowsOnly(1);
> $bm_query_runner->execute($bm_query);
> open(RESULTS, '+>', \my $results) or die "$!\n";
> $bm_query_runner->printResults(\*RESULTS);
> seek(RESULTS, 0, 0);
> while (<RESULTS>) {
>      chomp;
>      my @row_fields = split /\t/;
>      # build up a data structure or processed your fields here...
> }
> close(RESULTS);
>
>
> Using the REST API:
>
> use LWP::UserAgent ();
>
> my $query_xml =<<XML;
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE Query>
> <Query virtualSchemaName="default" formatter="TSV" header="0"
> uniqueRows="1" count="" datasetConfigVersion="0.7">
>      <Dataset name="my_dataset" interface="default">
>          <Filter name="attr1" value="Q6LTE1"/>
>          <Attribute name="attr2" />
>          <Attribute name="attr3" />
>      </Dataset>
> </Query>
> XML
>
> my $ua = LWP::UserAgent->new();
> my $response = $ua->post('http://myserver.mydomain:9002/biomart/martservice',
> [ query =>  $query_xml ]);
> if ($response->is_success and $response->decoded_content !~
> /BioMart::Exception/i) {
>      open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
>      while (<RESULTS>) {
>          chomp;
>          my @row_fields = split /\t/;
>          # build up a data structure or processed your fields here...
>      }
>      close(RESULTS);
> }
> else {
>      die $response->decoded_content, "\n";
> }
>
>
> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[hidden email]>  wrote:
>> Hi Leandro,
>>
>> The datastructures representation of results is not returned by the API. If
>> you are feeling adventurous please feel free to look into the
>> lib/BioMart/Formatter/ directory for the appropriate formatter that you are
>> interested in.
>>
>>
>> Best
>> Syed
>>
>>
>>
>> On 09/06/2010 17:51, Leandro Hermida wrote:
>>>
>>> Hi,
>>>
>>> I was wondering if there is a way using the Perl API to get results in a
>>> Perl data structure and, if possible, row by row.  For example each row
>>> returned as an array or arrayref.  It seems inefficient to take
>>> printResults() and have to break everything up again when I know somewhere
>>> in the Perl API it was doing the reverse...
>>>
>>> thanks,
>>> Leandro
>>>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Leandro Hermida-2
Hi Syed,

Since none of the BioMart APIs actually return results in a data
structure (it only returns formatted files like TSV, etc) I was trying
to be helpful and show other developers on this forum how they can go
about populating a Perl data structure from the results returned by
BioMart.

It's not obvious after reading the docs and when you get started how
you need to do this, one initially expects in the APIs that there
would be for e.g. in the Perl API some method call ->getResults()
which returns an @array of arrayrefs structure or in the REST API that
there would be an option to return for e.g. a JSON serialized data
structure that can be unserialized into a native data structure for
the language you are using.

best,
Leandro

On Wed, Jul 14, 2010 at 3:21 PM, Syed Haider <[hidden email]> wrote:

> Hi Leandro,
>
> this is the only method that returns the results. What exactly are you after
> ?
>
> Best
> Syed
>
> On 14/07/2010 13:14, Leandro Hermida wrote:
>>
>> Sorry forgot to post what I did before! For those of your who use the
>> Biomart APIs and want to get results back into a Perl data structures,
>> here is the approach I use:
>>
>> If using the Perl API:
>>
>> use BioMart::Initializer;
>> use BioMart::Query;
>> use BioMart::QueryRunner;
>>
>> my $bm_initializer = BioMart::Initializer->new(
>>     registryFile =>  "/path/to/myRegistry.xml",
>>     action =>  'update',
>> );
>> my $bm_query = BioMart::Query->new(
>>     registry =>  $bm_initializer->getRegistry(),
>>     virtualSchemaName =>  'default'
>> );
>> $bm_query->setDataset('my_dataset');
>> $bm_query->addFilter('attr1', ['Q6LTE1']);
>> $bm_query->addAttribute('attr2');
>> $bm_query->addAttribute('attr3');
>> $bm_query->formatter('TSV');
>> my $bm_query_runner=BioMart::QueryRunner->new();
>> $bm_query_runner->uniqueRowsOnly(1);
>> $bm_query_runner->execute($bm_query);
>> open(RESULTS, '+>', \my $results) or die "$!\n";
>> $bm_query_runner->printResults(\*RESULTS);
>> seek(RESULTS, 0, 0);
>> while (<RESULTS>) {
>>     chomp;
>>     my @row_fields = split /\t/;
>>     # build up a data structure or processed your fields here...
>> }
>> close(RESULTS);
>>
>>
>> Using the REST API:
>>
>> use LWP::UserAgent ();
>>
>> my $query_xml =<<XML;
>> <?xml version="1.0" encoding="UTF-8"?>
>> <!DOCTYPE Query>
>> <Query virtualSchemaName="default" formatter="TSV" header="0"
>> uniqueRows="1" count="" datasetConfigVersion="0.7">
>>     <Dataset name="my_dataset" interface="default">
>>         <Filter name="attr1" value="Q6LTE1"/>
>>         <Attribute name="attr2" />
>>         <Attribute name="attr3" />
>>     </Dataset>
>> </Query>
>> XML
>>
>> my $ua = LWP::UserAgent->new();
>> my $response =
>> $ua->post('http://myserver.mydomain:9002/biomart/martservice',
>> [ query =>  $query_xml ]);
>> if ($response->is_success and $response->decoded_content !~
>> /BioMart::Exception/i) {
>>     open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
>>     while (<RESULTS>) {
>>         chomp;
>>         my @row_fields = split /\t/;
>>         # build up a data structure or processed your fields here...
>>     }
>>     close(RESULTS);
>> }
>> else {
>>     die $response->decoded_content, "\n";
>> }
>>
>>
>> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[hidden email]>
>>  wrote:
>>>
>>> Hi Leandro,
>>>
>>> The datastructures representation of results is not returned by the API.
>>> If
>>> you are feeling adventurous please feel free to look into the
>>> lib/BioMart/Formatter/ directory for the appropriate formatter that you
>>> are
>>> interested in.
>>>
>>>
>>> Best
>>> Syed
>>>
>>>
>>>
>>> On 09/06/2010 17:51, Leandro Hermida wrote:
>>>>
>>>> Hi,
>>>>
>>>> I was wondering if there is a way using the Perl API to get results in a
>>>> Perl data structure and, if possible, row by row.  For example each row
>>>> returned as an array or arrayref.  It seems inefficient to take
>>>> printResults() and have to break everything up again when I know
>>>> somewhere
>>>> in the Perl API it was doing the reverse...
>>>>
>>>> thanks,
>>>> Leandro
>>>>
>>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Leandro Hermida-2
Hi again,

In the new BioMart 0.8 will the SOAP and REST APIs have:
- an option to return results in JSON or other serialized data structure form?
- an option to return results sorted by some attribute(s)?
- an option to return results with LIMITs in full form i.e. start_row,
end_row (for paging)?

best,
Leandro

On Wed, Jul 14, 2010 at 4:32 PM, Leandro Hermida
<[hidden email]> wrote:

> Hi Syed,
>
> Since none of the BioMart APIs actually return results in a data
> structure (it only returns formatted files like TSV, etc) I was trying
> to be helpful and show other developers on this forum how they can go
> about populating a Perl data structure from the results returned by
> BioMart.
>
> It's not obvious after reading the docs and when you get started how
> you need to do this, one initially expects in the APIs that there
> would be for e.g. in the Perl API some method call ->getResults()
> which returns an @array of arrayrefs structure or in the REST API that
> there would be an option to return for e.g. a JSON serialized data
> structure that can be unserialized into a native data structure for
> the language you are using.
>
> best,
> Leandro
>
> On Wed, Jul 14, 2010 at 3:21 PM, Syed Haider <[hidden email]> wrote:
>> Hi Leandro,
>>
>> this is the only method that returns the results. What exactly are you after
>> ?
>>
>> Best
>> Syed
>>
>> On 14/07/2010 13:14, Leandro Hermida wrote:
>>>
>>> Sorry forgot to post what I did before! For those of your who use the
>>> Biomart APIs and want to get results back into a Perl data structures,
>>> here is the approach I use:
>>>
>>> If using the Perl API:
>>>
>>> use BioMart::Initializer;
>>> use BioMart::Query;
>>> use BioMart::QueryRunner;
>>>
>>> my $bm_initializer = BioMart::Initializer->new(
>>>     registryFile =>  "/path/to/myRegistry.xml",
>>>     action =>  'update',
>>> );
>>> my $bm_query = BioMart::Query->new(
>>>     registry =>  $bm_initializer->getRegistry(),
>>>     virtualSchemaName =>  'default'
>>> );
>>> $bm_query->setDataset('my_dataset');
>>> $bm_query->addFilter('attr1', ['Q6LTE1']);
>>> $bm_query->addAttribute('attr2');
>>> $bm_query->addAttribute('attr3');
>>> $bm_query->formatter('TSV');
>>> my $bm_query_runner=BioMart::QueryRunner->new();
>>> $bm_query_runner->uniqueRowsOnly(1);
>>> $bm_query_runner->execute($bm_query);
>>> open(RESULTS, '+>', \my $results) or die "$!\n";
>>> $bm_query_runner->printResults(\*RESULTS);
>>> seek(RESULTS, 0, 0);
>>> while (<RESULTS>) {
>>>     chomp;
>>>     my @row_fields = split /\t/;
>>>     # build up a data structure or processed your fields here...
>>> }
>>> close(RESULTS);
>>>
>>>
>>> Using the REST API:
>>>
>>> use LWP::UserAgent ();
>>>
>>> my $query_xml =<<XML;
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <!DOCTYPE Query>
>>> <Query virtualSchemaName="default" formatter="TSV" header="0"
>>> uniqueRows="1" count="" datasetConfigVersion="0.7">
>>>     <Dataset name="my_dataset" interface="default">
>>>         <Filter name="attr1" value="Q6LTE1"/>
>>>         <Attribute name="attr2" />
>>>         <Attribute name="attr3" />
>>>     </Dataset>
>>> </Query>
>>> XML
>>>
>>> my $ua = LWP::UserAgent->new();
>>> my $response =
>>> $ua->post('http://myserver.mydomain:9002/biomart/martservice',
>>> [ query =>  $query_xml ]);
>>> if ($response->is_success and $response->decoded_content !~
>>> /BioMart::Exception/i) {
>>>     open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
>>>     while (<RESULTS>) {
>>>         chomp;
>>>         my @row_fields = split /\t/;
>>>         # build up a data structure or processed your fields here...
>>>     }
>>>     close(RESULTS);
>>> }
>>> else {
>>>     die $response->decoded_content, "\n";
>>> }
>>>
>>>
>>> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[hidden email]>
>>>  wrote:
>>>>
>>>> Hi Leandro,
>>>>
>>>> The datastructures representation of results is not returned by the API.
>>>> If
>>>> you are feeling adventurous please feel free to look into the
>>>> lib/BioMart/Formatter/ directory for the appropriate formatter that you
>>>> are
>>>> interested in.
>>>>
>>>>
>>>> Best
>>>> Syed
>>>>
>>>>
>>>>
>>>> On 09/06/2010 17:51, Leandro Hermida wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I was wondering if there is a way using the Perl API to get results in a
>>>>> Perl data structure and, if possible, row by row.  For example each row
>>>>> returned as an array or arrayref.  It seems inefficient to take
>>>>> printResults() and have to break everything up again when I know
>>>>> somewhere
>>>>> in the Perl API it was doing the reverse...
>>>>>
>>>>> thanks,
>>>>> Leandro
>>>>>
>>>>>
>>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Syed Haider
In reply to this post by Leandro Hermida-2
Hi Leandro,

I was just trying to understand the use case, i got it. The reason we
have the webservice and Perl returning similar string like datastructure
is to aim for a coherent results structure across all API end-points. I
agree there are several ways to publishing the results in other formats
e.g arrays, hashes, JSON etc that offers great interoperability with
various languages. Your suggestions are very valuable and have been
taken on board for the ongoing new libraries, where we intend to offer
these as extensions.

thanks
Syed

On 14/07/2010 15:32, Leandro Hermida wrote:

> Hi Syed,
>
> Since none of the BioMart APIs actually return results in a data
> structure (it only returns formatted files like TSV, etc) I was trying
> to be helpful and show other developers on this forum how they can go
> about populating a Perl data structure from the results returned by
> BioMart.
>
> It's not obvious after reading the docs and when you get started how
> you need to do this, one initially expects in the APIs that there
> would be for e.g. in the Perl API some method call ->getResults()
> which returns an @array of arrayrefs structure or in the REST API that
> there would be an option to return for e.g. a JSON serialized data
> structure that can be unserialized into a native data structure for
> the language you are using.
>
> best,
> Leandro
>
> On Wed, Jul 14, 2010 at 3:21 PM, Syed Haider<[hidden email]>  wrote:
>> Hi Leandro,
>>
>> this is the only method that returns the results. What exactly are you after
>> ?
>>
>> Best
>> Syed
>>
>> On 14/07/2010 13:14, Leandro Hermida wrote:
>>>
>>> Sorry forgot to post what I did before! For those of your who use the
>>> Biomart APIs and want to get results back into a Perl data structures,
>>> here is the approach I use:
>>>
>>> If using the Perl API:
>>>
>>> use BioMart::Initializer;
>>> use BioMart::Query;
>>> use BioMart::QueryRunner;
>>>
>>> my $bm_initializer = BioMart::Initializer->new(
>>>      registryFile =>    "/path/to/myRegistry.xml",
>>>      action =>    'update',
>>> );
>>> my $bm_query = BioMart::Query->new(
>>>      registry =>    $bm_initializer->getRegistry(),
>>>      virtualSchemaName =>    'default'
>>> );
>>> $bm_query->setDataset('my_dataset');
>>> $bm_query->addFilter('attr1', ['Q6LTE1']);
>>> $bm_query->addAttribute('attr2');
>>> $bm_query->addAttribute('attr3');
>>> $bm_query->formatter('TSV');
>>> my $bm_query_runner=BioMart::QueryRunner->new();
>>> $bm_query_runner->uniqueRowsOnly(1);
>>> $bm_query_runner->execute($bm_query);
>>> open(RESULTS, '+>', \my $results) or die "$!\n";
>>> $bm_query_runner->printResults(\*RESULTS);
>>> seek(RESULTS, 0, 0);
>>> while (<RESULTS>) {
>>>      chomp;
>>>      my @row_fields = split /\t/;
>>>      # build up a data structure or processed your fields here...
>>> }
>>> close(RESULTS);
>>>
>>>
>>> Using the REST API:
>>>
>>> use LWP::UserAgent ();
>>>
>>> my $query_xml =<<XML;
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <!DOCTYPE Query>
>>> <Query virtualSchemaName="default" formatter="TSV" header="0"
>>> uniqueRows="1" count="" datasetConfigVersion="0.7">
>>>      <Dataset name="my_dataset" interface="default">
>>>          <Filter name="attr1" value="Q6LTE1"/>
>>>          <Attribute name="attr2" />
>>>          <Attribute name="attr3" />
>>>      </Dataset>
>>> </Query>
>>> XML
>>>
>>> my $ua = LWP::UserAgent->new();
>>> my $response =
>>> $ua->post('http://myserver.mydomain:9002/biomart/martservice',
>>> [ query =>    $query_xml ]);
>>> if ($response->is_success and $response->decoded_content !~
>>> /BioMart::Exception/i) {
>>>      open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
>>>      while (<RESULTS>) {
>>>          chomp;
>>>          my @row_fields = split /\t/;
>>>          # build up a data structure or processed your fields here...
>>>      }
>>>      close(RESULTS);
>>> }
>>> else {
>>>      die $response->decoded_content, "\n";
>>> }
>>>
>>>
>>> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[hidden email]>
>>>   wrote:
>>>>
>>>> Hi Leandro,
>>>>
>>>> The datastructures representation of results is not returned by the API.
>>>> If
>>>> you are feeling adventurous please feel free to look into the
>>>> lib/BioMart/Formatter/ directory for the appropriate formatter that you
>>>> are
>>>> interested in.
>>>>
>>>>
>>>> Best
>>>> Syed
>>>>
>>>>
>>>>
>>>> On 09/06/2010 17:51, Leandro Hermida wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I was wondering if there is a way using the Perl API to get results in a
>>>>> Perl data structure and, if possible, row by row.  For example each row
>>>>> returned as an array or arrayref.  It seems inefficient to take
>>>>> printResults() and have to break everything up again when I know
>>>>> somewhere
>>>>> in the Perl API it was doing the reverse...
>>>>>
>>>>> thanks,
>>>>> Leandro
>>>>>
>>>>>
>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Syed Haider
In reply to this post by Leandro Hermida-2


On 14/07/2010 15:46, Leandro Hermida wrote:
> Hi again,
>
> In the new BioMart 0.8 will the SOAP and REST APIs have:
> - an option to return results in JSON or other serialized data structure form?

tentative yes for results request. For all other API call (meta data
calls), a definite yes. For the former, there is very little point to
e.g wrap 1000 bytes of gene ids in  20,000 bytes of JSON.

> - an option to return results sorted by some attribute(s)?

no, thats a post processing option and tends to be very expensive as it
needs all results to be collected in the first place. we can make it
optional though. BioMart web interface would have this option for sure.

> - an option to return results with LIMITs in full form i.e. start_row,
> end_row (for paging)?

you will have limit as offset of zero. e.g you can retrieve, first 100,
first 1000, first 10000 and so on.

HTH,
Syed

>
> best,
> Leandro
>
> On Wed, Jul 14, 2010 at 4:32 PM, Leandro Hermida
> <[hidden email]>  wrote:
>> Hi Syed,
>>
>> Since none of the BioMart APIs actually return results in a data
>> structure (it only returns formatted files like TSV, etc) I was trying
>> to be helpful and show other developers on this forum how they can go
>> about populating a Perl data structure from the results returned by
>> BioMart.
>>
>> It's not obvious after reading the docs and when you get started how
>> you need to do this, one initially expects in the APIs that there
>> would be for e.g. in the Perl API some method call ->getResults()
>> which returns an @array of arrayrefs structure or in the REST API that
>> there would be an option to return for e.g. a JSON serialized data
>> structure that can be unserialized into a native data structure for
>> the language you are using.
>>
>> best,
>> Leandro
>>
>> On Wed, Jul 14, 2010 at 3:21 PM, Syed Haider<[hidden email]>  wrote:
>>> Hi Leandro,
>>>
>>> this is the only method that returns the results. What exactly are you after
>>> ?
>>>
>>> Best
>>> Syed
>>>
>>> On 14/07/2010 13:14, Leandro Hermida wrote:
>>>>
>>>> Sorry forgot to post what I did before! For those of your who use the
>>>> Biomart APIs and want to get results back into a Perl data structures,
>>>> here is the approach I use:
>>>>
>>>> If using the Perl API:
>>>>
>>>> use BioMart::Initializer;
>>>> use BioMart::Query;
>>>> use BioMart::QueryRunner;
>>>>
>>>> my $bm_initializer = BioMart::Initializer->new(
>>>>      registryFile =>    "/path/to/myRegistry.xml",
>>>>      action =>    'update',
>>>> );
>>>> my $bm_query = BioMart::Query->new(
>>>>      registry =>    $bm_initializer->getRegistry(),
>>>>      virtualSchemaName =>    'default'
>>>> );
>>>> $bm_query->setDataset('my_dataset');
>>>> $bm_query->addFilter('attr1', ['Q6LTE1']);
>>>> $bm_query->addAttribute('attr2');
>>>> $bm_query->addAttribute('attr3');
>>>> $bm_query->formatter('TSV');
>>>> my $bm_query_runner=BioMart::QueryRunner->new();
>>>> $bm_query_runner->uniqueRowsOnly(1);
>>>> $bm_query_runner->execute($bm_query);
>>>> open(RESULTS, '+>', \my $results) or die "$!\n";
>>>> $bm_query_runner->printResults(\*RESULTS);
>>>> seek(RESULTS, 0, 0);
>>>> while (<RESULTS>) {
>>>>      chomp;
>>>>      my @row_fields = split /\t/;
>>>>      # build up a data structure or processed your fields here...
>>>> }
>>>> close(RESULTS);
>>>>
>>>>
>>>> Using the REST API:
>>>>
>>>> use LWP::UserAgent ();
>>>>
>>>> my $query_xml =<<XML;
>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>> <!DOCTYPE Query>
>>>> <Query virtualSchemaName="default" formatter="TSV" header="0"
>>>> uniqueRows="1" count="" datasetConfigVersion="0.7">
>>>>      <Dataset name="my_dataset" interface="default">
>>>>          <Filter name="attr1" value="Q6LTE1"/>
>>>>          <Attribute name="attr2" />
>>>>          <Attribute name="attr3" />
>>>>      </Dataset>
>>>> </Query>
>>>> XML
>>>>
>>>> my $ua = LWP::UserAgent->new();
>>>> my $response =
>>>> $ua->post('http://myserver.mydomain:9002/biomart/martservice',
>>>> [ query =>    $query_xml ]);
>>>> if ($response->is_success and $response->decoded_content !~
>>>> /BioMart::Exception/i) {
>>>>      open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
>>>>      while (<RESULTS>) {
>>>>          chomp;
>>>>          my @row_fields = split /\t/;
>>>>          # build up a data structure or processed your fields here...
>>>>      }
>>>>      close(RESULTS);
>>>> }
>>>> else {
>>>>      die $response->decoded_content, "\n";
>>>> }
>>>>
>>>>
>>>> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[hidden email]>
>>>>   wrote:
>>>>>
>>>>> Hi Leandro,
>>>>>
>>>>> The datastructures representation of results is not returned by the API.
>>>>> If
>>>>> you are feeling adventurous please feel free to look into the
>>>>> lib/BioMart/Formatter/ directory for the appropriate formatter that you
>>>>> are
>>>>> interested in.
>>>>>
>>>>>
>>>>> Best
>>>>> Syed
>>>>>
>>>>>
>>>>>
>>>>> On 09/06/2010 17:51, Leandro Hermida wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was wondering if there is a way using the Perl API to get results in a
>>>>>> Perl data structure and, if possible, row by row.  For example each row
>>>>>> returned as an array or arrayref.  It seems inefficient to take
>>>>>> printResults() and have to break everything up again when I know
>>>>>> somewhere
>>>>>> in the Perl API it was doing the reverse...
>>>>>>
>>>>>> thanks,
>>>>>> Leandro
>>>>>>
>>>>>>
>>>>>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Leandro Hermida-2
On Wed, Jul 14, 2010 at 5:02 PM, Syed Haider <[hidden email]> wrote:

>
>
> On 14/07/2010 15:46, Leandro Hermida wrote:
>>
>> Hi again,
>>
>> In the new BioMart 0.8 will the SOAP and REST APIs have:
>> - an option to return results in JSON or other serialized data structure
>> form?
>
> tentative yes for results request. For all other API call (meta data calls),
> a definite yes. For the former, there is very little point to e.g wrap 1000
> bytes of gene ids in  20,000 bytes of JSON.
>

good point, but many times you are returning much more than that,
records with many attributes

>> - an option to return results sorted by some attribute(s)?
>
> no, thats a post processing option and tends to be very expensive as it
> needs all results to be collected in the first place. we can make it
> optional though. BioMart web interface would have this option for sure.
>

why not let the database do these things? (i.e. ... ORDER BY x1 ASC,y1
DESC, z1 ASC ) I noticed that also in the current 0.7 you do many
things post-processed in Perl, e.g. unique rows are processed in Perl
after returning database results, why not use just use SELECT DISTINCT
....?

>> - an option to return results with LIMITs in full form i.e. start_row,
>> end_row (for paging)?
>
> you will have limit as offset of zero. e.g you can retrieve, first 100,
> first 1000, first 10000 and so on.

again why not let the database do it? ( e.g. ... LIMIT 100,500 )

>
> HTH,
> Syed
>
>>
>> best,
>> Leandro
>>
>> On Wed, Jul 14, 2010 at 4:32 PM, Leandro Hermida
>> <[hidden email]>  wrote:
>>>
>>> Hi Syed,
>>>
>>> Since none of the BioMart APIs actually return results in a data
>>> structure (it only returns formatted files like TSV, etc) I was trying
>>> to be helpful and show other developers on this forum how they can go
>>> about populating a Perl data structure from the results returned by
>>> BioMart.
>>>
>>> It's not obvious after reading the docs and when you get started how
>>> you need to do this, one initially expects in the APIs that there
>>> would be for e.g. in the Perl API some method call ->getResults()
>>> which returns an @array of arrayrefs structure or in the REST API that
>>> there would be an option to return for e.g. a JSON serialized data
>>> structure that can be unserialized into a native data structure for
>>> the language you are using.
>>>
>>> best,
>>> Leandro
>>>
>>> On Wed, Jul 14, 2010 at 3:21 PM, Syed Haider<[hidden email]>
>>>  wrote:
>>>>
>>>> Hi Leandro,
>>>>
>>>> this is the only method that returns the results. What exactly are you
>>>> after
>>>> ?
>>>>
>>>> Best
>>>> Syed
>>>>
>>>> On 14/07/2010 13:14, Leandro Hermida wrote:
>>>>>
>>>>> Sorry forgot to post what I did before! For those of your who use the
>>>>> Biomart APIs and want to get results back into a Perl data structures,
>>>>> here is the approach I use:
>>>>>
>>>>> If using the Perl API:
>>>>>
>>>>> use BioMart::Initializer;
>>>>> use BioMart::Query;
>>>>> use BioMart::QueryRunner;
>>>>>
>>>>> my $bm_initializer = BioMart::Initializer->new(
>>>>>     registryFile =>    "/path/to/myRegistry.xml",
>>>>>     action =>    'update',
>>>>> );
>>>>> my $bm_query = BioMart::Query->new(
>>>>>     registry =>    $bm_initializer->getRegistry(),
>>>>>     virtualSchemaName =>    'default'
>>>>> );
>>>>> $bm_query->setDataset('my_dataset');
>>>>> $bm_query->addFilter('attr1', ['Q6LTE1']);
>>>>> $bm_query->addAttribute('attr2');
>>>>> $bm_query->addAttribute('attr3');
>>>>> $bm_query->formatter('TSV');
>>>>> my $bm_query_runner=BioMart::QueryRunner->new();
>>>>> $bm_query_runner->uniqueRowsOnly(1);
>>>>> $bm_query_runner->execute($bm_query);
>>>>> open(RESULTS, '+>', \my $results) or die "$!\n";
>>>>> $bm_query_runner->printResults(\*RESULTS);
>>>>> seek(RESULTS, 0, 0);
>>>>> while (<RESULTS>) {
>>>>>     chomp;
>>>>>     my @row_fields = split /\t/;
>>>>>     # build up a data structure or processed your fields here...
>>>>> }
>>>>> close(RESULTS);
>>>>>
>>>>>
>>>>> Using the REST API:
>>>>>
>>>>> use LWP::UserAgent ();
>>>>>
>>>>> my $query_xml =<<XML;
>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>> <!DOCTYPE Query>
>>>>> <Query virtualSchemaName="default" formatter="TSV" header="0"
>>>>> uniqueRows="1" count="" datasetConfigVersion="0.7">
>>>>>     <Dataset name="my_dataset" interface="default">
>>>>>         <Filter name="attr1" value="Q6LTE1"/>
>>>>>         <Attribute name="attr2" />
>>>>>         <Attribute name="attr3" />
>>>>>     </Dataset>
>>>>> </Query>
>>>>> XML
>>>>>
>>>>> my $ua = LWP::UserAgent->new();
>>>>> my $response =
>>>>> $ua->post('http://myserver.mydomain:9002/biomart/martservice',
>>>>> [ query =>    $query_xml ]);
>>>>> if ($response->is_success and $response->decoded_content !~
>>>>> /BioMart::Exception/i) {
>>>>>     open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
>>>>>     while (<RESULTS>) {
>>>>>         chomp;
>>>>>         my @row_fields = split /\t/;
>>>>>         # build up a data structure or processed your fields here...
>>>>>     }
>>>>>     close(RESULTS);
>>>>> }
>>>>> else {
>>>>>     die $response->decoded_content, "\n";
>>>>> }
>>>>>
>>>>>
>>>>> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[hidden email]>
>>>>>  wrote:
>>>>>>
>>>>>> Hi Leandro,
>>>>>>
>>>>>> The datastructures representation of results is not returned by the
>>>>>> API.
>>>>>> If
>>>>>> you are feeling adventurous please feel free to look into the
>>>>>> lib/BioMart/Formatter/ directory for the appropriate formatter that
>>>>>> you
>>>>>> are
>>>>>> interested in.
>>>>>>
>>>>>>
>>>>>> Best
>>>>>> Syed
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 09/06/2010 17:51, Leandro Hermida wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was wondering if there is a way using the Perl API to get results
>>>>>>> in a
>>>>>>> Perl data structure and, if possible, row by row.  For example each
>>>>>>> row
>>>>>>> returned as an array or arrayref.  It seems inefficient to take
>>>>>>> printResults() and have to break everything up again when I know
>>>>>>> somewhere
>>>>>>> in the Perl API it was doing the reverse...
>>>>>>>
>>>>>>> thanks,
>>>>>>> Leandro
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Leandro Hermida-2
Hi Syed,

More comments in line...

On Wed, Jul 14, 2010 at 5:04 PM, Leandro Hermida
<[hidden email]> wrote:

> On Wed, Jul 14, 2010 at 5:02 PM, Syed Haider <[hidden email]> wrote:
>>
>>
>> On 14/07/2010 15:46, Leandro Hermida wrote:
>>>
>>> Hi again,
>>>
>>> In the new BioMart 0.8 will the SOAP and REST APIs have:
>>> - an option to return results in JSON or other serialized data structure
>>> form?
>>
>> tentative yes for results request. For all other API call (meta data calls),
>> a definite yes. For the former, there is very little point to e.g wrap 1000
>> bytes of gene ids in  20,000 bytes of JSON.
>>
>
> good point, but many times you are returning much more than that,
> records with many attributes

just to give you and example here, a complementary software to BioMart
I am using in one of my projects is Solr, a full-text search engine
and framework.  In some fundamental ways both BioMart and Solr have
some shared goals, they both have a web-based query interface

With Solr you run a REST query like this:

http://myserver:8983/solr/select/?q=*&start=0&rows=10&wt=xml

<result name="response" numFound="390256" start="0">
    <doc>
      <str name="id">Q96GW9</str>
      <str name="organism">Homo sapiens (Human)</str>
      ...
    </doc>
    <doc>
      <str name="id">Q499X9</str>
      <str name="organism">Mus musculus (Mouse)</str>
    </doc>
    ...
</result>

or to return JSON...
http://myserver:8983/solr/select/?q=*&start=0&rows=10&wt=json

"response":{
  "numFound":390256,
  "start":0,
  "docs":[
        {
         "id":"Q96GW9",
         "organism":"Homo sapiens (Human)",
         ...
        },
        {
         "id":"Q499X9",
         "organism":"Mus musculus (Mouse)",
         ...
        },
        ...
  ]
}

You can specify the return type and also choose how many docs to skip
and how many docs to return (LIMIT x,y).  This is the way I would
recommend for BioMart, don't you think?

best,
Leandro

>
>>> - an option to return results sorted by some attribute(s)?
>>
>> no, thats a post processing option and tends to be very expensive as it
>> needs all results to be collected in the first place. we can make it
>> optional though. BioMart web interface would have this option for sure.
>>
>
> why not let the database do these things? (i.e. ... ORDER BY x1 ASC,y1
> DESC, z1 ASC ) I noticed that also in the current 0.7 you do many
> things post-processed in Perl, e.g. unique rows are processed in Perl
> after returning database results, why not use just use SELECT DISTINCT
> ....?
>
>>> - an option to return results with LIMITs in full form i.e. start_row,
>>> end_row (for paging)?
>>
>> you will have limit as offset of zero. e.g you can retrieve, first 100,
>> first 1000, first 10000 and so on.
>
> again why not let the database do it? ( e.g. ... LIMIT 100,500 )
>
>>
>> HTH,
>> Syed
>>
>>>
>>> best,
>>> Leandro
>>>
>>> On Wed, Jul 14, 2010 at 4:32 PM, Leandro Hermida
>>> <[hidden email]>  wrote:
>>>>
>>>> Hi Syed,
>>>>
>>>> Since none of the BioMart APIs actually return results in a data
>>>> structure (it only returns formatted files like TSV, etc) I was trying
>>>> to be helpful and show other developers on this forum how they can go
>>>> about populating a Perl data structure from the results returned by
>>>> BioMart.
>>>>
>>>> It's not obvious after reading the docs and when you get started how
>>>> you need to do this, one initially expects in the APIs that there
>>>> would be for e.g. in the Perl API some method call ->getResults()
>>>> which returns an @array of arrayrefs structure or in the REST API that
>>>> there would be an option to return for e.g. a JSON serialized data
>>>> structure that can be unserialized into a native data structure for
>>>> the language you are using.
>>>>
>>>> best,
>>>> Leandro
>>>>
>>>> On Wed, Jul 14, 2010 at 3:21 PM, Syed Haider<[hidden email]>
>>>>  wrote:
>>>>>
>>>>> Hi Leandro,
>>>>>
>>>>> this is the only method that returns the results. What exactly are you
>>>>> after
>>>>> ?
>>>>>
>>>>> Best
>>>>> Syed
>>>>>
>>>>> On 14/07/2010 13:14, Leandro Hermida wrote:
>>>>>>
>>>>>> Sorry forgot to post what I did before! For those of your who use the
>>>>>> Biomart APIs and want to get results back into a Perl data structures,
>>>>>> here is the approach I use:
>>>>>>
>>>>>> If using the Perl API:
>>>>>>
>>>>>> use BioMart::Initializer;
>>>>>> use BioMart::Query;
>>>>>> use BioMart::QueryRunner;
>>>>>>
>>>>>> my $bm_initializer = BioMart::Initializer->new(
>>>>>>     registryFile =>    "/path/to/myRegistry.xml",
>>>>>>     action =>    'update',
>>>>>> );
>>>>>> my $bm_query = BioMart::Query->new(
>>>>>>     registry =>    $bm_initializer->getRegistry(),
>>>>>>     virtualSchemaName =>    'default'
>>>>>> );
>>>>>> $bm_query->setDataset('my_dataset');
>>>>>> $bm_query->addFilter('attr1', ['Q6LTE1']);
>>>>>> $bm_query->addAttribute('attr2');
>>>>>> $bm_query->addAttribute('attr3');
>>>>>> $bm_query->formatter('TSV');
>>>>>> my $bm_query_runner=BioMart::QueryRunner->new();
>>>>>> $bm_query_runner->uniqueRowsOnly(1);
>>>>>> $bm_query_runner->execute($bm_query);
>>>>>> open(RESULTS, '+>', \my $results) or die "$!\n";
>>>>>> $bm_query_runner->printResults(\*RESULTS);
>>>>>> seek(RESULTS, 0, 0);
>>>>>> while (<RESULTS>) {
>>>>>>     chomp;
>>>>>>     my @row_fields = split /\t/;
>>>>>>     # build up a data structure or processed your fields here...
>>>>>> }
>>>>>> close(RESULTS);
>>>>>>
>>>>>>
>>>>>> Using the REST API:
>>>>>>
>>>>>> use LWP::UserAgent ();
>>>>>>
>>>>>> my $query_xml =<<XML;
>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>> <!DOCTYPE Query>
>>>>>> <Query virtualSchemaName="default" formatter="TSV" header="0"
>>>>>> uniqueRows="1" count="" datasetConfigVersion="0.7">
>>>>>>     <Dataset name="my_dataset" interface="default">
>>>>>>         <Filter name="attr1" value="Q6LTE1"/>
>>>>>>         <Attribute name="attr2" />
>>>>>>         <Attribute name="attr3" />
>>>>>>     </Dataset>
>>>>>> </Query>
>>>>>> XML
>>>>>>
>>>>>> my $ua = LWP::UserAgent->new();
>>>>>> my $response =
>>>>>> $ua->post('http://myserver.mydomain:9002/biomart/martservice',
>>>>>> [ query =>    $query_xml ]);
>>>>>> if ($response->is_success and $response->decoded_content !~
>>>>>> /BioMart::Exception/i) {
>>>>>>     open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
>>>>>>     while (<RESULTS>) {
>>>>>>         chomp;
>>>>>>         my @row_fields = split /\t/;
>>>>>>         # build up a data structure or processed your fields here...
>>>>>>     }
>>>>>>     close(RESULTS);
>>>>>> }
>>>>>> else {
>>>>>>     die $response->decoded_content, "\n";
>>>>>> }
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[hidden email]>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Hi Leandro,
>>>>>>>
>>>>>>> The datastructures representation of results is not returned by the
>>>>>>> API.
>>>>>>> If
>>>>>>> you are feeling adventurous please feel free to look into the
>>>>>>> lib/BioMart/Formatter/ directory for the appropriate formatter that
>>>>>>> you
>>>>>>> are
>>>>>>> interested in.
>>>>>>>
>>>>>>>
>>>>>>> Best
>>>>>>> Syed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 09/06/2010 17:51, Leandro Hermida wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I was wondering if there is a way using the Perl API to get results
>>>>>>>> in a
>>>>>>>> Perl data structure and, if possible, row by row.  For example each
>>>>>>>> row
>>>>>>>> returned as an array or arrayref.  It seems inefficient to take
>>>>>>>> printResults() and have to break everything up again when I know
>>>>>>>> somewhere
>>>>>>>> in the Perl API it was doing the reverse...
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Leandro
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Syed Haider
In reply to this post by Leandro Hermida-2
Hi Leandro,

the summary of answers to your questions is as follows:

ORDER BY:
BioMart being a dataware house is a denormalised database and ORDER BY
clause is an expensive operation that we refrain from using whenever
possible.

LIMIT:
concept of limit is only valid with-in the scope of one SQL QUERY. When
we do data-integration by means of data-federation, results are joined
by the BioMart integration engine, hence your limit wont work.

Best,
Syed



On 14/07/2010 16:04, Leandro Hermida wrote:

> On Wed, Jul 14, 2010 at 5:02 PM, Syed Haider<[hidden email]>  wrote:
>>
>>
>> On 14/07/2010 15:46, Leandro Hermida wrote:
>>>
>>> Hi again,
>>>
>>> In the new BioMart 0.8 will the SOAP and REST APIs have:
>>> - an option to return results in JSON or other serialized data structure
>>> form?
>>
>> tentative yes for results request. For all other API call (meta data calls),
>> a definite yes. For the former, there is very little point to e.g wrap 1000
>> bytes of gene ids in  20,000 bytes of JSON.
>>
>
> good point, but many times you are returning much more than that,
> records with many attributes
>
>>> - an option to return results sorted by some attribute(s)?
>>
>> no, thats a post processing option and tends to be very expensive as it
>> needs all results to be collected in the first place. we can make it
>> optional though. BioMart web interface would have this option for sure.
>>
>
> why not let the database do these things? (i.e. ... ORDER BY x1 ASC,y1
> DESC, z1 ASC ) I noticed that also in the current 0.7 you do many
> things post-processed in Perl, e.g. unique rows are processed in Perl
> after returning database results, why not use just use SELECT DISTINCT
> ....?
>
>>> - an option to return results with LIMITs in full form i.e. start_row,
>>> end_row (for paging)?
>>
>> you will have limit as offset of zero. e.g you can retrieve, first 100,
>> first 1000, first 10000 and so on.
>
> again why not let the database do it? ( e.g. ... LIMIT 100,500 )
>
>>
>> HTH,
>> Syed
>>
>>>
>>> best,
>>> Leandro
>>>
>>> On Wed, Jul 14, 2010 at 4:32 PM, Leandro Hermida
>>> <[hidden email]>    wrote:
>>>>
>>>> Hi Syed,
>>>>
>>>> Since none of the BioMart APIs actually return results in a data
>>>> structure (it only returns formatted files like TSV, etc) I was trying
>>>> to be helpful and show other developers on this forum how they can go
>>>> about populating a Perl data structure from the results returned by
>>>> BioMart.
>>>>
>>>> It's not obvious after reading the docs and when you get started how
>>>> you need to do this, one initially expects in the APIs that there
>>>> would be for e.g. in the Perl API some method call ->getResults()
>>>> which returns an @array of arrayrefs structure or in the REST API that
>>>> there would be an option to return for e.g. a JSON serialized data
>>>> structure that can be unserialized into a native data structure for
>>>> the language you are using.
>>>>
>>>> best,
>>>> Leandro
>>>>
>>>> On Wed, Jul 14, 2010 at 3:21 PM, Syed Haider<[hidden email]>
>>>>   wrote:
>>>>>
>>>>> Hi Leandro,
>>>>>
>>>>> this is the only method that returns the results. What exactly are you
>>>>> after
>>>>> ?
>>>>>
>>>>> Best
>>>>> Syed
>>>>>
>>>>> On 14/07/2010 13:14, Leandro Hermida wrote:
>>>>>>
>>>>>> Sorry forgot to post what I did before! For those of your who use the
>>>>>> Biomart APIs and want to get results back into a Perl data structures,
>>>>>> here is the approach I use:
>>>>>>
>>>>>> If using the Perl API:
>>>>>>
>>>>>> use BioMart::Initializer;
>>>>>> use BioMart::Query;
>>>>>> use BioMart::QueryRunner;
>>>>>>
>>>>>> my $bm_initializer = BioMart::Initializer->new(
>>>>>>      registryFile =>      "/path/to/myRegistry.xml",
>>>>>>      action =>      'update',
>>>>>> );
>>>>>> my $bm_query = BioMart::Query->new(
>>>>>>      registry =>      $bm_initializer->getRegistry(),
>>>>>>      virtualSchemaName =>      'default'
>>>>>> );
>>>>>> $bm_query->setDataset('my_dataset');
>>>>>> $bm_query->addFilter('attr1', ['Q6LTE1']);
>>>>>> $bm_query->addAttribute('attr2');
>>>>>> $bm_query->addAttribute('attr3');
>>>>>> $bm_query->formatter('TSV');
>>>>>> my $bm_query_runner=BioMart::QueryRunner->new();
>>>>>> $bm_query_runner->uniqueRowsOnly(1);
>>>>>> $bm_query_runner->execute($bm_query);
>>>>>> open(RESULTS, '+>', \my $results) or die "$!\n";
>>>>>> $bm_query_runner->printResults(\*RESULTS);
>>>>>> seek(RESULTS, 0, 0);
>>>>>> while (<RESULTS>) {
>>>>>>      chomp;
>>>>>>      my @row_fields = split /\t/;
>>>>>>      # build up a data structure or processed your fields here...
>>>>>> }
>>>>>> close(RESULTS);
>>>>>>
>>>>>>
>>>>>> Using the REST API:
>>>>>>
>>>>>> use LWP::UserAgent ();
>>>>>>
>>>>>> my $query_xml =<<XML;
>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>> <!DOCTYPE Query>
>>>>>> <Query virtualSchemaName="default" formatter="TSV" header="0"
>>>>>> uniqueRows="1" count="" datasetConfigVersion="0.7">
>>>>>>      <Dataset name="my_dataset" interface="default">
>>>>>>          <Filter name="attr1" value="Q6LTE1"/>
>>>>>>          <Attribute name="attr2" />
>>>>>>          <Attribute name="attr3" />
>>>>>>      </Dataset>
>>>>>> </Query>
>>>>>> XML
>>>>>>
>>>>>> my $ua = LWP::UserAgent->new();
>>>>>> my $response =
>>>>>> $ua->post('http://myserver.mydomain:9002/biomart/martservice',
>>>>>> [ query =>      $query_xml ]);
>>>>>> if ($response->is_success and $response->decoded_content !~
>>>>>> /BioMart::Exception/i) {
>>>>>>      open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
>>>>>>      while (<RESULTS>) {
>>>>>>          chomp;
>>>>>>          my @row_fields = split /\t/;
>>>>>>          # build up a data structure or processed your fields here...
>>>>>>      }
>>>>>>      close(RESULTS);
>>>>>> }
>>>>>> else {
>>>>>>      die $response->decoded_content, "\n";
>>>>>> }
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[hidden email]>
>>>>>>   wrote:
>>>>>>>
>>>>>>> Hi Leandro,
>>>>>>>
>>>>>>> The datastructures representation of results is not returned by the
>>>>>>> API.
>>>>>>> If
>>>>>>> you are feeling adventurous please feel free to look into the
>>>>>>> lib/BioMart/Formatter/ directory for the appropriate formatter that
>>>>>>> you
>>>>>>> are
>>>>>>> interested in.
>>>>>>>
>>>>>>>
>>>>>>> Best
>>>>>>> Syed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 09/06/2010 17:51, Leandro Hermida wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I was wondering if there is a way using the Perl API to get results
>>>>>>>> in a
>>>>>>>> Perl data structure and, if possible, row by row.  For example each
>>>>>>>> row
>>>>>>>> returned as an array or arrayref.  It seems inefficient to take
>>>>>>>> printResults() and have to break everything up again when I know
>>>>>>>> somewhere
>>>>>>>> in the Perl API it was doing the reverse...
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> Leandro
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: getting results in Perl API in data structures instead of printResults()

Syed Haider
In reply to this post by Leandro Hermida-2
Hi Leandro,

the solr example with parameter specifying the formatter (e.g json, xml)
is almost what we have implemented already for the new release.

About limit, like i said, the system should be able to send results as a
zero based offset at this stage.

However, in subsequent releases, we can of course add this batch
splitting of results - sounds like  a good idea.

thanks
Syed

On 14/07/2010 18:45, Leandro Hermida wrote:

> Hi Syed,
>
> More comments in line...
>
> On Wed, Jul 14, 2010 at 5:04 PM, Leandro Hermida
> <[hidden email]>  wrote:
>> On Wed, Jul 14, 2010 at 5:02 PM, Syed Haider<[hidden email]>  wrote:
>>>
>>>
>>> On 14/07/2010 15:46, Leandro Hermida wrote:
>>>>
>>>> Hi again,
>>>>
>>>> In the new BioMart 0.8 will the SOAP and REST APIs have:
>>>> - an option to return results in JSON or other serialized data structure
>>>> form?
>>>
>>> tentative yes for results request. For all other API call (meta data calls),
>>> a definite yes. For the former, there is very little point to e.g wrap 1000
>>> bytes of gene ids in  20,000 bytes of JSON.
>>>
>>
>> good point, but many times you are returning much more than that,
>> records with many attributes
>
> just to give you and example here, a complementary software to BioMart
> I am using in one of my projects is Solr, a full-text search engine
> and framework.  In some fundamental ways both BioMart and Solr have
> some shared goals, they both have a web-based query interface
>
> With Solr you run a REST query like this:
>
> http://myserver:8983/solr/select/?q=*&start=0&rows=10&wt=xml
>
> <result name="response" numFound="390256" start="0">
>      <doc>
>        <str name="id">Q96GW9</str>
>        <str name="organism">Homo sapiens (Human)</str>
>        ...
>      </doc>
>      <doc>
>        <str name="id">Q499X9</str>
>        <str name="organism">Mus musculus (Mouse)</str>
>      </doc>
>      ...
> </result>
>
> or to return JSON...
> http://myserver:8983/solr/select/?q=*&start=0&rows=10&wt=json
>
> "response":{
>    "numFound":390256,
>    "start":0,
>    "docs":[
> {
> "id":"Q96GW9",
> "organism":"Homo sapiens (Human)",
>           ...
>          },
> {
> "id":"Q499X9",
> "organism":"Mus musculus (Mouse)",
>           ...
>          },
> ...
>    ]
> }
>
> You can specify the return type and also choose how many docs to skip
> and how many docs to return (LIMIT x,y).  This is the way I would
> recommend for BioMart, don't you think?
>
> best,
> Leandro
>
>>
>>>> - an option to return results sorted by some attribute(s)?
>>>
>>> no, thats a post processing option and tends to be very expensive as it
>>> needs all results to be collected in the first place. we can make it
>>> optional though. BioMart web interface would have this option for sure.
>>>
>>
>> why not let the database do these things? (i.e. ... ORDER BY x1 ASC,y1
>> DESC, z1 ASC ) I noticed that also in the current 0.7 you do many
>> things post-processed in Perl, e.g. unique rows are processed in Perl
>> after returning database results, why not use just use SELECT DISTINCT
>> ....?
>>
>>>> - an option to return results with LIMITs in full form i.e. start_row,
>>>> end_row (for paging)?
>>>
>>> you will have limit as offset of zero. e.g you can retrieve, first 100,
>>> first 1000, first 10000 and so on.
>>
>> again why not let the database do it? ( e.g. ... LIMIT 100,500 )
>>
>>>
>>> HTH,
>>> Syed
>>>
>>>>
>>>> best,
>>>> Leandro
>>>>
>>>> On Wed, Jul 14, 2010 at 4:32 PM, Leandro Hermida
>>>> <[hidden email]>    wrote:
>>>>>
>>>>> Hi Syed,
>>>>>
>>>>> Since none of the BioMart APIs actually return results in a data
>>>>> structure (it only returns formatted files like TSV, etc) I was trying
>>>>> to be helpful and show other developers on this forum how they can go
>>>>> about populating a Perl data structure from the results returned by
>>>>> BioMart.
>>>>>
>>>>> It's not obvious after reading the docs and when you get started how
>>>>> you need to do this, one initially expects in the APIs that there
>>>>> would be for e.g. in the Perl API some method call ->getResults()
>>>>> which returns an @array of arrayrefs structure or in the REST API that
>>>>> there would be an option to return for e.g. a JSON serialized data
>>>>> structure that can be unserialized into a native data structure for
>>>>> the language you are using.
>>>>>
>>>>> best,
>>>>> Leandro
>>>>>
>>>>> On Wed, Jul 14, 2010 at 3:21 PM, Syed Haider<[hidden email]>
>>>>>   wrote:
>>>>>>
>>>>>> Hi Leandro,
>>>>>>
>>>>>> this is the only method that returns the results. What exactly are you
>>>>>> after
>>>>>> ?
>>>>>>
>>>>>> Best
>>>>>> Syed
>>>>>>
>>>>>> On 14/07/2010 13:14, Leandro Hermida wrote:
>>>>>>>
>>>>>>> Sorry forgot to post what I did before! For those of your who use the
>>>>>>> Biomart APIs and want to get results back into a Perl data structures,
>>>>>>> here is the approach I use:
>>>>>>>
>>>>>>> If using the Perl API:
>>>>>>>
>>>>>>> use BioMart::Initializer;
>>>>>>> use BioMart::Query;
>>>>>>> use BioMart::QueryRunner;
>>>>>>>
>>>>>>> my $bm_initializer = BioMart::Initializer->new(
>>>>>>>      registryFile =>      "/path/to/myRegistry.xml",
>>>>>>>      action =>      'update',
>>>>>>> );
>>>>>>> my $bm_query = BioMart::Query->new(
>>>>>>>      registry =>      $bm_initializer->getRegistry(),
>>>>>>>      virtualSchemaName =>      'default'
>>>>>>> );
>>>>>>> $bm_query->setDataset('my_dataset');
>>>>>>> $bm_query->addFilter('attr1', ['Q6LTE1']);
>>>>>>> $bm_query->addAttribute('attr2');
>>>>>>> $bm_query->addAttribute('attr3');
>>>>>>> $bm_query->formatter('TSV');
>>>>>>> my $bm_query_runner=BioMart::QueryRunner->new();
>>>>>>> $bm_query_runner->uniqueRowsOnly(1);
>>>>>>> $bm_query_runner->execute($bm_query);
>>>>>>> open(RESULTS, '+>', \my $results) or die "$!\n";
>>>>>>> $bm_query_runner->printResults(\*RESULTS);
>>>>>>> seek(RESULTS, 0, 0);
>>>>>>> while (<RESULTS>) {
>>>>>>>      chomp;
>>>>>>>      my @row_fields = split /\t/;
>>>>>>>      # build up a data structure or processed your fields here...
>>>>>>> }
>>>>>>> close(RESULTS);
>>>>>>>
>>>>>>>
>>>>>>> Using the REST API:
>>>>>>>
>>>>>>> use LWP::UserAgent ();
>>>>>>>
>>>>>>> my $query_xml =<<XML;
>>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>>> <!DOCTYPE Query>
>>>>>>> <Query virtualSchemaName="default" formatter="TSV" header="0"
>>>>>>> uniqueRows="1" count="" datasetConfigVersion="0.7">
>>>>>>>      <Dataset name="my_dataset" interface="default">
>>>>>>>          <Filter name="attr1" value="Q6LTE1"/>
>>>>>>>          <Attribute name="attr2" />
>>>>>>>          <Attribute name="attr3" />
>>>>>>>      </Dataset>
>>>>>>> </Query>
>>>>>>> XML
>>>>>>>
>>>>>>> my $ua = LWP::UserAgent->new();
>>>>>>> my $response =
>>>>>>> $ua->post('http://myserver.mydomain:9002/biomart/martservice',
>>>>>>> [ query =>      $query_xml ]);
>>>>>>> if ($response->is_success and $response->decoded_content !~
>>>>>>> /BioMart::Exception/i) {
>>>>>>>      open(RESULTS, '<', \$response->decoded_content) or die "$!\n";
>>>>>>>      while (<RESULTS>) {
>>>>>>>          chomp;
>>>>>>>          my @row_fields = split /\t/;
>>>>>>>          # build up a data structure or processed your fields here...
>>>>>>>      }
>>>>>>>      close(RESULTS);
>>>>>>> }
>>>>>>> else {
>>>>>>>      die $response->decoded_content, "\n";
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 10, 2010 at 12:03 AM, Syed Haider<[hidden email]>
>>>>>>>   wrote:
>>>>>>>>
>>>>>>>> Hi Leandro,
>>>>>>>>
>>>>>>>> The datastructures representation of results is not returned by the
>>>>>>>> API.
>>>>>>>> If
>>>>>>>> you are feeling adventurous please feel free to look into the
>>>>>>>> lib/BioMart/Formatter/ directory for the appropriate formatter that
>>>>>>>> you
>>>>>>>> are
>>>>>>>> interested in.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Syed
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 09/06/2010 17:51, Leandro Hermida wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I was wondering if there is a way using the Perl API to get results
>>>>>>>>> in a
>>>>>>>>> Perl data structure and, if possible, row by row.  For example each
>>>>>>>>> row
>>>>>>>>> returned as an array or arrayref.  It seems inefficient to take
>>>>>>>>> printResults() and have to break everything up again when I know
>>>>>>>>> somewhere
>>>>>>>>> in the Perl API it was doing the reverse...
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>> Leandro
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>