[Gmod-tripal-devel] Execution Time Problems with new API

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-tripal-devel] Execution Time Problems with new API

Kucheran, Lacey Sanderson
Hi 

I started switching over all tripal_stock queries to use the tripal_core api and ran into a serious performance issue... I tried switching over all the queries in my node_load function and now I'm getting the following error:
Fatal error: Maximum execution time of 30 seconds exceeded in /XXX/sites/all/modules/tripal/tripal_core/chado_tables.schema.inc on line 3608
Executed 4310 queries in 5575.12 milliseconds.

This makes me think that the current approach of storing the entire schema in memory is causing serious performance issues... It would be better if we could get only the table description we need without trying to load all of them into memory. 

One way of doing this might be to put each definition in it's own function and have the function names be hook_tablename_schema (ie: tripal_stock_stock_schema or tripal_stock_stockprop_schema). I can programmatically create all these functions from the chado_tables.schema.inc file we have currently and then just move them into the appropriate module.api.inc
function tripal_stock_stock_schema() {
  return array(
     'description' => t('TODO: please describe this table!'),
     'fields' => array(
       'stock_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'serial',
         'not null' => TRUE,
       ),
       'dbxref_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => FALSE,
       ),
       'organism_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => TRUE,
       ),
       'name' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'varchar',
         'length' => '255',
         'not null' => FALSE,
       ),
       'uniquename' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'text',
         'not null' => TRUE,
       ),
       'description' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'text',
         'not null' => FALSE,
       ),
       'type_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => TRUE,
       ),
       'is_obsolete' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'boolean',
         'not null' => TRUE,
         'default' => 'als',
       ),
     ),
     'primary key' => array('stock_id'),
     'unique keys' => array(
       'stock_c1' => array('organism_id', 'uniquename', 'type_id')
     ),
     'indexes' => array(
       'stock_idx1' => array('dbxref_id'),
       'stock_idx2' => array('organism_id'),
       'stock_idx3' => array('type_id'),
       'stock_idx4' => array('uniquename'),
       'stock_name_ind1' => array('name'),
     ),
    'foreign keys' => array(
       'organism' => array(
         'table' => 'organism',
         'columns' => array('organism_id' => 'organism_id'),
       ),
       'dbxref' => array(
         'table' => 'dbxref',
         'columns' => array('dbxref_id' => 'dbxref_id'),
       ),
       'cvterm' => array(
         'table' => 'cvterm',
         'columns' => array('type_id' => 'cvterm_id'),
       ),
     ),
   );
}

Then you could do the following to get the table definition:
// Call all module definitions of the hook for a given table
// Ie: If $table is 'stock' then $table_desc would contain the array returned by
// any hook_stock_schema() implemented
// @todo: See how this performs if 2+ modules define a given hook
// are returned arrays merged?
$table_desc = module_invoke_all($table.'_schema');

This would change our $values array to be tablename => array(descriptors) rather than 'foreign key' => array(descriptors). For example, to select a feature:
$columns = array('feature_id', 'name');
$values = array(
'organism' => array(
'genus' => 'Lens',
'species' => 'culinaris',
),
'uniquename' => 'gi:432555',
'cvterm' => array(
'name' => 'gene',
'cv' => array(
'name' => 'sequence',
),
)
);
$features = tripal_core_chado_select('feature', $columns, $values);

What do you think of this?
~Lacey

------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-2430
Room 3D10 Agriculture
Department of Plant Sciences
University of Saskatchewan


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Execution Time Problems with new API

Stephen Ficklin-2

Hi Lacey,

 

Sorry for my slow response.  I’ve been very busy on another project and just now had time to think about other things…

 

I would prefer to leave the structure in this way: 

 

$values = array(

     organism_id => array(

           'genus' => 'Lens',

           'species' => 'culinaris',

     ),

     'uniquename' => 'gi:432555',

     'cvterm_id' => array(

           'name' => 'gene',

           'cv_id' => array(

                'name' => 'sequence',

           ),

     )

);

 

So, for this example, I would be passing in the above array and the table name ‘feature’ to the ‘select’, ‘update’, or ‘insert’ function.  But, In the code for the ‘select’, ‘insert’ and ‘update’ functions we do as you suggest. We have a hook that would call a function:  tripal_chado_<table_name>_schema (i.e. tripal_chado_feature_schema()) which would return the table definition for the ‘feature’ table.   From this table defition we would find out that the organism_id is a foreign key relationship with organism and we could then recurse down, in the same way that the code currently works.

 

I like the array structure as we previously defined it because it has all of the field names of the table in it and I think is a bit more intuitive.

 

Stephen

 

 

From: Lacey-Anne Sanderson [mailto:[hidden email]]
Sent: Friday, February 25, 2011 5:56 PM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Execution Time Problems with new API

 

Hi 

 

I started switching over all tripal_stock queries to use the tripal_core api and ran into a serious performance issue... I tried switching over all the queries in my node_load function and now I'm getting the following error:

Fatal error: Maximum execution time of 30 seconds exceeded in /XXX/sites/all/modules/tripal/tripal_core/chado_tables.schema.inc on line 3608

Executed 4310 queries in 5575.12 milliseconds.

 

This makes me think that the current approach of storing the entire schema in memory is causing serious performance issues... It would be better if we could get only the table description we need without trying to load all of them into memory. 

 

One way of doing this might be to put each definition in it's own function and have the function names be hook_tablename_schema (ie: tripal_stock_stock_schema or tripal_stock_stockprop_schema). I can programmatically create all these functions from the chado_tables.schema.inc file we have currently and then just move them into the appropriate module.api.inc

function tripal_stock_stock_schema() {

  return array(

     'description' => t('TODO: please describe this table!'),

     'fields' => array(

       'stock_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'serial',

         'not null' => TRUE,

       ),

       'dbxref_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'int',

         'not null' => FALSE,

       ),

       'organism_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'int',

         'not null' => TRUE,

       ),

       'name' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'varchar',

         'length' => '255',

         'not null' => FALSE,

       ),

       'uniquename' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'text',

         'not null' => TRUE,

       ),

       'description' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'text',

         'not null' => FALSE,

       ),

       'type_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'int',

         'not null' => TRUE,

       ),

       'is_obsolete' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'boolean',

         'not null' => TRUE,

         'default' => 'als',

       ),

     ),

     'primary key' => array('stock_id'),

     'unique keys' => array(

       'stock_c1' => array('organism_id', 'uniquename', 'type_id')

     ),

     'indexes' => array(

       'stock_idx1' => array('dbxref_id'),

       'stock_idx2' => array('organism_id'),

       'stock_idx3' => array('type_id'),

       'stock_idx4' => array('uniquename'),

       'stock_name_ind1' => array('name'),

     ),

    'foreign keys' => array(

       'organism' => array(

         'table' => 'organism',

         'columns' => array('organism_id' => 'organism_id'),

       ),

       'dbxref' => array(

         'table' => 'dbxref',

         'columns' => array('dbxref_id' => 'dbxref_id'),

       ),

       'cvterm' => array(

         'table' => 'cvterm',

         'columns' => array('type_id' => 'cvterm_id'),

       ),

     ),

   );

}

 

Then you could do the following to get the table definition:

     // Call all module definitions of the hook for a given table

     // Ie: If $table is 'stock' then $table_desc would contain the array returned by

     // any hook_stock_schema() implemented

     // @todo: See how this performs if 2+ modules define a given hook

     //         are returned arrays merged?

     $table_desc = module_invoke_all($table.'_schema');

 

This would change our $values array to be tablename => array(descriptors) rather than 'foreign key' => array(descriptors). For example, to select a feature:

$columns = array('feature_id', 'name');

$values = array(

     'organism' => array(

           'genus' => 'Lens',

           'species' => 'culinaris',

     ),

     'uniquename' => 'gi:432555',

     'cvterm' => array(

           'name' => 'gene',

           'cv' => array(

                'name' => 'sequence',

           ),

     )

);

$features = tripal_core_chado_select('feature', $columns, $values);

 

What do you think of this?

~Lacey

 

------------------------------------------------------

Lacey-Anne Sanderson

Bioinformaticist

Pulse Crop Breeding and Genetics

Phone: (306) 966-2430

Room 3D10 Agriculture

Department of Plant Sciences

University of Saskatchewan

 


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Execution Time Problems with new API

Kucheran, Lacey Sanderson
<base href="x-msg://8/">I also prefer the previous form of the values array. The problem with that is -Since we are only pulling out the table description for one table I need to know which table that is. In order to keep the close to the current structure we could either add a 'table' bit to a foreign key definition.
ie:
$values = array(
     organism_id => array(
           'table' => 'organism',
           'genus' => 'Lens',
           'species' => 'culinaris',
     ),
     'uniquename' => 'gi:432555',
     'type_id' => array(
  'table' => 'cvterm',
           'name' => 'gene',
           'cv_id' => array(
                'name' => 'sequence',
           ),
     )
);

Or we could create another hook that compiles a list of primary_key => table definitions. Something along the lines of hook_table_description_chado_schema that would be used as follows:
In each module,
tripal_stock_table_description_chado_schema() {
  $description = array();

  $description['stock_id'] = array(
    'table' => 'stock',
    'primary_key' => 'stock_id',
  );

  $description['stockpro-_id'] = array(
    'table' => 'stockprop',
    'primary_key' => 'stockprop_id',
  );

  return $description;
}
Then in the select/update all the descriptions would be pulled in using module_invoke_all. This might cause the same time overload problem that we have with the current api though since it is loading all the chado table descriptions into memory (abeit a lot smaller table description than before).

Obviously, I'll leaning towards the first option.
~Lacey

------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-2430
Room 3D10 Agriculture
Department of Plant Sciences
University of Saskatchewan

On 2011-02-28, at 10:03 AM, Stephen Ficklin wrote:

Hi Lacey,
 
Sorry for my slow response.  I’ve been very busy on another project and just now had time to think about other things…
 
I would prefer to leave the structure in this way: 
 
$values = array(
     organism_id => array(
           'genus' => 'Lens',
           'species' => 'culinaris',
     ),
     'uniquename' => 'gi:432555',
     'cvterm_id' => array(
           'name' => 'gene',
           'cv_id' => array(
                'name' => 'sequence',
           ),
     )
);
 
So, for this example, I would be passing in the above array and the table name ‘feature’ to the ‘select’, ‘update’, or ‘insert’ function.  But, In the code for the ‘select’, ‘insert’ and ‘update’ functions we do as you suggest. We have a hook that would call a function:  tripal_chado_<table_name>_schema (i.e. tripal_chado_feature_schema()) which would return the table definition for the ‘feature’ table.   From this table defition we would find out that the organism_id is a foreign key relationship with organism and we could then recurse down, in the same way that the code currently works.
 
I like the array structure as we previously defined it because it has all of the field names of the table in it and I think is a bit more intuitive.
 
Stephen
 
 
From: Lacey-Anne Sanderson [mailto:[hidden email]] 
Sent: Friday, February 25, 2011 5:56 PM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Execution Time Problems with new API
 
Hi 
 
I started switching over all tripal_stock queries to use the tripal_core api and ran into a serious performance issue... I tried switching over all the queries in my node_load function and now I'm getting the following error:
Fatal error: Maximum execution time of 30 seconds exceeded in /XXX/sites/all/modules/tripal/tripal_core/chado_tables.schema.inc on line 3608
Executed 4310 queries in 5575.12 milliseconds.
 
This makes me think that the current approach of storing the entire schema in memory is causing serious performance issues... It would be better if we could get only the table description we need without trying to load all of them into memory. 
 
One way of doing this might be to put each definition in it's own function and have the function names be hook_tablename_schema (ie: tripal_stock_stock_schema or tripal_stock_stockprop_schema). I can programmatically create all these functions from the chado_tables.schema.inc file we have currently and then just move them into the appropriate module.api.inc
function tripal_stock_stock_schema() {
  return array(
     'description' => t('TODO: please describe this table!'),
     'fields' => array(
       'stock_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'serial',
         'not null' => TRUE,
       ),
       'dbxref_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => FALSE,
       ),
       'organism_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => TRUE,
       ),
       'name' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'varchar',
         'length' => '255',
         'not null' => FALSE,
       ),
       'uniquename' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'text',
         'not null' => TRUE,
       ),
       'description' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'text',
         'not null' => FALSE,
       ),
       'type_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => TRUE,
       ),
       'is_obsolete' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'boolean',
         'not null' => TRUE,
         'default' => 'als',
       ),
     ),
     'primary key' => array('stock_id'),
     'unique keys' => array(
       'stock_c1' => array('organism_id', 'uniquename', 'type_id')
     ),
     'indexes' => array(
       'stock_idx1' => array('dbxref_id'),
       'stock_idx2' => array('organism_id'),
       'stock_idx3' => array('type_id'),
       'stock_idx4' => array('uniquename'),
       'stock_name_ind1' => array('name'),
     ),
    'foreign keys' => array(
       'organism' => array(
         'table' => 'organism',
         'columns' => array('organism_id' => 'organism_id'),
       ),
       'dbxref' => array(
         'table' => 'dbxref',
         'columns' => array('dbxref_id' => 'dbxref_id'),
       ),
       'cvterm' => array(
         'table' => 'cvterm',
         'columns' => array('type_id' => 'cvterm_id'),
       ),
     ),
   );
}
 
Then you could do the following to get the table definition:
     // Call all module definitions of the hook for a given table
     // Ie: If $table is 'stock' then $table_desc would contain the array returned by
     // any hook_stock_schema() implemented
     // @todo: See how this performs if 2+ modules define a given hook
     //         are returned arrays merged?
     $table_desc = module_invoke_all($table.'_schema');
 
This would change our $values array to be tablename => array(descriptors) rather than 'foreign key' => array(descriptors). For example, to select a feature:
$columns = array('feature_id', 'name');
$values = array(
     'organism' => array(
           'genus' => 'Lens',
           'species' => 'culinaris',
     ),
     'uniquename' => 'gi:432555',
     'cvterm' => array(
           'name' => 'gene',
           'cv' => array(
                'name' => 'sequence',
           ),
     )
);
$features = tripal_core_chado_select('feature', $columns, $values);
 
What do you think of this?
~Lacey
 
------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-2430
Room 3D10 Agriculture
Department of Plant Sciences
University of Saskatchewan
 


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Execution Time Problems with new API

Stephen Ficklin-2
<base href="x-msg://8/">

You can get the table to join on from the table description you are pulling.  So no need to add a new entry to the array to specify the table. 

 

So, for example.  The ‘feature’ table definition has a foreign key relationship for the organism_id.  We know that joins with the organism table because it is defined in the foreign key relationship described in the definition array that we get back from the hook. 

 

The changes to the code should be very minimal to make this work (it’s already programmed this way).   We just need to define our hooks and then call the appropriate hook in place of the function call that right now gets the huge schema array.

 

Stephen

 

From: Lacey-Anne Sanderson [mailto:[hidden email]]
Sent: Monday, February 28, 2011 11:33 AM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Re: Execution Time Problems with new API

 

I also prefer the previous form of the values array. The problem with that is -Since we are only pulling out the table description for one table I need to know which table that is. In order to keep the close to the current structure we could either add a 'table' bit to a foreign key definition.

ie:

$values = array(

     organism_id => array(

           'table' => 'organism',

           'genus' => 'Lens',

           'species' => 'culinaris',

     ),

     'uniquename' => 'gi:432555',

     'type_id' => array(

           'table' => 'cvterm',

           'name' => 'gene',

           'cv_id' => array(

                'name' => 'sequence',

           ),

     )

);

 

Or we could create another hook that compiles a list of primary_key => table definitions. Something along the lines of hook_table_description_chado_schema that would be used as follows:

In each module,

tripal_stock_table_description_chado_schema() {

  $description = array();

 

  $description['stock_id'] = array(

    'table' => 'stock',

    'primary_key' => 'stock_id',

  );

 

  $description['stockpro-_id'] = array(

    'table' => 'stockprop',

    'primary_key' => 'stockprop_id',

  );

 

  return $description;

}

Then in the select/update all the descriptions would be pulled in using module_invoke_all. This might cause the same time overload problem that we have with the current api though since it is loading all the chado table descriptions into memory (abeit a lot smaller table description than before).

 

Obviously, I'll leaning towards the first option.

~Lacey

 

------------------------------------------------------

Lacey-Anne Sanderson

Bioinformaticist

Pulse Crop Breeding and Genetics

Phone: (306) 966-2430

Room 3D10 Agriculture

Department of Plant Sciences

University of Saskatchewan

 

On 2011-02-28, at 10:03 AM, Stephen Ficklin wrote:



Hi Lacey,

 

Sorry for my slow response.  I’ve been very busy on another project and just now had time to think about other things…

 

I would prefer to leave the structure in this way: 

 

$values = array(

     organism_id => array(

           'genus' => 'Lens',

           'species' => 'culinaris',

     ),

     'uniquename' => 'gi:432555',

     'cvterm_id' => array(

           'name' => 'gene',

           'cv_id' => array(

                'name' => 'sequence',

           ),

     )

);

 

So, for this example, I would be passing in the above array and the table name ‘feature’ to the ‘select’, ‘update’, or ‘insert’ function.  But, In the code for the ‘select’, ‘insert’ and ‘update’ functions we do as you suggest. We have a hook that would call a function:  tripal_chado_<table_name>_schema (i.e. tripal_chado_feature_schema()) which would return the table definition for the ‘feature’ table.   From this table defition we would find out that the organism_id is a foreign key relationship with organism and we could then recurse down, in the same way that the code currently works.

 

I like the array structure as we previously defined it because it has all of the field names of the table in it and I think is a bit more intuitive.

 

Stephen

 

 

From: Lacey-Anne Sanderson [mailto:[hidden email]] 
Sent: Friday, February 25, 2011 5:56 PM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Execution Time Problems with new API

 

Hi 

 

I started switching over all tripal_stock queries to use the tripal_core api and ran into a serious performance issue... I tried switching over all the queries in my node_load function and now I'm getting the following error:

Fatal error: Maximum execution time of 30 seconds exceeded in /XXX/sites/all/modules/tripal/tripal_core/chado_tables.schema.inc on line 3608

Executed 4310 queries in 5575.12 milliseconds.

 

This makes me think that the current approach of storing the entire schema in memory is causing serious performance issues... It would be better if we could get only the table description we need without trying to load all of them into memory. 

 

One way of doing this might be to put each definition in it's own function and have the function names be hook_tablename_schema (ie: tripal_stock_stock_schema or tripal_stock_stockprop_schema). I can programmatically create all these functions from the chado_tables.schema.inc file we have currently and then just move them into the appropriate module.api.inc

function tripal_stock_stock_schema() {

  return array(

     'description' => t('TODO: please describe this table!'),

     'fields' => array(

       'stock_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'serial',

         'not null' => TRUE,

       ),

       'dbxref_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'int',

         'not null' => FALSE,

       ),

       'organism_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'int',

         'not null' => TRUE,

       ),

       'name' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'varchar',

         'length' => '255',

         'not null' => FALSE,

       ),

       'uniquename' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'text',

         'not null' => TRUE,

       ),

       'description' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'text',

         'not null' => FALSE,

       ),

       'type_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'int',

         'not null' => TRUE,

       ),

       'is_obsolete' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'boolean',

         'not null' => TRUE,

         'default' => 'als',

       ),

     ),

     'primary key' => array('stock_id'),

     'unique keys' => array(

       'stock_c1' => array('organism_id', 'uniquename', 'type_id')

     ),

     'indexes' => array(

       'stock_idx1' => array('dbxref_id'),

       'stock_idx2' => array('organism_id'),

       'stock_idx3' => array('type_id'),

       'stock_idx4' => array('uniquename'),

       'stock_name_ind1' => array('name'),

     ),

    'foreign keys' => array(

       'organism' => array(

         'table' => 'organism',

         'columns' => array('organism_id' => 'organism_id'),

       ),

       'dbxref' => array(

         'table' => 'dbxref',

         'columns' => array('dbxref_id' => 'dbxref_id'),

       ),

       'cvterm' => array(

         'table' => 'cvterm',

         'columns' => array('type_id' => 'cvterm_id'),

       ),

     ),

   );

}

 

Then you could do the following to get the table definition:

     // Call all module definitions of the hook for a given table

     // Ie: If $table is 'stock' then $table_desc would contain the array returned by

     // any hook_stock_schema() implemented

     // @todo: See how this performs if 2+ modules define a given hook

     //         are returned arrays merged?

     $table_desc = module_invoke_all($table.'_schema');

 

This would change our $values array to be tablename => array(descriptors) rather than 'foreign key' => array(descriptors). For example, to select a feature:

$columns = array('feature_id', 'name');

$values = array(

     'organism' => array(

           'genus' => 'Lens',

           'species' => 'culinaris',

     ),

     'uniquename' => 'gi:432555',

     'cvterm' => array(

           'name' => 'gene',

           'cv' => array(

                'name' => 'sequence',

           ),

     )

);

$features = tripal_core_chado_select('feature', $columns, $values);

 

What do you think of this?

~Lacey

 

------------------------------------------------------

Lacey-Anne Sanderson

Bioinformaticist

Pulse Crop Breeding and Genetics

Phone: (306) 966-2430

Room 3D10 Agriculture

Department of Plant Sciences

University of Saskatchewan

 

 


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Execution Time Problems with new API

Lacey-Anne Sanderson
<base href="x-msg://8/">Ah, yes. I hadn't thought of that. Okay, I will start implementing these changes:
1. Programatic creation of all table description functions -functions will be stored in the <tripal module>.api.inc file they are associated with (ie: stock table in tripal_stock.api.inc)
2. Addition of $table_desc = module_invoke_all($table.'_schema'); to select/update/insert
3. Using of previous table description to pull out table description for the table referred to be the foreign key

~Lacey
------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-2430
Room 3D10 Agriculture
Department of Plant Sciences
University of Saskatchewan

On 2011-02-28, at 10:57 AM, Stephen Ficklin wrote:

You can get the table to join on from the table description you are pulling.  So no need to add a new entry to the array to specify the table. 
 
So, for example.  The ‘feature’ table definition has a foreign key relationship for the organism_id.  We know that joins with the organism table because it is defined in the foreign key relationship described in the definition array that we get back from the hook. 
 
The changes to the code should be very minimal to make this work (it’s already programmed this way).   We just need to define our hooks and then call the appropriate hook in place of the function call that right now gets the huge schema array.
 
Stephen
 
From: Lacey-Anne Sanderson [mailto:[hidden email]] 
Sent: Monday, February 28, 2011 11:33 AM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Re: Execution Time Problems with new API
 
I also prefer the previous form of the values array. The problem with that is -Since we are only pulling out the table description for one table I need to know which table that is. In order to keep the close to the current structure we could either add a 'table' bit to a foreign key definition.
ie:
$values = array(

     organism_id => array(
           'table' => 'organism',

           'genus' => 'Lens',

           'species' => 'culinaris',

     ),

     'uniquename' => 'gi:432555',

     'type_id' => array(
           'table' => 'cvterm',

           'name' => 'gene',

           'cv_id' => array(

                'name' => 'sequence',

           ),

     )

);
 
Or we could create another hook that compiles a list of primary_key => table definitions. Something along the lines of hook_table_description_chado_schema that would be used as follows:
In each module,
tripal_stock_table_description_chado_schema() {
  $description = array();
 
  $description['stock_id'] = array(
    'table' => 'stock',
    'primary_key' => 'stock_id',
  );
 
  $description['stockpro-_id'] = array(
    'table' => 'stockprop',
    'primary_key' => 'stockprop_id',
  );
 
  return $description;
}
Then in the select/update all the descriptions would be pulled in using module_invoke_all. This might cause the same time overload problem that we have with the current api though since it is loading all the chado table descriptions into memory (abeit a lot smaller table description than before).
 
Obviously, I'll leaning towards the first option.
~Lacey
 
------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-2430
Room 3D10 Agriculture
Department of Plant Sciences
University of Saskatchewan
 
On 2011-02-28, at 10:03 AM, Stephen Ficklin wrote:


Hi Lacey,
 
Sorry for my slow response.  I’ve been very busy on another project and just now had time to think about other things…
 
I would prefer to leave the structure in this way: 
 
$values = array(
     organism_id => array(
           'genus' => 'Lens',
           'species' => 'culinaris',
     ),
     'uniquename' => 'gi:432555',
     'cvterm_id' => array(
           'name' => 'gene',
           'cv_id' => array(
                'name' => 'sequence',
           ),
     )
);
 
So, for this example, I would be passing in the above array and the table name ‘feature’ to the ‘select’, ‘update’, or ‘insert’ function.  But, In the code for the ‘select’, ‘insert’ and ‘update’ functions we do as you suggest. We have a hook that would call a function:  tripal_chado_<table_name>_schema (i.e. tripal_chado_feature_schema()) which would return the table definition for the ‘feature’ table.   From this table defition we would find out that the organism_id is a foreign key relationship with organism and we could then recurse down, in the same way that the code currently works.
 
I like the array structure as we previously defined it because it has all of the field names of the table in it and I think is a bit more intuitive.
 
Stephen
 
 
From: Lacey-Anne Sanderson [mailto:[hidden email]] 
Sent: Friday, February 25, 2011 5:56 PM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Execution Time Problems with new API
 
Hi 
 
I started switching over all tripal_stock queries to use the tripal_core api and ran into a serious performance issue... I tried switching over all the queries in my node_load function and now I'm getting the following error:
Fatal error: Maximum execution time of 30 seconds exceeded in /XXX/sites/all/modules/tripal/tripal_core/chado_tables.schema.inc on line 3608
Executed 4310 queries in 5575.12 milliseconds.
 
This makes me think that the current approach of storing the entire schema in memory is causing serious performance issues... It would be better if we could get only the table description we need without trying to load all of them into memory. 
 
One way of doing this might be to put each definition in it's own function and have the function names be hook_tablename_schema (ie: tripal_stock_stock_schema or tripal_stock_stockprop_schema). I can programmatically create all these functions from the chado_tables.schema.inc file we have currently and then just move them into the appropriate module.api.inc
function tripal_stock_stock_schema() {
  return array(
     'description' => t('TODO: please describe this table!'),
     'fields' => array(
       'stock_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'serial',
         'not null' => TRUE,
       ),
       'dbxref_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => FALSE,
       ),
       'organism_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => TRUE,
       ),
       'name' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'varchar',
         'length' => '255',
         'not null' => FALSE,
       ),
       'uniquename' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'text',
         'not null' => TRUE,
       ),
       'description' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'text',
         'not null' => FALSE,
       ),
       'type_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => TRUE,
       ),
       'is_obsolete' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'boolean',
         'not null' => TRUE,
         'default' => 'als',
       ),
     ),
     'primary key' => array('stock_id'),
     'unique keys' => array(
       'stock_c1' => array('organism_id', 'uniquename', 'type_id')
     ),
     'indexes' => array(
       'stock_idx1' => array('dbxref_id'),
       'stock_idx2' => array('organism_id'),
       'stock_idx3' => array('type_id'),
       'stock_idx4' => array('uniquename'),
       'stock_name_ind1' => array('name'),
     ),
    'foreign keys' => array(
       'organism' => array(
         'table' => 'organism',
         'columns' => array('organism_id' => 'organism_id'),
       ),
       'dbxref' => array(
         'table' => 'dbxref',
         'columns' => array('dbxref_id' => 'dbxref_id'),
       ),
       'cvterm' => array(
         'table' => 'cvterm',
         'columns' => array('type_id' => 'cvterm_id'),
       ),
     ),
   );
}
 
Then you could do the following to get the table definition:
     // Call all module definitions of the hook for a given table
     // Ie: If $table is 'stock' then $table_desc would contain the array returned by
     // any hook_stock_schema() implemented
     // @todo: See how this performs if 2+ modules define a given hook
     //         are returned arrays merged?
     $table_desc = module_invoke_all($table.'_schema');
 
This would change our $values array to be tablename => array(descriptors) rather than 'foreign key' => array(descriptors). For example, to select a feature:
$columns = array('feature_id', 'name');
$values = array(
     'organism' => array(
           'genus' => 'Lens',
           'species' => 'culinaris',
     ),
     'uniquename' => 'gi:432555',
     'cvterm' => array(
           'name' => 'gene',
           'cv' => array(
                'name' =& gt; 'sequence',
           ),
     )
);
$features = tripal_core_chado_select('feature', $columns, $values);
 
What do you think of this?
~Lacey
 
------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-2430
Room 3D10 Agriculture
Department of Plant Sciences
University of Saskatchewan
 
 


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Execution Time Problems with new API

Stephen Ficklin-2
<base href="x-msg://8/">

Sounds good :-)

 

From: Lacey-Anne Sanderson [mailto:[hidden email]]
Sent: Monday, February 28, 2011 12:10 PM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Re: Execution Time Problems with new API

 

Ah, yes. I hadn't thought of that. Okay, I will start implementing these changes:

1. Programatic creation of all table description functions -functions will be stored in the <tripal module>.api.inc file they are associated with (ie: stock table in tripal_stock.api.inc)

2. Addition of $table_desc = module_invoke_all($table.'_schema'); to select/update/insert

3. Using of previous table description to pull out table description for the table referred to be the foreign key

 

~Lacey

------------------------------------------------------

Lacey-Anne Sanderson

Bioinformaticist

Pulse Crop Breeding and Genetics

Phone: (306) 966-2430

Room 3D10 Agriculture

Department of Plant Sciences

University of Saskatchewan

 

On 2011-02-28, at 10:57 AM, Stephen Ficklin wrote:



You can get the table to join on from the table description you are pulling.  So no need to add a new entry to the array to specify the table. 

 

So, for example.  The ‘feature’ table definition has a foreign key relationship for the organism_id.  We know that joins with the organism table because it is defined in the foreign key relationship described in the definition array that we get back from the hook. 

 

The changes to the code should be very minimal to make this work (it’s already programmed this way).   We just need to define our hooks and then call the appropriate hook in place of the function call that right now gets the huge schema array.

 

Stephen

 

From: Lacey-Anne Sanderson [mailto:[hidden email]] 
Sent: Monday, February 28, 2011 11:33 AM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Re: Execution Time Problems with new API

 

I also prefer the previous form of the values array. The problem with that is -Since we are only pulling out the table description for one table I need to know which table that is. In order to keep the close to the current structure we could either add a 'table' bit to a foreign key definition.

ie:

$values = array(


     organism_id => array(

           'table' => 'organism',


           'genus' => 'Lens',


           'species' => 'culinaris',


     ),


     'uniquename' => 'gi:432555',


     'type_id' => array(

           'table' => 'cvterm',


           'name' => 'gene',


           'cv_id' => array(


                'name' => 'sequence',


           ),


     )


);

 

Or we could create another hook that compiles a list of primary_key => table definitions. Something along the lines of hook_table_description_chado_schema that would be used as follows:

In each module,

tripal_stock_table_description_chado_schema() {

  $description = array();

 

  $description['stock_id'] = array(

    'table' => 'stock',

    'primary_key' => 'stock_id',

  );

 

  $description['stockpro-_id'] = array(

    'table' => 'stockprop',

    'primary_key' => 'stockprop_id',

  );

 

  return $description;

}

Then in the select/update all the descriptions would be pulled in using module_invoke_all. This might cause the same time overload problem that we have with the current api though since it is loading all the chado table descriptions into memory (abeit a lot smaller table description than before).

 

Obviously, I'll leaning towards the first option.

~Lacey

 

------------------------------------------------------

Lacey-Anne Sanderson

Bioinformaticist

Pulse Crop Breeding and Genetics

Phone: (306) 966-2430

Room 3D10 Agriculture

Department of Plant Sciences

University of Saskatchewan

 

On 2011-02-28, at 10:03 AM, Stephen Ficklin wrote:




Hi Lacey,

 

Sorry for my slow response.  I’ve been very busy on another project and just now had time to think about other things…

 

I would prefer to leave the structure in this way: 

 

$values = array(

     organism_id => array(

           'genus' => 'Lens',

           'species' => 'culinaris',

     ),

     'uniquename' => 'gi:432555',

     'cvterm_id' => array(

           'name' => 'gene',

           'cv_id' => array(

                'name' => 'sequence',

           ),

     )

);

 

So, for this example, I would be passing in the above array and the table name ‘feature’ to the ‘select’, ‘update’, or ‘insert’ function.  But, In the code for the ‘select’, ‘insert’ and ‘update’ functions we do as you suggest. We have a hook that would call a function:  tripal_chado_<table_name>_schema (i.e. tripal_chado_feature_schema()) which would return the table definition for the ‘feature’ table.   From this table defition we would find out that the organism_id is a foreign key relationship with organism and we could then recurse down, in the same way that the code currently works.

 

I like the array structure as we previously defined it because it has all of the field names of the table in it and I think is a bit more intuitive.

 

Stephen

 

 

From: Lacey-Anne Sanderson [mailto:[hidden email]] 
Sent: Friday, February 25, 2011 5:56 PM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Execution Time Problems with new API

 

Hi 

 

I started switching over all tripal_stock queries to use the tripal_core api and ran into a serious performance issue... I tried switching over all the queries in my node_load function and now I'm getting the following error:

Fatal error: Maximum execution time of 30 seconds exceeded in /XXX/sites/all/modules/tripal/tripal_core/chado_tables.schema.inc on line 3608

Executed 4310 queries in 5575.12 milliseconds.

 

This makes me think that the current approach of storing the entire schema in memory is causing serious performance issues... It would be better if we could get only the table description we need without trying to load all of them into memory. 

 

One way of doing this might be to put each definition in it's own function and have the function names be hook_tablename_schema (ie: tripal_stock_stock_schema or tripal_stock_stockprop_schema). I can programmatically create all these functions from the chado_tables.schema.inc file we have currently and then just move them into the appropriate module.api.inc

function tripal_stock_stock_schema() {

  return array(

     'description' => t('TODO: please describe this table!'),

     'fields' => array(

       'stock_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'serial',

         'not null' => TRUE,

       ),

       'dbxref_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'int',

         'not null' => FALSE,

       ),

       'organism_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'int',

         'not null' => TRUE,

       ),

       'name' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'varchar',

         'length' => '255',

         'not null' => FALSE,

       ),

       'uniquename' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'text',

         'not null' => TRUE,

       ),

       'description' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'text',

         'not null' => FALSE,

       ),

       'type_id' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'int',

         'not null' => TRUE,

       ),

       'is_obsolete' => array(

         'description' => t('TODO: please describe this field!'),

         'type' => 'boolean',

         'not null' => TRUE,

         'default' => 'als',

       ),

     ),

     'primary key' => array('stock_id'),

     'unique keys' => array(

       'stock_c1' => array('organism_id', 'uniquename', 'type_id')

     ),

     'indexes' => array(

       'stock_idx1' => array('dbxref_id'),

       'stock_idx2' => array('organism_id'),

       'stock_idx3' => array('type_id'),

       'stock_idx4' => array('uniquename'),

       'stock_name_ind1' => array('name'),

     ),

    'foreign keys' => array(

       'organism' => array(

         'table' => 'organism',

         'columns' => array('organism_id' => 'organism_id'),

       ),

       'dbxref' => array(

         'table' => 'dbxref',

         'columns' => array('dbxref_id' => 'dbxref_id'),

       ),

       'cvterm' => array(

         'table' => 'cvterm',

         'columns' => array('type_id' => 'cvterm_id'),

       ),

     ),

   );

}

 

Then you could do the following to get the table definition:

     // Call all module definitions of the hook for a given table

     // Ie: If $table is 'stock' then $table_desc would contain the array returned by

     // any hook_stock_schema() implemented

     // @todo: See how this performs if 2+ modules define a given hook

     //         are returned arrays merged?

     $table_desc = module_invoke_all($table.'_schema');

 

This would change our $values array to be tablename => array(descriptors) rather than 'foreign key' => array(descriptors). For example, to select a feature:

$columns = array('feature_id', 'name');

$values = array(

     'organism' => array(

           'genus' => 'Lens',

           'species' => 'culinaris',

     ),

     'uniquename' => 'gi:432555',

     'cvterm' => array(

           'name' => 'gene',

           'cv' => array(

                'name' =& gt; 'sequence',

           ),

     )

);

$features = tripal_core_chado_select('feature', $columns, $values);

 

What do you think of this?

~Lacey

 

------------------------------------------------------

Lacey-Anne Sanderson

Bioinformaticist

Pulse Crop Breeding and Genetics

Phone: (306) 966-2430

Room 3D10 Agriculture

Department of Plant Sciences

University of Saskatchewan

 

 

 


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-tripal-devel] Execution Time Problems with new API

Lacey-Anne Sanderson
<base href="x-msg://8/">Okay, I have implemented the changes to the api. This has not changed the $values array but now only the table description needed is kept in memory rather than all table descriptions. 

Furthermore, the table description is acquired using hooks. A default hook implementation for each chado table was created programatically and is stored in tripal_core.schema.api.inc. These default description do not contain descriptions for the table/fields or foreign key definitions. 

To define foreign key constraints for your table of interest:
In the appropriate module (ie: tripal_stock for the stockprop table):
function <modulename>_chado_<tablename>_schema() {
  $description = array();

  $description['foreign keys'][<foreign tablename>] = array(
    'table' => <foreign tablename>,
    <foreign key in current table> => <primary key in foreign table>,
  );

  return $description;
}

For example,
function tripal_stock_chado_stockprop_schema() {
  $description = array();
  
  $description['foreign keys']['cvterm'] = array(
        'table' => 'cvterm',
        'columns' => array(
          'type_id' => 'cvterm_id',
        ),
  );
  
  return $description;
}

This does solve my execution timeout error and as such substantially speeds up select :)

~Lacey
------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-2430
Room 3D10 Agriculture
Department of Plant Sciences
University of Saskatchewan

On 2011-02-28, at 11:12 AM, Stephen Ficklin wrote:

Sounds good :-)
 
From: Lacey-Anne Sanderson [mailto:[hidden email]] 
Sent: Monday, February 28, 2011 12:10 PM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Re: Execution Time Problems with new API
 
Ah, yes. I hadn't thought of that. Okay, I will start implementing these changes:
1. Programatic creation of all table description functions -functions will be stored in the <tripal module>.api.inc file they are associated with (ie: stock table in tripal_stock.api.inc)
2. Addition of $table_desc = module_invoke_all($table.'_schema'); to select/update/insert
3. Using of previous table description to pull out table description for the table referred to be the foreign key
 
~Lacey
------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-2430
Room 3D10 Agriculture
Department of Plant Sciences
University of Saskatchewan
 
On 2011-02-28, at 10:57 AM, Stephen Ficklin wrote:


You can get the table to join on from the table description you are pulling.  So no need to add a new entry to the array to specify the table. 
 
So, for example.  The ‘feature’ table definition has a foreign key relationship for the organism_id.  We know that joins with the organism table because it is defined in the foreign key relationship described in the definition array that we get back from the hook. 
 
The changes to the code should be very minimal to make this work (it’s already programmed this way).   We just need to define our hooks and then call the appropriate hook in place of the function call that right now gets the huge schema array.
 
Stephen
 
From: Lacey-Anne Sanderson [mailto:[hidden email]] 
Sent: Monday, February 28, 2011 11:33 AM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Re: Execution Time Problems with new API
 
I also prefer the previous form of the values array. The problem with that is -Since we are only pulling out the table description for one table I need to know which table that is. In order to keep the close to the current structure we could either add a 'table' bit to a foreign key definition.
ie:
$values = array(


     organism_id => array(
           'table' => 'organism',


           'genus' => 'Lens',


           'species' => 'culinaris',


     ),


     'uniquename' => 'gi:432555',


     'type_id' => array(
           'table' => 'cvterm',


           'name' => 'gene',


           'cv_id' => array(


                'name' => 'sequence',


           ),


     )


);
 
Or we could create another hook that compiles a list of primary_key => table definitions. Something along the lines of hook_table_description_chado_schema that would be used as follows:
In each module,
tripal_stock_table_description_chado_schema() {
  $description = array();
 
  $description['stock_id'] = array(
    'table' => 'stock',
    'primary_key' => 'stock_id',
  );
 
  $description['stockpro-_id'] = array(
    'table' => 'stockprop',
    'primary_key' => 'stockprop_id',
  );
 
  return $description;
}
Then in the select/update all the descriptions would be pulled in using module_invoke_all. This might cause the same time overload problem that we have with the current api though since it is loading all the chado table descriptions into memory (abeit a lot smaller table description than before).
 
Obviously, I'll leaning towards the first option.
~Lacey
 
------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-2430
Room 3D10 Agriculture
Department of Plant Sciences
University of Saskatchewan
 
On 2011-02-28, at 10:03 AM, Stephen Ficklin wrote:



Hi Lacey,
 
Sorry for my slow response.  I’ve been very busy on another project and just now had time to think about other things…
 
I would prefer to leave the structure in this way: 
 
$values = array(
     organism_id => array(
           'genus' => 'Lens',
           'species' => 'culinaris',
     ),
     'uniquename' => 'gi:432555',
     'cvterm_id' => array(
           'name' => 'gene',
           'cv_id' => array(
                'name' => 'sequence',
           ),
     )
);
 
So, for this example, I would be passing in the above array and the table name ‘feature’ to the ‘select’, ‘update’, or ‘insert’ function.  But, In the code for the ‘select’, ‘insert’ and ‘update’ functions we do as you suggest. We have a hook that would call a function:  tripal_chado_<table_name>_schema (i.e. tripal_chado_feature_schema()) which would return the table definition for the ‘feature’ table.   From this table defition we would find out that the organism_id is a foreign key relationship with organism and we could then recurse down, in the same way that the code currently works.
 
I like the array structure as we previously defined it because it has all of the field names of the table in it and I think is a bit more intuitive.
 
Stephen
 
 
From: Lacey-Anne Sanderson [mailto:[hidden email]] 
Sent: Friday, February 25, 2011 5:56 PM
To: Stephen Ficklin
Cc: [hidden email]
Subject: Execution Time Problems with new API
 
Hi 
 
I started switching over all tripal_stock queries to use the tripal_core api and ran into a serious performance issue... I tried switching over all the queries in my node_load function and now I'm getting the following error:
Fatal error: Maximum execution time of 30 seconds exceeded in /XXX/sites/all/modules/tripal/tripal_core/chado_tables.schema.inc on line 3608
Executed 4310 queries in 5575.12 milliseconds.
 
This makes me think that the current approach of storing the entire schema in memory is causing serious performance issues... It would be better if we could get only the table description we need without trying to load all of them into memory. 
 
One way of doing this might be to put each definition in it's own function and have the function names be hook_tablename_schema (ie: tripal_stock_stock_schema or tripal_stock_stockprop_schema). I can programmatically create all these functions from the chado_tables.schema.inc file we have currently and then just move them into the appropriate module.api.inc
function tripal_stock_stock_schema() {
  return array(
     'description' => t('TODO: please describe this table!'),
     'fields' => array(
       'stock_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'serial',
         'not null' => TRUE,
       ),
       'dbxref_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => FALSE,
       ),
       'organism_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => TRUE,
       ),
       'name' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'varchar',
         'length' => '255',
         'not null' => FALSE,
       ),
       'uniquename' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'text',
         'not null' => TRUE,
       ),
       'description' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'text',
         'not null' => FALSE,
       ),
       'type_id' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'int',
         'not null' => TRUE,
       ),
       'is_obsolete' => array(
         'description' => t('TODO: please describe this field!'),
         'type' => 'boolean',
         'not null' => TRUE,
         'default' => 'als',
       ),
     ),
     'primary key' => array('stock_id'),
     'unique keys' => array(
       'stock_c1' => array('organism_id', 'uniquename', 'type_id')
     ),
     'indexes' => array(
       'stock_idx1' => array('dbxref_id'),
       'stock_idx2' => array('organism_id'),
       'stock_idx3' => array('type_id'),
       'stock_idx4' => array('uniquename'),
       'stock_name_ind1' => array('name'),
     ),
    'foreign keys' => array(
       'organism' => array(
         'table' => 'organism',
         'columns' => array('organism_id' => 'organism_id'),
       ),
       'dbxref' => array(
         'table' => 'dbxref',
         'columns' => array('dbxref_id' => 'dbxref_id'),
       ),
       'cvterm' => array(
         'table' => 'cvterm',
         'columns' => array('type_id' => 'cvterm_id'),
       ),
     ),
   );
}
 
Then you could do the following to get the table definition:
     // Call all module definitions of the hook for a given table
     // Ie: If $table is 'stock' then $table_desc would contain the array returned by
     // any hook_stock_schema() implemented
     // @todo: See how this performs if 2+ modules define a given hook
     //         are returned arrays merged?
     $table_desc = module_invoke_all($table.'_schema');
 
This would change our $values array to be tablename => array(descriptors) rather than 'foreign key' => array(descriptors). For example, to select a feature:
$columns = array('feature_id', 'name');
$values = array(
     'organism' => array(
           'genus' => 'Lens',
           'species' => 'culinaris',
     ),
     'uniquename' => 'gi:432555',
     'cvterm' => array(
           'name' => 'gene',
           'cv' => array(
                'name' =& gt; 'sequence',
           ),
     )
);
$features = tripal_core_chado_select('feature', $columns, $values);
 
What do you think of this?
~Lacey
 
------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-2430
Room 3D10 Agriculture
Department of Plant Sciences
University of Saskatchewan
 
 
 


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-tripal-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel